WorldWideScience

Sample records for accurate genome alignment

  1. Pairagon: a highly accurate, HMM-based cDNA-to-genome aligner

    DEFF Research Database (Denmark)

    Lu, David V; Brown, Randall H; Arumugam, Manimozhiyan;

    2009-01-01

    MOTIVATION: The most accurate way to determine the intron-exon structures in a genome is to align spliced cDNA sequences to the genome. Thus, cDNA-to-genome alignment programs are a key component of most annotation pipelines. The scoring system used to choose the best alignment is a primary...

  2. Pairagon: a highly accurate, HMM-based cDNA-to-genome aligner

    DEFF Research Database (Denmark)

    Lu, David V; Brown, Randall H; Arumugam, Manimozhiyan

    2009-01-01

    MOTIVATION: The most accurate way to determine the intron-exon structures in a genome is to align spliced cDNA sequences to the genome. Thus, cDNA-to-genome alignment programs are a key component of most annotation pipelines. The scoring system used to choose the best alignment is a primary......' simulated cDNA sequences by splicing the sequences of exons in the reference genome sequences of fly and human. The complete reference genome sequences were then mutated to various degrees using a realistic mutation simulator and the perfect cDNAs were aligned to them using Pairagon and 12 other aligners...... heuristics. RESULTS: We present Pairagon, a pair hidden Markov model based cDNA-to-genome alignment program, as the most accurate aligner for sequences with high- and low-identity levels. We conducted a series of experiments testing alignment accuracy with varying sequence identity. We first created 'perfect...

  3. Faster and More Accurate Sequence Alignment with SNAP

    CERN Document Server

    Zaharia, Matei; Curtis, Kristal; Fox, Armando; Patterson, David; Shenker, Scott; Stoica, Ion; Karp, Richard M; Sittler, Taylor

    2011-01-01

    We present the Scalable Nucleotide Alignment Program (SNAP), a new short and long read aligner that is both more accurate (i.e., aligns more reads with fewer errors) and 10-100x faster than state-of-the-art tools such as BWA. Unlike recent aligners based on the Burrows-Wheeler transform, SNAP uses a simple hash index of short seed sequences from the genome, similar to BLAST's. However, SNAP greatly reduces the number and cost of local alignment checks performed through several measures: it uses longer seeds to reduce the false positive locations considered, leverages larger memory capacities to speed index lookup, and excludes most candidate locations without fully computing their edit distance to the read. The result is an algorithm that scales well for reads from one hundred to thousands of bases long and provides a rich error model that can match classes of mutations (e.g., longer indels) that today's fast aligners ignore. We calculate that SNAP can align a dataset with 30x coverage of a human genome in le...

  4. Genome Update: alignment of bacterial chromosomes

    DEFF Research Database (Denmark)

    Ussery, David; Jensen, Mette; Poulsen, Tine Rugh

    2004-01-01

    There are four new microbial genomes listed in this month's Genome Update, three belonging to Gram-positive bacteria and one belonging to an archaeon that lives at pH 0; all of these genomes are listed in Table 1⇓. The method of genome comparison this month is that of genome alignment and......, as an example, an alignment of seven Staphylococcus aureus genomes and one Staphylococcus epidermidis genome is presented....

  5. Cactus: Algorithms for genome multiple sequence alignment

    OpenAIRE

    Paten, Benedict; Earl, Dent; Nguyen, Ngan; Diekhans, Mark; Zerbino, Daniel; Haussler, David

    2011-01-01

    Much attention has been given to the problem of creating reliable multiple sequence alignments in a model incorporating substitutions, insertions, and deletions. Far less attention has been paid to the problem of optimizing alignments in the presence of more general rearrangement and copy number variation. Using Cactus graphs, recently introduced for representing sequence alignments, we describe two complementary algorithms for creating genomic alignments. We have implemented these algorithms...

  6. Strategies and tools for whole genome alignments

    Energy Technology Data Exchange (ETDEWEB)

    Couronne, Olivier; Poliakov, Alexander; Bray, Nicolas; Ishkhanov,Tigran; Ryaboy, Dmitriy; Rubin, Edward; Pachter, Lior; Dubchak, Inna

    2002-11-25

    The availability of the assembled mouse genome makespossible, for the first time, an alignment and comparison of two largevertebrate genomes. We have investigated different strategies ofalignment for the subsequent analysis of conservation of genomes that areeffective for different quality assemblies. These strategies were appliedto the comparison of the working draft of the human genome with the MouseGenome Sequencing Consortium assembly, as well as other intermediatemouse assemblies. Our methods are fast and the resulting alignmentsexhibit a high degree of sensitivity, covering more than 90 percent ofknown coding exons in the human genome. We have obtained such coveragewhile preserving specificity. With a view towards the end user, we havedeveloped a suite of tools and websites for automatically aligning, andsubsequently browsing and working with whole genome comparisons. Wedescribe the use of these tools to identify conserved non-coding regionsbetween the human and mouse genomes, some of which have not beenidentified by other methods.

  7. Accurate genome relative abundance estimation based on shotgun metagenomic reads.

    Directory of Open Access Journals (Sweden)

    Li C Xia

    Full Text Available Accurate estimation of microbial community composition based on metagenomic sequencing data is fundamental for subsequent metagenomics analysis. Prevalent estimation methods are mainly based on directly summarizing alignment results or its variants; often result in biased and/or unstable estimates. We have developed a unified probabilistic framework (named GRAMMy by explicitly modeling read assignment ambiguities, genome size biases and read distributions along the genomes. Maximum likelihood method is employed to compute Genome Relative Abundance of microbial communities using the Mixture Model theory (GRAMMy. GRAMMy has been demonstrated to give estimates that are accurate and robust across both simulated and real read benchmark datasets. We applied GRAMMy to a collection of 34 metagenomic read sets from four metagenomics projects and identified 99 frequent species (minimally 0.5% abundant in at least 50% of the data-sets in the human gut samples. Our results show substantial improvements over previous studies, such as adjusting the over-estimated abundance for Bacteroides species for human gut samples, by providing a new reference-based strategy for metagenomic sample comparisons. GRAMMy can be used flexibly with many read assignment tools (mapping, alignment or composition-based even with low-sensitivity mapping results from huge short-read datasets. It will be increasingly useful as an accurate and robust tool for abundance estimation with the growing size of read sets and the expanding database of reference genomes.

  8. CUSHAW3: sensitive and accurate base-space and color-space short-read alignment with hybrid seeding.

    Directory of Open Access Journals (Sweden)

    Yongchao Liu

    Full Text Available The majority of next-generation sequencing short-reads can be properly aligned by leading aligners at high speed. However, the alignment quality can still be further improved, since usually not all reads can be correctly aligned to large genomes, such as the human genome, even for simulated data. Moreover, even slight improvements in this area are important but challenging, and usually require significantly more computational endeavor. In this paper, we present CUSHAW3, an open-source parallelized, sensitive and accurate short-read aligner for both base-space and color-space sequences. In this aligner, we have investigated a hybrid seeding approach to improve alignment quality, which incorporates three different seed types, i.e. maximal exact match seeds, exact-match k-mer seeds and variable-length seeds, into the alignment pipeline. Furthermore, three techniques: weighted seed-pairing heuristic, paired-end alignment pair ranking and read mate rescuing have been conceived to facilitate accurate paired-end alignment. For base-space alignment, we have compared CUSHAW3 to Novoalign, CUSHAW2, BWA-MEM, Bowtie2 and GEM, by aligning both simulated and real reads to the human genome. The results show that CUSHAW3 consistently outperforms CUSHAW2, BWA-MEM, Bowtie2 and GEM in terms of single-end and paired-end alignment. Furthermore, our aligner has demonstrated better paired-end alignment performance than Novoalign for short-reads with high error rates. For color-space alignment, CUSHAW3 is consistently one of the best aligners compared to SHRiMP2 and BFAST. The source code of CUSHAW3 and all simulated data are available at http://cushaw3.sourceforge.net.

  9. BFAST: an alignment tool for large scale genome resequencing.

    Directory of Open Access Journals (Sweden)

    Nils Homer

    Full Text Available BACKGROUND: The new generation of massively parallel DNA sequencers, combined with the challenge of whole human genome resequencing, result in the need for rapid and accurate alignment of billions of short DNA sequence reads to a large reference genome. Speed is obviously of great importance, but equally important is maintaining alignment accuracy of short reads, in the 25-100 base range, in the presence of errors and true biological variation. METHODOLOGY: We introduce a new algorithm specifically optimized for this task, as well as a freely available implementation, BFAST, which can align data produced by any of current sequencing platforms, allows for user-customizable levels of speed and accuracy, supports paired end data, and provides for efficient parallel and multi-threaded computation on a computer cluster. The new method is based on creating flexible, efficient whole genome indexes to rapidly map reads to candidate alignment locations, with arbitrary multiple independent indexes allowed to achieve robustness against read errors and sequence variants. The final local alignment uses a Smith-Waterman method, with gaps to support the detection of small indels. CONCLUSIONS: We compare BFAST to a selection of large-scale alignment tools -- BLAT, MAQ, SHRiMP, and SOAP -- in terms of both speed and accuracy, using simulated and real-world datasets. We show BFAST can achieve substantially greater sensitivity of alignment in the context of errors and true variants, especially insertions and deletions, and minimize false mappings, while maintaining adequate speed compared to other current methods. We show BFAST can align the amount of data needed to fully resequence a human genome, one billion reads, with high sensitivity and accuracy, on a modest computer cluster in less than 24 hours. BFAST is available at (http://bfast.sourceforge.net.

  10. Multiple Whole Genome Alignments Without a Reference Organism

    Energy Technology Data Exchange (ETDEWEB)

    Dubchak, Inna; Poliakov, Alexander; Kislyuk, Andrey; Brudno, Michael

    2009-01-16

    Multiple sequence alignments have become one of the most commonly used resources in genomics research. Most algorithms for multiple alignment of whole genomes rely either on a reference genome, against which all of the other sequences are laid out, or require a one-to-one mapping between the nucleotides of the genomes, preventing the alignment of recently duplicated regions. Both approaches have drawbacks for whole-genome comparisons. In this paper we present a novel symmetric alignment algorithm. The resulting alignments not only represent all of the genomes equally well, but also include all relevant duplications that occurred since the divergence from the last common ancestor. Our algorithm, implemented as a part of the VISTA Genome Pipeline (VGP), was used to align seven vertebrate and sixDrosophila genomes. The resulting whole-genome alignments demonstrate a higher sensitivity and specificity than the pairwise alignments previously available through the VGP and have higher exon alignment accuracy than comparable public whole-genome alignments. Of the multiple alignment methods tested, ours performed the best at aligning genes from multigene families?perhaps the most challenging test for whole-genome alignments. Our whole-genome multiple alignments are available through the VISTA Browser at http://genome.lbl.gov/vista/index.shtml.

  11. Improving pan-genome annotation using whole genome multiple alignment

    Directory of Open Access Journals (Sweden)

    Salzberg Steven L

    2011-06-01

    Full Text Available Abstract Background Rapid annotation and comparisons of genomes from multiple isolates (pan-genomes is becoming commonplace due to advances in sequencing technology. Genome annotations can contain inconsistencies and errors that hinder comparative analysis even within a single species. Tools are needed to compare and improve annotation quality across sets of closely related genomes. Results We introduce a new tool, Mugsy-Annotator, that identifies orthologs and evaluates annotation quality in prokaryotic genomes using whole genome multiple alignment. Mugsy-Annotator identifies anomalies in annotated gene structures, including inconsistently located translation initiation sites and disrupted genes due to draft genome sequencing or pseudogenes. An evaluation of species pan-genomes using the tool indicates that such anomalies are common, especially at translation initiation sites. Mugsy-Annotator reports alternate annotations that improve consistency and are candidates for further review. Conclusions Whole genome multiple alignment can be used to efficiently identify orthologs and annotation problem areas in a bacterial pan-genome. Comparisons of annotated gene structures within a species may show more variation than is actually present in the genome, indicating errors in genome annotation. Our new tool Mugsy-Annotator assists re-annotation efforts by highlighting edits that improve annotation consistency.

  12. JAGuaR: junction alignments to genome for RNA-seq reads.

    Directory of Open Access Journals (Sweden)

    Yaron S Butterfield

    Full Text Available JAGuaR is an alignment protocol for RNA-seq reads that uses an extended reference to increase alignment sensitivity. It uses BWA to align reads to the genome and reference transcript models (including annotated exon-exon junctions specifically allowing for the possibility of a single read spanning multiple exons. Reads aligned to the transcript models are then re-mapped on to genomic coordinates, transforming alignments that span multiple exons into large-gapped alignments on the genome. While JAGuaR does not detect novel junctions, we demonstrate how JAGuaR generates fast and accurate transcriptome alignments, which allows for both sensitive and specific SNV calling.

  13. Enhanced Dynamic Algorithm of Genome Sequence Alignments

    Directory of Open Access Journals (Sweden)

    Arabi E. keshk

    2014-05-01

    Full Text Available The merging of biology and computer science has created a new field called computational biology that explore the capacities of computers to gain knowledge from biological data, bioinformatics. Computational biology is rooted in life sciences as well as computers, information sciences, and technologies. The main problem in computational biology is sequence alignment that is a way of arranging the sequences of DNA, RNA or protein to identify the region of similarity and relationship between sequences. This paper introduces an enhancement of dynamic algorithm of genome sequence alignment, which called EDAGSA. It is filling the three main diagonals without filling the entire matrix by the unused data. It gets the optimal solution with decreasing the execution time and therefore the performance is increased. To illustrate the effectiveness of optimizing the performance of the proposed algorithm, it is compared with the traditional methods such as Needleman-Wunsch, Smith-Waterman and longest common subsequence algorithms. Also, database is implemented for using the algorithm in multi-sequence alignments for searching the optimal sequence that matches the given sequence.

  14. BBMap: A Fast, Accurate, Splice-Aware Aligner

    Energy Technology Data Exchange (ETDEWEB)

    Bushnell, Brian

    2014-03-17

    Alignment of reads is one of the primary computational tasks in bioinformatics. Of paramount importance to resequencing, alignment is also crucial to other areas - quality control, scaffolding, string-graph assembly, homology detection, assembly evaluation, error-correction, expression quantification, and even as a tool to evaluate other tools. An optimal aligner would greatly improve virtually any sequencing process, but optimal alignment is prohibitively expensive for gigabases of data. Here, we will present BBMap [1], a fast splice-aware aligner for short and long reads. We will demonstrate that BBMap has superior speed, sensitivity, and specificity to alternative high-throughput aligners bowtie2 [2], bwa [3], smalt, [4] GSNAP [5], and BLASR [6].

  15. Genomic multiple sequence alignments: refinement using a genetic algorithm

    Directory of Open Access Journals (Sweden)

    Lefkowitz Elliot J

    2005-08-01

    Full Text Available Abstract Background Genomic sequence data cannot be fully appreciated in isolation. Comparative genomics – the practice of comparing genomic sequences from different species – plays an increasingly important role in understanding the genotypic differences between species that result in phenotypic differences as well as in revealing patterns of evolutionary relationships. One of the major challenges in comparative genomics is producing a high-quality alignment between two or more related genomic sequences. In recent years, a number of tools have been developed for aligning large genomic sequences. Most utilize heuristic strategies to identify a series of strong sequence similarities, which are then used as anchors to align the regions between the anchor points. The resulting alignment is globally correct, but in many cases is suboptimal locally. We describe a new program, GenAlignRefine, which improves the overall quality of global multiple alignments by using a genetic algorithm to improve local regions of alignment. Regions of low quality are identified, realigned using the program T-Coffee, and then refined using a genetic algorithm. Because a better COFFEE (Consistency based Objective Function For alignmEnt Evaluation score generally reflects greater alignment quality, the algorithm searches for an alignment that yields a better COFFEE score. To improve the intrinsic slowness of the genetic algorithm, GenAlignRefine was implemented as a parallel, cluster-based program. Results We tested the GenAlignRefine algorithm by running it on a Linux cluster to refine sequences from a simulation, as well as refine a multiple alignment of 15 Orthopoxvirus genomic sequences approximately 260,000 nucleotides in length that initially had been aligned by Multi-LAGAN. It took approximately 150 minutes for a 40-processor Linux cluster to optimize some 200 fuzzy (poorly aligned regions of the orthopoxvirus alignment. Overall sequence identity increased only

  16. Optimizing cell arrays for accurate functional genomics

    Directory of Open Access Journals (Sweden)

    Fengler Sven

    2012-07-01

    Full Text Available Abstract Background Cellular responses emerge from a complex network of dynamic biochemical reactions. In order to investigate them is necessary to develop methods that allow perturbing a high number of gene products in a flexible and fast way. Cell arrays (CA enable such experiments on microscope slides via reverse transfection of cellular colonies growing on spotted genetic material. In contrast to multi-well plates, CA are susceptible to contamination among neighboring spots hindering accurate quantification in cell-based screening projects. Here we have developed a quality control protocol for quantifying and minimizing contamination in CA. Results We imaged checkered CA that express two distinct fluorescent proteins and segmented images into single cells to quantify the transfection efficiency and interspot contamination. Compared with standard procedures, we measured a 3-fold reduction of contaminants when arrays containing HeLa cells were washed shortly after cell seeding. We proved that nucleic acid uptake during cell seeding rather than migration among neighboring spots was the major source of contamination. Arrays of MCF7 cells developed without the washing step showed 7-fold lower percentage of contaminant cells, demonstrating that contamination is dependent on specific cell properties. Conclusions Previously published methodological works have focused on achieving high transfection rate in densely packed CA. Here, we focused in an equally important parameter: The interspot contamination. The presented quality control is essential for estimating the rate of contamination, a major source of false positives and negatives in current microscopy based functional genomics screenings. We have demonstrated that a washing step after seeding enhances CA quality for HeLA but is not necessary for MCF7. The described method provides a way to find optimal seeding protocols for cell lines intended to be used for the first time in CA.

  17. A fast and accurate initial alignment method for strapdown inertial navigation system on stationary base

    Institute of Scientific and Technical Information of China (English)

    Xinlong WANG; Gongxun SHEN

    2005-01-01

    In this work,a fast and accurate stationary alignment method for strapdown inertial navigation system (SINS) is proposed.It has been demonstrated that the stationary alignment of SINS can be improved by employing the multiposition technique,but the alignment time of the azimuth error is relatively longer.Over here,the two-position alignment principle is presented.On the basis of this SINS error model,a fast estimation algorithm of the azimuth error for the initial alignment of SINS on stationary base is derived fully from the horizontal velocity outputs and the output rates,and the novel azimuth error estimation algorithm is used for the two-position alignment.Consequently,the speed and accuracy of the SINS's initial alignment is enhanced greatly.The computer simulation results illustrate the efficiency of this alignment method.

  18. Improved fingercode alignment for accurate and compact fingerprint recognition

    CSIR Research Space (South Africa)

    Brown, Dane

    2016-05-01

    Full Text Available The traditional texture-based fingerprint recognition system known as FingerCode is improved in this work. Texture-based fingerprint recognition methods are generally more accurate than other methods, but at the disadvantage of increased storage...

  19. Comparative genomics beyond sequence-based alignments

    DEFF Research Database (Denmark)

    Þórarinsson, Elfar; Yao, Zizhen; Wiklund, Eric D.;

    2008-01-01

    Recent computational scans for non-coding RNAs (ncRNAs) in multiple organisms have relied on existing multiple sequence alignments. However, as sequence similarity drops, a key signal of RNA structure--frequent compensating base changes--is increasingly likely to cause sequence-based alignment me...

  20. Using ESTs for phylogenomics: Can one accurately infer a phylogenetic tree from a gappy alignment?

    Directory of Open Access Journals (Sweden)

    Hartmann Stefanie

    2008-03-01

    Full Text Available Abstract Background While full genome sequences are still only available for a handful of taxa, large collections of partial gene sequences are available for many more. The alignment of partial gene sequences results in a multiple sequence alignment containing large gaps that are arranged in a staggered pattern. The consequences of this pattern of missing data on the accuracy of phylogenetic analysis are not well understood. We conducted a simulation study to determine the accuracy of phylogenetic trees obtained from gappy alignments using three commonly used phylogenetic reconstruction methods (Neighbor Joining, Maximum Parsimony, and Maximum Likelihood and studied ways to improve the accuracy of trees obtained from such datasets. Results We found that the pattern of gappiness in multiple sequence alignments derived from partial gene sequences substantially compromised phylogenetic accuracy even in the absence of alignment error. The decline in accuracy was beyond what would be expected based on the amount of missing data. The decline was particularly dramatic for Neighbor Joining and Maximum Parsimony, where the majority of gappy alignments contained 25% to 40% incorrect quartets. To improve the accuracy of the trees obtained from a gappy multiple sequence alignment, we examined two approaches. In the first approach, alignment masking, potentially problematic columns and input sequences are excluded from from the dataset. Even in the absence of alignment error, masking improved phylogenetic accuracy up to 100-fold. However, masking retained, on average, only 83% of the input sequences. In the second approach, alignment subdivision, the missing data is statistically modelled in order to retain as many sequences as possible in the phylogenetic analysis. Subdivision resulted in more modest improvements to alignment accuracy, but succeeded in including almost all of the input sequences. Conclusion These results demonstrate that partial gene

  1. Rapid and accurate pyrosequencing of angiosperm plastid genomes

    Directory of Open Access Journals (Sweden)

    Farmerie William G

    2006-08-01

    Full Text Available Abstract Background Plastid genome sequence information is vital to several disciplines in plant biology, including phylogenetics and molecular biology. The past five years have witnessed a dramatic increase in the number of completely sequenced plastid genomes, fuelled largely by advances in conventional Sanger sequencing technology. Here we report a further significant reduction in time and cost for plastid genome sequencing through the successful use of a newly available pyrosequencing platform, the Genome Sequencer 20 (GS 20 System (454 Life Sciences Corporation, to rapidly and accurately sequence the whole plastid genomes of the basal eudicot angiosperms Nandina domestica (Berberidaceae and Platanus occidentalis (Platanaceae. Results More than 99.75% of each plastid genome was simultaneously obtained during two GS 20 sequence runs, to an average depth of coverage of 24.6× in Nandina and 17.3× in Platanus. The Nandina and Platanus plastid genomes shared essentially identical gene complements and possessed the typical angiosperm plastid structure and gene arrangement. To assess the accuracy of the GS 20 sequence, over 45 kilobases of sequence were generated for each genome using conventional sequencing. Overall error rates of 0.043% and 0.031% were observed in GS 20 sequence for Nandina and Platanus, respectively. More than 97% of all observed errors were associated with homopolymer runs, with ~60% of all errors associated with homopolymer runs of 5 or more nucleotides and ~50% of all errors associated with regions of extensive homopolymer runs. No substitution errors were present in either genome. Error rates were generally higher in the single-copy and noncoding regions of both plastid genomes relative to the inverted repeat and coding regions. Conclusion Highly accurate and essentially complete sequence information was obtained for the Nandina and Platanus plastid genomes using the GS 20 System. More importantly, the high accuracy

  2. Using a priori knowledge to align sequencing reads to their exact genomic position

    NARCIS (Netherlands)

    Böttcher, R.; Amberg, R.; Ruzius, F.P.; Guryev, V.; Verhaegh, W.F.J.; Beyerlein, P.; Van der Zaag, P.J.

    2011-01-01

    The use of a priori knowledge in aligning targeted sequencing data is investigated using computational experiments. With conventional aligners such as Bowtie, BWA or MAQ, alignment is performed against the whole genome. Using an alignment method in which the genomic position information from the

  3. Volume visualization of multiple alignment of genomic DNA

    Energy Technology Data Exchange (ETDEWEB)

    Shah, Nameeta; Weber, Gunther H.; Dillard, Scott E.; Hamann, Bernd

    2004-05-01

    Genomes of hundreds of species have been sequenced to date and many more are being sequenced. As more and more sequence data sets become available, and as the challenge of comparing these massive ''billion basepair DNA sequences'' becomes substantial, so does the need for more powerful tools supporting the exploration of these data sets. Similarity score data used to compare aligned DNA sequences is inherently one-dimensional. One-dimensional (1D) representations of these data sets do not effectively utilize screen real estate. We present a technique to arrange 1D data in 3D space to allow us to apply state-of-the-art interactive volume visualization techniques for data exploration. We provide results for aligned DNA sequence data and compare it with traditional 1D line plots. Our technique, coupled with 1D line plots, results in effective multiresolution visualization of very large aligned sequence data sets.

  4. SOAP3-dp: fast, accurate and sensitive GPU-based short read aligner.

    Directory of Open Access Journals (Sweden)

    Ruibang Luo

    Full Text Available To tackle the exponentially increasing throughput of Next-Generation Sequencing (NGS, most of the existing short-read aligners can be configured to favor speed in trade of accuracy and sensitivity. SOAP3-dp, through leveraging the computational power of both CPU and GPU with optimized algorithms, delivers high speed and sensitivity simultaneously. Compared with widely adopted aligners including BWA, Bowtie2, SeqAlto, CUSHAW2, GEM and GPU-based aligners BarraCUDA and CUSHAW, SOAP3-dp was found to be two to tens of times faster, while maintaining the highest sensitivity and lowest false discovery rate (FDR on Illumina reads with different lengths. Transcending its predecessor SOAP3, which does not allow gapped alignment, SOAP3-dp by default tolerates alignment similarity as low as 60%. Real data evaluation using human genome demonstrates SOAP3-dp's power to enable more authentic variants and longer Indels to be discovered. Fosmid sequencing shows a 9.1% FDR on newly discovered deletions. SOAP3-dp natively supports BAM file format and provides the same scoring scheme as BWA, which enables it to be integrated into existing analysis pipelines. SOAP3-dp has been deployed on Amazon-EC2, NIH-Biowulf and Tianhe-1A.

  5. Is multiple-sequence alignment required for accurate inference of phylogeny?

    Science.gov (United States)

    Höhl, Michael; Ragan, Mark A

    2007-04-01

    the correct phylogeny as accurately as does an approach based on maximum-likelihood distance estimates of multiply aligned sequences.

  6. Alignment-free phylogeny of whole genomes using underlying subwords

    Directory of Open Access Journals (Sweden)

    Comin Matteo

    2012-12-01

    Full Text Available Abstract Background With the progress of modern sequencing technologies a large number of complete genomes are now available. Traditionally the comparison of two related genomes is carried out by sequence alignment. There are cases where these techniques cannot be applied, for example if two genomes do not share the same set of genes, or if they are not alignable to each other due to low sequence similarity, rearrangements and inversions, or more specifically to their lengths when the organisms belong to different species. For these cases the comparison of complete genomes can be carried out only with ad hoc methods that are usually called alignment-free methods. Methods In this paper we propose a distance function based on subword compositions called Underlying Approach (UA. We prove that the matching statistics, a popular concept in the field of string algorithms able to capture the statistics of common words between two sequences, can be derived from a small set of “independent” subwords, namely the irredundant common subwords. We define a distance-like measure based on these subwords, such that each region of genomes contributes only once, thus avoiding to count shared subwords a multiple number of times. In a nutshell, this filter discards subwords occurring in regions covered by other more significant subwords. Results The Underlying Approach (UA builds a scoring function based on this set of patterns, called underlying. We prove that this set is by construction linear in the size of input, without overlaps, and can be efficiently constructed. Results show the validity of our method in the reconstruction of phylogenetic trees, where the Underlying Approach outperforms the current state of the art methods. Moreover, we show that the accuracy of UA is achieved with a very small number of subwords, which in some cases carry meaningful biological information. Availability http://www.dei.unipd.it/∼ciompin/main/underlying.html

  7. Genome comparison without alignment using shortest unique substrings

    Directory of Open Access Journals (Sweden)

    Möller Friedrich

    2005-05-01

    Full Text Available Abstract Background Sequence comparison by alignment is a fundamental tool of molecular biology. In this paper we show how a number of sequence comparison tasks, including the detection of unique genomic regions, can be accomplished efficiently without an alignment step. Our procedure for nucleotide sequence comparison is based on shortest unique substrings. These are substrings which occur only once within the sequence or set of sequences analysed and which cannot be further reduced in length without losing the property of uniqueness. Such substrings can be detected using generalized suffix trees. Results We find that the shortest unique substrings in Caenorhabditis elegans, human and mouse are no longer than 11 bp in the autosomes of these organisms. In mouse and human these unique substrings are significantly clustered in upstream regions of known genes. Moreover, the probability of finding such short unique substrings in the genomes of human or mouse by chance is extremely small. We derive an analytical expression for the null distribution of shortest unique substrings, given the GC-content of the query sequences. Furthermore, we apply our method to rapidly detect unique genomic regions in the genome of Staphylococcus aureus strain MSSA476 compared to four other staphylococcal genomes. Conclusion We combine a method to rapidly search for shortest unique substrings in DNA sequences and a derivation of their null distribution. We show that unique regions in an arbitrary sample of genomes can be efficiently detected with this method. The corresponding programs shustring (SHortest Unique subSTRING and shulen are written in C and available at http://adenine.biz.fh-weihenstephan.de/shustring/.

  8. Alignment of capillary electrophoresis-mass spectrometry datasets using accurate mass information.

    Science.gov (United States)

    Nevedomskaya, Ekaterina; Derks, Rico; Deelder, André M; Mayboroda, Oleg A; Palmblad, Magnus

    2009-12-01

    Capillary electrophoresis-mass spectrometry (CE-MS) is a powerful technique for the analysis of small soluble compounds in biological fluids. A major drawback of CE is the poor migration time reproducibility, which makes it difficult to combine data from different experiments and correctly assign compounds. A number of alignment algorithms have been developed but not all of them can cope with large and irregular time shifts between CE-MS runs. Here we present a genetic algorithm designed for alignment of CE-MS data using accurate mass information. The utility of the algorithm was demonstrated on real data, and the results were compared with one of the existing packages. The new algorithm showed a significant reduction of elution time variation in the aligned datasets. The importance of mass accuracy for the performance of the algorithm was also demonstrated by comparing alignments of datasets from a standard time-of-flight (TOF) instrument with those from the new ultrahigh resolution TOF maXis (Bruker Daltonics).

  9. IVA: accurate de novo assembly of RNA virus genomes.

    Science.gov (United States)

    Hunt, Martin; Gall, Astrid; Ong, Swee Hoe; Brener, Jacqui; Ferns, Bridget; Goulder, Philip; Nastouli, Eleni; Keane, Jacqueline A; Kellam, Paul; Otto, Thomas D

    2015-07-15

    An accurate genome assembly from short read sequencing data is critical for downstream analysis, for example allowing investigation of variants within a sequenced population. However, assembling sequencing data from virus samples, especially RNA viruses, into a genome sequence is challenging due to the combination of viral population diversity and extremely uneven read depth caused by amplification bias in the inevitable reverse transcription and polymerase chain reaction amplification process of current methods. We developed a new de novo assembler called IVA (Iterative Virus Assembler) designed specifically for read pairs sequenced at highly variable depth from RNA virus samples. We tested IVA on datasets from 140 sequenced samples from human immunodeficiency virus-1 or influenza-virus-infected people and demonstrated that IVA outperforms all other virus de novo assemblers. The software runs under Linux, has the GPLv3 licence and is freely available from http://sanger-pathogens.github.io/iva © The Author 2015. Published by Oxford University Press.

  10. Prokaryotic Phylogeny Based on Complete Genomes Without Sequence Alignment

    Science.gov (United States)

    Hao, Bailin; Qi, Ji; Wang, Bin

    2003-04-01

    This is a brief review of a series of on-going work on bacterial phylogeny. We have proposed a new method to infer relatedness of prokaryotes from their complete genome data without using sequence alignment. It has led to results comparable with the bacteriologists' systematics as reflected in the latest 2001 edition of the Bergey's Manual of Systematic Bacteriology1. In what follows we only touch on the mathematical aspects of the method. The biological implications of our results will be published elsewhere.

  11. Volume visualization of multiple alignment of large genomicDNA

    Energy Technology Data Exchange (ETDEWEB)

    Shah, Nameeta; Dillard, Scott E.; Weber, Gunther H.; Hamann, Bernd

    2005-07-25

    Genomes of hundreds of species have been sequenced to date, and many more are being sequenced. As more and more sequence data sets become available, and as the challenge of comparing these massive ''billion basepair DNA sequences'' becomes substantial, so does the need for more powerful tools supporting the exploration of these data sets. Similarity score data used to compare aligned DNA sequences is inherently one-dimensional. One-dimensional (1D) representations of these data sets do not effectively utilize screen real estate. As a result, tools using 1D representations are incapable of providing informatory overview for extremely large data sets. We present a technique to arrange 1D data in 3D space to allow us to apply state-of-the-art interactive volume visualization techniques for data exploration. We demonstrate our technique using multi-millions-basepair-long aligned DNA sequence data and compare it with traditional 1D line plots. The results show that our technique is superior in providing an overview of entire data sets. Our technique, coupled with 1D line plots, results in effective multi-resolution visualization of very large aligned sequence data sets.

  12. SMETANA: Accurate and Scalable Algorithm for Probabilistic Alignment of Large-Scale Biological Networks: e67995

    National Research Council Canada - National Science Library

    Sayed Mohammad Ebrahim Sahraeian; Byung-Jun Yoon

    2013-01-01

    .... We demonstrate that the proposed algorithm, called SMETANA, outperforms many state-of-the-art network alignment techniques, in terms of computational efficiency, alignment accuracy, and scalability...

  13. SMETANA: accurate and scalable algorithm for probabilistic alignment of large-scale biological networks

    National Research Council Canada - National Science Library

    Sahraeian, Sayed Mohammad Ebrahim; Yoon, Byung-Jun

    2013-01-01

    .... We demonstrate that the proposed algorithm, called SMETANA, outperforms many state-of-the-art network alignment techniques, in terms of computational efficiency, alignment accuracy, and scalability...

  14. Automated whole-genome multiple alignment of rat, mouse, and human

    Energy Technology Data Exchange (ETDEWEB)

    Brudno, Michael; Poliakov, Alexander; Salamov, Asaf; Cooper, Gregory M.; Sidow, Arend; Rubin, Edward M.; Solovyev, Victor; Batzoglou, Serafim; Dubchak, Inna

    2004-07-04

    We have built a whole genome multiple alignment of the three currently available mammalian genomes using a fully automated pipeline which combines the local/global approach of the Berkeley Genome Pipeline and the LAGAN program. The strategy is based on progressive alignment, and consists of two main steps: (1) alignment of the mouse and rat genomes; and (2) alignment of human to either the mouse-rat alignments from step 1, or the remaining unaligned mouse and rat sequences. The resulting alignments demonstrate high sensitivity, with 87% of all human gene-coding areas aligned in both mouse and rat. The specificity is also high: <7% of the rat contigs are aligned to multiple places in human and 97% of all alignments with human sequence > 100kb agree with a three-way synteny map built independently using predicted exons in the three genomes. At the nucleotide level <1% of the rat nucleotides are mapped to multiple places in the human sequence in the alignment; and 96.5% of human nucleotides within all alignments agree with the synteny map. The alignments are publicly available online, with visualization through the novel Multi-VISTA browser that we also present.

  15. FAMSA: Fast and accurate multiple sequence alignment of huge protein families

    Science.gov (United States)

    Deorowicz, Sebastian; Debudaj-Grabysz, Agnieszka; Gudyś, Adam

    2016-01-01

    Rapid development of modern sequencing platforms has contributed to the unprecedented growth of protein families databases. The abundance of sets containing hundreds of thousands of sequences is a formidable challenge for multiple sequence alignment algorithms. The article introduces FAMSA, a new progressive algorithm designed for fast and accurate alignment of thousands of protein sequences. Its features include the utilization of the longest common subsequence measure for determining pairwise similarities, a novel method of evaluating gap costs, and a new iterative refinement scheme. What matters is that its implementation is highly optimized and parallelized to make the most of modern computer platforms. Thanks to the above, quality indicators, i.e. sum-of-pairs and total-column scores, show FAMSA to be superior to competing algorithms, such as Clustal Omega or MAFFT for datasets exceeding a few thousand sequences. Quality does not compromise on time or memory requirements, which are an order of magnitude lower than those in the existing solutions. For example, a family of 415519 sequences was analyzed in less than two hours and required no more than 8 GB of RAM. FAMSA is available for free at http://sun.aei.polsl.pl/REFRESH/famsa. PMID:27670777

  16. Analysis of chimpanzee history based on genome sequence alignments.

    Directory of Open Access Journals (Sweden)

    Jennifer L Caswell

    2008-04-01

    Full Text Available Population geneticists often study small numbers of carefully chosen loci, but it has become possible to obtain orders of magnitude for more data from overlaps of genome sequences. Here, we generate tens of millions of base pairs of multiple sequence alignments from combinations of three western chimpanzees, three central chimpanzees, an eastern chimpanzee, a bonobo, a human, an orangutan, and a macaque. Analysis provides a more precise understanding of demographic history than was previously available. We show that bonobos and common chimpanzees were separated approximately 1,290,000 years ago, western and other common chimpanzees approximately 510,000 years ago, and eastern and central chimpanzees at least 50,000 years ago. We infer that the central chimpanzee population size increased by at least a factor of 4 since its separation from western chimpanzees, while the western chimpanzee effective population size decreased. Surprisingly, in about one percent of the genome, the genetic relationships between humans, chimpanzees, and bonobos appear to be different from the species relationships. We used PCR-based resequencing to confirm 11 regions where chimpanzees and bonobos are not most closely related. Study of such loci should provide information about the period of time 5-7 million years ago when the ancestors of humans separated from those of the chimpanzees.

  17. Implicit Hitting Set Problems and Multi-genome Alignment

    Science.gov (United States)

    Karp, Richard M.

    Let U be a finite set and S a family of subsets of U. Define a hitting set as a subset of U that intersects every element of S. The optimal hitting set problem is: given a positive weight for each element of U, find a hitting set of minimum total weight. This problem is equivalent to the classic weighted set cover problem.We consider the optimal hitting set problem in the case where the set system S is not explicitly given, but there is an oracle that will supply members of S satisfying certain conditions; for example, we might ask the oracle for a minimum-cardinality set in S that is disjoint from a given set Q. The problems of finding a minimum feedback arc set or minimum feedback vertex set in a digraph are examples of implicit hitting set problems. Our interest is in the number of oracle queries required to find an optimal hitting set. After presenting some generic algorithms for this problem we focus on our computational experience with an implicit hitting set problem related to multi-genome alignment in genomics. This is joint work with Erick Moreno Centeno.

  18. Genomic divergences among cattle, dog and human estimated from large-scale alignments of genomic sequences

    Directory of Open Access Journals (Sweden)

    Shade Larry L

    2006-06-01

    Full Text Available Abstract Background Approximately 11 Mb of finished high quality genomic sequences were sampled from cattle, dog and human to estimate genomic divergences and their regional variation among these lineages. Results Optimal three-way multi-species global sequence alignments for 84 cattle clones or loci (each >50 kb of genomic sequence were constructed using the human and dog genome assemblies as references. Genomic divergences and substitution rates were examined for each clone and for various sequence classes under different functional constraints. Analysis of these alignments revealed that the overall genomic divergences are relatively constant (0.32–0.37 change/site for pairwise comparisons among cattle, dog and human; however substitution rates vary across genomic regions and among different sequence classes. A neutral mutation rate (2.0–2.2 × 10(-9 change/site/year was derived from ancestral repetitive sequences, whereas the substitution rate in coding sequences (1.1 × 10(-9 change/site/year was approximately half of the overall rate (1.9–2.0 × 10(-9 change/site/year. Relative rate tests also indicated that cattle have a significantly faster rate of substitution as compared to dog and that this difference is about 6%. Conclusion This analysis provides a large-scale and unbiased assessment of genomic divergences and regional variation of substitution rates among cattle, dog and human. It is expected that these data will serve as a baseline for future mammalian molecular evolution studies.

  19. Multiple genome alignment for identifying the core structure among moderately related microbial genomes.

    Science.gov (United States)

    Uchiyama, Ikuo

    2008-10-31

    Identifying the set of intrinsically conserved genes, or the genomic core, among related genomes is crucial for understanding prokaryotic genomes where horizontal gene transfers are common. Although core genome identification appears to be obvious among very closely related genomes, it becomes more difficult when more distantly related genomes are compared. Here, we consider the core structure as a set of sufficiently long segments in which gene orders are conserved so that they are likely to have been inherited mainly through vertical transfer, and developed a method for identifying the core structure by finding the order of pre-identified orthologous groups (OGs) that maximally retains the conserved gene orders. The method was applied to genome comparisons of two well-characterized families, Bacillaceae and Enterobacteriaceae, and identified their core structures comprising 1438 and 2125 OGs, respectively. The core sets contained most of the essential genes and their related genes, which were primarily included in the intersection of the two core sets comprising around 700 OGs. The definition of the genomic core based on gene order conservation was demonstrated to be more robust than the simpler approach based only on gene conservation. We also investigated the core structures in terms of G+C content homogeneity and phylogenetic congruence, and found that the core genes primarily exhibited the expected characteristic, i.e., being indigenous and sharing the same history, more than the non-core genes. The results demonstrate that our strategy of genome alignment based on gene order conservation can provide an effective approach to identify the genomic core among moderately related microbial genomes.

  20. Multiple genome alignment for identifying the core structure among moderately related microbial genomes

    Directory of Open Access Journals (Sweden)

    Uchiyama Ikuo

    2008-10-01

    Full Text Available Abstract Background Identifying the set of intrinsically conserved genes, or the genomic core, among related genomes is crucial for understanding prokaryotic genomes where horizontal gene transfers are common. Although core genome identification appears to be obvious among very closely related genomes, it becomes more difficult when more distantly related genomes are compared. Here, we consider the core structure as a set of sufficiently long segments in which gene orders are conserved so that they are likely to have been inherited mainly through vertical transfer, and developed a method for identifying the core structure by finding the order of pre-identified orthologous groups (OGs that maximally retains the conserved gene orders. Results The method was applied to genome comparisons of two well-characterized families, Bacillaceae and Enterobacteriaceae, and identified their core structures comprising 1438 and 2125 OGs, respectively. The core sets contained most of the essential genes and their related genes, which were primarily included in the intersection of the two core sets comprising around 700 OGs. The definition of the genomic core based on gene order conservation was demonstrated to be more robust than the simpler approach based only on gene conservation. We also investigated the core structures in terms of G+C content homogeneity and phylogenetic congruence, and found that the core genes primarily exhibited the expected characteristic, i.e., being indigenous and sharing the same history, more than the non-core genes. Conclusion The results demonstrate that our strategy of genome alignment based on gene order conservation can provide an effective approach to identify the genomic core among moderately related microbial genomes.

  1. Constrained-DFT method for accurate energy-level alignment of metal/molecule interfaces

    KAUST Repository

    Souza, A. M.

    2013-10-07

    We present a computational scheme for extracting the energy-level alignment of a metal/molecule interface, based on constrained density functional theory and local exchange and correlation functionals. The method, applied here to benzene on Li(100), allows us to evaluate charge-transfer energies, as well as the spatial distribution of the image charge induced on the metal surface. We systematically study the energies for charge transfer from the molecule to the substrate as function of the molecule-substrate distance, and investigate the effects arising from image-charge confinement and local charge neutrality violation. For benzene on Li(100) we find that the image-charge plane is located at about 1.8 Å above the Li surface, and that our calculated charge-transfer energies compare perfectly with those obtained with a classical electrostatic model having the image plane located at the same position. The methodology outlined here can be applied to study any metal/organic interface in the weak coupling limit at the computational cost of a total energy calculation. Most importantly, as the scheme is based on total energies and not on correcting the Kohn-Sham quasiparticle spectrum, accurate results can be obtained with local/semilocal exchange and correlation functionals. This enables a systematic approach to convergence.

  2. Whole genome phylogeny of Prochlorococcus marinus group of cyanobacteria: genome alignment and overlapping gene approach.

    Science.gov (United States)

    Prabha, Ratna; Singh, Dhananjaya P; Gupta, Shailendra K; Rai, Anil

    2014-06-01

    Prochlorococcus is the smallest known oxygenic phototrophic marine cyanobacterium dominating the mid-latitude oceans. Physiologically and genetically distinct P. marinus isolates from many oceans in the world were assigned two different groups, a tightly clustered high-light (HL)-adapted and a divergent low-light (LL-) adapted clade. Phylogenetic analysis of this cyanobacterium on the basis of 16S rRNA and other conserved genes did not show consistency with its phenotypic behavior. We analyzed phylogeny of this genus on the basis of complete genome sequences through genome alignment, overlapping-gene content and gene-order approach. Phylogenetic tree of P. marinus obtained by comparing whole genome sequences in contrast to that based on 16S rRNA gene, corresponded well with the HL/LL ecotypic distinction of twelve strains and showed consistency with phenotypic classification of P. marinus. Evidence for the horizontal descent and acquisition of genes within and across the genus was observed. Many genes involved in metabolic functions were found to be conserved across these genomes and many were continuously gained by different strains as per their needs during the course of their evolution. Consistency in the physiological and genetic phylogeny based on whole genome sequence is established. These observations improve our understanding about the adaptation and diversification of these organisms under evolutionary pressure.

  3. OCPAT: an online codon-preserved alignment tool for evolutionary genomic analysis of protein coding sequences

    Directory of Open Access Journals (Sweden)

    Grossman Lawrence I

    2007-09-01

    Full Text Available Abstract Background Rapidly accumulating genome sequence data from multiple species offer powerful opportunities for the detection of DNA sequence evolution. Phylogenetic tree construction and codon-based tests for natural selection are the prevailing tools used to detect functionally important evolutionary change in protein coding sequences. These analyses often require multiple DNA sequence alignments that maintain the correct reading frame for each collection of putative orthologous sequences. Since this feature is not available in most alignment tools, codon reading frames often must be checked manually before evolutionary analyses can commence. Results Here we report an online codon-preserved alignment tool (OCPAT that generates multiple sequence alignments automatically from the coding sequences of any list of human gene IDs and their putative orthologs from genomes of other vertebrate tetrapods. OCPAT is programmed to extract putative orthologous genes from genomes and to align the orthologs with the reading frame maintained in all species. OCPAT also optimizes the alignment by trimming the most variable alignment regions at the 5' and 3' ends of each gene. The resulting output of alignments is returned in several formats, which facilitates further molecular evolutionary analyses by appropriate available software. Alignments are generally robust and reliable, retaining the correct reading frame. The tool can serve as the first step for comparative genomic analyses of protein-coding gene sequences including phylogenetic tree reconstruction and detection of natural selection. We aligned 20,658 human RefSeq mRNAs using OCPAT. Most alignments are missing sequence(s from at least one species; however, functional annotation clustering of the ~1700 transcripts that were alignable to all species shows that genes involved in multi-subunit protein complexes are highly conserved. Conclusion The OCPAT program facilitates large-scale evolutionary and

  4. Using a priori knowledge to align sequencing reads to their exact genomic position

    NARCIS (Netherlands)

    Böttcher, René; Amberg, Ronny; Ruzius, F P; Guryev, V; Verhaegh, Wim F J; Beyerlein, Peter; van der Zaag, P J

    2012-01-01

    The use of a priori knowledge in the alignment of targeted sequencing data is investigated using computational experiments. Adapting a Needleman-Wunsch algorithm to incorporate the genomic position information from the targeted capture, we demonstrate that alignment can be done to just the target re

  5. Adenoviral vector DNA for accurate genome editing with engineered nucleases.

    Science.gov (United States)

    Holkers, Maarten; Maggio, Ignazio; Henriques, Sara F D; Janssen, Josephine M; Cathomen, Toni; Gonçalves, Manuel A F V

    2014-10-01

    Engineered sequence-specific nucleases and donor DNA templates can be customized to edit mammalian genomes via the homologous recombination (HR) pathway. Here we report that the nature of the donor DNA greatly affects the specificity and accuracy of the editing process following site-specific genomic cleavage by transcription activator-like effector nucleases (TALENs) and clustered, regularly interspaced, short palindromic repeats (CRISPR)-Cas9 nucleases. By applying these designer nucleases together with donor DNA delivered as protein-capped adenoviral vector (AdV), free-ended integrase-defective lentiviral vector or nonviral vector templates, we found that the vast majority of AdV-modified human cells underwent scarless homology-directed genome editing. In contrast, a significant proportion of cells exposed to free-ended or to covalently closed HR substrates were subjected to random and illegitimate recombination events. These findings are particularly relevant for genome engineering approaches aiming at high-fidelity genetic modification of human cells.

  6. Multiple Whole Genome Alignments and Novel Biomedical Applicationsat the VISTA Portal

    Energy Technology Data Exchange (ETDEWEB)

    Brudno, Michael; Poliakov, Alexander; Minovitsky, Simon; Ratnere,Igor; Dubchak, Inna

    2007-02-01

    The VISTA portal for comparative genomics is designed togive biomedical scientists a unified set of tools to lead them from theraw DNA sequences through the alignment and annotation to thevisualization of the results. The VISTA portal also hosts alignments of anumber of genomes computed by our group, allowing users to study regionsof their interest without having to manually download the individualsequences. Here we describe various algorithmic and functionalimprovements implemented in the VISTA portal over the last two years. TheVISTA Portal is accessible at http://genome.lbl.gov/vista.

  7. Abundance of ultramicro inversions within local alignments between human and chimpanzee genomes

    Directory of Open Access Journals (Sweden)

    Hara Yuichiro

    2011-10-01

    Full Text Available Abstract Background Chromosomal inversion is one of the most important mechanisms of evolution. Recent studies of comparative genomics have revealed that chromosomal inversions are abundant in the human genome. While such previously characterized inversions are large enough to be identified as a single alignment or a string of local alignments, the impact of ultramicro inversions, which are such short that the local alignments completely cover them, on evolution is still uncertain. Results In this study, we developed a method for identifying ultramicro inversions by scanning of local alignments. This technique achieved a high sensitivity and a very low rate of false positives. We identified 2,377 ultramicro inversions ranging from five to 125 bp within the orthologous alignments between the human and chimpanzee genomes. The false positive rate was estimated to be around 4%. Based on phylogenetic profiles using the primate outgroups, 479 ultramicro inversions were inferred to have specifically inverted in the human lineage. Ultramicro inversions exclusively involving adenine and thymine were the most frequent; 461 inversions (19.4% of the total. Furthermore, the density of ultramicro inversions in chromosome Y and the neighborhoods of transposable elements was higher than average. Sixty-five ultramicro inversions were identified within the exons of human protein-coding genes. Conclusions We defined ultramicro inversions as the inverted regions equal to or smaller than 125 bp buried within local alignments. Our observations suggest that ultramicro inversions are abundant among the human and chimpanzee genomes, and that location of the inversions correlated with the genome structural instability. Some of the ultramicro inversions may contribute to gene evolution. Our inversion-identification method is also applicable in the fine-tuning of genome alignments by distinguishing ultramicro inversions from nucleotide substitutions and indels.

  8. Fast comparison of genomic and meta-genomic reads with alignment-free measures based on quality values.

    Science.gov (United States)

    Comin, Matteo; Schimd, Michele

    2016-08-12

    Sequencing technologies are generating enormous amounts of read data, however assembly of genomes and metagenomes remain among the most challenging tasks. In this paper we study the comparison of genomes and metagenomes only based on read data, using word counts statistics called alignment-free thus not requiring reference genomes or assemblies. Quality scores produced by sequencing platforms are fundamental for various analyses, moreover future-generation sequencing platforms, will produce longer reads but with error rate around 15 %. In this context it will be fundamental to exploit quality values information within the framework of alignment-free measures. In this paper we present a family of alignment-free measures, called d (q) -type, that are based on k-mer counts and quality values. These statistics can be used to compare genomes and metagenomes based on their read sets. Results show that the evolutionary relationship of genomes can be reconstructed based on the direct comparison of theirs reads sets. The use of quality values on average improves the classification accuracy, and its contribution increases when the reads are more noisy. Also the comparison of metagenomic microbial communities can be performed efficiently. Similar metagenomes are quickly detected, just by processing their read data, without the need of costly alignments.

  9. Measurement of word frequencies in genomic DNA sequences based on partial alignment and fuzzy set.

    Science.gov (United States)

    Shida, Fumiya; Mizuta, Satoshi

    2014-08-01

    Accompanied with the rapid increase of the amount of data registered in the databases of biological sequences, the need for a fast method of sequence comparison applicable to sequences of large size is also increasing. In general, alignment is used for sequence comparison. However, the alignment may not be appropriate for comparison of sequences of large size such as whole genome sequences due to its large time complexity. In this article, we propose a semi alignment-free method of sequence comparison based on word frequency distributions, in which we partially use the alignment to measure word frequencies along with the idea of fuzzy set theory. Experiments with ten bacterial genome sequences demonstrated that the fuzzy measurements has the effect that facilitates discrimination between close relatives and distant relatives.

  10. Accurate alignment of functional EPI data to anatomical MRI using a physics-based distortion model.

    Science.gov (United States)

    Studholme, C; Constable, R T; Duncan, J S

    2000-11-01

    Mapping of functional magnetic resonance imaging (fMRI) to conventional anatomical MRI is a valuable step in the interpretation of fMRI activations. One of the main limits on the accuracy of this alignment arises from differences in the geometric distortion induced by magnetic field inhomogeneity. This paper describes an approach to the registration of echo planar image (EPI) data to conventional anatomical images which takes into account this difference in geometric distortion. We make use of an additional spin echo EPI image and use the known signal conservation in spin echo distortion to derive a specialized multimodality nonrigid registration algorithm. We also examine a plausible modification using log-intensity evaluation of the criterion to provide increased sensitivity in areas of low EPI signal. A phantom-based imaging experiment is used to evaluate the behavior of the different criteria, comparing nonrigid displacement estimates to those provided by a imagnetic field mapping acquisition. The algorithm is then applied to a range of nine brain imaging studies illustrating global and local improvement in the anatomical alignment and localization of fMRI activations.

  11. Evaluating alignment and variant-calling software for mutation identification in C. elegans by whole-genome sequencing.

    Science.gov (United States)

    Smith, Harold E; Yun, Sijung

    2017-01-01

    Whole-genome sequencing is a powerful tool for analyzing genetic variation on a global scale. One particularly useful application is the identification of mutations obtained by classical phenotypic screens in model species. Sequence data from the mutant strain is aligned to the reference genome, and then variants are called to generate a list of candidate alleles. A number of software pipelines for mutation identification have been targeted to C. elegans, with particular emphasis on ease of use, incorporation of mapping strain data, subtraction of background variants, and similar criteria. Although success is predicated upon the sensitive and accurate detection of candidate alleles, relatively little effort has been invested in evaluating the underlying software components that are required for mutation identification. Therefore, we have benchmarked a number of commonly used tools for sequence alignment and variant calling, in all pair-wise combinations, against both simulated and actual datasets. We compared the accuracy of those pipelines for mutation identification in C. elegans, and found that the combination of BBMap for alignment plus FreeBayes for variant calling offers the most robust performance.

  12. Base-By-Base: Single nucleotide-level analysis of whole viral genome alignments

    Directory of Open Access Journals (Sweden)

    Tcherepanov Vasily

    2004-07-01

    Full Text Available Abstract Background With ever increasing numbers of closely related virus genomes being sequenced, it has become desirable to be able to compare two genomes at a level more detailed than gene content because two strains of an organism may share the same set of predicted genes but still differ in their pathogenicity profiles. For example, detailed comparison of multiple isolates of the smallpox virus genome (each approximately 200 kb, with 200 genes is not feasible without new bioinformatics tools. Results A software package, Base-By-Base, has been developed that provides visualization tools to enable researchers to 1 rapidly identify and correct alignment errors in large, multiple genome alignments; and 2 generate tabular and graphical output of differences between the genomes at the nucleotide level. Base-By-Base uses detailed annotation information about the aligned genomes and can list each predicted gene with nucleotide differences, display whether variations occur within promoter regions or coding regions and whether these changes result in amino acid substitutions. Base-By-Base can connect to our mySQL database (Virus Orthologous Clusters; VOCs to retrieve detailed annotation information about the aligned genomes or use information from text files. Conclusion Base-By-Base enables users to quickly and easily compare large viral genomes; it highlights small differences that may be responsible for important phenotypic differences such as virulence. It is available via the Internet using Java Web Start and runs on Macintosh, PC and Linux operating systems with the Java 1.4 virtual machine.

  13. TMO: time and memory optimized algorithm applicable for more accurate alignment of trinucleotide repeat disorders associated genes

    Directory of Open Access Journals (Sweden)

    Done Stojanov

    2016-03-01

    Full Text Available In this study, time and memory optimized (TMO algorithm is presented. Compared with Smith–Waterman's algorithm, TMO is applicable for a more accurate detection of continuous insertion/deletions (indels in genes’ fragments, associated with disorders caused by over-repetition of a certain codon. The improvement comes from the tendency to pinpoint indels in the least preserved nucleotide pairs. All nucleotide pairs that occur less frequently are classified as less preserved and they are considered as mutated codons whose mid-nucleotides were deleted. Other benefit of the proposed algorithm is its general tendency to maximize the number of matching nucleotides included per alignment, regardless of any specific alignment metrics. Since the structure of the solution, when applying Smith–Waterman, depends on the adjustment of the alignment parameters and, therefore, an incomplete (shortened solution may be derived, our algorithm does not reject any of the consistent matching nucleotides that can be included in the final solution. In terms of computational aspects, our algorithm runs faster than Smith–Waterman for very similar DNA and requires less memory than the most memory efficient dynamic programming algorithms. The speed up comes from the reduced number of nucleotide comparisons that have to be performed, without having to imperil the completeness of the solution. Due to the fact that four integers (16 Bytes are required for tracking matching fragment, regardless its length, our algorithm requires less memory than Huang's algorithm.

  14. Accurate determination of DNA yield from individual mosquitoes for population genomic applications

    Institute of Scientific and Technical Information of China (English)

    Craig S.Wilding; D.Weetman; K.Steen; M.J.Donnelly

    2009-01-01

    Accurate estimates of DNA quantity are likely to become increasingly important for successful genomic screening of insect populations via recently developed, highly multiplexed genotyping assays and high-throughput sequencing methods. Here we show that genomic DNA extractions from single Anopheles gambiae Giles using a standard commercial kit-based methodology yield extracts with concentrations below the linear range of spectrophotometric absorbance at 260 nm. Concentrations determined by spectrophotometry were not reproducible, and are therefore neither accurate nor reliable. However,DNA quantification using a fluorescent nucleic acid stain (PicoGreenR) gave highly reproducible concentration estimates, and indicated that, on average, single mosquitoes yielded approximately 300 ng of DNA. Such a total yield is currently insufficient for many highthroughput genome screening applications, necessitating whole genome amplification of all or most individuals in a population prior to genotyping.

  15. De Novo DNA Assembly with a Genetic Algorithm Finds Accurate Genomes Even with Suboptimal Fitness

    NARCIS (Netherlands)

    Bucur, Doina; Squillero, Giovanni; Sim, Kevin

    We design an evolutionary heuristic for the combinatorial problem of de-novo DNA assembly with short, overlapping, accurately sequenced single DNA reads of uniform length, from both strands of a genome without long repeated sequences. The representation of a candidate solution is a novel segmented

  16. Tools for Accurate and Efficient Analysis of Complex Evolutionary Mechanisms in Microbial Genomes. Final Report

    Energy Technology Data Exchange (ETDEWEB)

    Nakhleh, Luay

    2014-03-12

    I proposed to develop computationally efficient tools for accurate detection and reconstruction of microbes' complex evolutionary mechanisms, thus enabling rapid and accurate annotation, analysis and understanding of their genomes. To achieve this goal, I proposed to address three aspects. (1) Mathematical modeling. A major challenge facing the accurate detection of HGT is that of distinguishing between these two events on the one hand and other events that have similar "effects." I proposed to develop a novel mathematical approach for distinguishing among these events. Further, I proposed to develop a set of novel optimization criteria for the evolutionary analysis of microbial genomes in the presence of these complex evolutionary events. (2) Algorithm design. In this aspect of the project, I proposed to develop an array of e cient and accurate algorithms for analyzing microbial genomes based on the formulated optimization criteria. Further, I proposed to test the viability of the criteria and the accuracy of the algorithms in an experimental setting using both synthetic as well as biological data. (3) Software development. I proposed the nal outcome to be a suite of software tools which implements the mathematical models as well as the algorithms developed.

  17. A powerful test of independent assortment that determines genome-wide significance quickly and accurately.

    Science.gov (United States)

    Stewart, W C L; Hager, V R

    2016-08-01

    In the analysis of DNA sequences on related individuals, most methods strive to incorporate as much information as possible, with little or no attention paid to the issue of statistical significance. For example, a modern workstation can easily handle the computations needed to perform a large-scale genome-wide inheritance-by-descent (IBD) scan, but accurate assessment of the significance of that scan is often hindered by inaccurate approximations and computationally intensive simulation. To address these issues, we developed gLOD-a test of co-segregation that, for large samples, models chromosome-specific IBD statistics as a collection of stationary Gaussian processes. With this simple model, the parametric bootstrap yields an accurate and rapid assessment of significance-the genome-wide corrected P-value. Furthermore, we show that (i) under the null hypothesis, the limiting distribution of the gLOD is the standard Gumbel distribution; (ii) our parametric bootstrap simulator is approximately 40 000 times faster than gene-dropping methods, and it is more powerful than methods that approximate the adjusted P-value; and, (iii) the gLOD has the same statistical power as the widely used maximum Kong and Cox LOD. Thus, our approach gives researchers the ability to determine quickly and accurately the significance of most large-scale IBD scans, which may contain multiple traits, thousands of families and tens of thousands of DNA sequences.

  18. Genomic Signal Processing Methods for Computation of Alignment-Free Distances from DNA Sequences

    Science.gov (United States)

    Borrayo, Ernesto; Mendizabal-Ruiz, E. Gerardo; Vélez-Pérez, Hugo; Romo-Vázquez, Rebeca; Mendizabal, Adriana P.; Morales, J. Alejandro

    2014-01-01

    Genomic signal processing (GSP) refers to the use of digital signal processing (DSP) tools for analyzing genomic data such as DNA sequences. A possible application of GSP that has not been fully explored is the computation of the distance between a pair of sequences. In this work we present GAFD, a novel GSP alignment-free distance computation method. We introduce a DNA sequence-to-signal mapping function based on the employment of doublet values, which increases the number of possible amplitude values for the generated signal. Additionally, we explore the use of three DSP distance metrics as descriptors for categorizing DNA signal fragments. Our results indicate the feasibility of employing GAFD for computing sequence distances and the use of descriptors for characterizing DNA fragments. PMID:25393409

  19. Genomic signal processing methods for computation of alignment-free distances from DNA sequences.

    Science.gov (United States)

    Borrayo, Ernesto; Mendizabal-Ruiz, E Gerardo; Vélez-Pérez, Hugo; Romo-Vázquez, Rebeca; Mendizabal, Adriana P; Morales, J Alejandro

    2014-01-01

    Genomic signal processing (GSP) refers to the use of digital signal processing (DSP) tools for analyzing genomic data such as DNA sequences. A possible application of GSP that has not been fully explored is the computation of the distance between a pair of sequences. In this work we present GAFD, a novel GSP alignment-free distance computation method. We introduce a DNA sequence-to-signal mapping function based on the employment of doublet values, which increases the number of possible amplitude values for the generated signal. Additionally, we explore the use of three DSP distance metrics as descriptors for categorizing DNA signal fragments. Our results indicate the feasibility of employing GAFD for computing sequence distances and the use of descriptors for characterizing DNA fragments.

  20. READSCAN: A fast and scalable pathogen discovery program with accurate genome relative abundance estimation

    KAUST Repository

    Naeem, Raeece

    2012-11-28

    Summary: READSCAN is a highly scalable parallel program to identify non-host sequences (of potential pathogen origin) and estimate their genome relative abundance in high-throughput sequence datasets. READSCAN accurately classified human and viral sequences on a 20.1 million reads simulated dataset in <27 min using a small Beowulf compute cluster with 16 nodes (Supplementary Material). Availability: http://cbrc.kaust.edu.sa/readscan Contact: or raeece.naeem@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online. 2012 The Author(s).

  1. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome

    Directory of Open Access Journals (Sweden)

    Dewey Colin N

    2011-08-01

    Full Text Available Abstract Background RNA-Seq is revolutionizing the way transcript abundances are measured. A key challenge in transcript quantification from RNA-Seq data is the handling of reads that map to multiple genes or isoforms. This issue is particularly important for quantification with de novo transcriptome assemblies in the absence of sequenced genomes, as it is difficult to determine which transcripts are isoforms of the same gene. A second significant issue is the design of RNA-Seq experiments, in terms of the number of reads, read length, and whether reads come from one or both ends of cDNA fragments. Results We present RSEM, an user-friendly software package for quantifying gene and isoform abundances from single-end or paired-end RNA-Seq data. RSEM outputs abundance estimates, 95% credibility intervals, and visualization files and can also simulate RNA-Seq data. In contrast to other existing tools, the software does not require a reference genome. Thus, in combination with a de novo transcriptome assembler, RSEM enables accurate transcript quantification for species without sequenced genomes. On simulated and real data sets, RSEM has superior or comparable performance to quantification methods that rely on a reference genome. Taking advantage of RSEM's ability to effectively use ambiguously-mapping reads, we show that accurate gene-level abundance estimates are best obtained with large numbers of short single-end reads. On the other hand, estimates of the relative frequencies of isoforms within single genes may be improved through the use of paired-end reads, depending on the number of possible splice forms for each gene. Conclusions RSEM is an accurate and user-friendly software tool for quantifying transcript abundances from RNA-Seq data. As it does not rely on the existence of a reference genome, it is particularly useful for quantification with de novo transcriptome assemblies. In addition, RSEM has enabled valuable guidance for cost

  2. PSI/TM-Coffee: a web server for fast and accurate multiple sequence alignments of regular and transmembrane proteins using homology extension on reduced databases.

    Science.gov (United States)

    Floden, Evan W; Tommaso, Paolo D; Chatzou, Maria; Magis, Cedrik; Notredame, Cedric; Chang, Jia-Ming

    2016-07-08

    The PSI/TM-Coffee web server performs multiple sequence alignment (MSA) of proteins by combining homology extension with a consistency based alignment approach. Homology extension is performed with Position Specific Iterative (PSI) BLAST searches against a choice of redundant and non-redundant databases. The main novelty of this server is to allow databases of reduced complexity to rapidly perform homology extension. This server also gives the possibility to use transmembrane proteins (TMPs) reference databases to allow even faster homology extension on this important category of proteins. Aside from an MSA, the server also outputs topological prediction of TMPs using the HMMTOP algorithm. Previous benchmarking of the method has shown this approach outperforms the most accurate alignment methods such as MSAProbs, Kalign, PROMALS, MAFFT, ProbCons and PRALINE™. The web server is available at http://tcoffee.crg.cat/tmcoffee. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  3. Ultrasensitive single-genome sequencing: accurate, targeted, next generation sequencing of HIV-1 RNA.

    Science.gov (United States)

    Boltz, Valerie F; Rausch, Jason; Shao, Wei; Hattori, Junko; Luke, Brian; Maldarelli, Frank; Mellors, John W; Kearney, Mary F; Coffin, John M

    2016-12-20

    Although next generation sequencing (NGS) offers the potential for studying virus populations in unprecedented depth, PCR error, amplification bias and recombination during library construction have limited its use to population sequencing and measurements of unlinked allele frequencies. Here we report a method, termed ultrasensitive Single-Genome Sequencing (uSGS), for NGS library construction and analysis that eliminates PCR errors and recombinants, and generates single-genome sequences of the same quality as the "gold-standard" of HIV-1 single-genome sequencing assay but with more than 100-fold greater depth. Primer ID tagged cDNA was synthesized from mixtures of cloned BH10 wild-type and mutant HIV-1 transcripts containing ten drug resistance mutations. First, the resultant cDNA was divided and NGS libraries were generated in parallel using two methods: uSGS and a method applying long PCR primers to attach the NGS adaptors (LP-PCR-1). Second, cDNA was divided and NGS libraries were generated in parallel comparing 3 methods: uSGS and 2 methods adapted from more recent reports using variations of the long PCR primers to attach the adaptors (LP-PCR-2 and LP-PCR-3). Consistently, the uSGS method amplified a greater proportion of cDNAs, averaging 30% compared to 13% for LP-PCR-1, 21% for LP-PCR-2 and 14% for LP-PCR-3. Most importantly, when the uSGS sequences were binned according to their primer IDs, 94% of the bins did not contain PCR recombinant sequences versus only 55, 75 and 65% for LP-PCR-1, 2 and 3, respectively. Finally, when uSGS was applied to plasma samples from HIV-1 infected donors, both frequent and rare variants were detected in each sample and neighbor-joining trees revealed clusters of genomes driven by the linkage of these mutations, showing the lack of PCR recombinants in the datasets. The uSGS assay can be used for accurate detection of rare variants and for identifying linkage of rare alleles associated with HIV-1 drug resistance. In addition

  4. Construction, alignment and analysis of twelve framework physical maps that represent the ten genome types of the genus Oryza.

    Science.gov (United States)

    Kim, HyeRan; Hurwitz, Bonnie; Yu, Yeisoo; Collura, Kristi; Gill, Navdeep; SanMiguel, Phillip; Mullikin, James C; Maher, Christopher; Nelson, William; Wissotski, Marina; Braidotti, Michele; Kudrna, David; Goicoechea, José Luis; Stein, Lincoln; Ware, Doreen; Jackson, Scott A; Soderlund, Carol; Wing, Rod A

    2008-01-01

    We describe the establishment and analysis of a genus-wide comparative framework composed of 12 bacterial artificial chromosome fingerprint and end-sequenced physical maps representing the 10 genome types of Oryza aligned to the O. sativa ssp. japonica reference genome sequence. Over 932 Mb of end sequence was analyzed for repeats, simple sequence repeats, miRNA and single nucleotide variations, providing the most extensive analysis of Oryza sequence to date.

  5. A variational Bayes algorithm for fast and accurate multiple locus genome-wide association analysis

    Directory of Open Access Journals (Sweden)

    Mezey Jason G

    2010-01-01

    Full Text Available Abstract Background The success achieved by genome-wide association (GWA studies in the identification of candidate loci for complex diseases has been accompanied by an inability to explain the bulk of heritability. Here, we describe the algorithm V-Bay, a variational Bayes algorithm for multiple locus GWA analysis, which is designed to identify weaker associations that may contribute to this missing heritability. Results V-Bay provides a novel solution to the computational scaling constraints of most multiple locus methods and can complete a simultaneous analysis of a million genetic markers in a few hours, when using a desktop. Using a range of simulated genetic and GWA experimental scenarios, we demonstrate that V-Bay is highly accurate, and reliably identifies associations that are too weak to be discovered by single-marker testing approaches. V-Bay can also outperform a multiple locus analysis method based on the lasso, which has similar scaling properties for large numbers of genetic markers. For demonstration purposes, we also use V-Bay to confirm associations with gene expression in cell lines derived from the Phase II individuals of HapMap. Conclusions V-Bay is a versatile, fast, and accurate multiple locus GWA analysis tool for the practitioner interested in identifying weaker associations without high false positive rates.

  6. CUSHAW: a CUDA compatible short read aligner to large genomes based on the Burrows-Wheeler transform.

    Science.gov (United States)

    Liu, Yongchao; Schmidt, Bertil; Maskell, Douglas L

    2012-07-15

    New high-throughput sequencing technologies have promoted the production of short reads with dramatically low unit cost. The explosive growth of short read datasets poses a challenge to the mapping of short reads to reference genomes, such as the human genome, in terms of alignment quality and execution speed. We present CUSHAW, a parallelized short read aligner based on the compute unified device architecture (CUDA) parallel programming model. We exploit CUDA-compatible graphics hardware as accelerators to achieve fast speed. Our algorithm uses a quality-aware bounded search approach based on the Burrows-Wheeler transform (BWT) and the Ferragina-Manzini index to reduce the search space and achieve high alignment quality. Performance evaluation, using simulated as well as real short read datasets, reveals that our algorithm running on one or two graphics processing units achieves significant speedups in terms of execution time, while yielding comparable or even better alignment quality for paired-end alignments compared with three popular BWT-based aligners: Bowtie, BWA and SOAP2. CUSHAW also delivers competitive performance in terms of single-nucleotide polymorphism calling for an Escherichia coli test dataset. http://cushaw.sourceforge.net

  7. Alignment of leading-edge and peak-picking time of arrival methods to obtain accurate source locations

    Energy Technology Data Exchange (ETDEWEB)

    Roussel-Dupre, R.; Symbalisty, E.; Fox, C.; and Vanderlinde, O.

    2009-08-01

    The location of a radiating source can be determined by time-tagging the arrival of the radiated signal at a network of spatially distributed sensors. The accuracy of this approach depends strongly on the particular time-tagging algorithm employed at each of the sensors. If different techniques are used across the network, then the time tags must be referenced to a common fiducial for maximum location accuracy. In this report we derive the time corrections needed to temporally align leading-edge, time-tagging techniques with peak-picking algorithms. We focus on broadband radio frequency (RF) sources, an ionospheric propagation channel, and narrowband receivers, but the final results can be generalized to apply to any source, propagation environment, and sensor. Our analytic results are checked against numerical simulations for a number of representative cases and agree with the specific leading-edge algorithm studied independently by Kim and Eng (1995) and Pongratz (2005 and 2007).

  8. Rapid genome mapping in nanochannel arrays for highly complete and accurate de novo sequence assembly of the complex Aegilops tauschii genome.

    Science.gov (United States)

    Hastie, Alex R; Dong, Lingli; Smith, Alexis; Finklestein, Jeff; Lam, Ernest T; Huo, Naxin; Cao, Han; Kwok, Pui-Yan; Deal, Karin R; Dvorak, Jan; Luo, Ming-Cheng; Gu, Yong; Xiao, Ming

    2013-01-01

    Next-generation sequencing (NGS) technologies have enabled high-throughput and low-cost generation of sequence data; however, de novo genome assembly remains a great challenge, particularly for large genomes. NGS short reads are often insufficient to create large contigs that span repeat sequences and to facilitate unambiguous assembly. Plant genomes are notorious for containing high quantities of repetitive elements, which combined with huge genome sizes, makes accurate assembly of these large and complex genomes intractable thus far. Using two-color genome mapping of tiling bacterial artificial chromosomes (BAC) clones on nanochannel arrays, we completed high-confidence assembly of a 2.1-Mb, highly repetitive region in the large and complex genome of Aegilops tauschii, the D-genome donor of hexaploid wheat (Triticum aestivum). Genome mapping is based on direct visualization of sequence motifs on single DNA molecules hundreds of kilobases in length. With the genome map as a scaffold, we anchored unplaced sequence contigs, validated the initial draft assembly, and resolved instances of misassembly, some involving contigs assembly from 75% to 95% complete.

  9. Rapid genome mapping in nanochannel arrays for highly complete and accurate de novo sequence assembly of the complex Aegilops tauschii genome.

    Directory of Open Access Journals (Sweden)

    Alex R Hastie

    Full Text Available Next-generation sequencing (NGS technologies have enabled high-throughput and low-cost generation of sequence data; however, de novo genome assembly remains a great challenge, particularly for large genomes. NGS short reads are often insufficient to create large contigs that span repeat sequences and to facilitate unambiguous assembly. Plant genomes are notorious for containing high quantities of repetitive elements, which combined with huge genome sizes, makes accurate assembly of these large and complex genomes intractable thus far. Using two-color genome mapping of tiling bacterial artificial chromosomes (BAC clones on nanochannel arrays, we completed high-confidence assembly of a 2.1-Mb, highly repetitive region in the large and complex genome of Aegilops tauschii, the D-genome donor of hexaploid wheat (Triticum aestivum. Genome mapping is based on direct visualization of sequence motifs on single DNA molecules hundreds of kilobases in length. With the genome map as a scaffold, we anchored unplaced sequence contigs, validated the initial draft assembly, and resolved instances of misassembly, some involving contigs <2 kb long, to dramatically improve the assembly from 75% to 95% complete.

  10. Microarray MAPH: accurate array-based detection of relative copy number in genomic DNA

    Directory of Open Access Journals (Sweden)

    Chan Alan

    2006-06-01

    Full Text Available Abstract Background Current methods for measurement of copy number do not combine all the desirable qualities of convenience, throughput, economy, accuracy and resolution. In this study, to improve the throughput associated with Multiplex Amplifiable Probe Hybridisation (MAPH we aimed to develop a modification based on the 3-Dimensional, Flow-Through Microarray Platform from PamGene International. In this new method, electrophoretic analysis of amplified products is replaced with photometric analysis of a probed oligonucleotide array. Copy number analysis of hybridised probes is based on a dual-label approach by comparing the intensity of Cy3-labelled MAPH probes amplified from test samples co-hybridised with similarly amplified Cy5-labelled reference MAPH probes. The key feature of using a hybridisation-based end point with MAPH is that discrimination of amplified probes is based on sequence and not fragment length. Results In this study we showed that microarray MAPH measurement of PMP22 gene dosage correlates well with PMP22 gene dosage determined by capillary MAPH and that copy number was accurately reported in analyses of DNA from 38 individuals, 12 of which were known to have Charcot-Marie-Tooth disease type 1A (CMT1A. Conclusion Measurement of microarray-based endpoints for MAPH appears to be of comparable accuracy to electrophoretic methods, and holds the prospect of fully exploiting the potential multiplicity of MAPH. The technology has the potential to simplify copy number assays for genes with a large number of exons, or of expanded sets of probes from dispersed genomic locations.

  11. When whole-genome alignments just won't work: kSNP v2 software for alignment-free SNP discovery and phylogenetics of hundreds of microbial genomes.

    Science.gov (United States)

    Gardner, Shea N; Hall, Barry G

    2013-01-01

    Effective use of rapid and inexpensive whole genome sequencing for microbes requires fast, memory efficient bioinformatics tools for sequence comparison. The kSNP v2 software finds single nucleotide polymorphisms (SNPs) in whole genome data. kSNP v2 has numerous improvements over kSNP v1 including SNP gene annotation; better scaling for draft genomes available as assembled contigs or raw, unassembled reads; a tool to identify the optimal value of k; distribution of packages of executables for Linux and Mac OS X for ease of installation and user-friendly use; and a detailed User Guide. SNP discovery is based on k-mer analysis, and requires no multiple sequence alignment or the selection of a single reference genome. Most target sets with hundreds of genomes complete in minutes to hours. SNP phylogenies are built by maximum likelihood, parsimony, and distance, based on all SNPs, only core SNPs, or SNPs present in some intermediate user-specified fraction of targets. The SNP-based trees that result are consistent with known taxonomy. kSNP v2 can handle many gigabases of sequence in a single run, and if one or more annotated genomes are included in the target set, SNPs are annotated with protein coding and other information (UTRs, etc.) from Genbank file(s). We demonstrate application of kSNP v2 on sets of viral and bacterial genomes, and discuss in detail analysis of a set of 68 finished E. coli and Shigella genomes and a set of the same genomes to which have been added 47 assemblies and four "raw read" genomes of H104:H4 strains from the recent European E. coli outbreak that resulted in both bloody diarrhea and hemolytic uremic syndrome (HUS), and caused at least 50 deaths.

  12. Genome Alignment Spanning Major Poaceae Lineages Reveals Heterogeneous Evolutionary Rates and Alters Inferred Dates for Key Evolutionary Events.

    Science.gov (United States)

    Wang, Xiyin; Wang, Jingpeng; Jin, Dianchuan; Guo, Hui; Lee, Tae-Ho; Liu, Tao; Paterson, Andrew H

    2015-06-01

    Multiple comparisons among genomes can clarify their evolution, speciation, and functional innovations. To date, the genome sequences of eight grasses representing the most economically important Poaceae (grass) clades have been published, and their genomic-level comparison is an essential foundation for evolutionary, functional, and translational research. Using a formal and conservative approach, we aligned these genomes. Direct comparison of paralogous gene pairs all duplicated simultaneously reveal striking variation in evolutionary rates among whole genomes, with nucleotide substitution slowest in rice and up to 48% faster in other grasses, adding a new dimension to the value of rice as a grass model. We reconstructed ancestral genome contents for major evolutionary nodes, potentially contributing to understanding the divergence and speciation of grasses. Recent fossil evidence suggests revisions of the estimated dates of key evolutionary events, implying that the pan-grass polyploidization occurred ∼96 million years ago and could not be related to the Cretaceous-Tertiary mass extinction as previously inferred. Adjusted dating to reflect both updated fossil evidence and lineage-specific evolutionary rates suggested that maize subgenome divergence and maize-sorghum divergence were virtually simultaneous, a coincidence that would be explained if polyploidization directly contributed to speciation. This work lays a solid foundation for Poaceae translational genomics. Copyright © 2015 The Author. Published by Elsevier Inc. All rights reserved.

  13. C-Sibelia: an easy-to-use and highly accurate tool for bacterial genome comparison [v1; ref status: indexed, http://f1000r.es/27n

    Directory of Open Access Journals (Sweden)

    Ilya Minkin

    2013-11-01

    Full Text Available We present C-Sibelia, a highly accurate and easy-to-use software tool for comparing two closely related bacterial genomes, which can be presented as either finished sequences or fragmented assemblies. C-Sibelia takes as input two FASTA files and produces: (1 a VCF file containing all identified single nucleotide variations and indels; (2 an XMFA file containing alignment information. The software also produces Circos diagrams visualizing high level genomic architecture for rearrangement analyses. C-Sibelia is a part of the Sibelia comparative genomics suite, which is freely available under the GNU GPL v.2 license at http://sourceforge.net/projects/sibelia-bio. C-Sibelia is compatible with Unix-like operating systems. A web-based version of the software is available at http://etool.me/software/csibelia.

  14. Accurate Prediction of the Statistics of Repetitions in Random Sequences: A Case Study in Archaea Genomes.

    Science.gov (United States)

    Régnier, Mireille; Chassignet, Philippe

    2016-01-01

    Repetitive patterns in genomic sequences have a great biological significance and also algorithmic implications. Analytic combinatorics allow to derive formula for the expected length of repetitions in a random sequence. Asymptotic results, which generalize previous works on a binary alphabet, are easily computable. Simulations on random sequences show their accuracy. As an application, the sample case of Archaea genomes illustrates how biological sequences may differ from random sequences.

  15. Can a semi-automated surface matching and principal axis-based algorithm accurately quantify femoral shaft fracture alignment in six degrees of freedom?

    Science.gov (United States)

    Crookshank, Meghan C; Beek, Maarten; Singh, Devin; Schemitsch, Emil H; Whyne, Cari M

    2013-07-01

    Accurate alignment of femoral shaft fractures treated with intramedullary nailing remains a challenge for orthopaedic surgeons. The aim of this study is to develop and validate a cone-beam CT-based, semi-automated algorithm to quantify the malalignment in six degrees of freedom (6DOF) using a surface matching and principal axes-based approach. Complex comminuted diaphyseal fractures were created in nine cadaveric femora and cone-beam CT images were acquired (27 cases total). Scans were cropped and segmented using intensity-based thresholding, producing superior, inferior and comminution volumes. Cylinders were fit to estimate the long axes of the superior and inferior fragments. The angle and distance between the two cylindrical axes were calculated to determine flexion/extension and varus/valgus angulation and medial/lateral and anterior/posterior translations, respectively. Both surfaces were unwrapped about the cylindrical axes. Three methods of matching the unwrapped surface for determination of periaxial rotation were compared based on minimizing the distance between features. The calculated corrections were compared to the input malalignment conditions. All 6DOF were calculated to within current clinical tolerances for all but two cases. This algorithm yielded accurate quantification of malalignment of femoral shaft fractures for fracture gaps up to 60 mm, based on a single CBCT image of the fractured limb.

  16. A method for accurate detection of genomic microdeletions using real-time quantitative PCR

    Directory of Open Access Journals (Sweden)

    Bassett Anne S

    2005-12-01

    Full Text Available Abstract Background Quantitative Polymerase Chain Reaction (qPCR is a well-established method for quantifying levels of gene expression, but has not been routinely applied to the detection of constitutional copy number alterations of human genomic DNA. Microdeletions or microduplications of the human genome are associated with a variety of genetic disorders. Although, clinical laboratories routinely use fluorescence in situ hybridization (FISH to identify such cryptic genomic alterations, there remains a significant number of individuals in which constitutional genomic imbalance is suspected, based on clinical parameters, but cannot be readily detected using current cytogenetic techniques. Results In this study, a novel application for real-time qPCR is presented that can be used to reproducibly detect chromosomal microdeletions and microduplications. This approach was applied to DNA from a series of patient samples and controls to validate genomic copy number alteration at cytoband 22q11. The study group comprised 12 patients with clinical symptoms of chromosome 22q11 deletion syndrome (22q11DS, 1 patient trisomic for 22q11 and 4 normal controls. 6 of the patients (group 1 had known hemizygous deletions, as detected by standard diagnostic FISH, whilst the remaining 6 patients (group 2 were classified as 22q11DS negative using the clinical FISH assay. Screening of the patients and controls with a set of 10 real time qPCR primers, spanning the 22q11.2-deleted region and flanking sequence, confirmed the FISH assay results for all patients with 100% concordance. Moreover, this qPCR enabled a refinement of the region of deletion at 22q11. Analysis of DNA from chromosome 22 trisomic sample demonstrated genomic duplication within 22q11. Conclusion In this paper we present a qPCR approach for the detection of chromosomal microdeletions and microduplications. The strategic use of in silico modelling for qPCR primer design to avoid regions of repetitive

  17. Bisulfite-based epityping on pooled genomic DNA provides an accurate estimate of average group DNA methylation

    Directory of Open Access Journals (Sweden)

    Docherty Sophia J

    2009-03-01

    Full Text Available Abstract Background DNA methylation plays a vital role in normal cellular function, with aberrant methylation signatures being implicated in a growing number of human pathologies and complex human traits. Methods based on the modification of genomic DNA with sodium bisulfite are considered the 'gold-standard' for DNA methylation profiling on genomic DNA; however, they require relatively large amounts of DNA and may be prohibitively expensive when used on the large sample sizes necessary to detect small effects. We propose that a high-throughput DNA pooling approach will facilitate the use of emerging methylomic profiling techniques in large samples. Results Compared with data generated from 89 individual samples, our analysis of 205 CpG sites spanning nine independent regions of the genome demonstrates that DNA pools can be used to provide an accurate and reliable quantitative estimate of average group DNA methylation. Comparison of data generated from the pooled DNA samples with results averaged across the individual samples comprising each pool revealed highly significant correlations for individual CpG sites across all nine regions, with an average overall correlation across all regions and pools of 0.95 (95% bootstrapped confidence intervals: 0.94 to 0.96. Conclusion In this study we demonstrate the validity of using pooled DNA samples to accurately assess group DNA methylation averages. Such an approach can be readily applied to the assessment of disease phenotypes reducing the time, cost and amount of DNA starting material required for large-scale epigenetic analyses.

  18. Creating and evaluating accurate CRISPR-Cas9 scalpels for genomic surgery.

    Science.gov (United States)

    Bolukbasi, Mehmet Fatih; Gupta, Ankit; Wolfe, Scot A

    2016-01-01

    The simplicity of site-specific genome targeting by type II clustered, regularly interspaced, short palindromic repeat (CRISPR)-Cas9 nucleases, along with their robust activity profile, has changed the landscape of genome editing. These favorable properties have made the CRISPR-Cas9 system the technology of choice for sequence-specific modifications in vertebrate systems. For many applications, whether the focus is on basic science investigations or therapeutic efficacy, activity and precision are important considerations when one is choosing a nuclease platform, target site and delivery method. Here we review recent methods for increasing the activity and accuracy of Cas9 and assessing the extent of off-target cleavage events.

  19. PipMaker—A Web Server for Aligning Two Genomic DNA Sequences

    OpenAIRE

    Schwartz, Scott; Zheng ZHANG; Frazer, Kelly A; Smit, Arian; Riemer, Cathy; Bouck, John; Gibbs, Richard; Hardison, Ross; Miller, Webb

    2000-01-01

    PipMaker (http://bio.cse.psu.edu) is a World-Wide Web site for comparing two long DNA sequences to identify conserved segments and for producing informative, high-resolution displays of the resulting alignments. One display is a percent identity plot (pip), which shows both the position in one sequence and the degree of similarity for each aligning segment between the two sequences in a compact and easily understandable form. Positions along the horizontal axis can be labeled with features su...

  20. CVTree3 Web Server for Whole-genome-based and Alignment-free Prokaryotic Phylogeny and Taxonomy.

    Science.gov (United States)

    Zuo, Guanghong; Hao, Bailin

    2015-10-01

    A faithful phylogeny and an objective taxonomy for prokaryotes should agree with each other and ultimately follow the genome data. With the number of sequenced genomes reaching tens of thousands, both tree inference and detailed comparison with taxonomy are great challenges. We now provide one solution in the latest Release 3.0 of the alignment-free and whole-genome-based web server CVTree3. The server resides in a cluster of 64 cores and is equipped with an interactive, collapsible, and expandable tree display. It is capable of comparing the tree branching order with prokaryotic classification at all taxonomic ranks from domains down to species and strains. CVTree3 allows for inquiry by taxon names and trial on lineage modifications. In addition, it reports a summary of monophyletic and non-monophyletic taxa at all ranks as well as produces print-quality subtree figures. After giving an overview of retrospective verification of the CVTree approach, the power of the new server is described for the mega-classification of prokaryotes and determination of taxonomic placement of some newly-sequenced genomes. A few discrepancies between CVTree and 16S rRNA analyses are also summarized with regard to possible taxonomic revisions. CVTree3 is freely accessible to all users at http://tlife.fudan.edu.cn/cvtree3/ without login requirements.

  1. CVTree3 Web Server for Whole-genome-based and Alignment-free Prokaryotic Phylogeny and Taxonomy

    Directory of Open Access Journals (Sweden)

    Guanghong Zuo

    2015-10-01

    Full Text Available A faithful phylogeny and an objective taxonomy for prokaryotes should agree with each other and ultimately follow the genome data. With the number of sequenced genomes reaching tens of thousands, both tree inference and detailed comparison with taxonomy are great challenges. We now provide one solution in the latest Release 3.0 of the alignment-free and whole-genome-based web server CVTree3. The server resides in a cluster of 64 cores and is equipped with an interactive, collapsible, and expandable tree display. It is capable of comparing the tree branching order with prokaryotic classification at all taxonomic ranks from domains down to species and strains. CVTree3 allows for inquiry by taxon names and trial on lineage modifications. In addition, it reports a summary of monophyletic and non-monophyletic taxa at all ranks as well as produces print-quality subtree figures. After giving an overview of retrospective verification of the CVTree approach, the power of the new server is described for the mega-classification of prokaryotes and determination of taxonomic placement of some newly-sequenced genomes. A few discrepancies between CVTree and 16S rRNA analyses are also summarized with regard to possible taxonomic revisions. CVTree3 is freely accessible to all users at http://tlife.fudan.edu.cn/cvtree3/ without login requirements.

  2. An optimized and low-cost FPGA-based DNA sequence alignment--a step towards personal genomics.

    Science.gov (United States)

    Shah, Hurmat Ali; Hasan, Laiq; Ahmad, Nasir

    2013-01-01

    DNA sequence alignment is a cardinal process in computational biology but also is much expensive computationally when performing through traditional computational platforms like CPU. Of many off the shelf platforms explored for speeding up the computation process, FPGA stands as the best candidate due to its performance per dollar spent and performance per watt. These two advantages make FPGA as the most appropriate choice for realizing the aim of personal genomics. The previous implementation of DNA sequence alignment did not take into consideration the price of the device on which optimization was performed. This paper presents optimization over previous FPGA implementation that increases the overall speed-up achieved as well as the price incurred by the platform that was optimized. The optimizations are (1) The array of processing elements is made to run on change in input value and not on clock, so eliminating the need for tight clock synchronization, (2) the implementation is unrestrained by the size of the sequences to be aligned, (3) the waiting time required for the sequences to load to FPGA is reduced to the minimum possible and (4) an efficient method is devised to store the output matrix that make possible to save the diagonal elements to be used in next pass, in parallel with the computation of output matrix. Implemented on Spartan3 FPGA, this implementation achieved 20 times performance improvement in terms of CUPS over GPP implementation.

  3. High-throughput automated microfluidic sample preparation for accurate microbial genomics

    Science.gov (United States)

    Kim, Soohong; De Jonghe, Joachim; Kulesa, Anthony B.; Feldman, David; Vatanen, Tommi; Bhattacharyya, Roby P.; Berdy, Brittany; Gomez, James; Nolan, Jill; Epstein, Slava; Blainey, Paul C.

    2017-01-01

    Low-cost shotgun DNA sequencing is transforming the microbial sciences. Sequencing instruments are so effective that sample preparation is now the key limiting factor. Here, we introduce a microfluidic sample preparation platform that integrates the key steps in cells to sequence library sample preparation for up to 96 samples and reduces DNA input requirements 100-fold while maintaining or improving data quality. The general-purpose microarchitecture we demonstrate supports workflows with arbitrary numbers of reaction and clean-up or capture steps. By reducing the sample quantity requirements, we enabled low-input (∼10,000 cells) whole-genome shotgun (WGS) sequencing of Mycobacterium tuberculosis and soil micro-colonies with superior results. We also leveraged the enhanced throughput to sequence ∼400 clinical Pseudomonas aeruginosa libraries and demonstrate excellent single-nucleotide polymorphism detection performance that explained phenotypically observed antibiotic resistance. Fully-integrated lab-on-chip sample preparation overcomes technical barriers to enable broader deployment of genomics across many basic research and translational applications. PMID:28128213

  4. PrimerDesign-M: a multiple-alignment based multiple-primer design tool for walking across variable genomes.

    Science.gov (United States)

    Yoon, Hyejin; Leitner, Thomas

    2015-05-01

    Analyses of entire viral genomes or mtDNA requires comprehensive design of many primers across their genomes. Furthermore, simultaneous optimization of several DNA primer design criteria may improve overall experimental efficiency and downstream bioinformatic processing. To achieve these goals, we developed PrimerDesign-M. It includes several options for multiple-primer design, allowing researchers to efficiently design walking primers that cover long DNA targets, such as entire HIV-1 genomes, and that optimizes primers simultaneously informed by genetic diversity in multiple alignments and experimental design constraints given by the user. PrimerDesign-M can also design primers that include DNA barcodes and minimize primer dimerization. PrimerDesign-M finds optimal primers for highly variable DNA targets and facilitates design flexibility by suggesting alternative designs to adapt to experimental conditions. PrimerDesign-M is available as a webtool at http://www.hiv.lanl.gov/content/sequence/PRIMER_DESIGN/primer_design.html tkl@lanl.gov or seq-info@lanl.gov. Published by Oxford University Press 2014. This work is written by US Government employees and is in the public domain in the US.

  5. Sequencing and alignment of mitochondrial genomes of Tibetan chicken and two lowland chicken breeds

    Institute of Scientific and Technical Information of China (English)

    2008-01-01

    Tibetan chicken lives in high-altitude area and has adapted well to hypoxia genetically. Shouguang chicken and Silky chicken are both lowland chicken breeds. In the present study, the complete mito-chondrial genome sequences of the three chicken breeds were all sequenced. The results showed that the mitochondrial DNAs (mtDNAs) of Shouguang chicken and Silky chicken consist of 16784 bp and 16785 bp respectively, and Tibetan chicken mitochondrial genome varies from 16784 bp to 16786 bp. After sequence analysis, 120 mutations, including 4 single nucleotide polymorphisms (SNPs) in tRNA genes, 9 SNPs and 1 insertion in rRNA genes, 38 SNPs and 1 deletion in D-LOOP, 66 SNPs in pro-tein-coding genes, were found. This work will provide clues for the future study on the association between mitochondrial genes and the adaptation to hypoxia.Tibetan chicken, lowland chicken, mitochondrial genome, hypoxia.

  6. Omni-PolyA: a method and tool for accurate recognition of Poly(A) signals in human genomic DNA

    KAUST Repository

    Magana-Mora, Arturo

    2017-08-15

    BackgroundPolyadenylation is a critical stage of RNA processing during the formation of mature mRNA, and is present in most of the known eukaryote protein-coding transcripts and many long non-coding RNAs. The correct identification of poly(A) signals (PAS) not only helps to elucidate the 3′-end genomic boundaries of a transcribed DNA region and gene regulatory mechanisms but also gives insight into the multiple transcript isoforms resulting from alternative PAS. Although progress has been made in the in-silico prediction of genomic signals, the recognition of PAS in DNA genomic sequences remains a challenge.ResultsIn this study, we analyzed human genomic DNA sequences for the 12 most common PAS variants. Our analysis has identified a set of features that helps in the recognition of true PAS, which may be involved in the regulation of the polyadenylation process. The proposed features, in combination with a recognition model, resulted in a novel method and tool, Omni-PolyA. Omni-PolyA combines several machine learning techniques such as different classifiers in a tree-like decision structure and genetic algorithms for deriving a robust classification model. We performed a comparison between results obtained by state-of-the-art methods, deep neural networks, and Omni-PolyA. Results show that Omni-PolyA significantly reduced the average classification error rate by 35.37% in the prediction of the 12 considered PAS variants relative to the state-of-the-art results.ConclusionsThe results of our study demonstrate that Omni-PolyA is currently the most accurate model for the prediction of PAS in human and can serve as a useful complement to other PAS recognition methods. Omni-PolyA is publicly available as an online tool accessible at www.cbrc.kaust.edu.sa/omnipolya/.

  7. Omni-PolyA: a method and tool for accurate recognition of Poly(A) signals in human genomic DNA.

    Science.gov (United States)

    Magana-Mora, Arturo; Kalkatawi, Manal; Bajic, Vladimir B

    2017-08-15

    Polyadenylation is a critical stage of RNA processing during the formation of mature mRNA, and is present in most of the known eukaryote protein-coding transcripts and many long non-coding RNAs. The correct identification of poly(A) signals (PAS) not only helps to elucidate the 3'-end genomic boundaries of a transcribed DNA region and gene regulatory mechanisms but also gives insight into the multiple transcript isoforms resulting from alternative PAS. Although progress has been made in the in-silico prediction of genomic signals, the recognition of PAS in DNA genomic sequences remains a challenge. In this study, we analyzed human genomic DNA sequences for the 12 most common PAS variants. Our analysis has identified a set of features that helps in the recognition of true PAS, which may be involved in the regulation of the polyadenylation process. The proposed features, in combination with a recognition model, resulted in a novel method and tool, Omni-PolyA. Omni-PolyA combines several machine learning techniques such as different classifiers in a tree-like decision structure and genetic algorithms for deriving a robust classification model. We performed a comparison between results obtained by state-of-the-art methods, deep neural networks, and Omni-PolyA. Results show that Omni-PolyA significantly reduced the average classification error rate by 35.37% in the prediction of the 12 considered PAS variants relative to the state-of-the-art results. The results of our study demonstrate that Omni-PolyA is currently the most accurate model for the prediction of PAS in human and can serve as a useful complement to other PAS recognition methods. Omni-PolyA is publicly available as an online tool accessible at www.cbrc.kaust.edu.sa/omnipolya/ .

  8. Sequencing and alignment of mitochondrial genomes of Tibetan chicken and two lowland chicken breeds

    Institute of Scientific and Technical Information of China (English)

    2008-01-01

    Tibetan chicken lives in high-altitude area and has adapted well to hypoxia genetically. Shouguang chicken and Silky chicken are both lowland chicken breeds. In the present study, the complete mitochondrial genome sequences of the three chicken breeds were all sequenced. The results showed that the mitochondrial DNAs (mtDNAs) of Shouguang chicken and Silky chicken consist of 16784 bp and 16785 bp respectively, and Tibetan chicken mitochondrial genome varies from 16784 bp to 16786 bp. After sequence analysis, 120 mutations, including 4 single nucleotide polymorphisms (SNPs) in tRNA genes, 9 SNPs and 1 insertion in rRNA genes, 38 SNPs and 1 deletion in D-LOOP, 66 SNPs in protein-coding genes, were found. This work will provide clues for the future study on the association between mitochondrial genes and the adaptation to hypoxia.

  9. Splign: algorithms for computing spliced alignments with identification of paralogs

    Directory of Open Access Journals (Sweden)

    Tatusova Tatiana

    2008-05-01

    Full Text Available Abstract Background The computation of accurate alignments of cDNA sequences against a genome is at the foundation of modern genome annotation pipelines. Several factors such as presence of paralogs, small exons, non-consensus splice signals, sequencing errors and polymorphic sites pose recognized difficulties to existing spliced alignment algorithms. Results We describe a set of algorithms behind a tool called Splign for computing cDNA-to-Genome alignments. The algorithms include a high-performance preliminary alignment, a compartment identification based on a formally defined model of adjacent duplicated regions, and a refined sequence alignment. In a series of tests, Splign has produced more accurate results than other tools commonly used to compute spliced alignments, in a reasonable amount of time. Conclusion Splign's ability to deal with various issues complicating the spliced alignment problem makes it a helpful tool in eukaryotic genome annotation processes and alternative splicing studies. Its performance is enough to align the largest currently available pools of cDNA data such as the human EST set on a moderate-sized computing cluster in a matter of hours. The duplications identification (compartmentization algorithm can be used independently in other areas such as the study of pseudogenes. Reviewers This article was reviewed by: Steven Salzberg, Arcady Mushegian and Andrey Mironov (nominated by Mikhail Gelfand.

  10. Alignment validation

    Energy Technology Data Exchange (ETDEWEB)

    ALICE; ATLAS; CMS; LHCb; Golling, Tobias

    2008-09-06

    The four experiments, ALICE, ATLAS, CMS and LHCb are currently under constructionat CERN. They will study the products of proton-proton collisions at the Large Hadron Collider. All experiments are equipped with sophisticated tracking systems, unprecedented in size and complexity. Full exploitation of both the inner detector andthe muon system requires an accurate alignment of all detector elements. Alignmentinformation is deduced from dedicated hardware alignment systems and the reconstruction of charged particles. However, the system is degenerate which means the data is insufficient to constrain all alignment degrees of freedom, so the techniques are prone to converging on wrong geometries. This deficiency necessitates validation and monitoring of the alignment. An exhaustive discussion of means to validate is subject to this document, including examples and plans from all four LHC experiments, as well as other high energy experiments.

  11. Input/Output Scalability of Genomic Alignment: How to Configure a Computational Biology Cluster

    Energy Technology Data Exchange (ETDEWEB)

    Vaidyanathan, P; Madhyastha, T M; Jones, T

    2001-10-03

    Many scientific applications are I/O-intensive, which makes optimization and scaling difficult, especially on parallel architectures. The I/O requirements of computational biology applications are different from other scientific applications. The main difference is that many computational biology applications are embarrassingly parallel and require repeated read-only access to a large global database. In this paper we examine the scalability of an embarrassingly parallel computational biology application: psLayout, which played a crucial role in the mapping of the human genome. This study was carried out on three architecture: the native UCSC Linux cluster, a Linux cluster at Lawrence Livermore National Labs with a faster interconnect and NFS server, and the ASCI Blue-Pacific supercomputer. We show that a cluster equipped with a fast network and parallel file system or a scalable NFS server has reasonable I/O scalability. We believe that replication is an important issue when scaling to larger numbers of processors, and we introduce the design of a library for automatic data replication to address this issue.

  12. The Oryza map alignment project: Construction, alignment and analysis of 12 BAC fingerprint/end sequence framework physical maps that represent the 10 genome types of genus Oryza

    Science.gov (United States)

    The Oryza Map Alignment Project (OMAP) provides the first comprehensive experimental system for understanding the evolution, physiology and biochemistry of a full genus in plants or animals. We have constructed twelve deep-coverage BAC libraries that are representative of both diploid and tetraploid...

  13. Long Read Alignment with Parallel MapReduce Cloud Platform.

    Science.gov (United States)

    Al-Absi, Ahmed Abdulhakim; Kang, Dae-Ki

    2015-01-01

    Genomic sequence alignment is an important technique to decode genome sequences in bioinformatics. Next-Generation Sequencing technologies produce genomic data of longer reads. Cloud platforms are adopted to address the problems arising from storage and analysis of large genomic data. Existing genes sequencing tools for cloud platforms predominantly consider short read gene sequences and adopt the Hadoop MapReduce framework for computation. However, serial execution of map and reduce phases is a problem in such systems. Therefore, in this paper, we introduce Burrows-Wheeler Aligner's Smith-Waterman Alignment on Parallel MapReduce (BWASW-PMR) cloud platform for long sequence alignment. The proposed cloud platform adopts a widely accepted and accurate BWA-SW algorithm for long sequence alignment. A custom MapReduce platform is developed to overcome the drawbacks of the Hadoop framework. A parallel execution strategy of the MapReduce phases and optimization of Smith-Waterman algorithm are considered. Performance evaluation results exhibit an average speed-up of 6.7 considering BWASW-PMR compared with the state-of-the-art Bwasw-Cloud. An average reduction of 30% in the map phase makespan is reported across all experiments comparing BWASW-PMR with Bwasw-Cloud. Optimization of Smith-Waterman results in reducing the execution time by 91.8%. The experimental study proves the efficiency of BWASW-PMR for aligning long genomic sequences on cloud platforms.

  14. Archived neonatal dried blood spot samples can be used for accurate whole genome and exome-targeted next-generation sequencing

    DEFF Research Database (Denmark)

    Hollegaard, Mads Vilhelm; Grauholm, Jonas; Nielsen, Ronni;

    2013-01-01

    Dried blood spot samples (DBSS) have been collected and stored for decades as part of newborn screening programmes worldwide. Representing almost an entire population under a certain age and collected with virtually no bias, the Newborn Screening Biobanks are of immense value in medical studies......, for example, to examine the genetics of various disorders. We have previously demonstrated that DNA extracted from a fraction (2×3.2mm discs) of an archived DBSS can be whole genome amplified (wgaDNA) and used for accurate array genotyping. However, until now, it has been uncertain whether wgaDNA from DBSS...... can be used for accurate whole genome sequencing (WGS) and exome sequencing (WES). This study examined two individuals represented by three different types of samples each: whole-blood (reference samples), 3-year-old DBSS spotted with reference material (refDBSS), and 27- to 29-year-old archived...

  15. Accurate Localization of the Integration Sites of Two Genomic Islands at Single-Nucleotide Resolution in the Genome of Bacillus cereus ATCC 10987

    Directory of Open Access Journals (Sweden)

    Ren Zhang

    2008-01-01

    Full Text Available We have identified two genomic islands, that is, BCEGI-1 and BCEGI-2, in the genome of Bacillus cereus ATCC 10987, based on comparative analysis with Bacillus cereus ATCC 14579. Furthermore, by using the cumulative GC profile and performing homology searches between the two genomes, the integration sites of the two genomic islands were determined at single-nucleotide resolution. BCEGI-1 is integrated between 159705 bp and 198000 bp, whereas BCEGI-2 is integrated between the end of ORF BCE4594 and the start of the intergenic sequence immediately following BCE4626, that is, from 4256803 bp to 4285534 bp. BCEGI-1 harbors two bacterial Tn7 transposons, which have two sets of genes encoding TnsA, B, C, and D. It is generally believed that unlike the TnsABC+E pathway, the TnsABC+D pathway would only promote vertical transmission to daughter cells. The evidence presented in this paper, however, suggests a role of the TnsABC+D pathway in the horizontal transfer of some genomic islands.

  16. Use of whole-genus genome sequence data to develop a multilocus sequence typing tool that accurately identifies Yersinia isolates to the species and subspecies levels.

    Science.gov (United States)

    Hall, Miquette; Chattaway, Marie A; Reuter, Sandra; Savin, Cyril; Strauch, Eckhard; Carniel, Elisabeth; Connor, Thomas; Van Damme, Inge; Rajakaruna, Lakshani; Rajendram, Dunstan; Jenkins, Claire; Thomson, Nicholas R; McNally, Alan

    2015-01-01

    The genus Yersinia is a large and diverse bacterial genus consisting of human-pathogenic species, a fish-pathogenic species, and a large number of environmental species. Recently, the phylogenetic and population structure of the entire genus was elucidated through the genome sequence data of 241 strains encompassing every known species in the genus. Here we report the mining of this enormous data set to create a multilocus sequence typing-based scheme that can identify Yersinia strains to the species level to a level of resolution equal to that for whole-genome sequencing. Our assay is designed to be able to accurately subtype the important human-pathogenic species Yersinia enterocolitica to whole-genome resolution levels. We also report the validation of the scheme on 386 strains from reference laboratory collections across Europe. We propose that the scheme is an important molecular typing system to allow accurate and reproducible identification of Yersinia isolates to the species level, a process often inconsistent in nonspecialist laboratories. Additionally, our assay is the most phylogenetically informative typing scheme available for Y. enterocolitica. Copyright © 2015, American Society for Microbiology. All Rights Reserved.

  17. Long Read Alignment with Parallel MapReduce Cloud Platform

    Directory of Open Access Journals (Sweden)

    Ahmed Abdulhakim Al-Absi

    2015-01-01

    Full Text Available Genomic sequence alignment is an important technique to decode genome sequences in bioinformatics. Next-Generation Sequencing technologies produce genomic data of longer reads. Cloud platforms are adopted to address the problems arising from storage and analysis of large genomic data. Existing genes sequencing tools for cloud platforms predominantly consider short read gene sequences and adopt the Hadoop MapReduce framework for computation. However, serial execution of map and reduce phases is a problem in such systems. Therefore, in this paper, we introduce Burrows-Wheeler Aligner’s Smith-Waterman Alignment on Parallel MapReduce (BWASW-PMR cloud platform for long sequence alignment. The proposed cloud platform adopts a widely accepted and accurate BWA-SW algorithm for long sequence alignment. A custom MapReduce platform is developed to overcome the drawbacks of the Hadoop framework. A parallel execution strategy of the MapReduce phases and optimization of Smith-Waterman algorithm are considered. Performance evaluation results exhibit an average speed-up of 6.7 considering BWASW-PMR compared with the state-of-the-art Bwasw-Cloud. An average reduction of 30% in the map phase makespan is reported across all experiments comparing BWASW-PMR with Bwasw-Cloud. Optimization of Smith-Waterman results in reducing the execution time by 91.8%. The experimental study proves the efficiency of BWASW-PMR for aligning long genomic sequences on cloud platforms.

  18. A context dependent pair hidden Markov model for statistical alignment

    CERN Document Server

    Arribas-Gil, Ana

    2011-01-01

    This article proposes a novel approach to statistical alignment of nucleotide sequences by introducing a context dependent structure on the substitution process in the underlying evolutionary model. We propose to estimate alignments and context dependent mutation rates relying on the observation of two homologous sequences. The procedure is based on a generalized pair-hidden Markov structure, where conditional on the alignment path, the nucleotide sequences follow a Markov distribution. We use a stochastic approximation expectation maximization (saem) algorithm to give accurate estimators of parameters and alignments. We provide results both on simulated data and vertebrate genomes, which are known to have a high mutation rate from CG dinucleotide. In particular, we establish that the method improves the accuracy of the alignment of a human pseudogene and its functional gene.

  19. Roles of the Y-family DNA polymerase Dbh in accurate replication of the Sulfolobus genome at high temperature.

    Science.gov (United States)

    Sakofsky, Cynthia J; Foster, Patricia L; Grogan, Dennis W

    2012-04-01

    The intrinsically thermostable Y-family DNA polymerases of Sulfolobus spp. have revealed detailed three-dimensional structure and catalytic mechanisms of trans-lesion DNA polymerases, yet their functions in maintaining their native genomes remain largely unexplored. To identify functions of the Y-family DNA polymerase Dbh in replicating the Sulfolobus genome under extreme conditions, we disrupted the dbh gene in Sulfolobus acidocaldarius and characterized the resulting mutant strains phenotypically. Disruption of dbh did not cause any obvious growth defect, sensitivity to any of several DNA-damaging agents, or change in overall rate of spontaneous mutation at a well-characterized target gene. Loss of dbh did, however, cause significant changes in the spectrum of spontaneous forward mutation in each of two orthologous target genes of different sequence. Relative to wild-type strains, dbh(-) constructs exhibited fewer frame-shift and other small insertion-deletion mutations, but exhibited more base-pair substitutions that converted G:C base pairs to T:A base pairs. These changes, which were confirmed to be statistically significant, indicate two distinct activities of the Dbh polymerase in Sulfolobus cells growing under nearly optimal culture conditions (78-80°C and pH 3). The first activity promotes slipped-strand events within simple repetitive motifs, such as mononucleotide runs or triplet repeats, and the second promotes insertion of C opposite a potentially miscoding form of G, thereby avoiding G:C to T:A transversions.

  20. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies.

    Directory of Open Access Journals (Sweden)

    Bryan N Howie

    2009-06-01

    Full Text Available Genotype imputation methods are now being widely used in the analysis of genome-wide association studies. Most imputation analyses to date have used the HapMap as a reference dataset, but new reference panels (such as controls genotyped on multiple SNP chips and densely typed samples from the 1,000 Genomes Project will soon allow a broader range of SNPs to be imputed with higher accuracy, thereby increasing power. We describe a genotype imputation method (IMPUTE version 2 that is designed to address the challenges presented by these new datasets. The main innovation of our approach is a flexible modelling framework that increases accuracy and combines information across multiple reference panels while remaining computationally feasible. We find that IMPUTE v2 attains higher accuracy than other methods when the HapMap provides the sole reference panel, but that the size of the panel constrains the improvements that can be made. We also find that imputation accuracy can be greatly enhanced by expanding the reference panel to contain thousands of chromosomes and that IMPUTE v2 outperforms other methods in this setting at both rare and common SNPs, with overall error rates that are 15%-20% lower than those of the closest competing method. One particularly challenging aspect of next-generation association studies is to integrate information across multiple reference panels genotyped on different sets of SNPs; we show that our approach to this problem has practical advantages over other suggested solutions.

  1. Accurate variant detection across non-amplified and whole genome amplified DNA using targeted next generation sequencing

    Directory of Open Access Journals (Sweden)

    ElSharawy Abdou

    2012-09-01

    Full Text Available Abstract Background Many hypothesis-driven genetic studies require the ability to comprehensively and efficiently target specific regions of the genome to detect sequence variations. Often, sample availability is limited requiring the use of whole genome amplification (WGA. We evaluated a high-throughput microdroplet-based PCR approach in combination with next generation sequencing (NGS to target 384 discrete exons from 373 genes involved in cancer. In our evaluation, we compared the performance of six non-amplified gDNA samples from two HapMap family trios. Three of these samples were also preamplified by WGA and evaluated. We tested sample pooling or multiplexing strategies at different stages of the tested targeted NGS (T-NGS workflow. Results The results demonstrated comparable sequence performance between non-amplified and preamplified samples and between different indexing strategies [sequence specificity of 66.0% ± 3.4%, uniformity (coverage at 0.2× of the mean of 85.6% ± 0.6%]. The average genotype concordance maintained across all the samples was 99.5% ± 0.4%, regardless of sample type or pooling strategy. We did not detect any errors in the Mendelian patterns of inheritance of genotypes between the parents and offspring within each trio. We also demonstrated the ability to detect minor allele frequencies within the pooled samples that conform to predicted models. Conclusion Our described PCR-based sample multiplex approach and the ability to use WGA material for NGS may enable researchers to perform deep resequencing studies and explore variants at very low frequencies and cost.

  2. Accurate Breakpoint Mapping in Apparently Balanced Translocation Families with Discordant Phenotypes Using Whole Genome Mate-Pair Sequencing

    Science.gov (United States)

    Aristidou, Constantia; Koufaris, Costas; Theodosiou, Athina; Bak, Mads; Mehrjouy, Mana M.; Behjati, Farkhondeh; Tanteles, George; Christophidou-Anastasiadou, Violetta; Tommerup, Niels

    2017-01-01

    Familial apparently balanced translocations (ABTs) segregating with discordant phenotypes are extremely challenging for interpretation and counseling due to the scarcity of publications and lack of routine techniques for quick investigation. Recently, next generation sequencing has emerged as an efficacious methodology for precise detection of translocation breakpoints. However, studies so far have mainly focused on de novo translocations. The present study focuses specifically on familial cases in order to shed some light to this diagnostic dilemma. Whole-genome mate-pair sequencing (WG-MPS) was applied to map the breakpoints in nine two-way ABT carriers from four families. Translocation breakpoints and patient-specific structural variants were validated by Sanger sequencing and quantitative Real Time PCR, respectively. Identical sequencing patterns and breakpoints were identified in affected and non-affected members carrying the same translocations. PTCD1, ATP5J2-PTCD1, CADPS2, and STPG1 were disrupted by the translocations in three families, rendering them initially as possible disease candidate genes. However, subsequent mutation screening and structural variant analysis did not reveal any pathogenic mutations or unique variants in the affected individuals that could explain the phenotypic differences between carriers of the same translocations. In conclusion, we suggest that NGS-based methods, such as WG-MPS, can be successfully used for detailed mapping of translocation breakpoints, which can also be used in routine clinical investigation of ABT cases. Unlike de novo translocations, no associations were determined here between familial two-way ABTs and the phenotype of the affected members, in which the presence of cryptic imbalances and complex chromosomal rearrangements has been excluded. Future whole-exome or whole-genome sequencing will potentially reveal unidentified mutations in the patients underlying the discordant phenotypes within each family. In

  3. Coval: Improving Alignment Quality and Variant Calling Accuracy for Next-Generation Sequencing Data

    Science.gov (United States)

    Kosugi, Shunichi; Natsume, Satoshi; Yoshida, Kentaro; MacLean, Daniel; Cano, Liliana; Kamoun, Sophien; Terauchi, Ryohei

    2013-01-01

    Accurate identification of DNA polymorphisms using next-generation sequencing technology is challenging because of a high rate of sequencing error and incorrect mapping of reads to reference genomes. Currently available short read aligners and DNA variant callers suffer from these problems. We developed the Coval software to improve the quality of short read alignments. Coval is designed to minimize the incidence of spurious alignment of short reads, by filtering mismatched reads that remained in alignments after local realignment and error correction of mismatched reads. The error correction is executed based on the base quality and allele frequency at the non-reference positions for an individual or pooled sample. We demonstrated the utility of Coval by applying it to simulated genomes and experimentally obtained short-read data of rice, nematode, and mouse. Moreover, we found an unexpectedly large number of incorrectly mapped reads in ‘targeted’ alignments, where the whole genome sequencing reads had been aligned to a local genomic segment, and showed that Coval effectively eliminated such spurious alignments. We conclude that Coval significantly improves the quality of short-read sequence alignments, thereby increasing the calling accuracy of currently available tools for SNP and indel identification. Coval is available at http://sourceforge.net/projects/coval105/. PMID:24116042

  4. Coval: improving alignment quality and variant calling accuracy for next-generation sequencing data.

    Directory of Open Access Journals (Sweden)

    Shunichi Kosugi

    Full Text Available Accurate identification of DNA polymorphisms using next-generation sequencing technology is challenging because of a high rate of sequencing error and incorrect mapping of reads to reference genomes. Currently available short read aligners and DNA variant callers suffer from these problems. We developed the Coval software to improve the quality of short read alignments. Coval is designed to minimize the incidence of spurious alignment of short reads, by filtering mismatched reads that remained in alignments after local realignment and error correction of mismatched reads. The error correction is executed based on the base quality and allele frequency at the non-reference positions for an individual or pooled sample. We demonstrated the utility of Coval by applying it to simulated genomes and experimentally obtained short-read data of rice, nematode, and mouse. Moreover, we found an unexpectedly large number of incorrectly mapped reads in 'targeted' alignments, where the whole genome sequencing reads had been aligned to a local genomic segment, and showed that Coval effectively eliminated such spurious alignments. We conclude that Coval significantly improves the quality of short-read sequence alignments, thereby increasing the calling accuracy of currently available tools for SNP and indel identification. Coval is available at http://sourceforge.net/projects/coval105/.

  5. SPA: a probabilistic algorithm for spliced alignment.

    Directory of Open Access Journals (Sweden)

    Erik van Nimwegen

    2006-04-01

    Full Text Available Recent large-scale cDNA sequencing efforts show that elaborate patterns of splice variation are responsible for much of the proteome diversity in higher eukaryotes. To obtain an accurate account of the repertoire of splice variants, and to gain insight into the mechanisms of alternative splicing, it is essential that cDNAs are very accurately mapped to their respective genomes. Currently available algorithms for cDNA-to-genome alignment do not reach the necessary level of accuracy because they use ad hoc scoring models that cannot correctly trade off the likelihoods of various sequencing errors against the probabilities of different gene structures. Here we develop a Bayesian probabilistic approach to cDNA-to-genome alignment. Gene structures are assigned prior probabilities based on the lengths of their introns and exons, and based on the sequences at their splice boundaries. A likelihood model for sequencing errors takes into account the rates at which misincorporation, as well as insertions and deletions of different lengths, occurs during sequencing. The parameters of both the prior and likelihood model can be automatically estimated from a set of cDNAs, thus enabling our method to adapt itself to different organisms and experimental procedures. We implemented our method in a fast cDNA-to-genome alignment program, SPA, and applied it to the FANTOM3 dataset of over 100,000 full-length mouse cDNAs and a dataset of over 20,000 full-length human cDNAs. Comparison with the results of four other mapping programs shows that SPA produces alignments of significantly higher quality. In particular, the quality of the SPA alignments near splice boundaries and SPA's mapping of the 5' and 3' ends of the cDNAs are highly improved, allowing for more accurate identification of transcript starts and ends, and accurate identification of subtle splice variations. Finally, our splice boundary analysis on the human dataset suggests the existence of a novel non

  6. A rank-based sequence aligner with applications in phylogenetic analysis.

    Directory of Open Access Journals (Sweden)

    Liviu P Dinu

    Full Text Available Recent tools for aligning short DNA reads have been designed to optimize the trade-off between correctness and speed. This paper introduces a method for assigning a set of short DNA reads to a reference genome, under Local Rank Distance (LRD. The rank-based aligner proposed in this work aims to improve correctness over speed. However, some indexing strategies to speed up the aligner are also investigated. The LRD aligner is improved in terms of speed by storing [Formula: see text]-mer positions in a hash table for each read. Another improvement, that produces an approximate LRD aligner, is to consider only the positions in the reference that are likely to represent a good positional match of the read. The proposed aligner is evaluated and compared to other state of the art alignment tools in several experiments. A set of experiments are conducted to determine the precision and the recall of the proposed aligner, in the presence of contaminated reads. In another set of experiments, the proposed aligner is used to find the order, the family, or the species of a new (or unknown organism, given only a set of short Next-Generation Sequencing DNA reads. The empirical results show that the aligner proposed in this work is highly accurate from a biological point of view. Compared to the other evaluated tools, the LRD aligner has the important advantage of being very accurate even for a very low base coverage. Thus, the LRD aligner can be considered as a good alternative to standard alignment tools, especially when the accuracy of the aligner is of high importance. Source code and UNIX binaries of the aligner are freely available for future development and use at http://lrd.herokuapp.com/aligners. The software is implemented in C++ and Java, being supported on UNIX and MS Windows.

  7. Carbohydrate catabolic flexibility in the mammalian intestinal commensal Lactobacillus ruminis revealed by fermentation studies aligned to genome annotations

    LENUS (Irish Health Repository)

    2011-08-30

    Abstract Background Lactobacillus ruminis is a poorly characterized member of the Lactobacillus salivarius clade that is part of the intestinal microbiota of pigs, humans and other mammals. Its variable abundance in human and animals may be linked to historical changes over time and geographical differences in dietary intake of complex carbohydrates. Results In this study, we investigated the ability of nine L. ruminis strains of human and bovine origin to utilize fifty carbohydrates including simple sugars, oligosaccharides, and prebiotic polysaccharides. The growth patterns were compared with metabolic pathways predicted by annotation of a high quality draft genome sequence of ATCC 25644 (human isolate) and the complete genome of ATCC 27782 (bovine isolate). All of the strains tested utilized prebiotics including fructooligosaccharides (FOS), soybean-oligosaccharides (SOS) and 1,3:1,4-β-D-gluco-oligosaccharides to varying degrees. Six strains isolated from humans utilized FOS-enriched inulin, as well as FOS. In contrast, three strains isolated from cows grew poorly in FOS-supplemented medium. In general, carbohydrate utilisation patterns were strain-dependent and also varied depending on the degree of polymerisation or complexity of structure. Six putative operons were identified in the genome of the human isolate ATCC 25644 for the transport and utilisation of the prebiotics FOS, galacto-oligosaccharides (GOS), SOS, and 1,3:1,4-β-D-Gluco-oligosaccharides. One of these comprised a novel FOS utilisation operon with predicted capacity to degrade chicory-derived FOS. However, only three of these operons were identified in the ATCC 27782 genome that might account for the utilisation of only SOS and 1,3:1,4-β-D-Gluco-oligosaccharides. Conclusions This study has provided definitive genome-based evidence to support the fermentation patterns of nine strains of Lactobacillus ruminis, and has linked it to gene distribution patterns in strains from different sources

  8. A fast and accurate method to detect allelic genomic imbalances underlying mosaic rearrangements using SNP array data

    Directory of Open Access Journals (Sweden)

    Pique-Regi Roger

    2011-05-01

    Full Text Available Abstract Background Mosaicism for copy number and copy neutral chromosomal rearrangements has been recently identified as a relatively common source of genetic variation in the normal population. However its prevalence is poorly defined since it has been only studied systematically in one large-scale study and by using non optimal ad-hoc SNP array data analysis tools, uncovering rather large alterations (> 1 Mb and affecting a high proportion of cells. Here we propose a novel methodology, Mosaic Alteration Detection-MAD, by providing a software tool that is effective for capturing previously described alterations as wells as new variants that are smaller in size and/or affecting a low percentage of cells. Results The developed method identified all previously known mosaic abnormalities reported in SNP array data obtained from controls, bladder cancer and HapMap individuals. In addition MAD tool was able to detect new mosaic variants not reported before that were smaller in size and with lower percentage of cells affected. The performance of the tool was analysed by studying simulated data for different scenarios. Our method showed high sensitivity and specificity for all assessed scenarios. Conclusions The tool presented here has the ability to identify mosaic abnormalities with high sensitivity and specificity. Our results confirm the lack of sensitivity of former methods by identifying new mosaic variants not reported in previously utilised datasets. Our work suggests that the prevalence of mosaic alterations could be higher than initially thought. The use of appropriate SNP array data analysis methods would help in defining the human genome mosaic map.

  9. Whole genome sequences of the USMARC beef cattle diversity panel v2.9 aligned to the bovine reference genome assembly

    Science.gov (United States)

    A searchable and publicly viewable set of mapped genomes from 96 beef sires from 19 popular breeds of U.S. cattle was created. These sires with minimal pedigree relationships, represent >99% of the germplasm used in the US beef industry circa 2000. The group is estimated to contain more than 187 u...

  10. Horizontally Transferred Genetic Elements in the Tsetse Fly Genome: An Alignment-Free Clustering Approach Using Batch Learning Self-Organising Map (BLSOM)

    Science.gov (United States)

    Nakao, Ryo; Funayama, Shunsuke

    2016-01-01

    Tsetse flies (Glossina spp.) are the primary vectors of trypanosomes, which can cause human and animal African trypanosomiasis in Sub-Saharan African countries. The objective of this study was to explore the genome of Glossina morsitans morsitans for evidence of horizontal gene transfer (HGT) from microorganisms. We employed an alignment-free clustering method, that is, batch learning self-organising map (BLSOM), in which sequence fragments are clustered based on the similarity of oligonucleotide frequencies independently of sequence homology. After an initial scan of HGT events using BLSOM, we identified 3.8% of the tsetse fly genome as HGT candidates. The predicted donors of these HGT candidates included known symbionts, such as Wolbachia, as well as bacteria that have not previously been associated with the tsetse fly. We detected HGT candidates from diverse bacteria such as Bacillus and Flavobacteria, suggesting a past association between these taxa. Functional annotation revealed that the HGT candidates encoded loci in various functional pathways, such as metabolic and antibiotic biosynthesis pathways. These findings provide a basis for understanding the coevolutionary history of the tsetse fly and its microbes and establish the effectiveness of BLSOM for the detection of HGT events. PMID:28074180

  11. AGORA: Assembly Guided by Optical Restriction Alignment

    Directory of Open Access Journals (Sweden)

    Lin Henry C

    2012-08-01

    Full Text Available Abstract Background Genome assembly is difficult due to repeated sequences within the genome, which create ambiguities and cause the final assembly to be broken up into many separate sequences (contigs. Long range linking information, such as mate-pairs or mapping data, is necessary to help assembly software resolve repeats, thereby leading to a more complete reconstruction of genomes. Prior work has used optical maps for validating assemblies and scaffolding contigs, after an initial assembly has been produced. However, optical maps have not previously been used within the genome assembly process. Here, we use optical map information within the popular de Bruijn graph assembly paradigm to eliminate paths in the de Bruijn graph which are not consistent with the optical map and help determine the correct reconstruction of the genome. Results We developed a new algorithm called AGORA: Assembly Guided by Optical Restriction Alignment. AGORA is the first algorithm to use optical map information directly within the de Bruijn graph framework to help produce an accurate assembly of a genome that is consistent with the optical map information provided. Our simulations on bacterial genomes show that AGORA is effective at producing assemblies closely matching the reference sequences. Additionally, we show that noise in the optical map can have a strong impact on the final assembly quality for some complex genomes, and we also measure how various characteristics of the starting de Bruijn graph may impact the quality of the final assembly. Lastly, we show that a proper choice of restriction enzyme for the optical map may substantially improve the quality of the final assembly. Conclusions Our work shows that optical maps can be used effectively to assemble genomes within the de Bruijn graph assembly framework. Our experiments also provide insights into the characteristics of the mapping data that most affect the performance of our algorithm, indicating the

  12. Tidal alignment of galaxies

    Energy Technology Data Exchange (ETDEWEB)

    Blazek, Jonathan; Vlah, Zvonimir; Seljak, Uroš

    2015-08-01

    We develop an analytic model for galaxy intrinsic alignments (IA) based on the theory of tidal alignment. We calculate all relevant nonlinear corrections at one-loop order, including effects from nonlinear density evolution, galaxy biasing, and source density weighting. Contributions from density weighting are found to be particularly important and lead to bias dependence of the IA amplitude, even on large scales. This effect may be responsible for much of the luminosity dependence in IA observations. The increase in IA amplitude for more highly biased galaxies reflects their locations in regions with large tidal fields. We also consider the impact of smoothing the tidal field on halo scales. We compare the performance of this consistent nonlinear model in describing the observed alignment of luminous red galaxies with the linear model as well as the frequently used "nonlinear alignment model," finding a significant improvement on small and intermediate scales. We also show that the cross-correlation between density and IA (the "GI" term) can be effectively separated into source alignment and source clustering, and we accurately model the observed alignment down to the one-halo regime using the tidal field from the fully nonlinear halo-matter cross correlation. Inside the one-halo regime, the average alignment of galaxies with density tracers no longer follows the tidal alignment prediction, likely reflecting nonlinear processes that must be considered when modeling IA on these scales. Finally, we discuss tidal alignment in the context of cosmic shear measurements.

  13. Asexual populations of the human malaria parasite, Plasmodium falciparum, use a two-step genomic strategy to acquire accurate, beneficial DNA amplifications.

    Directory of Open Access Journals (Sweden)

    Jennifer L Guler

    Full Text Available Malaria drug resistance contributes to up to a million annual deaths. Judicious deployment of new antimalarials and vaccines could benefit from an understanding of early molecular events that promote the evolution of parasites. Continuous in vitro challenge of Plasmodium falciparum parasites with a novel dihydroorotate dehydrogenase (DHODH inhibitor reproducibly selected for resistant parasites. Genome-wide analysis of independently-derived resistant clones revealed a two-step strategy to evolutionary success. Some haploid blood-stage parasites first survive antimalarial pressure through fortuitous DNA duplications that always included the DHODH gene. Independently-selected parasites had different sized amplification units but they were always flanked by distant A/T tracks. Higher level amplification and resistance was attained using a second, more efficient and more accurate, mechanism for head-to-tail expansion of the founder unit. This second homology-based process could faithfully tune DNA copy numbers in either direction, always retaining the unique DNA amplification sequence from the original A/T-mediated duplication for that parasite line. Pseudo-polyploidy at relevant genomic loci sets the stage for gaining additional mutations at the locus of interest. Overall, we reveal a population-based genomic strategy for mutagenesis that operates in human stages of P. falciparum to efficiently yield resistance-causing genetic changes at the correct locus in a successful parasite. Importantly, these founding events arise with precision; no other new amplifications are seen in the resistant haploid blood stage parasite. This minimizes the need for meiotic genetic cleansing that can only occur in sexual stage development of the parasite in mosquitoes.

  14. General Alignment Concept of the CMS experiment

    CERN Document Server

    Lampen, T

    2006-01-01

    Efficient and accurate track reconstruction requires proper alignment of the tracking devices used. Here we describe the general alignment strategy envisaged for the CMS experiment. The hardware alignment devices of CMS are presented as well as the different track based alignment approaches.

  15. Simulation of beamline alignment operations

    Energy Technology Data Exchange (ETDEWEB)

    Annese, C; Miller, M G

    1999-02-02

    distributions rather than static values. The only way to accurately understand resource utilization and time requirements for a complex industrial application such as alignment, is to utilize simulation tools such as Simprocess to model the system.

  16. PredictSNP2: A Unified Platform for Accurately Evaluating SNP Effects by Exploiting the Different Characteristics of Variants in Distinct Genomic Regions.

    Science.gov (United States)

    Bendl, Jaroslav; Musil, Miloš; Štourač, Jan; Zendulka, Jaroslav; Damborský, Jiří; Brezovský, Jan

    2016-05-01

    An important message taken from human genome sequencing projects is that the human population exhibits approximately 99.9% genetic similarity. Variations in the remaining parts of the genome determine our identity, trace our history and reveal our heritage. The precise delineation of phenotypically causal variants plays a key role in providing accurate personalized diagnosis, prognosis, and treatment of inherited diseases. Several computational methods for achieving such delineation have been reported recently. However, their ability to pinpoint potentially deleterious variants is limited by the fact that their mechanisms of prediction do not account for the existence of different categories of variants. Consequently, their output is biased towards the variant categories that are most strongly represented in the variant databases. Moreover, most such methods provide numeric scores but not binary predictions of the deleteriousness of variants or confidence scores that would be more easily understood by users. We have constructed three datasets covering different types of disease-related variants, which were divided across five categories: (i) regulatory, (ii) splicing, (iii) missense, (iv) synonymous, and (v) nonsense variants. These datasets were used to develop category-optimal decision thresholds and to evaluate six tools for variant prioritization: CADD, DANN, FATHMM, FitCons, FunSeq2 and GWAVA. This evaluation revealed some important advantages of the category-based approach. The results obtained with the five best-performing tools were then combined into a consensus score. Additional comparative analyses showed that in the case of missense variations, protein-based predictors perform better than DNA sequence-based predictors. A user-friendly web interface was developed that provides easy access to the five tools' predictions, and their consensus scores, in a user-understandable format tailored to the specific features of different categories of variations. To

  17. AlignerBoost: A Generalized Software Toolkit for Boosting Next-Gen Sequencing Mapping Accuracy Using a Bayesian-Based Mapping Quality Framework.

    Directory of Open Access Journals (Sweden)

    Qi Zheng

    2016-10-01

    Full Text Available Accurate mapping of next-generation sequencing (NGS reads to reference genomes is crucial for almost all NGS applications and downstream analyses. Various repetitive elements in human and other higher eukaryotic genomes contribute in large part to ambiguously (non-uniquely mapped reads. Most available NGS aligners attempt to address this by either removing all non-uniquely mapping reads, or reporting one random or "best" hit based on simple heuristics. Accurate estimation of the mapping quality of NGS reads is therefore critical albeit completely lacking at present. Here we developed a generalized software toolkit "AlignerBoost", which utilizes a Bayesian-based framework to accurately estimate mapping quality of ambiguously mapped NGS reads. We tested AlignerBoost with both simulated and real DNA-seq and RNA-seq datasets at various thresholds. In most cases, but especially for reads falling within repetitive regions, AlignerBoost dramatically increases the mapping precision of modern NGS aligners without significantly compromising the sensitivity even without mapping quality filters. When using higher mapping quality cutoffs, AlignerBoost achieves a much lower false mapping rate while exhibiting comparable or higher sensitivity compared to the aligner default modes, therefore significantly boosting the detection power of NGS aligners even using extreme thresholds. AlignerBoost is also SNP-aware, and higher quality alignments can be achieved if provided with known SNPs. AlignerBoost's algorithm is computationally efficient, and can process one million alignments within 30 seconds on a typical desktop computer. AlignerBoost is implemented as a uniform Java application and is freely available at https://github.com/Grice-Lab/AlignerBoost.

  18. SAMMate: a GUI tool for processing short read alignments in SAM/BAM format

    Directory of Open Access Journals (Sweden)

    Flemington Erik

    2011-01-01

    Full Text Available Abstract Background Next Generation Sequencing (NGS technology generates tens of millions of short reads for each DNA/RNA sample. A key step in NGS data analysis is the short read alignment of the generated sequences to a reference genome. Although storing alignment information in the Sequence Alignment/Map (SAM or Binary SAM (BAM format is now standard, biomedical researchers still have difficulty accessing this information. Results We have developed a Graphical User Interface (GUI software tool named SAMMate. SAMMate allows biomedical researchers to quickly process SAM/BAM files and is compatible with both single-end and paired-end sequencing technologies. SAMMate also automates some standard procedures in DNA-seq and RNA-seq data analysis. Using either standard or customized annotation files, SAMMate allows users to accurately calculate the short read coverage of genomic intervals. In particular, for RNA-seq data SAMMate can accurately calculate the gene expression abundance scores for customized genomic intervals using short reads originating from both exons and exon-exon junctions. Furthermore, SAMMate can quickly calculate a whole-genome signal map at base-wise resolution allowing researchers to solve an array of bioinformatics problems. Finally, SAMMate can export both a wiggle file for alignment visualization in the UCSC genome browser and an alignment statistics report. The biological impact of these features is demonstrated via several case studies that predict miRNA targets using short read alignment information files. Conclusions With just a few mouse clicks, SAMMate will provide biomedical researchers easy access to important alignment information stored in SAM/BAM files. Our software is constantly updated and will greatly facilitate the downstream analysis of NGS data. Both the source code and the GUI executable are freely available under the GNU General Public License at http://sammate.sourceforge.net.

  19. A cross-species alignment tool (CAT)

    DEFF Research Database (Denmark)

    Li, Heng; Guan, Liang; Liu, Tao;

    2007-01-01

    sensitive methods which are usually applied in aligning inter-species sequences. RESULTS: Here we present a new algorithm called CAT (for Cross-species Alignment Tool). It is designed to align mRNA sequences to mammalian-sized genomes. CAT is implemented using C scripts and is freely available on the web...

  20. Pyro-Align: Sample-Align based Multiple Alignment system for Pyrosequencing Reads of Large Number

    CERN Document Server

    Saeed, Fahad

    2009-01-01

    Pyro-Align is a multiple alignment program specifically designed for pyrosequencing reads of huge number. Multiple sequence alignment is shown to be NP-hard and heuristics are designed for approximate solutions. Multiple sequence alignment of pyrosequenceing reads is complex mainly because of 2 factors. One being the huge number of reads, making the use of traditional heuristics,that scale very poorly for large number, unsuitable. The second reason is that the alignment cannot be performed arbitrarily, because the position of the reads with respect to the original genome is important and has to be taken into account.In this report we present a short description of the multiple alignment system for pyrosequencing reads.

  1. Overcoming low-alignment signal contrast induced alignment failure by alignment signal enhancement

    Science.gov (United States)

    Lee, Byeong Soo; Kim, Young Ha; Hwang, Hyunwoo; Lee, Jeongjin; Kong, Jeong Heung; Kang, Young Seog; Paarhuis, Bart; Kok, Haico; de Graaf, Roelof; Weichselbaum, Stefan; Droste, Richard; Mason, Christopher; Aarts, Igor; de Boeij, Wim P.

    2016-03-01

    Overlay is one of the key factors which enables optical lithography extension to 1X node DRAM manufacturing. It is natural that accurate wafer alignment is a prerequisite for good device overlay. However, alignment failures or misalignments are commonly observed in a fab. There are many factors which could induce alignment problems. Low alignment signal contrast is one of the main issues. Alignment signal contrast can be degraded by opaque stack materials or by alignment mark degradation due to processes like CMP. This issue can be compounded by mark sub-segmentation from design rules in combination with double or quadruple spacer process. Alignment signal contrast can be improved by applying new material or process optimization, which sometimes lead to the addition of another process-step with higher costs. If we can amplify the signal components containing the position information and reduce other unwanted signal and background contributions then we can improve alignment performance without process change. In this paper we use ASML's new alignment sensor (as was introduced and released on the NXT:1980Di) and sample wafers with special stacks which can induce poor alignment signal to demonstrate alignment and overlay improvement.

  2. Use of Alignment-Free Phylogenetics for Rapid Genome Sequence-Based Typing of Helicobacter pylori Virulence Markers and Antibiotic Susceptibility

    NARCIS (Netherlands)

    van Vliet, Arnoud H M; Kusters, Johannes G.

    2015-01-01

    Whole-genome sequencing is becoming a leading technology in the typing and epidemiology of microbial pathogens, but the increase in genomic information necessitates significant investment in bioinformatic resources and expertise, and currently used methodologies struggle with genetically heterogeneo

  3. Accurate Dna Assembly And Direct Genome Integration With Optimized Uracil Excision Cloning To Facilitate Engineering Of Escherichia Coli As A Cell Factory

    DEFF Research Database (Denmark)

    Cavaleiro, Mafalda; Kim, Se Hyeuk; Nørholm, Morten

    2015-01-01

    Plants produce a vast diversity of valuable compounds with medical properties, but these are often difficult to purify from the natural source or produce by organic synthesis. An alternative is to transfer the biosynthetic pathways to an efficient production host like the bacterium Escherichia co......-excision-based cloning and combining it with a genome-engineering approach to allow direct integration of whole metabolic pathways into the genome of E. coli, to facilitate the advanced engineering of cell factories....

  4. Beyond Alignment

    DEFF Research Database (Denmark)

    Beyond Alignment: Applying Systems Thinking to Architecting Enterprises is a comprehensive reader about how enterprises can apply systems thinking in their enterprise architecture practice, for business transformation and for strategic execution. The book's contributors find that systems thinking...... is a valuable way of thinking about the viable enterprise and how to architect it....

  5. Mulan: Multiple-Sequence Local Alignment and Visualization for Studying Function and Evolution

    Energy Technology Data Exchange (ETDEWEB)

    Ovcharenko, I; Loots, G; Giardine, B; Hou, M; Ma, J; Hardison, R; Stubbs, L; Miller, W

    2004-07-14

    Multiple sequence alignment analysis is a powerful approach for understanding phylogenetic relationships, annotating genes and detecting functional regulatory elements. With a growing number of partly or fully sequenced vertebrate genomes, effective tools for performing multiple comparisons are required to accurately and efficiently assist biological discoveries. Here we introduce Mulan (http://mulan.dcode.org/), a novel method and a network server for comparing multiple draft and finished-quality sequences to identify functional elements conserved over evolutionary time. Mulan brings together several novel algorithms: the tba multi-aligner program for rapid identification of local sequence conservation and the multiTF program for detecting evolutionarily conserved transcription factor binding sites in multiple alignments. In addition, Mulan supports two-way communication with the GALA database; alignments of multiple species dynamically generated in GALA can be viewed in Mulan, and conserved transcription factor binding sites identified with Mulan/multiTF can be integrated and overlaid with extensive genome annotation data using GALA. Local multiple alignments computed by Mulan ensure reliable representation of short-and large-scale genomic rearrangements in distant organisms. Mulan allows for interactive modification of critical conservation parameters to differentially predict conserved regions in comparisons of both closely and distantly related species. We illustrate the uses and applications of the Mulan tool through multi-species comparisons of the GATA3 gene locus and the identification of elements that are conserved differently in avians than in other genomes allowing speculation on the evolution of birds. Source code for the aligners and the aligner-evaluation software can be freely downloaded from http://bio.cse.psu.edu/.

  6. Accurate overlaying for mobile augmented reality

    NARCIS (Netherlands)

    Pasman, W; van der Schaaf, A; Lagendijk, RL; Jansen, F.W.

    1999-01-01

    Mobile augmented reality requires accurate alignment of virtual information with objects visible in the real world. We describe a system for mobile communications to be developed to meet these strict alignment criteria using a combination of computer vision. inertial tracking and low-latency renderi

  7. Accurate overlaying for mobile augmented reality

    NARCIS (Netherlands)

    Pasman, W; van der Schaaf, A; Lagendijk, RL; Jansen, F.W.

    1999-01-01

    Mobile augmented reality requires accurate alignment of virtual information with objects visible in the real world. We describe a system for mobile communications to be developed to meet these strict alignment criteria using a combination of computer vision. inertial tracking and low-latency

  8. Plant Genome Duplication Database.

    Science.gov (United States)

    Lee, Tae-Ho; Kim, Junah; Robertson, Jon S; Paterson, Andrew H

    2017-01-01

    Genome duplication, widespread in flowering plants, is a driving force in evolution. Genome alignments between/within genomes facilitate identification of homologous regions and individual genes to investigate evolutionary consequences of genome duplication. PGDD (the Plant Genome Duplication Database), a public web service database, provides intra- or interplant genome alignment information. At present, PGDD contains information for 47 plants whose genome sequences have been released. Here, we describe methods for identification and estimation of dates of genome duplication and speciation by functions of PGDD.The database is freely available at http://chibba.agtec.uga.edu/duplication/.

  9. Handling Permutation in Sequence Comparison: Genome-Wide Enhancer Prediction in Vertebrates by a Novel Non-Linear Alignment Scoring Principle.

    Directory of Open Access Journals (Sweden)

    Dirk Dolle

    Full Text Available Enhancers have been described to evolve by permutation without changing function. This has posed the problem of how to predict enhancer elements that are hidden from alignment-based approaches due to the loss of co-linearity. Alignment-free algorithms have been proposed as one possible solution. However, this approach is hampered by several problems inherent to its underlying working principle. Here we present a new approach, which combines the power of alignment and alignment-free techniques into one algorithm. It allows the prediction of enhancers based on the query and target sequence only, no matter whether the regulatory logic is co-linear or reshuffled. To test our novel approach, we employ it for the prediction of enhancers across the evolutionary distance of ~450Myr between human and medaka. We demonstrate its efficacy by subsequent in vivo validation resulting in 82% (9/11 of the predicted medaka regions showing reporter activity. These include five candidates with partially co-linear and four with reshuffled motif patterns. Orthology in flanking genes and conservation of the detected co-linear motifs indicates that those candidates are likely functionally equivalent enhancers. In sum, our results demonstrate that the proposed principle successfully predicts mutated as well as permuted enhancer regions at an encouragingly high rate.

  10. AlignHUSH: Alignment of HMMs using structure and hydrophobicity information

    OpenAIRE

    Krishnadev Oruganty; Srinivasan Narayanaswamy

    2011-01-01

    Abstract Background Sensitive remote homology detection and accurate alignments especially in the midnight zone of sequence similarity are needed for better function annotation and structural modeling of proteins. An algorithm, AlignHUSH for HMM-HMM alignment has been developed which is capable of recognizing distantly related domain families The method uses structural information, in the form of predicted secondary structure probabilities, and hydrophobicity of amino acids to align HMMs of t...

  11. An improved Hough transform-based fingerprint alignment approach

    CSIR Research Space (South Africa)

    Mlambo, CS

    2014-11-01

    Full Text Available An improved Hough Transform based fingerprint alignment approach is presented, which improves computing time and memory usage with accurate alignment parameter (rotation and translation) results. This is achieved by studying the strengths...

  12. Magnetic alignment and the Poisson alignment reference system

    Science.gov (United States)

    Griffith, L. V.; Schenz, R. F.; Sommargren, G. E.

    1990-08-01

    Three distinct metrological operations are necessary to align a free-electron laser (FEL): the magnetic axis must be located, a straight line reference (SLR) must be generated, and the magnetic axis must be related to the SLR. This article begins with a review of the motivation for developing an alignment system that will assure better than 100-μm accuracy in the alignment of the magnetic axis throughout an FEL. The 100-μm accuracy is an error circle about an ideal axis for 300 m or more. The article describes techniques for identifying the magnetic axes of solenoids, quadrupoles, and wiggler poles. Propagation of a laser beam is described to the extent of revealing sources of nonlinearity in the beam. Development of a straight-line reference based on the Poisson line, a diffraction effect, is described in detail. Spheres in a large-diameter laser beam create Poisson lines and thus provide a necessary mechanism for gauging between the magnetic axis and the SLR. Procedures for installing FEL components and calibrating alignment fiducials to the magnetic axes of the components are also described. The Poisson alignment reference system should be accurate to 25 μm over 300 m, which is believed to be a factor-of-4 improvement over earlier techniques. An error budget shows that only 25% of the total budgeted tolerance is used for the alignment reference system, so the remaining tolerances should fall within the allowable range for FEL alignment.

  13. Integrative structural annotation of de novo RNA-Seq provides an accurate reference gene set of the enormous genome of the onion (Allium cepa L.).

    Science.gov (United States)

    Kim, Seungill; Kim, Myung-Shin; Kim, Yong-Min; Yeom, Seon-In; Cheong, Kyeongchae; Kim, Ki-Tae; Jeon, Jongbum; Kim, Sunggil; Kim, Do-Sun; Sohn, Seong-Han; Lee, Yong-Hwan; Choi, Doil

    2015-02-01

    The onion (Allium cepa L.) is one of the most widely cultivated and consumed vegetable crops in the world. Although a considerable amount of onion transcriptome data has been deposited into public databases, the sequences of the protein-coding genes are not accurate enough to be used, owing to non-coding sequences intermixed with the coding sequences. We generated a high-quality, annotated onion transcriptome from de novo sequence assembly and intensive structural annotation using the integrated structural gene annotation pipeline (ISGAP), which identified 54,165 protein-coding genes among 165,179 assembled transcripts totalling 203.0 Mb by eliminating the intron sequences. ISGAP performed reliable annotation, recognizing accurate gene structures based on reference proteins, and ab initio gene models of the assembled transcripts. Integrative functional annotation and gene-based SNP analysis revealed a whole biological repertoire of genes and transcriptomic variation in the onion. The method developed in this study provides a powerful tool for the construction of reference gene sets for organisms based solely on de novo transcriptome data. Furthermore, the reference genes and their variation described here for the onion represent essential tools for molecular breeding and gene cloning in Allium spp.

  14. DIDA: Distributed Indexing Dispatched Alignment.

    Directory of Open Access Journals (Sweden)

    Hamid Mohamadi

    Full Text Available One essential application in bioinformatics that is affected by the high-throughput sequencing data deluge is the sequence alignment problem, where nucleotide or amino acid sequences are queried against targets to find regions of close similarity. When queries are too many and/or targets are too large, the alignment process becomes computationally challenging. This is usually addressed by preprocessing techniques, where the queries and/or targets are indexed for easy access while searching for matches. When the target is static, such as in an established reference genome, the cost of indexing is amortized by reusing the generated index. However, when the targets are non-static, such as contigs in the intermediate steps of a de novo assembly process, a new index must be computed for each run. To address such scalability problems, we present DIDA, a novel framework that distributes the indexing and alignment tasks into smaller subtasks over a cluster of compute nodes. It provides a workflow beyond the common practice of embarrassingly parallel implementations. DIDA is a cost-effective, scalable and modular framework for the sequence alignment problem in terms of memory usage and runtime. It can be employed in large-scale alignments to draft genomes and intermediate stages of de novo assembly runs. The DIDA source code, sample files and user manual are available through http://www.bcgsc.ca/platform/bioinfo/software/dida. The software is released under the British Columbia Cancer Agency License (BCCA, and is free for academic use.

  15. Seeking the perfect alignment

    CERN Multimedia

    2002-01-01

    The first full-scale tests of the ATLAS Muon Spectrometer are about to begin in Prévessin. The set-up includes several layers of Monitored Drift Tubes Chambers (MDTs) and will allow tests of the performance of the detectors and of their highly accurate alignment system.   Monitored Drift Chambers in Building 887 in Prévessin, where they are just about to be tested. Muon chambers are keeping the ATLAS Muon Spectrometer team quite busy this summer. Now that most people go on holiday, the beam and alignment tests for these chambers are just starting. These chambers will measure with high accuracy the momentum of high-energy muons, and this implies very demanding requirements for their alignment. The MDT chambers consist of drift tubes, which are gas-filled metal tubes, 3 cm in diameter, with wires running down their axes. With high voltage between the wire and the tube wall, the ionisation due to traversing muons is detected as electrical pulses. With careful timing of the pulses, the position of the muon t...

  16. Alignment method for solar collector arrays

    Science.gov (United States)

    Driver, Jr., Richard B

    2012-10-23

    The present invention is directed to an improved method for establishing camera fixture location for aligning mirrors on a solar collector array (SCA) comprising multiple mirror modules. The method aligns the mirrors on a module by comparing the location of the receiver image in photographs with the predicted theoretical receiver image location. To accurately align an entire SCA, a common reference is used for all of the individual module images within the SCA. The improved method can use relative pixel location information in digital photographs along with alignment fixture inclinometer data to calculate relative locations of the fixture between modules. The absolute locations are determined by minimizing alignment asymmetry for the SCA. The method inherently aligns all of the mirrors in an SCA to the receiver, even with receiver position and module-to-module alignment errors.

  17. Review of alignment and SNP calling algorithms for next-generation sequencing data.

    Science.gov (United States)

    Mielczarek, M; Szyda, J

    2016-02-01

    Application of the massive parallel sequencing technology has become one of the most important issues in life sciences. Therefore, it was crucial to develop bioinformatics tools for next-generation sequencing (NGS) data processing. Currently, two of the most significant tasks include alignment to a reference genome and detection of single nucleotide polymorphisms (SNPs). In many types of genomic analyses, great numbers of reads need to be mapped to the reference genome; therefore, selection of the aligner is an essential step in NGS pipelines. Two main algorithms-suffix tries and hash tables-have been introduced for this purpose. Suffix array-based aligners are memory-efficient and work faster than hash-based aligners, but they are less accurate. In contrast, hash table algorithms tend to be slower, but more sensitive. SNP and genotype callers may also be divided into two main different approaches: heuristic and probabilistic methods. A variety of software has been subsequently developed over the past several years. In this paper, we briefly review the current development of NGS data processing algorithms and present the available software.

  18. STELLAR: fast and exact local alignments

    Directory of Open Access Journals (Sweden)

    Weese David

    2011-10-01

    Full Text Available Abstract Background Large-scale comparison of genomic sequences requires reliable tools for the search of local alignments. Practical local aligners are in general fast, but heuristic, and hence sometimes miss significant matches. Results We present here the local pairwise aligner STELLAR that has full sensitivity for ε-alignments, i.e. guarantees to report all local alignments of a given minimal length and maximal error rate. The aligner is composed of two steps, filtering and verification. We apply the SWIFT algorithm for lossless filtering, and have developed a new verification strategy that we prove to be exact. Our results on simulated and real genomic data confirm and quantify the conjecture that heuristic tools like BLAST or BLAT miss a large percentage of significant local alignments. Conclusions STELLAR is very practical and fast on very long sequences which makes it a suitable new tool for finding local alignments between genomic sequences under the edit distance model. Binaries are freely available for Linux, Windows, and Mac OS X at http://www.seqan.de/projects/stellar. The source code is freely distributed with the SeqAn C++ library version 1.3 and later at http://www.seqan.de.

  19. Considerations for clinical read alignment and mutational profiling using next-generation sequencing

    Directory of Open Access Journals (Sweden)

    Gavin R Oliver

    2012-07-01

    Full Text Available Next-generation sequencing technologies are increasingly being applied in clinical settings, however the data are characterized by a range of platform-specific artifacts making downstream analysis problematic and error prone. One major application of NGS is in the profiling of clinically relevant mutations whereby sequences are aligned to a reference genome and potential mutations assessed and scored. Accurate sequence alignment is pivotal in reliable assessment of potential mutations however selection of appropriate alignment tools is a non-trivial task complicated by the availability of multiple solutions each with its own performance characteristics. Using BRCA1 as an example, we have simulated and mutated a test dataset based on Illumina sequencing technology. Our findings reveal key differences in the performances of a range of common commercial and open source tools and will be of importance to anyone using NGS to profile mutations in clinical or basic research.

  20. GenomePeek—an online tool for prokaryotic genome and metagenome analysis

    Directory of Open Access Journals (Sweden)

    Katelyn McNair

    2015-06-01

    Full Text Available As more and more prokaryotic sequencing takes place, a method to quickly and accurately analyze this data is needed. Previous tools are mainly designed for metagenomic analysis and have limitations; such as long runtimes and significant false positive error rates. The online tool GenomePeek (edwards.sdsu.edu/GenomePeek was developed to analyze both single genome and metagenome sequencing files, quickly and with low error rates. GenomePeek uses a sequence assembly approach where reads to a set of conserved genes are extracted, assembled and then aligned against the highly specific reference database. GenomePeek was found to be faster than traditional approaches while still keeping error rates low, as well as offering unique data visualization options.

  1. Fast and sensitive alignment of microbial whole genome sequencing reads to large sequence datasets on a desktop PC: application to metagenomic datasets and pathogen identification.

    Directory of Open Access Journals (Sweden)

    Lőrinc S Pongor

    Full Text Available Next generation sequencing (NGS of metagenomic samples is becoming a standard approach to detect individual species or pathogenic strains of microorganisms. Computer programs used in the NGS community have to balance between speed and sensitivity and as a result, species or strain level identification is often inaccurate and low abundance pathogens can sometimes be missed. We have developed Taxoner, an open source, taxon assignment pipeline that includes a fast aligner (e.g. Bowtie2 and a comprehensive DNA sequence database. We tested the program on simulated datasets as well as experimental data from Illumina, IonTorrent, and Roche 454 sequencing platforms. We found that Taxoner performs as well as, and often better than BLAST, but requires two orders of magnitude less running time meaning that it can be run on desktop or laptop computers. Taxoner is slower than the approaches that use small marker databases but is more sensitive due the comprehensive reference database. In addition, it can be easily tuned to specific applications using small tailored databases. When applied to metagenomic datasets, Taxoner can provide a functional summary of the genes mapped and can provide strain level identification. Taxoner is written in C for Linux operating systems. The code and documentation are available for research applications at http://code.google.com/p/taxoner.

  2. R3D Align: global pairwise alignment of RNA 3D structures using local superpositions

    Science.gov (United States)

    Rahrig, Ryan R.; Leontis, Neocles B.; Zirbel, Craig L.

    2010-01-01

    Motivation: Comparing 3D structures of homologous RNA molecules yields information about sequence and structural variability. To compare large RNA 3D structures, accurate automatic comparison tools are needed. In this article, we introduce a new algorithm and web server to align large homologous RNA structures nucleotide by nucleotide using local superpositions that accommodate the flexibility of RNA molecules. Local alignments are merged to form a global alignment by employing a maximum clique algorithm on a specially defined graph that we call the ‘local alignment’ graph. Results: The algorithm is implemented in a program suite and web server called ‘R3D Align’. The R3D Align alignment of homologous 3D structures of 5S, 16S and 23S rRNA was compared to a high-quality hand alignment. A full comparison of the 16S alignment with the other state-of-the-art methods is also provided. The R3D Align program suite includes new diagnostic tools for the structural evaluation of RNA alignments. The R3D Align alignments were compared to those produced by other programs and were found to be the most accurate, in comparison with a high quality hand-crafted alignment and in conjunction with a series of other diagnostics presented. The number of aligned base pairs as well as measures of geometric similarity are used to evaluate the accuracy of the alignments. Availability: R3D Align is freely available through a web server http://rna.bgsu.edu/R3DAlign. The MATLAB source code of the program suite is also freely available for download at that location. Supplementary information: Supplementary data are available at Bioinformatics online. Contact: r-rahrig@onu.edu PMID:20929913

  3. Two genome sequences of the same bacterial strain, Gluconacetobacter diazotrophicus PAl 5, suggest a new standard in genome sequence submission.

    Science.gov (United States)

    Giongo, Adriana; Tyler, Heather L; Zipperer, Ursula N; Triplett, Eric W

    2010-06-15

    Gluconacetobacter diazotrophicus PAl 5 is of agricultural significance due to its ability to provide fixed nitrogen to plants. Consequently, its genome sequence has been eagerly anticipated to enhance understanding of endophytic nitrogen fixation. Two groups have sequenced the PAl 5 genome from the same source (ATCC 49037), though the resulting sequences contain a surprisingly high number of differences. Therefore, an optical map of PAl 5 was constructed in order to determine which genome assembly more closely resembles the chromosomal DNA by aligning each sequence against a physical map of the genome. While one sequence aligned very well, over 98% of the second sequence contained numerous rearrangements. The many differences observed between these two genome sequences could be owing to either assembly errors or rapid evolutionary divergence. The extent of the differences derived from sequence assembly errors could be assessed if the raw sequencing reads were provided by both genome centers at the time of genome sequence submission. Hence, a new genome sequence standard is proposed whereby the investigator supplies the raw reads along with the closed sequence so that the community can make more accurate judgments on whether differences observed in a single stain may be of biological origin or are simply caused by differences in genome assembly procedures.

  4. Bacterial genome reengineering.

    Science.gov (United States)

    Zhou, Jindan; Rudd, Kenneth E

    2011-01-01

    The web application PrimerPair at ecogene.org generates large sets of paired DNA sequences surrounding- all protein and RNA genes of Escherichia coli K-12. Many DNA fragments, which these primers amplify, can be used to implement a genome reengineering strategy using complementary in vitro cloning and in vivo recombineering. The integration of a primer design tool with a model organism database increases the level of quality control. Computer-assisted design of gene primer pairs relies upon having highly accurate genomic DNA sequence information that exactly matches the DNA of the cells being used in the laboratory to ensure predictable DNA hybridizations. It is equally crucial to have confidence that the predicted start codons define the locations of genes accurately. Annotations in the EcoGene database are queried by PrimerPair to eliminate pseudogenes, IS elements, and other problematic genes before the design process starts. These projects progressively familiarize users with the EcoGene content, scope, and application interfaces that are useful for genome reengineering projects. The first protocol leads to the design of a pair of primer sequences that were used to clone and express a single gene. The N-terminal protein sequence was experimentally verified and the protein was detected in the periplasm. This is followed by instructions to design PCR primer pairs for cloning gene fragments encoding 50 periplasmic proteins without their signal peptides. The design process begins with the user simply designating one pair of forward and reverse primer endpoint positions relative to all start and stop codon positions. The gene name, genomic coordinates, and primer DNA sequences are reported to the user. When making chromosomal deletions, the integrity of the provisional primer design is checked to see whether it will generate any unwanted double deletions with adjacent genes. The bad designs are recalculated and replacement primers are provided alongside the

  5. Shuttle onboard IMU alignment methods

    Science.gov (United States)

    Henderson, D. M.

    1976-01-01

    The current approach to the shuttle IMU alignment is based solely on the Apollo Deterministic Method. This method is simple, fast, reliable and provides an accurate estimate for the present cluster to mean of 1,950 transformation matrix. If four or more star sightings are available, the application of least squares analysis can be utilized. The least squares method offers the next level of sophistication to the IMU alignment solution. The least squares method studied shows that a more accurate estimate for the misalignment angles is computed, and the IMU drift rates are a free by-product of the analysis. Core storage requirements are considerably more; estimated 20 to 30 times the core required for the Apollo Deterministic Method. The least squares method offers an intermediate solution utilizing as much data that is available without a complete statistical analysis as in Kalman filtering.

  6. Multiple sequence alignment accuracy and phylogenetic inference.

    Science.gov (United States)

    Ogden, T Heath; Rosenberg, Michael S

    2006-04-01

    Phylogenies are often thought to be more dependent upon the specifics of the sequence alignment rather than on the method of reconstruction. Simulation of sequences containing insertion and deletion events was performed in order to determine the role that alignment accuracy plays during phylogenetic inference. Data sets were simulated for pectinate, balanced, and random tree shapes under different conditions (ultrametric equal branch length, ultrametric random branch length, nonultrametric random branch length). Comparisons between hypothesized alignments and true alignments enabled determination of two measures of alignment accuracy, that of the total data set and that of individual branches. In general, our results indicate that as alignment error increases, topological accuracy decreases. This trend was much more pronounced for data sets derived from more pectinate topologies. In contrast, for balanced, ultrametric, equal branch length tree shapes, alignment inaccuracy had little average effect on tree reconstruction. These conclusions are based on average trends of many analyses under different conditions, and any one specific analysis, independent of the alignment accuracy, may recover very accurate or inaccurate topologies. Maximum likelihood and Bayesian, in general, outperformed neighbor joining and maximum parsimony in terms of tree reconstruction accuracy. Results also indicated that as the length of the branch and of the neighboring branches increase, alignment accuracy decreases, and the length of the neighboring branches is the major factor in topological accuracy. Thus, multiple-sequence alignment can be an important factor in downstream effects on topological reconstruction.

  7. FANSe: an accurate algorithm for quantitative mapping of large scale sequencing reads.

    Science.gov (United States)

    Zhang, Gong; Fedyunin, Ivan; Kirchner, Sebastian; Xiao, Chuanle; Valleriani, Angelo; Ignatova, Zoya

    2012-06-01

    The most crucial step in data processing from high-throughput sequencing applications is the accurate and sensitive alignment of the sequencing reads to reference genomes or transcriptomes. The accurate detection of insertions and deletions (indels) and errors introduced by the sequencing platform or by misreading of modified nucleotides is essential for the quantitative processing of the RNA-based sequencing (RNA-Seq) datasets and for the identification of genetic variations and modification patterns. We developed a new, fast and accurate algorithm for nucleic acid sequence analysis, FANSe, with adjustable mismatch allowance settings and ability to handle indels to accurately and quantitatively map millions of reads to small or large reference genomes. It is a seed-based algorithm which uses the whole read information for mapping and high sensitivity and low ambiguity are achieved by using short and non-overlapping reads. Furthermore, FANSe uses hotspot score to prioritize the processing of highly possible matches and implements modified Smith-Watermann refinement with reduced scoring matrix to accelerate the calculation without compromising its sensitivity. The FANSe algorithm stably processes datasets from various sequencing platforms, masked or unmasked and small or large genomes. It shows a remarkable coverage of low-abundance mRNAs which is important for quantitative processing of RNA-Seq datasets.

  8. The twilight zone of cis element alignments.

    Science.gov (United States)

    Sebastian, Alvaro; Contreras-Moreira, Bruno

    2013-02-01

    Sequence alignment of proteins and nucleic acids is a routine task in bioinformatics. Although the comparison of complete peptides, genes or genomes can be undertaken with a great variety of tools, the alignment of short DNA sequences and motifs entails pitfalls that have not been fully addressed yet. Here we confront the structural superposition of transcription factors with the sequence alignment of their recognized cis elements. Our goals are (i) to test TFcompare (http://floresta.eead.csic.es/tfcompare), a structural alignment method for protein-DNA complexes; (ii) to benchmark the pairwise alignment of regulatory elements; (iii) to define the confidence limits and the twilight zone of such alignments and (iv) to evaluate the relevance of these thresholds with elements obtained experimentally. We find that the structure of cis elements and protein-DNA interfaces is significantly more conserved than their sequence and measures how this correlates with alignment errors when only sequence information is considered. Our results confirm that DNA motifs in the form of matrices produce better alignments than individual sequences. Finally, we report that empirical and theoretically derived twilight thresholds are useful for estimating the natural plasticity of regulatory sequences, and hence for filtering out unreliable alignments.

  9. Descriptive Statistics of the Genome: Phylogenetic Classification of Viruses.

    Science.gov (United States)

    Hernandez, Troy; Yang, Jie

    2016-10-01

    The typical process for classifying and submitting a newly sequenced virus to the NCBI database involves two steps. First, a BLAST search is performed to determine likely family candidates. That is followed by checking the candidate families with the pairwise sequence alignment tool for similar species. The submitter's judgment is then used to determine the most likely species classification. The aim of this article is to show that this process can be automated into a fast, accurate, one-step process using the proposed alignment-free method and properly implemented machine learning techniques. We present a new family of alignment-free vectorizations of the genome, the generalized vector, that maintains the speed of existing alignment-free methods while outperforming all available methods. This new alignment-free vectorization uses the frequency of genomic words (k-mers), as is done in the composition vector, and incorporates descriptive statistics of those k-mers' positional information, as inspired by the natural vector. We analyze five different characterizations of genome similarity using k-nearest neighbor classification and evaluate these on two collections of viruses totaling over 10,000 viruses. We show that our proposed method performs better than, or as well as, other methods at every level of the phylogenetic hierarchy. The data and R code is available upon request.

  10. Accelerated large-scale multiple sequence alignment

    Directory of Open Access Journals (Sweden)

    Lloyd Scott

    2011-12-01

    Full Text Available Abstract Background Multiple sequence alignment (MSA is a fundamental analysis method used in bioinformatics and many comparative genomic applications. Prior MSA acceleration attempts with reconfigurable computing have only addressed the first stage of progressive alignment and consequently exhibit performance limitations according to Amdahl's Law. This work is the first known to accelerate the third stage of progressive alignment on reconfigurable hardware. Results We reduce subgroups of aligned sequences into discrete profiles before they are pairwise aligned on the accelerator. Using an FPGA accelerator, an overall speedup of up to 150 has been demonstrated on a large data set when compared to a 2.4 GHz Core2 processor. Conclusions Our parallel algorithm and architecture accelerates large-scale MSA with reconfigurable computing and allows researchers to solve the larger problems that confront biologists today. Program source is available from http://dna.cs.byu.edu/msa/.

  11. Alignment-free phylogenetics and population genetics.

    Science.gov (United States)

    Haubold, Bernhard

    2014-05-01

    Phylogenetics and population genetics are central disciplines in evolutionary biology. Both are based on comparative data, today usually DNA sequences. These have become so plentiful that alignment-free sequence comparison is of growing importance in the race between scientists and sequencing machines. In phylogenetics, efficient distance computation is the major contribution of alignment-free methods. A distance measure should reflect the number of substitutions per site, which underlies classical alignment-based phylogeny reconstruction. Alignment-free distance measures are either based on word counts or on match lengths, and I apply examples of both approaches to simulated and real data to assess their accuracy and efficiency. While phylogeny reconstruction is based on the number of substitutions, in population genetics, the distribution of mutations along a sequence is also considered. This distribution can be explored by match lengths, thus opening the prospect of alignment-free population genomics.

  12. Murasaki: a fast, parallelizable algorithm to find anchors from multiple genomes.

    Directory of Open Access Journals (Sweden)

    Kris Popendorf

    Full Text Available BACKGROUND: With the number of available genome sequences increasing rapidly, the magnitude of sequence data required for multiple-genome analyses is a challenging problem. When large-scale rearrangements break the collinearity of gene orders among genomes, genome comparison algorithms must first identify sets of short well-conserved sequences present in each genome, termed anchors. Previously, anchor identification among multiple genomes has been achieved using pairwise alignment tools like BLASTZ through progressive alignment tools like TBA, but the computational requirements for sequence comparisons of multiple genomes quickly becomes a limiting factor as the number and scale of genomes grows. METHODOLOGY/PRINCIPAL FINDINGS: Our algorithm, named Murasaki, makes it possible to identify anchors within multiple large sequences on the scale of several hundred megabases in few minutes using a single CPU. Two advanced features of Murasaki are (1 adaptive hash function generation, which enables efficient use of arbitrary mismatch patterns (spaced seeds and therefore the comparison of multiple mammalian genomes in a practical amount of computation time, and (2 parallelizable execution that decreases the required wall-clock and CPU times. Murasaki can perform a sensitive anchoring of eight mammalian genomes (human, chimp, rhesus, orangutan, mouse, rat, dog, and cow in 21 hours CPU time (42 minutes wall time. This is the first single-pass in-core anchoring of multiple mammalian genomes. We evaluated Murasaki by comparing it with the genome alignment programs BLASTZ and TBA. We show that Murasaki can anchor multiple genomes in near linear time, compared to the quadratic time requirements of BLASTZ and TBA, while improving overall accuracy. CONCLUSIONS/SIGNIFICANCE: Murasaki provides an open source platform to take advantage of long patterns, cluster computing, and novel hash algorithms to produce accurate anchors across multiple genomes with

  13. Murasaki: a fast, parallelizable algorithm to find anchors from multiple genomes.

    Science.gov (United States)

    Popendorf, Kris; Tsuyoshi, Hachiya; Osana, Yasunori; Sakakibara, Yasubumi

    2010-09-24

    With the number of available genome sequences increasing rapidly, the magnitude of sequence data required for multiple-genome analyses is a challenging problem. When large-scale rearrangements break the collinearity of gene orders among genomes, genome comparison algorithms must first identify sets of short well-conserved sequences present in each genome, termed anchors. Previously, anchor identification among multiple genomes has been achieved using pairwise alignment tools like BLASTZ through progressive alignment tools like TBA, but the computational requirements for sequence comparisons of multiple genomes quickly becomes a limiting factor as the number and scale of genomes grows. Our algorithm, named Murasaki, makes it possible to identify anchors within multiple large sequences on the scale of several hundred megabases in few minutes using a single CPU. Two advanced features of Murasaki are (1) adaptive hash function generation, which enables efficient use of arbitrary mismatch patterns (spaced seeds) and therefore the comparison of multiple mammalian genomes in a practical amount of computation time, and (2) parallelizable execution that decreases the required wall-clock and CPU times. Murasaki can perform a sensitive anchoring of eight mammalian genomes (human, chimp, rhesus, orangutan, mouse, rat, dog, and cow) in 21 hours CPU time (42 minutes wall time). This is the first single-pass in-core anchoring of multiple mammalian genomes. We evaluated Murasaki by comparing it with the genome alignment programs BLASTZ and TBA. We show that Murasaki can anchor multiple genomes in near linear time, compared to the quadratic time requirements of BLASTZ and TBA, while improving overall accuracy. Murasaki provides an open source platform to take advantage of long patterns, cluster computing, and novel hash algorithms to produce accurate anchors across multiple genomes with computational efficiency significantly greater than existing methods. Murasaki is available

  14. Vibrating wire alignment technique

    CERN Document Server

    Xiao-Long, Wang; lei, Wu; Chun-Hua, Li

    2013-01-01

    Vibrating wire alignment technique is a kind of method which through measuring the spatial distribution of magnetic field to do the alignment and it can achieve very high alignment accuracy. Vibrating wire alignment technique can be applied for magnet fiducialization and accelerator straight section components alignment, it is a necessary supplement for conventional alignment method. This article will systematically expound the international research achievements of vibrating wire alignment technique, including vibrating wire model analysis, system frequency calculation, wire sag calculation and the relation between wire amplitude and magnetic induction intensity. On the basis of model analysis this article will introduce the alignment method which based on magnetic field measurement and the alignment method which based on amplitude and phase measurement. Finally, some basic questions will be discussed and the solutions will be given.

  15. MUON DETECTORS: ALIGNMENT

    CERN Multimedia

    G.Gomez.

    Since June of 2009, the muon alignment group has focused on providing new alignment constants and on finalizing the hardware alignment reconstruction. Alignment constants for DTs and CSCs were provided for CRAFT09 data reprocessing. For DT chambers, the track-based alignment was repeated using CRAFT09 cosmic ray muons and validated using segment extrapolation and split cosmic tools. One difference with respect to the previous alignment is that only five degrees of freedom were aligned, leaving the rotation around the local x-axis to be better determined by the hardware system. Similarly, DT chambers poorly aligned by tracks (due to limited statistics) were aligned by a combination of photogrammetry and hardware-based alignment. For the CSC chambers, the hardware system provided alignment in global z and rotations about local x. Entire muon endcap rings were further corrected in the transverse plane (global x and y) by the track-based alignment. Single chamber track-based alignment suffers from poor statistic...

  16. nGASP - the nematode genome annotation assessment project

    Energy Technology Data Exchange (ETDEWEB)

    Coghlan, A; Fiedler, T J; McKay, S J; Flicek, P; Harris, T W; Blasiar, D; Allen, J; Stein, L D

    2008-12-19

    While the C. elegans genome is extensively annotated, relatively little information is available for other Caenorhabditis species. The nematode genome annotation assessment project (nGASP) was launched to objectively assess the accuracy of protein-coding gene prediction software in C. elegans, and to apply this knowledge to the annotation of the genomes of four additional Caenorhabditis species and other nematodes. Seventeen groups worldwide participated in nGASP, and submitted 47 prediction sets for 10 Mb of the C. elegans genome. Predictions were compared to reference gene sets consisting of confirmed or manually curated gene models from WormBase. The most accurate gene-finders were 'combiner' algorithms, which made use of transcript- and protein-alignments and multi-genome alignments, as well as gene predictions from other gene-finders. Gene-finders that used alignments of ESTs, mRNAs and proteins came in second place. There was a tie for third place between gene-finders that used multi-genome alignments and ab initio gene-finders. The median gene level sensitivity of combiners was 78% and their specificity was 42%, which is nearly the same accuracy as reported for combiners in the human genome. C. elegans genes with exons of unusual hexamer content, as well as those with many exons, short exons, long introns, a weak translation start signal, weak splice sites, or poorly conserved orthologs were the most challenging for gene-finders. While the C. elegans genome is extensively annotated, relatively little information is available for other Caenorhabditis species. The nematode genome annotation assessment project (nGASP) was launched to objectively assess the accuracy of protein-coding gene prediction software in C. elegans, and to apply this knowledge to the annotation of the genomes of four additional Caenorhabditis species and other nematodes. Seventeen groups worldwide participated in nGASP, and submitted 47 prediction sets for 10 Mb of the C

  17. MUON DETECTORS: ALIGNMENT

    CERN Multimedia

    G. Gomez and J. Pivarski

    2011-01-01

    Alignment efforts in the first few months of 2011 have shifted away from providing alignment constants (now a well established procedure) and focussed on some critical remaining issues. The single most important task left was to understand the systematic differences observed between the track-based (TB) and hardware-based (HW) barrel alignments: a systematic difference in r-φ and in z, which grew as a function of z, and which amounted to ~4-5 mm differences going from one end of the barrel to the other. This difference is now understood to be caused by the tracker alignment. The systematic differences disappear when the track-based barrel alignment is performed using the new “twist-free” tracker alignment. This removes the largest remaining source of systematic uncertainty. Since the barrel alignment is based on hardware, it does not suffer from the tracker twist. However, untwisting the tracker causes endcap disks (which are aligned ...

  18. SinicView: A visualization environment for comparisons of multiple nucleotide sequence alignment tools

    OpenAIRE

    Wong Chun-Yi; Wu Yu-Wei; Chen Shiang-Heng; Peng Chin-Lin; Lin Laurent; Lee DT; Shih Arthur; Chou Meng-Yuan; Shiao Tze-Chang; Hsieh Mu-Fen

    2006-01-01

    Abstract Background Deluged by the rate and complexity of completed genomic sequences, the need to align longer sequences becomes more urgent, and many more tools have thus been developed. In the initial stage of genomic sequence analysis, a biologist is usually faced with the questions of how to choose the best tool to align sequences of interest and how to analyze and visualize the alignment results, and then with the question of whether poorly aligned regions produced by the tool are indee...

  19. Implementation of a Parallel Protein Structure Alignment Service on Cloud

    Directory of Open Access Journals (Sweden)

    Che-Lun Hung

    2013-01-01

    Full Text Available Protein structure alignment has become an important strategy by which to identify evolutionary relationships between protein sequences. Several alignment tools are currently available for online comparison of protein structures. In this paper, we propose a parallel protein structure alignment service based on the Hadoop distribution framework. This service includes a protein structure alignment algorithm, a refinement algorithm, and a MapReduce programming model. The refinement algorithm refines the result of alignment. To process vast numbers of protein structures in parallel, the alignment and refinement algorithms are implemented using MapReduce. We analyzed and compared the structure alignments produced by different methods using a dataset randomly selected from the PDB database. The experimental results verify that the proposed algorithm refines the resulting alignments more accurately than existing algorithms. Meanwhile, the computational performance of the proposed service is proportional to the number of processors used in our cloud platform.

  20. Implementation of a Parallel Protein Structure Alignment Service on Cloud

    Science.gov (United States)

    Hung, Che-Lun; Lin, Yaw-Ling

    2013-01-01

    Protein structure alignment has become an important strategy by which to identify evolutionary relationships between protein sequences. Several alignment tools are currently available for online comparison of protein structures. In this paper, we propose a parallel protein structure alignment service based on the Hadoop distribution framework. This service includes a protein structure alignment algorithm, a refinement algorithm, and a MapReduce programming model. The refinement algorithm refines the result of alignment. To process vast numbers of protein structures in parallel, the alignment and refinement algorithms are implemented using MapReduce. We analyzed and compared the structure alignments produced by different methods using a dataset randomly selected from the PDB database. The experimental results verify that the proposed algorithm refines the resulting alignments more accurately than existing algorithms. Meanwhile, the computational performance of the proposed service is proportional to the number of processors used in our cloud platform. PMID:23671842

  1. Accurate discrimination of conserved coding and non-coding regions through multiple indicators of evolutionary dynamics

    Directory of Open Access Journals (Sweden)

    Pesole Graziano

    2009-09-01

    Full Text Available Abstract Background The conservation of sequences between related genomes has long been recognised as an indication of functional significance and recognition of sequence homology is one of the principal approaches used in the annotation of newly sequenced genomes. In the context of recent findings that the number non-coding transcripts in higher organisms is likely to be much higher than previously imagined, discrimination between conserved coding and non-coding sequences is a topic of considerable interest. Additionally, it should be considered desirable to discriminate between coding and non-coding conserved sequences without recourse to the use of sequence similarity searches of protein databases as such approaches exclude the identification of novel conserved proteins without characterized homologs and may be influenced by the presence in databases of sequences which are erroneously annotated as coding. Results Here we present a machine learning-based approach for the discrimination of conserved coding sequences. Our method calculates various statistics related to the evolutionary dynamics of two aligned sequences. These features are considered by a Support Vector Machine which designates the alignment coding or non-coding with an associated probability score. Conclusion We show that our approach is both sensitive and accurate with respect to comparable methods and illustrate several situations in which it may be applied, including the identification of conserved coding regions in genome sequences and the discrimination of coding from non-coding cDNA sequences.

  2. MUON DETECTORS: ALIGNMENT

    CERN Multimedia

    G.Gomez

    2010-01-01

    The main developments in muon alignment since March 2010 have been the production, approval and deployment of alignment constants for the ICHEP data reprocessing. In the barrel, a new geometry, combining information from both hardware and track-based alignment systems, has been developed for the first time. The hardware alignment provides an initial DT geometry, which is then anchored as a rigid solid, using the link alignment system, to a reference frame common to the tracker. The “GlobalPositionRecords” for both the Tracker and Muon systems are being used for the first time, and the initial tracker-muon relative positioning, based on the link alignment, yields good results within the photogrammetry uncertainties of the Tracker and alignment ring positions. For the first time, the optical and track-based alignments show good agreement between them; the optical alignment being refined by the track-based alignment. The resulting geometry is the most complete to date, aligning all 250 DTs, ...

  3. MUON DETECTORS: ALIGNMENT

    CERN Multimedia

    Z. Szillasi and G. Gomez.

    2013-01-01

    When CMS is opened up, major components of the Link and Barrel Alignment systems will be removed. This operation, besides allowing for maintenance of the detector underneath, is needed for making interventions that will reinforce the alignment measurements and make the operation of the alignment system more reliable. For that purpose and also for their general maintenance and recalibration, the alignment components will be transferred to the Alignment Lab situated in the ISR area. For the track-based alignment, attention is focused on the determination of systematic uncertainties, which have become dominant, since now there is a large statistics of muon tracks. This will allow for an improved Monte Carlo misalignment scenario and updated alignment position errors, crucial for high-momentum muon analysis such as Z′ searches.

  4. CSA: An efficient algorithm to improve circular DNA multiple alignment

    Directory of Open Access Journals (Sweden)

    Pereira Luísa

    2009-07-01

    Full Text Available Abstract Background The comparison of homologous sequences from different species is an essential approach to reconstruct the evolutionary history of species and of the genes they harbour in their genomes. Several complete mitochondrial and nuclear genomes are now available, increasing the importance of using multiple sequence alignment algorithms in comparative genomics. MtDNA has long been used in phylogenetic analysis and errors in the alignments can lead to errors in the interpretation of evolutionary information. Although a large number of multiple sequence alignment algorithms have been proposed to date, they all deal with linear DNA and cannot handle directly circular DNA. Researchers interested in aligning circular DNA sequences must first rotate them to the "right" place using an essentially manual process, before they can use multiple sequence alignment tools. Results In this paper we propose an efficient algorithm that identifies the most interesting region to cut circular genomes in order to improve phylogenetic analysis when using standard multiple sequence alignment algorithms. This algorithm identifies the largest chain of non-repeated longest subsequences common to a set of circular mitochondrial DNA sequences. All the sequences are then rotated and made linear for multiple alignment purposes. To evaluate the effectiveness of this new tool, three different sets of mitochondrial DNA sequences were considered. Other tests considering randomly rotated sequences were also performed. The software package Arlequin was used to evaluate the standard genetic measures of the alignments obtained with and without the use of the CSA algorithm with two well known multiple alignment algorithms, the CLUSTALW and the MAVID tools, and also the visualization tool SinicView. Conclusion The results show that a circularization and rotation pre-processing step significantly improves the efficiency of public available multiple sequence alignment

  5. A novel approach to multiple sequence alignment using hadoop data grids.

    Science.gov (United States)

    Sudha Sadasivam, G; Baktavatchalam, G

    2010-01-01

    Multiple alignment of protein sequences helps to determine evolutionary linkage and to predict molecular structures. The factors to be considered while aligning multiple sequences are speed and accuracy of alignment. Although dynamic programming algorithms produce accurate alignments, they are computation intensive. In this paper we propose a time efficient approach to sequence alignment that also produces quality alignment. The dynamic nature of the algorithm coupled with data and computational parallelism of hadoop data grids improves the accuracy and speed of sequence alignment. The principle of block splitting in hadoop coupled with its scalability facilitates alignment of very large sequences.

  6. Inference of homologous recombination in bacteria using whole-genome sequences.

    Science.gov (United States)

    Didelot, Xavier; Lawson, Daniel; Darling, Aaron; Falush, Daniel

    2010-12-01

    Bacteria and archaea reproduce clonally, but sporadically import DNA into their chromosomes from other organisms. In many of these events, the imported DNA replaces an homologous segment in the recipient genome. Here we present a new method to reconstruct the history of recombination events that affected a given sample of bacterial genomes. We introduce a mathematical model that represents both the donor and the recipient of each DNA import as an ancestor of the genomes in the sample. The model represents a simplification of the previously described coalescent with gene conversion. We implement a Monte Carlo Markov chain algorithm to perform inference under this model from sequence data alignments and show that inference is feasible for whole-genome alignments through parallelization. Using simulated data, we demonstrate accurate and reliable identification of individual recombination events and global recombination rate parameters. We applied our approach to an alignment of 13 whole genomes from the Bacillus cereus group. We find, as expected from laboratory experiments, that the recombination rate is higher between closely related organisms and also that the genome contains several broad regions of elevated levels of recombination. Application of the method to the genomic data sets that are becoming available should reveal the evolutionary history and private lives of populations of bacteria and archaea. The methods described in this article have been implemented in a computer software package, ClonalOrigin, which is freely available from http://code.google.com/p/clonalorigin/.

  7. New Attitude Sensor Alignment Calibration Algorithms

    Science.gov (United States)

    Hashmall, Joseph A.; Sedlak, Joseph E.; Harman, Richard (Technical Monitor)

    2002-01-01

    Accurate spacecraft attitudes may only be obtained if the primary attitude sensors are well calibrated. Launch shock, relaxation of gravitational stresses and similar effects often produce large enough alignment shifts so that on-orbit alignment calibration is necessary if attitude accuracy requirements are to be met. A variety of attitude sensor alignment algorithms have been developed to meet the need for on-orbit calibration. Two new algorithms are presented here: ALICAL and ALIQUEST. Each of these has advantages in particular circumstances. ALICAL is an attitude independent algorithm that uses near simultaneous measurements from two or more sensors to produce accurate sensor alignments. For each set of simultaneous observations the attitude is overdetermined. The information content of the extra degrees of freedom can be combined over numerous sets to provide the sensor alignments. ALIQUEST is an attitude dependent algorithm that combines sensor and attitude data into a loss function that has the same mathematical form as the Wahba problem. Alignments can then be determined using any of the algorithms (such as the QUEST quaternion estimator) that have been developed to solve the Wahba problem for attitude. Results from the use of these methods on active missions are presented.

  8. Enhanced de novo assembly of high throughput pyrosequencing data using whole genome mapping.

    Science.gov (United States)

    Onmus-Leone, Fatma; Hang, Jun; Clifford, Robert J; Yang, Yu; Riley, Matthew C; Kuschner, Robert A; Waterman, Paige E; Lesho, Emil P

    2013-01-01

    Despite major advances in next-generation sequencing, assembly of sequencing data, especially data from novel microorganisms or re-emerging pathogens, remains constrained by the lack of suitable reference sequences. De novo assembly is the best approach to achieve an accurate finished sequence, but multiple sequencing platforms or paired-end libraries are often required to achieve full genome coverage. In this study, we demonstrated a method to assemble complete bacterial genome sequences by integrating shotgun Roche 454 pyrosequencing with optical whole genome mapping (WGM). The whole genome restriction map (WGRM) was used as the reference to scaffold de novo assembled sequence contigs through a stepwise process. Large de novo contigs were placed in the correct order and orientation through alignment to the WGRM. De novo contigs that were not aligned to WGRM were merged into scaffolds using contig branching structure information. These extended scaffolds were then aligned to the WGRM to identify the overlaps to be eliminated and the gaps and mismatches to be resolved with unused contigs. The process was repeated until a sequence with full coverage and alignment with the whole genome map was achieved. Using this method we were able to achieved 100% WGRM coverage without a paired-end library. We assembled complete sequences for three distinct genetic components of a clinical isolate of Providencia stuartii: a bacterial chromosome, a novel bla NDM-1 plasmid, and a novel bacteriophage, without separately purifying them to homogeneity.

  9. Shod wear and foot alignment in clinical gait analysis.

    Science.gov (United States)

    Louey, Melissa Gar Yee; Sangeux, Morgan

    2016-09-01

    Sagittal plane alignment of the foot presents challenges when the subject wears shoes during gait analysis. Typically, visual alignment is performed by positioning two markers, the heel and toe markers, aligned with the foot within the shoe. Alternatively, software alignment is possible when the sole of the shoe lies parallel to the ground, and the change in the shoe's sole thickness is measured and entered as a parameter. The aim of this technical note was to evaluate the accuracy of visual and software foot alignment during shod gait analysis. We calculated the static standing ankle angles of 8 participants (mean age: 8.7 years, SD: 2.9 years) wearing bilateral solid ankle foot orthoses (BSAFOs) with and without shoes using the visual and software alignment methods. All participants were able to stand with flat feet in both static trials and the ankle angles obtained in BSAFOs without shoes was considered the reference. We showed that the current implementation of software alignment introduces a bias towards more ankle dorsiflexion, mean=3°, SD=3.4°, p=0.006, and proposed an adjusted software alignment method. We found no statistical differences using visual alignment and adjusted software alignment between the shoe and shoeless conditions, p=0.19 for both. Visual alignment or adjusted software alignment are advised to represent foot alignment accurately.

  10. Scaling statistical multiple sequence alignment to large datasets

    Directory of Open Access Journals (Sweden)

    Michael Nute

    2016-11-01

    Full Text Available Abstract Background Multiple sequence alignment is an important task in bioinformatics, and alignments of large datasets containing hundreds or thousands of sequences are increasingly of interest. While many alignment methods exist, the most accurate alignments are likely to be based on stochastic models where sequences evolve down a tree with substitutions, insertions, and deletions. While some methods have been developed to estimate alignments under these stochastic models, only the Bayesian method BAli-Phy has been able to run on even moderately large datasets, containing 100 or so sequences. A technique to extend BAli-Phy to enable alignments of thousands of sequences could potentially improve alignment and phylogenetic tree accuracy on large-scale data beyond the best-known methods today. Results We use simulated data with up to 10,000 sequences representing a variety of model conditions, including some that are significantly divergent from the statistical models used in BAli-Phy and elsewhere. We give a method for incorporating BAli-Phy into PASTA and UPP, two strategies for enabling alignment methods to scale to large datasets, and give alignment and tree accuracy results measured against the ground truth from simulations. Comparable results are also given for other methods capable of aligning this many sequences. Conclusions Extensions of BAli-Phy using PASTA and UPP produce significantly more accurate alignments and phylogenetic trees than the current leading methods.

  11. Using local alignments for relation recognition

    NARCIS (Netherlands)

    S. Katrenko; P. Adriaans; M. van Someren

    2010-01-01

    This paper discusses the problem of marrying structural similarity with semantic relatedness for Information Extraction from text. Aiming at accurate recognition of relations, we introduce local alignment kernels and explore various possibilities of using them for this task. We give a definition of

  12. Alignment modification for pencil eye shields

    Energy Technology Data Exchange (ETDEWEB)

    Evans, M.D.; Pla, M.; Podgorsak, E.B. (McGill Univ., Quebec (Canada))

    1989-01-01

    Accurate alignment of pencil beam eye shields to protect the lens of the eye may be made easier by means of a simple modification of existing apparatus. This involves drilling a small hole through the center of the shield to isolate the rayline directed to the lens and fabricating a suitable plug for this hole.

  13. Ontology alignment with OLA

    OpenAIRE

    Euzenat, Jérôme; Loup, David; Touzani, Mohamed; Valtchev, Petko

    2004-01-01

    euzenat2004d; International audience; Using ontologies is the standard way to achieve interoperability of heterogeneous systems within the Semantic web. However, as the ontologies underlying two systems are not necessarily compatible, they may in turn need to be aligned. Similarity-based approaches to alignment seems to be both powerful and flexible enough to match the expressive power of languages like OWL. We present an alignment tool that follows the similarity-based paradigm, called OLA. ...

  14. Erasing errors due to alignment ambiguity when estimating positive selection.

    Science.gov (United States)

    Redelings, Benjamin

    2014-08-01

    Current estimates of diversifying positive selection rely on first having an accurate multiple sequence alignment. Simulation studies have shown that under biologically plausible conditions, relying on a single estimate of the alignment from commonly used alignment software can lead to unacceptably high false-positive rates in detecting diversifying positive selection. We present a novel statistical method that eliminates excess false positives resulting from alignment error by jointly estimating the degree of positive selection and the alignment under an evolutionary model. Our model treats both substitutions and insertions/deletions as sequence changes on a tree and allows site heterogeneity in the substitution process. We conduct inference starting from unaligned sequence data by integrating over all alignments. This approach naturally accounts for ambiguous alignments without requiring ambiguously aligned sites to be identified and removed prior to analysis. We take a Bayesian approach and conduct inference using Markov chain Monte Carlo to integrate over all alignments on a fixed evolutionary tree topology. We introduce a Bayesian version of the branch-site test and assess the evidence for positive selection using Bayes factors. We compare two models of differing dimensionality using a simple alternative to reversible-jump methods. We also describe a more accurate method of estimating the Bayes factor using Rao-Blackwellization. We then show using simulated data that jointly estimating the alignment and the presence of positive selection solves the problem with excessive false positives from erroneous alignments and has nearly the same power to detect positive selection as when the true alignment is known. We also show that samples taken from the posterior alignment distribution using the software BAli-Phy have substantially lower alignment error compared with MUSCLE, MAFFT, PRANK, and FSA alignments.

  15. MUON DETECTORS: ALIGNMENT

    CERN Multimedia

    G.Gomez

    2010-01-01

    Most of the work in muon alignment since December 2009 has focused on the geometry reconstruction from the optical systems and improvements in the internal alignment of the DT chambers. The barrel optical alignment system has progressively evolved from reconstruction of single active planes to super-planes (December 09) to a new, full barrel reconstruction. Initial validation studies comparing this full barrel alignment at 0T with photogrammetry provide promising results. In addition, the method has been applied to CRAFT09 data, and the resulting alignment at 3.8T yields residuals from tracks (extrapolated from the tracker) which look smooth, suggesting a good internal barrel alignment with a small overall offset with respect to the tracker. This is a significant improvement, which should allow the optical system to provide a start-up alignment for 2010. The end-cap optical alignment has made considerable progress in the analysis of transfer line data. The next set of alignment constants for CSCs will there...

  16. Whole genome phylogenies for multiple Drosophila species

    Directory of Open Access Journals (Sweden)

    Seetharam Arun

    2012-12-01

    Full Text Available Abstract Background Reconstructing the evolutionary history of organisms using traditional phylogenetic methods may suffer from inaccurate sequence alignment. An alternative approach, particularly effective when whole genome sequences are available, is to employ methods that don’t use explicit sequence alignments. We extend a novel phylogenetic method based on Singular Value Decomposition (SVD to reconstruct the phylogeny of 12 sequenced Drosophila species. SVD analysis provides accurate comparisons for a high fraction of sequences within whole genomes without the prior identification of orthologs or homologous sites. With this method all protein sequences are converted to peptide frequency vectors within a matrix that is decomposed to provide simplified vector representations for each protein of the genome in a reduced dimensional space. These vectors are summed together to provide a vector representation for each species, and the angle between these vectors provides distance measures that are used to construct species trees. Results An unfiltered whole genome analysis (193,622 predicted proteins strongly supports the currently accepted phylogeny for 12 Drosophila species at higher dimensions except for the generally accepted but difficult to discern sister relationship between D. erecta and D. yakuba. Also, in accordance with previous studies, many sequences appear to support alternative phylogenies. In this case, we observed grouping of D. erecta with D. sechellia when approximately 55% to 95% of the proteins were removed using a filter based on projection values or by reducing resolution by using fewer dimensions. Similar results were obtained when just the melanogaster subgroup was analyzed. Conclusions These results indicate that using our novel phylogenetic method, it is possible to consult and interpret all predicted protein sequences within multiple whole genomes to produce accurate phylogenetic estimations of relatedness between

  17. Universal seeds for cDNA-to-genome comparison

    Directory of Open Access Journals (Sweden)

    Florea Liliana

    2008-01-01

    Full Text Available Abstract Background To meet the needs of gene annotation for newly sequenced organisms, optimized spaced seeds can be implemented into cross-species sequence alignment programs to accurately align gene sequences to the genome of a related species. So far, seed performance has been tested for comparisons between closely related species, such as human and mouse, or on simulated data. As the number and variety of genomes increases, it becomes desirable to identify a small set of universal seeds that perform optimally or near-optimally on a large range of comparisons. Results Using statistical regression methods, we investigate the sensitivity of seeds, in particular good seeds, between four cDNA-to-genome comparisons at different evolutionary distances (human-dog, human-mouse, human-chicken and human-zebrafish, and identify classes of comparisons that show similar seed behavior and therefore can employ the same seed. In addition, we find that with high confidence good seeds for more distant comparisons perform well on closer comparisons, within 98–99% of the optimal seeds, and thus represent universal good seeds. Conclusion We show for the first time that optimal and near-optimal seeds for distant species-to-species comparisons are more generally applicable to a wide range of comparisons. This finding will be instrumental in developing practical and user-friendly cDNA-to-genome alignment applications, to aid in the annotation of new model organisms.

  18. ChimeRScope: a novel alignment-free algorithm for fusion transcript prediction using paired-end RNA-Seq data.

    Science.gov (United States)

    Li, You; Heavican, Tayla B; Vellichirammal, Neetha N; Iqbal, Javeed; Guda, Chittibabu

    2017-07-27

    The RNA-Seq technology has revolutionized transcriptome characterization not only by accurately quantifying gene expression, but also by the identification of novel transcripts like chimeric fusion transcripts. The 'fusion' or 'chimeric' transcripts have improved the diagnosis and prognosis of several tumors, and have led to the development of novel therapeutic regimen. The fusion transcript detection is currently accomplished by several software packages, primarily relying on sequence alignment algorithms. The alignment of sequencing reads from fusion transcript loci in cancer genomes can be highly challenging due to the incorrect mapping induced by genomic alterations, thereby limiting the performance of alignment-based fusion transcript detection methods. Here, we developed a novel alignment-free method, ChimeRScope that accurately predicts fusion transcripts based on the gene fingerprint (as k-mers) profiles of the RNA-Seq paired-end reads. Results on published datasets and in-house cancer cell line datasets followed by experimental validations demonstrate that ChimeRScope consistently outperforms other popular methods irrespective of the read lengths and sequencing depth. More importantly, results on our in-house datasets show that ChimeRScope is a better tool that is capable of identifying novel fusion transcripts with potential oncogenic functions. ChimeRScope is accessible as a standalone software at (https://github.com/ChimeRScope/ChimeRScope/wiki) or via the Galaxy web-interface at (https://galaxy.unmc.edu/). © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  19. Backup Alignment Devices on Shuttle: Heads-Up Display or Crew Optical Alignment Sight

    Science.gov (United States)

    Chavez, Melissa A.

    2011-01-01

    NASA s Space Shuttle was built to withstand multiple failures while still keeping the crew and vehicle safe. Although the design of the Space Shuttle had a great deal of redundancy built into each system, there were often additional ways to keep systems in the best configuration if a failure were to occur. One such method was to use select pieces of hardware in a way for which they were not primarily intended. The primary function of the Heads-Up Display (HUD) was to provide the crew with a display of flight critical information during the entry phase. The primary function of the Crew Optical Alignment Sight (COAS) was to provide the crew an optical alignment capability for rendezvous and docking phases. An alignment device was required to keep the Inertial Measurement Units (IMUs) well aligned for a safe Entry; nominally this alignment device would be the two on-board Star Trackers. However, in the event of a Star Tracker failure, the HUD or COAS could also be used as a backup alignment device, but only if the device had been calibrated beforehand. Once the HUD or COAS was calibrated and verified then it was considered an adequate backup to the Star Trackers for entry IMU alignment. There were procedures in place and the astronauts were trained on how to accurately calibrate the HUD or COAS and how to use them as an alignment device. The calibration procedure for the HUD and COAS had been performed on many Shuttle missions. Many of the first calibrations performed were for data gathering purposes to determine which device was more accurate as a backup alignment device, HUD or COAS. Once this was determined, the following missions would frequently calibrate the HUD in order to be one step closer to having the device ready in case it was needed as a backup alignment device.

  20. PASTA: Ultra-Large Multiple Sequence Alignment for Nucleotide and Amino-Acid Sequences.

    Science.gov (United States)

    Mirarab, Siavash; Nguyen, Nam; Guo, Sheng; Wang, Li-San; Kim, Junhyong; Warnow, Tandy

    2015-05-01

    We introduce PASTA, a new multiple sequence alignment algorithm. PASTA uses a new technique to produce an alignment given a guide tree that enables it to be both highly scalable and very accurate. We present a study on biological and simulated data with up to 200,000 sequences, showing that PASTA produces highly accurate alignments, improving on the accuracy and scalability of the leading alignment methods (including SATé). We also show that trees estimated on PASTA alignments are highly accurate--slightly better than SATé trees, but with substantial improvements relative to other methods. Finally, PASTA is faster than SATé, highly parallelizable, and requires relatively little memory.

  1. An efficient genetic algorithm for structural RNA pairwise alignment and its application to non-coding RNA discovery in yeast

    Directory of Open Access Journals (Sweden)

    Taneda Akito

    2008-12-01

    Full Text Available Abstract Background Aligning RNA sequences with low sequence identity has been a challenging problem since such a computation essentially needs an algorithm with high complexities for taking structural conservation into account. Although many sophisticated algorithms for the purpose have been proposed to date, further improvement in efficiency is necessary to accelerate its large-scale applications including non-coding RNA (ncRNA discovery. Results We developed a new genetic algorithm, Cofolga2, for simultaneously computing pairwise RNA sequence alignment and consensus folding, and benchmarked it using BRAliBase 2.1. The benchmark results showed that our new algorithm is accurate and efficient in both time and memory usage. Then, combining with the originally trained SVM, we applied the new algorithm to novel ncRNA discovery where we compared S. cerevisiae genome with six related genomes in a pairwise manner. By focusing our search to the relatively short regions (50 bp to 2,000 bp sandwiched by conserved sequences, we successfully predict 714 intergenic and 1,311 sense or antisense ncRNA candidates, which were found in the pairwise alignments with stable consensus secondary structure and low sequence identity (≤ 50%. By comparing with the previous predictions, we found that > 92% of the candidates is novel candidates. The estimated rate of false positives in the predicted candidates is 51%. Twenty-five percent of the intergenic candidates has supports for expression in cell, i.e. their genomic positions overlap those of the experimentally determined transcripts in literature. By manual inspection of the results, moreover, we obtained four multiple alignments with low sequence identity which reveal consensus structures shared by three species/sequences. Conclusion The present method gives an efficient tool complementary to sequence-alignment-based ncRNA finders.

  2. Phonetic alignment for speech synthesis in under-resourced languages

    CSIR Research Space (South Africa)

    Van Niekerk, DR

    2009-09-01

    Full Text Available The rapid development of concatenative speech synthesis systems in resource scarce languages requires an efficient and accurate solution with regard to automated phonetic alignment. However, in this context corpora are often minimally designed due...

  3. Rapid identification of sequences for orphan enzymes to power accurate protein annotation.

    Directory of Open Access Journals (Sweden)

    Kevin R Ramkissoon

    Full Text Available The power of genome sequencing depends on the ability to understand what those genes and their proteins products actually do. The automated methods used to assign functions to putative proteins in newly sequenced organisms are limited by the size of our library of proteins with both known function and sequence. Unfortunately this library grows slowly, lagging well behind the rapid increase in novel protein sequences produced by modern genome sequencing methods. One potential source for rapidly expanding this functional library is the "back catalog" of enzymology--"orphan enzymes," those enzymes that have been characterized and yet lack any associated sequence. There are hundreds of orphan enzymes in the Enzyme Commission (EC database alone. In this study, we demonstrate how this orphan enzyme "back catalog" is a fertile source for rapidly advancing the state of protein annotation. Starting from three orphan enzyme samples, we applied mass-spectrometry based analysis and computational methods (including sequence similarity networks, sequence and structural alignments, and operon context analysis to rapidly identify the specific sequence for each orphan while avoiding the most time- and labor-intensive aspects of typical sequence identifications. We then used these three new sequences to more accurately predict the catalytic function of 385 previously uncharacterized or misannotated proteins. We expect that this kind of rapid sequence identification could be efficiently applied on a larger scale to make enzymology's "back catalog" another powerful tool to drive accurate genome annotation.

  4. Rapid identification of sequences for orphan enzymes to power accurate protein annotation.

    Science.gov (United States)

    Ramkissoon, Kevin R; Miller, Jennifer K; Ojha, Sunil; Watson, Douglas S; Bomar, Martha G; Galande, Amit K; Shearer, Alexander G

    2013-01-01

    The power of genome sequencing depends on the ability to understand what those genes and their proteins products actually do. The automated methods used to assign functions to putative proteins in newly sequenced organisms are limited by the size of our library of proteins with both known function and sequence. Unfortunately this library grows slowly, lagging well behind the rapid increase in novel protein sequences produced by modern genome sequencing methods. One potential source for rapidly expanding this functional library is the "back catalog" of enzymology--"orphan enzymes," those enzymes that have been characterized and yet lack any associated sequence. There are hundreds of orphan enzymes in the Enzyme Commission (EC) database alone. In this study, we demonstrate how this orphan enzyme "back catalog" is a fertile source for rapidly advancing the state of protein annotation. Starting from three orphan enzyme samples, we applied mass-spectrometry based analysis and computational methods (including sequence similarity networks, sequence and structural alignments, and operon context analysis) to rapidly identify the specific sequence for each orphan while avoiding the most time- and labor-intensive aspects of typical sequence identifications. We then used these three new sequences to more accurately predict the catalytic function of 385 previously uncharacterized or misannotated proteins. We expect that this kind of rapid sequence identification could be efficiently applied on a larger scale to make enzymology's "back catalog" another powerful tool to drive accurate genome annotation.

  5. Rapid Identification of Sequences for Orphan Enzymes to Power Accurate Protein Annotation

    Science.gov (United States)

    Ojha, Sunil; Watson, Douglas S.; Bomar, Martha G.; Galande, Amit K.; Shearer, Alexander G.

    2013-01-01

    The power of genome sequencing depends on the ability to understand what those genes and their proteins products actually do. The automated methods used to assign functions to putative proteins in newly sequenced organisms are limited by the size of our library of proteins with both known function and sequence. Unfortunately this library grows slowly, lagging well behind the rapid increase in novel protein sequences produced by modern genome sequencing methods. One potential source for rapidly expanding this functional library is the “back catalog” of enzymology – “orphan enzymes,” those enzymes that have been characterized and yet lack any associated sequence. There are hundreds of orphan enzymes in the Enzyme Commission (EC) database alone. In this study, we demonstrate how this orphan enzyme “back catalog” is a fertile source for rapidly advancing the state of protein annotation. Starting from three orphan enzyme samples, we applied mass-spectrometry based analysis and computational methods (including sequence similarity networks, sequence and structural alignments, and operon context analysis) to rapidly identify the specific sequence for each orphan while avoiding the most time- and labor-intensive aspects of typical sequence identifications. We then used these three new sequences to more accurately predict the catalytic function of 385 previously uncharacterized or misannotated proteins. We expect that this kind of rapid sequence identification could be efficiently applied on a larger scale to make enzymology’s “back catalog” another powerful tool to drive accurate genome annotation. PMID:24386392

  6. Design of practical alignment device in KSTAR Thomson diagnostic

    Energy Technology Data Exchange (ETDEWEB)

    Lee, J. H., E-mail: jhlee@nfri.re.kr [National Fusion Research Institute, Daejeon (Korea, Republic of); University of Science and Technology (UST), Daejeon (Korea, Republic of); Lee, S. H. [National Fusion Research Institute, Daejeon (Korea, Republic of); Yamada, I. [National Institute for Fusion Science, Toki (Japan)

    2016-11-15

    The precise alignment of the laser path and collection optics in Thomson scattering measurements is essential for accurately determining electron temperature and density in tokamak experiments. For the last five years, during the development stage, the KSTAR tokamak’s Thomson diagnostic system has had alignment fibers installed in its optical collection modules, but these lacked a proper alignment detection system. In order to address these difficulties, an alignment verifying detection device between lasers and an object field of collection optics is developed. The alignment detection device utilizes two types of filters: a narrow laser band wavelength for laser, and a broad wavelength filter for Thomson scattering signal. Four such alignment detection devices have been successfully developed for the KSTAR Thomson scattering system in this year, and these will be tested in KSTAR experiments in 2016. In this paper, we present the newly developed alignment detection device for KSTAR’s Thomson scattering diagnostics.

  7. Efficient Word Alignment with Markov Chain Monte Carlo

    Directory of Open Access Journals (Sweden)

    Östling Robert

    2016-10-01

    Full Text Available We present EFMARAL, a new system for efficient and accurate word alignment using a Bayesian model with Markov Chain Monte Carlo (MCMC inference. Through careful selection of data structures and model architecture we are able to surpass the fast_align system, commonly used for performance-critical word alignment, both in computational efficiency and alignment accuracy. Our evaluation shows that a phrase-based statistical machine translation (SMT system produces translations of higher quality when using word alignments from EFMARAL than from fast_align, and that translation quality is on par with what is obtained using GIZA++, a tool requiring orders of magnitude more processing time. More generally we hope to convince the reader that Monte Carlo sampling, rather than being viewed as a slow method of last resort, should actually be the method of choice for the SMT practitioner and others interested in word alignment.

  8. MUON DETECTORS: ALIGNMENT

    CERN Multimedia

    G. Gomez

    Since December, the muon alignment community has focused on analyzing the data recorded so far in order to produce new DT and CSC Alignment Records for the second reprocessing of CRAFT data. Two independent algorithms were developed which align the DT chambers using global tracks, thus providing, for the first time, a relative alignment of the barrel with respect to the tracker. These results are an important ingredient for the second CRAFT reprocessing and allow, for example, a more detailed study of any possible mis-modelling of the magnetic field in the muon spectrometer. Both algorithms are constructed in such a way that the resulting alignment constants are not affected, to first order, by any such mis-modelling. The CSC chambers have not yet been included in this global track-based alignment due to a lack of statistics, since only a few cosmics go through the tracker and the CSCs. A strategy exists to align the CSCs using the barrel as a reference until collision tracks become available. Aligning the ...

  9. MUON DETECTORS: ALIGNMENT

    CERN Multimedia

    Gervasio Gomez

    The main progress of the muon alignment group since March has been in the refinement of both the track-based alignment for the DTs and the hardware-based alignment for the CSCs. For DT track-based alignment, there has been significant improvement in the internal alignment of the superlayers inside the DTs. In particular, the distance between superlayers is now corrected, eliminating the residual dependence on track impact angles, and good agreement is found between survey and track-based corrections. The new internal geometry has been approved to be included in the forthcoming reprocessing of CRAFT samples. The alignment of DTs with respect to the tracker using global tracks has also improved significantly, since the algorithms use the latest B-field mapping, better run selection criteria, optimized momentum cuts, and an alignment is now obtained for all six degrees of freedom (three spatial coordinates and three rotations) of the aligned DTs. This work is ongoing and at a stage where we are trying to unders...

  10. MUON DETECTORS: ALIGNMENT

    CERN Multimedia

    G.Gomez

    2011-01-01

    The Muon Alignment work now focuses on producing a new track-based alignment with higher track statistics, making systematic studies between the results of the hardware and track-based alignment methods and aligning the barrel using standalone muon tracks. Currently, the muon track reconstruction software uses a hardware-based alignment in the barrel (DT) and a track-based alignment in the endcaps (CSC). An important task is to assess the muon momentum resolution that can be achieved using the current muon alignment, especially for highly energetic muons. For this purpose, cosmic ray muons are used, since the rate of high-energy muons from collisions is very low and the event statistics are still limited. Cosmics have the advantage of higher statistics in the pT region above 100 GeV/c, but they have the disadvantage of having a mostly vertical topology, resulting in a very few global endcap muons. Only the barrel alignment has therefore been tested so far. Cosmic muons traversing CMS from top to bottom are s...

  11. Physics of Grain Alignment

    CERN Document Server

    Lazarian, A

    2000-01-01

    Aligned grains provide one of the easiest ways to study magnetic fields in diffuse gas and molecular clouds. How reliable our conclusions about the inferred magnetic field depends critically on our understanding of the physics of grain alignment. Although grain alignment is a problem of half a century standing recent progress achieved in the field makes us believe that we are approaching the solution of this mystery. I review basic physical processes involved in grain alignment and show why mechanisms that were favored for decades do not look so promising right now. I also discuss why the radiative torque mechanism ignored for more than 20 years looks right now the most powerful means of grain alignment.

  12. MUON DETECTORS: ALIGNMENT

    CERN Multimedia

    G. Gomez

    2011-01-01

    A new set of muon alignment constants was approved in August. The relative position between muon chambers is essentially unchanged, indicating good detector stability. The main changes concern the global positioning of the barrel and of the endcap rings to match the new Tracker geometry. Detailed studies of the differences between track-based and optical alignment of DTs have proven to be a valuable tool for constraining Tracker alignment weak modes, and this information is now being used as part of the alignment procedure. In addition to the “split-cosmic” analysis used to investigate the muon momentum resolution at high momentum, a new procedure based on reconstructing the invariant mass of di-muons from boosted Zs is under development. Both procedures show an improvement in the momentum precision of Global Muons with respect to Tracker-only Muons. Recent developments in track-based alignment include a better treatment of the tails of residual distributions and accounting for correla...

  13. SPEAR3 Construction Alignment

    Energy Technology Data Exchange (ETDEWEB)

    LeCocq, Catherine; Banuelos, Cristobal; Fuss, Brian; Gaudreault, Francis; Gaydosh, Michael; Griffin, Levirt; Imfeld, Hans; McDougal, John; Perry, Michael; Rogers,; /SLAC

    2005-08-17

    An ambitious seven month shutdown of the existing SPEAR2 synchrotron radiation facility was successfully completed in March 2004 when the first synchrotron light was observed in the new SPEAR3 ring, SPEAR3 completely replaced SPEAR2 with new components aligned on a new highly-flat concrete floor. Devices such as magnets and vacuum chambers had to be fiducialized and later aligned on girder rafts that were then placed into the ring over pre-aligned support plates. Key to the success of aligning this new ring was to ensure that the new beam orbit matched the old SPEAR2 orbit so that existing experimental beamlines would not have to be reoriented. In this presentation a pictorial summary of the Alignment Engineering Group's surveying tasks for the construction of the SPEAR3 ring is provided. Details on the networking and analysis of various surveys throughout the project can be found in the accompanying paper.

  14. MICAN: a protein structure alignment algorithm that can handle Multiple-chains, Inverse alignments, C(α) only models, Alternative alignments, and Non-sequential alignments.

    Science.gov (United States)

    Minami, Shintaro; Sawada, Kengo; Chikenji, George

    2013-01-18

    Protein pairs that have the same secondary structure packing arrangement but have different topologies have attracted much attention in terms of both evolution and physical chemistry of protein structures. Further investigation of such protein relationships would give us a hint as to how proteins can change their fold in the course of evolution, as well as a insight into physico-chemical properties of secondary structure packing. For this purpose, highly accurate sequence order independent structure comparison methods are needed. We have developed a novel protein structure alignment algorithm, MICAN (a structure alignment algorithm that can handle Multiple-chain complexes, Inverse direction of secondary structures, Cα only models, Alternative alignments, and Non-sequential alignments). The algorithm was designed so as to identify the best structural alignment between protein pairs by disregarding the connectivity between secondary structure elements (SSE). One of the key feature of the algorithm is utilizing the multiple vector representation for each SSE, which enables us to correctly treat bent or twisted nature of long SSE. We compared MICAN with other 9 publicly available structure alignment programs, using both reference-dependent and reference-independent evaluation methods on a variety of benchmark test sets which include both sequential and non-sequential alignments. We show that MICAN outperforms the other existing methods for reproducing reference alignments of non-sequential test sets. Further, although MICAN does not specialize in sequential structure alignment, it showed the top level performance on the sequential test sets. We also show that MICAN program is the fastest non-sequential structure alignment program among all the programs we examined here. MICAN is the fastest and the most accurate program among non-sequential alignment programs we examined here. These results suggest that MICAN is a highly effective tool for automatically detecting non

  15. MICAN : a protein structure alignment algorithm that can handle Multiple-chains, Inverse alignments, Cα only models, Alternative alignments, and Non-sequential alignments

    Science.gov (United States)

    2013-01-01

    Background Protein pairs that have the same secondary structure packing arrangement but have different topologies have attracted much attention in terms of both evolution and physical chemistry of protein structures. Further investigation of such protein relationships would give us a hint as to how proteins can change their fold in the course of evolution, as well as a insight into physico-chemical properties of secondary structure packing. For this purpose, highly accurate sequence order independent structure comparison methods are needed. Results We have developed a novel protein structure alignment algorithm, MICAN (a structure alignment algorithm that can handle Multiple-chain complexes, Inverse direction of secondary structures, Cα only models, Alternative alignments, and Non-sequential alignments). The algorithm was designed so as to identify the best structural alignment between protein pairs by disregarding the connectivity between secondary structure elements (SSE). One of the key feature of the algorithm is utilizing the multiple vector representation for each SSE, which enables us to correctly treat bent or twisted nature of long SSE. We compared MICAN with other 9 publicly available structure alignment programs, using both reference-dependent and reference-independent evaluation methods on a variety of benchmark test sets which include both sequential and non-sequential alignments. We show that MICAN outperforms the other existing methods for reproducing reference alignments of non-sequential test sets. Further, although MICAN does not specialize in sequential structure alignment, it showed the top level performance on the sequential test sets. We also show that MICAN program is the fastest non-sequential structure alignment program among all the programs we examined here. Conclusions MICAN is the fastest and the most accurate program among non-sequential alignment programs we examined here. These results suggest that MICAN is a highly effective tool

  16. Rapid protein alignment in the cloud: HAMOND combines fast DIAMOND alignments with Hadoop parallelism.

    Science.gov (United States)

    Yu, Jia; Blom, Jochen; Sczyrba, Alexander; Goesmann, Alexander

    2017-02-21

    The introduction of next generation sequencing has caused a steady increase in the amounts of data that have to be processed in modern life science. Sequence alignment plays a key role in the analysis of sequencing data e.g. within whole genome sequencing or metagenome projects. BLAST is a commonly used alignment tool that was the standard approach for more than two decades, but in the last years faster alternatives have been proposed including RapSearch, GHOSTX, and DIAMOND. Here we introduce HAMOND, an application that uses Apache Hadoop to parallelize DIAMOND computation in order to scale-out the calculation of alignments. HAMOND is fault tolerant and scalable by utilizing large cloud computing infrastructures like Amazon Web Services. HAMOND has been tested in comparative genomics analyses and showed promising results both in efficiency and accuracy.

  17. MSA-PAD: DNA multiple sequence alignment framework based on PFAM accessed domain information.

    Science.gov (United States)

    Balech, Bachir; Vicario, Saverio; Donvito, Giacinto; Monaco, Alfonso; Notarangelo, Pasquale; Pesole, Graziano

    2015-08-01

    Here we present the MSA-PAD application, a DNA multiple sequence alignment framework that uses PFAM protein domain information to align DNA sequences encoding either single or multiple protein domains. MSA-PAD has two alignment options: gene and genome mode.

  18. Multiple sequence alignment accuracy and evolutionary distance estimation.

    Science.gov (United States)

    Rosenberg, Michael S

    2005-11-23

    Sequence alignment is a common tool in bioinformatics and comparative genomics. It is generally assumed that multiple sequence alignment yields better results than pair wise sequence alignment, but this assumption has rarely been tested, and never with the control provided by simulation analysis. This study used sequence simulation to examine the gain in accuracy of adding a third sequence to a pair wise alignment, particularly concentrating on how the phylogenetic position of the additional sequence relative to the first pair changes the accuracy of the initial pair's alignment as well as their estimated evolutionary distance. The maximal gain in alignment accuracy was found not when the third sequence is directly intermediate between the initial two sequences, but rather when it perfectly subdivides the branch leading from the root of the tree to one of the original sequences (making it half as close to one sequence as the other). Evolutionary distance estimation in the multiple alignment framework, however, is largely unrelated to alignment accuracy and rather is dependent on the position of the third sequence; the closer the branch leading to the third sequence is to the root of the tree, the larger the estimated distance between the first two sequences. The bias in distance estimation appears to be a direct result of the standard greedy progressive algorithm used by many multiple alignment methods. These results have implications for choosing new taxa and genomes to sequence when resources are limited.

  19. DendroBLAST: approximate phylogenetic trees in the absence of multiple sequence alignments.

    Directory of Open Access Journals (Sweden)

    Steven Kelly

    Full Text Available The rapidly growing availability of genome information has created considerable demand for both fast and accurate phylogenetic inference algorithms. We present a novel method called DendroBLAST for reconstructing phylogenetic dendrograms/trees from protein sequences using BLAST. This method differs from other methods by incorporating a simple model of sequence evolution to test the effect of introducing sequence changes on the reliability of the bipartitions in the inferred tree. Using realistic simulated sequence data we demonstrate that this method produces phylogenetic trees that are more accurate than other commonly-used distance based methods though not as accurate as maximum likelihood methods from good quality multiple sequence alignments. In addition to tests on simulated data, we use DendroBLAST to generate input trees for a supertree reconstruction of the phylogeny of the Archaea. This independent analysis produces an approximate phylogeny of the Archaea that has both high precision and recall when compared to previously published analysis of the same dataset using conventional methods. Taken together these results demonstrate that approximate phylogenetic trees can be produced in the absence of multiple sequence alignments, and we propose that these trees will provide a platform for improving and informing downstream bioinformatic analysis. A web implementation of the DendroBLAST method is freely available for use at http://www.dendroblast.com/.

  20. Galaxy alignments: An overview

    CERN Document Server

    Joachimi, Benjamin; Kitching, Thomas D; Leonard, Adrienne; Mandelbaum, Rachel; Schäfer, Björn Malte; Sifón, Cristóbal; Hoekstra, Henk; Kiessling, Alina; Kirk, Donnacha; Rassat, Anais

    2015-01-01

    The alignments between galaxies, their underlying matter structures, and the cosmic web constitute vital ingredients for a comprehensive understanding of gravity, the nature of matter, and structure formation in the Universe. We provide an overview on the state of the art in the study of these alignment processes and their observational signatures, aimed at a non-specialist audience. The development of the field over the past one hundred years is briefly reviewed. We also discuss the impact of galaxy alignments on measurements of weak gravitational lensing, and discuss avenues for making theoretical and observational progress over the coming decade.

  1. Discriminative Shape Alignment

    DEFF Research Database (Denmark)

    Loog, M.; de Bruijne, M.

    2009-01-01

    The alignment of shape data to a common mean before its subsequent processing is an ubiquitous step within the area shape analysis. Current approaches to shape analysis or, as more specifically considered in this work, shape classification perform the alignment in a fully unsupervised way......, not taking into account that eventually the shapes are to be assigned to two or more different classes. This work introduces a discriminative variation to well-known Procrustes alignment and demonstrates its benefit over this classical method in shape classification tasks. The focus is on two......-dimensional shapes from a two-class recognition problem....

  2. Lateral pupil alignment tolerance in peripheral refractometry.

    Science.gov (United States)

    Fedtke, Cathleen; Ehrmann, Klaus; Ho, Arthur; Holden, Brien A

    2011-05-01

    To investigate the tolerance to lateral pupil misalignment in peripheral refraction compared with central refraction. A Shin-Nippon NVision-K5001 open-view auto-refractor was used to measure central and peripheral refraction (30° temporal and 30° nasal visual field) of the right eyes of 10 emmetropic and 10 myopic participants. At each of the three fixation angles, five readings were recorded for each of the following alignment positions relative to pupil center: centrally aligned, 1 and 2 mm temporally aligned, and 1 and 2 mm nasally aligned. For central fixation, increasing dealignment from pupil center produced a quadratic decrease (r ≥ 0.98, p < 0.04) in the refractive power vectors M and J180 which, when interpolated, reached clinical significance (i.e., ≥ 0.25 diopter for M and ≥ 0.125 diopter for J180 and J45) for an alignment error of 0.79 mm or greater. M and J180 as measured in the 30° temporal and 30° nasal visual field led to a significant linear correlation (r ≥ 0.94, p < 0.02) as pupil dealignment gradually changed from temporal to nasal. As determined from regression analysis, a pupil alignment error of 0.20 mm or greater would introduce errors in M and J180 that are clinically significant. Tolerance to lateral pupil alignment error decreases strongly in the periphery compared with the greater tolerance in central refraction. Thus, precise alignment of the entrance pupil with the instrument axis is critical for accurate and reliable peripheral refraction.

  3. MUON DETECTORS: ALIGNMENT

    CERN Multimedia

    G.Gomez

    Since September, the muon alignment system shifted from a mode of hardware installation and commissioning to operation and data taking. All three optical subsystems (Barrel, Endcap and Link alignment) have recorded data before, during and after CRAFT, at different magnetic fields and during ramps of the magnet. This first data taking experience has several interesting goals: •    study detector deformations and movements under the influence of the huge magnetic forces; •    study the stability of detector structures and of the alignment system over long periods, •    study geometry reproducibility at equal fields (specially at 0T and 3.8T); •    reconstruct B=0T geometry and compare to nominal/survey geometries; •    reconstruct B=3.8T geometry and provide DT and CSC alignment records for CMSSW. However, the main goal is to recons...

  4. MUON DETECTORS: ALIGNMENT

    CERN Multimedia

    S. Szillasi

    2013-01-01

    The CMS detector has been gradually opened and whenever a wheel became exposed the first operation was the removal of the MABs, the sensor structures of the Hardware Barrel Alignment System. By the last days of June all 36 MABs have arrived at the Alignment Lab at the ISR where, as part of the Alignment Upgrade Project, they are refurbished with new Survey target holders. Their electronic checkout is on the way and finally they will be recalibrated. During LS1 the alignment system will be upgraded in order to allow more precise reconstruction of the MB4 chambers in Sector 10 and Sector 4. This requires new sensor components, so called MiniMABs (pictured below), that have already been assembled and calibrated. Image 6: Calibrated MiniMABs are ready for installation For the track-based alignment, the systematic uncertainties of the algorithm are under scrutiny: this study will enable the production of an improved Monte Carlo misalignment scenario and to update alignment position errors eventually, crucial...

  5. MUON DETECTORS: ALIGNMENT

    CERN Multimedia

    G. Gomez

    2012-01-01

      A new muon alignment has been produced for 2012 A+B data reconstruction. It uses the latest Tracker alignment and single-muon data samples to align both DTs and CSCs. Physics validation has been performed and shows a modest improvement in stand-alone muon momentum resolution in the barrel, where the alignment is essentially unchanged from the previous version. The reference-target track-based algorithm using only collision muons is employed for the first time to align the CSCs, and a substantial improvement in resolution is observed in the endcap and overlap regions for stand-alone muons. This new alignment is undergoing the approval process and is expected to be deployed as part of a new global tag in the beginning of December. The pT dependence of the φ-bias in curvature observed in Monte Carlo was traced to a relative vertical misalignment between the Tracker and barrel muon systems. Moving the barrel as a whole to match the Tracker cures this pT dependence, leaving only the &phi...

  6. Incremental Alignment Manifold Learning

    Institute of Scientific and Technical Information of China (English)

    Zhi Han; De-Yu Meng; Zong-Sen Xu; Nan-Nan Gu

    2011-01-01

    A new manifold learning method, called incremental alignment method (IAM), is proposed for nonlinear dimensionality reduction of high dimensional data with intrinsic low dimensionality. The main idea is to incrementally align low-dimensional coordinates of input data patch-by-patch to iteratively generate the representation of the entire dataset. The method consists of two major steps, the incremental step and the alignment step. The incremental step incrementally searches neighborhood patch to be aligned in the next step, and the alignment step iteratively aligns the low-dimensional coordinates of the neighborhood patch searched to generate the embeddings of the entire dataset. Compared with the existing manifold learning methods, the proposed method dominates in several aspects: high efficiency, easy out-of-sample extension, well metric-preserving, and averting of the local minima issue. All these properties are supported by a series of experiments performed on the synthetic and real-life datasets. In addition, the computational complexity of the proposed method is analyzed, and its efficiency is theoretically argued and experimentally demonstrated.

  7. Oculus: faster sequence alignment by streaming read compression

    Science.gov (United States)

    2012-01-01

    Background Despite significant advancement in alignment algorithms, the exponential growth of nucleotide sequencing throughput threatens to outpace bioinformatic analysis. Computation may become the bottleneck of genome analysis if growing alignment costs are not mitigated by further improvement in algorithms. Much gain has been gleaned from indexing and compressing alignment databases, but many widely used alignment tools process input reads sequentially and are oblivious to any underlying redundancy in the reads themselves. Results Here we present Oculus, a software package that attaches to standard aligners and exploits read redundancy by performing streaming compression, alignment, and decompression of input sequences. This nearly lossless process (> 99.9%) led to alignment speedups of up to 270% across a variety of data sets, while requiring a modest amount of memory. We expect that streaming read compressors such as Oculus could become a standard addition to existing RNA-Seq and ChIP-Seq alignment pipelines, and potentially other applications in the future as throughput increases. Conclusions Oculus efficiently condenses redundant input reads and wraps existing aligners to provide nearly identical SAM output in a fraction of the aligner runtime. It includes a number of useful features, such as tunable performance and fidelity options, compatibility with FASTA or FASTQ files, and adherence to the SAM format. The platform-independent C++ source code is freely available online, at http://code.google.com/p/oculus-bio. PMID:23148484

  8. Multiple sequence alignment with user-defined anchor points

    Directory of Open Access Journals (Sweden)

    Pöhler Dirk

    2006-04-01

    Full Text Available Abstract Background Automated software tools for multiple alignment often fail to produce biologically meaningful results. In such situations, expert knowledge can help to improve the quality of alignments. Results Herein, we describe a semi-automatic version of the alignment program DIALIGN that can take pre-defined constraints into account. It is possible for the user to specify parts of the sequences that are assumed to be homologous and should therefore be aligned to each other. Our software program can use these sites as anchor points by creating a multiple alignment respecting these constraints. This way, our alignment method can produce alignments that are biologically more meaningful than alignments produced by fully automated procedures. As a demonstration of how our method works, we apply our approach to genomic sequences around the Hox gene cluster and to a set of DNA-binding proteins. As a by-product, we obtain insights about the performance of the greedy algorithm that our program uses for multiple alignment and about the underlying objective function. This information will be useful for the further development of DIALIGN. The described alignment approach has been integrated into the TRACKER software system.

  9. CloudAligner: A fast and full-featured MapReduce based tool for sequence mapping

    Directory of Open Access Journals (Sweden)

    Shi Weisong

    2011-06-01

    Full Text Available Abstract Background Research in genetics has developed rapidly recently due to the aid of next generation sequencing (NGS. However, massively-parallel NGS produces enormous amounts of data, which leads to storage, compatibility, scalability, and performance issues. The Cloud Computing and MapReduce framework, which utilizes hundreds or thousands of shared computers to map sequencing reads quickly and efficiently to reference genome sequences, appears to be a very promising solution for these issues. Consequently, it has been adopted by many organizations recently, and the initial results are very promising. However, since these are only initial steps toward this trend, the developed software does not provide adequate primary functions like bisulfite, pair-end mapping, etc., in on-site software such as RMAP or BS Seeker. In addition, existing MapReduce-based applications were not designed to process the long reads produced by the most recent second-generation and third-generation NGS instruments and, therefore, are inefficient. Last, it is difficult for a majority of biologists untrained in programming skills to use these tools because most were developed on Linux with a command line interface. Results To urge the trend of using Cloud technologies in genomics and prepare for advances in second- and third-generation DNA sequencing, we have built a Hadoop MapReduce-based application, CloudAligner, which achieves higher performance, covers most primary features, is more accurate, and has a user-friendly interface. It was also designed to be able to deal with long sequences. The performance gain of CloudAligner over Cloud-based counterparts (35 to 80% mainly comes from the omission of the reduce phase. In comparison to local-based approaches, the performance gain of CloudAligner is from the partition and parallel processing of the huge reference genome as well as the reads. The source code of CloudAligner is available at http

  10. CloudAligner: A fast and full-featured MapReduce based tool for sequence mapping.

    Science.gov (United States)

    Nguyen, Tung; Shi, Weisong; Ruden, Douglas

    2011-06-06

    Research in genetics has developed rapidly recently due to the aid of next generation sequencing (NGS). However, massively-parallel NGS produces enormous amounts of data, which leads to storage, compatibility, scalability, and performance issues. The Cloud Computing and MapReduce framework, which utilizes hundreds or thousands of shared computers to map sequencing reads quickly and efficiently to reference genome sequences, appears to be a very promising solution for these issues. Consequently, it has been adopted by many organizations recently, and the initial results are very promising. However, since these are only initial steps toward this trend, the developed software does not provide adequate primary functions like bisulfite, pair-end mapping, etc., in on-site software such as RMAP or BS Seeker. In addition, existing MapReduce-based applications were not designed to process the long reads produced by the most recent second-generation and third-generation NGS instruments and, therefore, are inefficient. Last, it is difficult for a majority of biologists untrained in programming skills to use these tools because most were developed on Linux with a command line interface. To urge the trend of using Cloud technologies in genomics and prepare for advances in second- and third-generation DNA sequencing, we have built a Hadoop MapReduce-based application, CloudAligner, which achieves higher performance, covers most primary features, is more accurate, and has a user-friendly interface. It was also designed to be able to deal with long sequences. The performance gain of CloudAligner over Cloud-based counterparts (35 to 80%) mainly comes from the omission of the reduce phase. In comparison to local-based approaches, the performance gain of CloudAligner is from the partition and parallel processing of the huge reference genome as well as the reads. The source code of CloudAligner is available at http://cloudaligner.sourceforge.net/ and its web version is at http

  11. Evaluation of alignment marks using ASML ATHENA alignment system in 90nm BEOL process

    CERN Document Server

    Tan Chin Boon; Koh Hui Peng; Koo Chee, Kiong; Siew Yong Kong; Yeo Swee Hock

    2003-01-01

    As the critical dimension (CD) in integrated circuit (IC) device reduces, the total overlay budget needs to be more stringent. Typically, the allowable overlay error is 1/3 of the CD in the IC device. In this case, robustness of alignment mark is critical, as accurate signal is required by the scanner's alignment system to precisely align a layer of pattern to the previous layer. Alignment issue is more severe in back-end process partly due to the influenced of Chemical Mechanical Polishing (CMP), which contribute to the asymmetric or total destruction of the alignment marks. Alignment marks on the wafer can be placed along the scribe-line of the IC pattern. ASML scanner allows such type of wafer alignment using phase grating mark, known as Scribe-line Primary Mark (SPM) which can be fit into a standard 80um scribe-line. In this paper, we have studied the feasibility of introducing Narrow SPM (NSPM) to enable a smaller scribe-line. The width of NSPM has been shrunk down to 70% of the SPM and the length remain...

  12. HAMSA: Highly Accelerated Multiple Sequence Aligner

    Directory of Open Access Journals (Sweden)

    Naglaa M. Reda

    2016-06-01

    Full Text Available For biologists, the existence of an efficient tool for multiple sequence alignment is essential. This work presents a new parallel aligner called HAMSA. HAMSA is a bioinformatics application designed for highly accelerated alignment of multiple sequences of proteins and DNA/RNA on a multi-core cluster system. The design of HAMSA is based on a combination of our new optimized algorithms proposed recently of vectorization, partitioning, and scheduling. It mainly operates on a distance vector instead of a distance matrix. It accomplishes similarity computations and generates the guide tree in a highly accelerated and accurate manner. HAMSA outperforms MSAProbs with 21.9- fold speedup, and ClustalW-MPI of 11-fold speedup. It can be considered as an essential tool for structure prediction, protein classification, motive finding and drug design studies.

  13. Using structure to explore the sequence alignment space of remote homologs.

    Directory of Open Access Journals (Sweden)

    Andrew Kuziemko

    2011-10-01

    Full Text Available Protein structure modeling by homology requires an accurate sequence alignment between the query protein and its structural template. However, sequence alignment methods based on dynamic programming (DP are typically unable to generate accurate alignments for remote sequence homologs, thus limiting the applicability of modeling methods. A central problem is that the alignment that is "optimal" in terms of the DP score does not necessarily correspond to the alignment that produces the most accurate structural model. That is, the correct alignment based on structural superposition will generally have a lower score than the optimal alignment obtained from sequence. Variations of the DP algorithm have been developed that generate alternative alignments that are "suboptimal" in terms of the DP score, but these still encounter difficulties in detecting the correct structural alignment. We present here a new alternative sequence alignment method that relies heavily on the structure of the template. By initially aligning the query sequence to individual fragments in secondary structure elements and combining high-scoring fragments that pass basic tests for "modelability", we can generate accurate alignments within a small ensemble. Our results suggest that the set of sequences that can currently be modeled by homology can be greatly extended.

  14. The UCSC Archaeal Genome Browser: 2012 update

    OpenAIRE

    Chan, Patricia P.; Holmes, Andrew D.; Smith, Andrew M.; Tran, Danny; Lowe, Todd M.

    2011-01-01

    The UCSC Archaeal Genome Browser (http://archaea.ucsc.edu) offers a graphical web-based resource for exploration and discovery within archaeal and other selected microbial genomes. By bringing together existing gene annotations, gene expression data, multiple-genome alignments, pre-computed sequence comparisons and other specialized analysis tracks, the genome browser is a powerful aggregator of varied genomic information. The genome browser environment maintains the current look-and-feel of ...

  15. The Pinus taeda genome is characterized by diverse and highly diverged repetitive sequences

    Directory of Open Access Journals (Sweden)

    Yandell Mark

    2010-07-01

    Full Text Available Abstract Background In today's age of genomic discovery, no attempt has been made to comprehensively sequence a gymnosperm genome. The largest genus in the coniferous family Pinaceae is Pinus, whose 110-120 species have extremely large genomes (c. 20-40 Gb, 2N = 24. The size and complexity of these genomes have prompted much speculation as to the feasibility of completing a conifer genome sequence. Conifer genomes are reputed to be highly repetitive, but there is little information available on the nature and identity of repetitive units in gymnosperms. The pines have extensive genetic resources, with approximately 329000 ESTs from eleven species and genetic maps in eight species, including a dense genetic map of the twelve linkage groups in Pinus taeda. Results We present here the Sanger sequence and annotation of ten P. taeda BAC clones and Genome Analyzer II whole genome shotgun (WGS sequences representing 7.5% of the genome. Computational annotation of ten BACs predicts three putative protein-coding genes and at least fifteen likely pseudogenes in nearly one megabase of sequence. We found three conifer-specific LTR retroelements in the BACs, and tentatively identified at least 15 others based on evidence from the distantly related angiosperms. Alignment of WGS sequences to the BACs indicates that 80% of BAC sequences have similar copies (≥ 75% nucleotide identity elsewhere in the genome, but only 23% have identical copies (99% identity. The three most common repetitive elements in the genome were identified and, when combined, represent less than 5% of the genome. Conclusions This study indicates that the majority of repeats in the P. taeda genome are 'novel' and will therefore require additional BAC or genomic sequencing for accurate characterization. The pine genome contains a very large number of diverged and probably defunct repetitive elements. This study also provides new evidence that sequencing a pine genome using a WGS approach is

  16. Curriculum Alignment Research Suggests that Alignment Can Improve Student Achievement

    Science.gov (United States)

    Squires, David

    2012-01-01

    Curriculum alignment research has developed showing the relationship among three alignment categories: the taught curriculum, the tested curriculum and the written curriculum. Each pair (for example, the taught and the written curriculum) shows a positive impact for aligning those results. Following this, alignment results from the Third…

  17. Curriculum Alignment Research Suggests that Alignment Can Improve Student Achievement

    Science.gov (United States)

    Squires, David

    2012-01-01

    Curriculum alignment research has developed showing the relationship among three alignment categories: the taught curriculum, the tested curriculum and the written curriculum. Each pair (for example, the taught and the written curriculum) shows a positive impact for aligning those results. Following this, alignment results from the Third…

  18. Read clouds uncover variation in complex regions of the human genome.

    Science.gov (United States)

    Bishara, Alex; Liu, Yuling; Weng, Ziming; Kashef-Haghighi, Dorna; Newburger, Daniel E; West, Robert; Sidow, Arend; Batzoglou, Serafim

    2015-10-01

    Although an increasing amount of human genetic variation is being identified and recorded, determining variants within repeated sequences of the human genome remains a challenge. Most population and genome-wide association studies have therefore been unable to consider variation in these regions. Core to the problem is the lack of a sequencing technology that produces reads with sufficient length and accuracy to enable unique mapping. Here, we present a novel methodology of using read clouds, obtained by accurate short-read sequencing of DNA derived from long fragment libraries, to confidently align short reads within repeat regions and enable accurate variant discovery. Our novel algorithm, Random Field Aligner (RFA), captures the relationships among the short reads governed by the long read process via a Markov Random Field. We utilized a modified version of the Illumina TruSeq synthetic long-read protocol, which yielded shallow-sequenced read clouds. We test RFA through extensive simulations and apply it to discover variants on the NA12878 human sample, for which shallow TruSeq read cloud sequencing data are available, and on an invasive breast carcinoma genome that we sequenced using the same method. We demonstrate that RFA facilitates accurate recovery of variation in 155 Mb of the human genome, including 94% of 67 Mb of segmental duplication sequence and 96% of 11 Mb of transcribed sequence, that are currently hidden from short-read technologies.

  19. MaxAlign: maximizing usable data in an alignment

    DEFF Research Database (Denmark)

    Oliveira, Rodrigo Gouveia; Sackett, Peter Wad; Pedersen, Anders Gorm

    2007-01-01

    BACKGROUND: The presence of gaps in an alignment of nucleotide or protein sequences is often an inconvenience for bioinformatical studies. In phylogenetic and other analyses, for instance, gapped columns are often discarded entirely from the alignment. RESULTS: MaxAlign is a program that optimizes...... the alignment prior to such analyses. Specifically, it maximizes the number of nucleotide (or amino acid) symbols that are present in gap-free columns - the alignment area - by selecting the optimal subset of sequences to exclude from the alignment. MaxAlign can be used prior to phylogenetic and bioinformatical...... analyses as well as in other situations where this form of alignment improvement is useful. In this work we test MaxAlign's performance in these tasks and compare the accuracy of phylogenetic estimates including and excluding gapped columns from the analysis, with and without processing with MaxAlign...

  20. Parameter Identification Method for SINS Initial Alignment under Inertial Frame

    Directory of Open Access Journals (Sweden)

    Haijian Xue

    2016-01-01

    Full Text Available The performance of a strapdown inertial navigation system (SINS largely depends on the accuracy and rapidness of the initial alignment. The conventional alignment method with parameter identification has been already applied widely, but it needs to calculate the gyroscope drifts through two-position method; then the time of initial alignment is greatly prolonged. For this issue, a novel self-alignment algorithm by parameter identification method under inertial frame for SINS is proposed in this paper. Firstly, this coarse alignment method using the gravity in the inertial frame as a reference is discussed to overcome the limit of dynamic disturbance on a rocking base and fulfill the requirement for the fine alignment. Secondly, the fine alignment method by parameter identification under inertial frame is formulated. The theoretical analysis results show that the fine alignment model is fully self-aligned with no external reference information and the gyrodrifts can be estimated in real time. The simulation results demonstrate that the proposed method can achieve rapid and highly accurate initial alignment for SINS.

  1. Karect: accurate correction of substitution, insertion and deletion errors for next-generation sequencing data

    KAUST Repository

    Allam, Amin

    2015-07-14

    Motivation: Next-generation sequencing generates large amounts of data affected by errors in the form of substitutions, insertions or deletions of bases. Error correction based on the high-coverage information, typically improves de novo assembly. Most existing tools can correct substitution errors only; some support insertions and deletions, but accuracy in many cases is low. Results: We present Karect, a novel error correction technique based on multiple alignment. Our approach supports substitution, insertion and deletion errors. It can handle non-uniform coverage as well as moderately covered areas of the sequenced genome. Experiments with data from Illumina, 454 FLX and Ion Torrent sequencing machines demonstrate that Karect is more accurate than previous methods, both in terms of correcting individual-bases errors (up to 10% increase in accuracy gain) and post de novo assembly quality (up to 10% increase in NGA50). We also introduce an improved framework for evaluating the quality of error correction.

  2. MUON DETECTORS: ALIGNMENT

    CERN Multimedia

    Gervasio Gomez

    2012-01-01

      The new alignment for the DT chambers has been successfully used in physics analysis starting with the 52X Global Tag. The remaining main areas of development over the next few months will be preparing a new track-based CSC alignment and producing realistic APEs (alignment position errors) and MC misalignment scenarios to match the latest muon alignment constants. Work on these items has been delayed from the intended timeline, mostly due to a large involvement of the muon alignment man-power in physics analyses over the first half of this year. As CMS keeps probing higher and higher energies, special attention must be paid to the reconstruction of very-high-energy muons. Recent muon POG reports from mid-June show a φ-dependence in curvature bias in Monte Carlo samples. This bias is observed already at the tracker level, where it is constant with muon pT, while it grows with pT as muon chamber information is added to the tracks. Similar studies show a much smaller effect in data, at le...

  3. MUON DETECTORS: ALIGNMENT

    CERN Multimedia

    G. Gomez

    2010-01-01

    For the last three months, the Muon Alignment group has focussed on providing a new, improved set of alignment constants for the end-of-year data reprocessing. These constants were delivered on time and approved by the CMS physics validation team on November 17. The new alignment incorporates several improvements over the previous one from March for nearly all sub-systems. Motivated by the loss of information from a hardware failure in May (an entire MAB was lost), the optical barrel alignment has moved from a modular, super-plane reconstruction, to a full, single loop calculation of the entire geometry for all DTs in stations 1, 2 and 3. This makes better use of the system redundancy, mitigating the effect of the information loss. Station 4 is factorised and added afterwards to make the system smaller (and therefore faster to run), and also because the MAB calibration at the MB4 zone is less precise. This new alignment procedure was tested at 0 T against photogrammetry resulting in precisions of the order...

  4. MUON DETECTORS: ALIGNMENT

    CERN Document Server

    M. Dallavalle

    2013-01-01

    A new Muon misalignment scenario for 2011 (7 TeV) Monte Carlo re-processing was re-leased. The scenario is based on running of standard track-based reference-target algorithm (exactly as in data) using single-muon simulated sample (with the transverse-momentum spectrum matching data). It used statistics similar to what was used for alignment with 2011 data, starting from an initially misaligned Muon geometry from uncertainties of hardware measurements and using the latest Tracker misalignment geometry. Validation of the scenario (with muons from Z decay and high-pT simulated muons) shows that it describes data well. The study of systematic uncertainties (dominant by now due to huge amount of data collected by CMS and used for muon alignment) is finalised. Realistic alignment position errors are being obtained from the estimated uncertainties and are expected to improve the muon reconstruction performance. Concerning the Hardware Alignment System, the upgrade of the Barrel Alignment is in progress. By now, d...

  5. Ergodic Secret Alignment

    CERN Document Server

    Bassily, Raef

    2010-01-01

    In this paper, we introduce two new achievable schemes for the fading multiple access wiretap channel (MAC-WT). In the model that we consider, we assume that perfect knowledge of the state of all channels is available at all the nodes in a causal fashion. Our schemes use this knowledge together with the time varying nature of the channel model to align the interference from different users at the eavesdropper perfectly in a one-dimensional space while creating a higher dimensionality space for the interfering signals at the legitimate receiver hence allowing for better chance of recovery. While we achieve this alignment through signal scaling at the transmitters in our first scheme (scaling based alignment (SBA)), we let nature provide this alignment through the ergodicity of the channel coefficients in the second scheme (ergodic secret alignment (ESA)). For each scheme, we obtain the resulting achievable secrecy rate region. We show that the secrecy rates achieved by both schemes scale with SNR as 1/2log(SNR...

  6. Closed circuit television welding alignment system

    Energy Technology Data Exchange (ETDEWEB)

    Darner, G.S.

    1976-09-01

    Closed circuit television (CCTV) weld targeting systems were developed to provide accurate and repeatable positioning of the electrode of an electronic arc welder with respect to the parts being joined. A sliding mirror electrode holder was developed for use with closed circuit television equipment on existing weld fixturing. A complete motorized CCTV weld alignment system was developed to provide weld targeting for even the most critical positioning requirements.

  7. Secure Fingerprint Alignment and Matching Protocols

    OpenAIRE

    Bayatbabolghani, Fattaneh; Blanton, Marina; Aliasgari, Mehrdad; Goodrich, Michael

    2017-01-01

    We present three secure privacy-preserving protocols for fingerprint alignment and matching, based on what are considered to be the most precise and efficient fingerprint recognition algorithms-those based on the geometric matching of "landmarks" known as minutia points. Our protocols allow two or more honest-but-curious parties to compare their respective privately-held fingerprints in a secure way such that they each learn nothing more than a highly-accurate score of how well the fingerprin...

  8. Mango: multiple alignment with N gapped oligos.

    Science.gov (United States)

    Zhang, Zefeng; Lin, Hao; Li, Ming

    2008-06-01

    Multiple sequence alignment is a classical and challenging task. The problem is NP-hard. The full dynamic programming takes too much time. The progressive alignment heuristics adopted by most state-of-the-art works suffer from the "once a gap, always a gap" phenomenon. Is there a radically new way to do multiple sequence alignment? In this paper, we introduce a novel and orthogonal multiple sequence alignment method, using both multiple optimized spaced seeds and new algorithms to handle these seeds efficiently. Our new algorithm processes information of all sequences as a whole and tries to build the alignment vertically, avoiding problems caused by the popular progressive approaches. Because the optimized spaced seeds have proved significantly more sensitive than the consecutive k-mers, the new approach promises to be more accurate and reliable. To validate our new approach, we have implemented MANGO: Multiple Alignment with N Gapped Oligos. Experiments were carried out on large 16S RNA benchmarks, showing that MANGO compares favorably, in both accuracy and speed, against state-of-the-art multiple sequence alignment methods, including ClustalW 1.83, MUSCLE 3.6, MAFFT 5.861, ProbConsRNA 1.11, Dialign 2.2.1, DIALIGN-T 0.2.1, T-Coffee 4.85, POA 2.0, and Kalign 2.0. We have further demonstrated the scalability of MANGO on very large datasets of repeat elements. MANGO can be downloaded at http://www.bioinfo.org.cn/mango/ and is free for academic usage.

  9. An Introduction to Genome Annotation.

    Science.gov (United States)

    Campbell, Michael S; Yandell, Mark

    2015-12-17

    Genome projects have evolved from large international undertakings to tractable endeavors for a single lab. Accurate genome annotation is critical for successful genomic, genetic, and molecular biology experiments. These annotations can be generated using a number of approaches and available software tools. This unit describes methods for genome annotation and a number of software tools commonly used in gene annotation.

  10. Syntenator: Multiple gene order alignments with a gene-specific scoring function

    Directory of Open Access Journals (Sweden)

    Dieterich Christoph

    2008-11-01

    Full Text Available Abstract Background Identification of homologous regions or conserved syntenies across genomes is one crucial step in comparative genomics. This task is usually performed by genome alignment softwares like WABA or blastz. In case of conserved syntenies, such regions are defined as conserved gene orders. On the gene order level, homologous regions can even be found between distantly related genomes, which do not align on the nucleotide sequence level. Results We present a novel approach to identify regions of conserved synteny across multiple genomes. Syntenator represents genomes and alignments thereof as partial order graphs (POGs. These POGs are aligned by a dynamic programming approach employing a gene-specific scoring function. The scoring function reflects the level of protein sequence similarity for each possible gene pair. Our method consistently defines larger homologous regions in pairwise gene order alignments than nucleotide-level comparisons. Our method is superior to methods that work on predefined homology gene sets (as implemented in Blockfinder. Syntenator successfully reproduces 80% of the EnsEMBL man-mouse conserved syntenic blocks. The full potential of our method becomes visible by comparing remotely related genomes and multiple genomes. Gene order alignments potentially resolve up to 75% of the EnsEMBL 1:many orthology relations and 27% of the many:many orthology relations. Conclusion We propose Syntenator as a software solution to reliably infer conserved syntenies among distantly related genomes. The software is available from http://www2.tuebingen.mpg.de/abt4/plone.

  11. FMIT alignment cart

    Energy Technology Data Exchange (ETDEWEB)

    Potter, R.C.; Dauelsberg, L.B.; Clark, D.C.; Grieggs, R.J.

    1981-01-01

    The Fusion Materials Irradiation Test (FMIT) Facility alignment cart must perform several functions. It must serve as a fixture to receive the drift-tube girder assembly when it is removed from the linac tank. It must transport the girder assembly from the linac vault to the area where alignment or disassembly is to take place. It must serve as a disassembly fixture to hold the girder while individual drift tubes are removed for repair. It must align the drift tube bores in a straight line parallel to the girder, using an optical system. These functions must be performed without violating any clearances found within the building. The bore tubes of the drift tubes will be irradiated, and shielding will be included in the system for easier maintenance.

  12. Concurrent and Accurate Short Read Mapping on Multicore Processors.

    Science.gov (United States)

    Martínez, Héctor; Tárraga, Joaquín; Medina, Ignacio; Barrachina, Sergio; Castillo, Maribel; Dopazo, Joaquín; Quintana-Ortí, Enrique S

    2015-01-01

    We introduce a parallel aligner with a work-flow organization for fast and accurate mapping of RNA sequences on servers equipped with multicore processors. Our software, HPG Aligner SA (HPG Aligner SA is an open-source application. The software is available at http://www.opencb.org, exploits a suffix array to rapidly map a large fraction of the RNA fragments (reads), as well as leverages the accuracy of the Smith-Waterman algorithm to deal with conflictive reads. The aligner is enhanced with a careful strategy to detect splice junctions based on an adaptive division of RNA reads into small segments (or seeds), which are then mapped onto a number of candidate alignment locations, providing crucial information for the successful alignment of the complete reads. The experimental results on a platform with Intel multicore technology report the parallel performance of HPG Aligner SA, on RNA reads of 100-400 nucleotides, which excels in execution time/sensitivity to state-of-the-art aligners such as TopHat 2+Bowtie 2, MapSplice, and STAR.

  13. Strategic Alignment of Business Intelligence

    OpenAIRE

    Cederberg, Niclas

    2010-01-01

    This thesis is about the concept of strategic alignment of business intelligence. It is based on a theoretical foundation that is used to define and explain business intelligence, data warehousing and strategic alignment. By combining a number of different methods for strategic alignment a framework for alignment of business intelligence is suggested. This framework addresses all different aspects of business intelligence identified as relevant for strategic alignment of business intelligence...

  14. PILOT optical alignment

    Science.gov (United States)

    Longval, Y.; Mot, B.; Ade, P.; André, Y.; Aumont, J.; Baustista, L.; Bernard, J.-Ph.; Bray, N.; de Bernardis, P.; Boulade, O.; Bousquet, F.; Bouzit, M.; Buttice, V.; Caillat, A.; Charra, M.; Chaigneau, M.; Crane, B.; Crussaire, J.-P.; Douchin, F.; Doumayrou, E.; Dubois, J.-P.; Engel, C.; Etcheto, P.; Gélot, P.; Griffin, M.; Foenard, G.; Grabarnik, S.; Hargrave, P..; Hughes, A.; Laureijs, R.; Lepennec, Y.; Leriche, B.; Maestre, S.; Maffei, B.; Martignac, J.; Marty, C.; Marty, W.; Masi, S.; Mirc, F.; Misawa, R.; Montel, J.; Montier, L.; Narbonne, J.; Nicot, J.-M.; Pajot, F.; Parot, G.; Pérot, E.; Pimentao, J.; Pisano, G.; Ponthieu, N.; Ristorcelli, I.; Rodriguez, L.; Roudil, G.; Salatino, M.; Savini, G.; Simonella, O.; Saccoccio, M.; Tapie, P.; Tauber, J.; Torre, J.-P.; Tucker, C.

    2016-07-01

    PILOT is a balloon-borne astronomy experiment designed to study the polarization of dust emission in the diffuse interstellar medium in our Galaxy at wavelengths 240 μm with an angular resolution about two arcminutes. Pilot optics is composed an off-axis Gregorian type telescope and a refractive re-imager system. All optical elements, except the primary mirror, are in a cryostat cooled to 3K. We combined the optical, 3D dimensional measurement methods and thermo-elastic modeling to perform the optical alignment. The talk describes the system analysis, the alignment procedure, and finally the performances obtained during the first flight in September 2015.

  15. Group Based Interference Alignment

    CERN Document Server

    Ma, Yanjun; Chen, Rui; Yao, Junliang

    2010-01-01

    in $K$-user single-input single-output (SISO) frequency selective fading interference channels, it is shown that the achievable multiplexing gain is almost surely $K/2$ by using interference alignment (IA). However when the signaling dimensions is limited, allocating all the resource to all the users simultaneously is not optimal. According to this problem, a group based interference alignment (GIA) scheme is proposed and a search algorithm is designed to get the group patterns and the resource allocation among them. Analysis results show that our proposed scheme achieves a higher multiplexing gain when the resource is limited.

  16. Orientation and Alignment Echoes

    CERN Document Server

    Karras, G; Billard, F; Lavorel, B; Hartmann, J -M; Faucher, O; Gershnabel, E; Prior, Y; Averbukh, I Sh

    2015-01-01

    We present what is probably the simplest classical system featuring the echo phenomenon - a collection of randomly oriented free rotors with dispersed rotational velocities. Following excitation by a pair of time-delayed impulsive kicks, the mean orientation/alignment of the ensemble exhibits multiple echoes and fractional echoes. We elucidate the mechanism of the echo formation by kick-induced filamentation of phase space, and provide the first experimental demonstration of classical alignment echoes in a thermal gas of CO_2 molecules excited by a pair of femtosecond laser pulses.

  17. Genomic Prediction from Whole Genome Sequence in Livestock: The 1000 Bull Genomes Project

    DEFF Research Database (Denmark)

    Hayes, Benjamin J; MacLeod, Iona M; Daetwyler, Hans D

    Advantages of using whole genome sequence data to predict genomic estimated breeding values (GEBV) include better persistence of accuracy of GEBV across generations and more accurate GEBV across breeds. The 1000 Bull Genomes Project provides a database of whole genome sequenced key ancestor bulls...

  18. SinicView: A visualization environment for comparisons of multiple nucleotide sequence alignment tools

    Directory of Open Access Journals (Sweden)

    Wong Chun-Yi

    2006-03-01

    Full Text Available Abstract Background Deluged by the rate and complexity of completed genomic sequences, the need to align longer sequences becomes more urgent, and many more tools have thus been developed. In the initial stage of genomic sequence analysis, a biologist is usually faced with the questions of how to choose the best tool to align sequences of interest and how to analyze and visualize the alignment results, and then with the question of whether poorly aligned regions produced by the tool are indeed not homologous or are just results due to inappropriate alignment tools or scoring systems used. Although several systematic evaluations of multiple sequence alignment (MSA programs have been proposed, they may not provide a standard-bearer for most biologists because those poorly aligned regions in these evaluations are never discussed. Thus, a tool that allows cross comparison of the alignment results obtained by different tools simultaneously could help a biologist evaluate their correctness and accuracy. Results In this paper, we present a versatile alignment visualization system, called SinicView, (for Sequence-aligning INnovative and Interactive Comparison VIEWer, which allows the user to efficiently compare and evaluate assorted nucleotide alignment results obtained by different tools. SinicView calculates similarity of the alignment outputs under a fixed window using the sum-of-pairs method and provides scoring profiles of each set of aligned sequences. The user can visually compare alignment results either in graphic scoring profiles or in plain text format of the aligned nucleotides along with the annotations information. We illustrate the capabilities of our visualization system by comparing alignment results obtained by MLAGAN, MAVID, and MULTIZ, respectively. Conclusion With SinicView, users can use their own data sequences to compare various alignment tools or scoring systems and select the most suitable one to perform alignment in the

  19. Choice of reference sequence and assembler for alignment of Listeria monocytogenes short-read sequence data greatly influences rates of error in SNP analyses.

    Directory of Open Access Journals (Sweden)

    Arthur W Pightling

    Full Text Available The wide availability of whole-genome sequencing (WGS and an abundance of open-source software have made detection of single-nucleotide polymorphisms (SNPs in bacterial genomes an increasingly accessible and effective tool for comparative analyses. Thus, ensuring that real nucleotide differences between genomes (i.e., true SNPs are detected at high rates and that the influences of errors (such as false positive SNPs, ambiguously called sites, and gaps are mitigated is of utmost importance. The choices researchers make regarding the generation and analysis of WGS data can greatly influence the accuracy of short-read sequence alignments and, therefore, the efficacy of such experiments. We studied the effects of some of these choices, including: i depth of sequencing coverage, ii choice of reference-guided short-read sequence assembler, iii choice of reference genome, and iv whether to perform read-quality filtering and trimming, on our ability to detect true SNPs and on the frequencies of errors. We performed benchmarking experiments, during which we assembled simulated and real Listeria monocytogenes strain 08-5578 short-read sequence datasets of varying quality with four commonly used assemblers (BWA, MOSAIK, Novoalign, and SMALT, using reference genomes of varying genetic distances, and with or without read pre-processing (i.e., quality filtering and trimming. We found that assemblies of at least 50-fold coverage provided the most accurate results. In addition, MOSAIK yielded the fewest errors when reads were aligned to a nearly identical reference genome, while using SMALT to align reads against a reference sequence that is ∼0.82% distant from 08-5578 at the nucleotide level resulted in the detection of the greatest numbers of true SNPs and the fewest errors. Finally, we show that whether read pre-processing improves SNP detection depends upon the choice of reference sequence and assembler. In total, this study demonstrates that researchers

  20. SOAP2: an improved ultrafast tool for short read alignment

    DEFF Research Database (Denmark)

    Li, Ruiqiang; Yu, Chang; Li, Yingrui

    2009-01-01

    SUMMARY: SOAP2 is a significantly improved version of the short oligonucleotide alignment program that both reduces computer memory usage and increases alignment speed at an unprecedented rate. We used a Burrows Wheeler Transformation (BWT) compression index to substitute the seed strategy...... for indexing the reference sequence in the main memory. We tested it on the whole human genome and found that this new algorithm reduced memory usage from 14.7 to 5.4 GB and improved alignment speed by 20-30 times. SOAP2 is compatible with both single- and paired-end reads. Additionally, this tool now supports...

  1. Aligning Responsible Business Practices

    DEFF Research Database (Denmark)

    Weller, Angeli E.

    2017-01-01

    This article offers an in-depth case study of a global high tech manufacturer that aligned its ethics and compliance, corporate social responsibility, and sustainability practices. Few large companies organize their responsible business practices this way, despite conceptual relevance and calls...... and managers interested in understanding how responsible business practices may be collectively organized....

  2. MUON DETECTORS: ALIGNMENT

    CERN Multimedia

    G. Gomez and Y. Pakhotin

    2012-01-01

      A new track-based alignment for the DT chambers is ready for deployment: an offline tag has already been produced which will become part of the 52X Global Tag. This alignment was validated within the muon alignment group both at low and high momentum using a W/Z skim sample. It shows an improved mass resolution for pairs of stand-alone muons, improved curvature resolution at high momentum, and improved DT segment extrapolation residuals. The validation workflow for high-momentum muons used to depend solely on the “split cosmics” method, looking at the curvature difference between muon tracks reconstructed in the upper or lower half of CMS. The validation has now been extended to include energetic muons decaying from heavily boosted Zs: the di-muon invariant mass for global and stand-alone muons is reconstructed, and the invariant mass resolution is compared for different alignments. The main areas of development over the next few months will be preparing a new track-based C...

  3. Aligning Theory with Practice

    Science.gov (United States)

    Kurz, Terri L.; Batarelo, Ivana

    2009-01-01

    This article describes a structure to help preservice teachers get invaluable field experience by aligning theory with practice supported by the integration of elementary school children into their university mathematics methodology course. This course structure allowed preservice teachers to learn about teaching mathematics in a nonthreatening…

  4. Alignment of concerns

    DEFF Research Database (Denmark)

    Andersen, Tariq Osman; Bansler, Jørgen P.; Kensing, Finn;

    2014-01-01

    The emergence of patient-centered eHealth systems introduces new challenges, where patients come to play an increasingly important role. Realizing the promises requires an in-depth understanding of not only the technology, but also the needs of both clinicians and patients. However, insights from...... as a design rationale for successful eHealth, termed 'alignment of concerns'....

  5. Aligning Mental Representations

    DEFF Research Database (Denmark)

    Kano Glückstad, Fumiko

    2013-01-01

    on the application of the BMG to publicly available datasets, the Leuven natural concept database [3] representing semantic structures of domain knowledge possessed by individual subjects [3]. Results indicate that the BMG is potentially a model applicable to simulating the alignment of domain knowledge from...

  6. Speaking Fluently And Accurately

    Institute of Scientific and Technical Information of China (English)

    JosephDeVeto

    2004-01-01

    Even after many years of study,students make frequent mistakes in English. In addition, many students still need a long time to think of what they want to say. For some reason, in spite of all the studying, students are still not quite fluent.When I teach, I use one technique that helps students not only speak more accurately, but also more fluently. That technique is dictations.

  7. Assembling large genomes with single-molecule sequencing and locality-sensitive hashing.

    Science.gov (United States)

    Berlin, Konstantin; Koren, Sergey; Chin, Chen-Shan; Drake, James P; Landolin, Jane M; Phillippy, Adam M

    2015-06-01

    Long-read, single-molecule real-time (SMRT) sequencing is routinely used to finish microbial genomes, but available assembly methods have not scaled well to larger genomes. We introduce the MinHash Alignment Process (MHAP) for overlapping noisy, long reads using probabilistic, locality-sensitive hashing. Integrating MHAP with the Celera Assembler enabled reference-grade de novo assemblies of Saccharomyces cerevisiae, Arabidopsis thaliana, Drosophila melanogaster and a human hydatidiform mole cell line (CHM1) from SMRT sequencing. The resulting assemblies are highly continuous, include fully resolved chromosome arms and close persistent gaps in these reference genomes. Our assembly of D. melanogaster revealed previously unknown heterochromatic and telomeric transition sequences, and we assembled low-complexity sequences from CHM1 that fill gaps in the human GRCh38 reference. Using MHAP and the Celera Assembler, single-molecule sequencing can produce de novo near-complete eukaryotic assemblies that are 99.99% accurate when compared with available reference genomes.

  8. Detection of Off-normal Images for NIF Automatic Alignment

    Energy Technology Data Exchange (ETDEWEB)

    Candy, J V; Awwal, A S; McClay, W A; Ferguson, S W; Burkhart, S C

    2005-07-11

    One of the major purposes of National Ignition Facility at Lawrence Livermore National Laboratory is to accurately focus 192 high energy laser beams on a nanoscale (mm) fusion target at the precise location and time. The automatic alignment system developed for NIF is used to align the beams in order to achieve the required focusing effect. However, if a distorted image is inadvertently created by a faulty camera shutter or some other opto-mechanical malfunction, the resulting image termed ''off-normal'' must be detected and rejected before further alignment processing occurs. Thus the off-normal processor acts as a preprocessor to automatic alignment image processing. In this work, we discuss the development of an ''off-normal'' pre-processor capable of rapidly detecting the off-normal images and performing the rejection. Wide variety of off-normal images for each loop is used to develop the criterion for rejections accurately.

  9. Robust local intervertebral disc alignment for spinal MRI

    Science.gov (United States)

    Reisman, James; Höppner, Jan; Huang, Szu-Hao; Zhang, Li; Lai, Shang-Hong; Odry, Benjamin; Novak, Carol L.

    2006-03-01

    Magnetic resonance (MR) imaging is frequently used to diagnose abnormalities in the spinal intervertebral discs. Owing to the non-isotropic resolution of typical MR spinal scans, physicians prefer to align the scanner plane with the disc in order to maximize the diagnostic value and to facilitate comparison with prior and follow-up studies. Commonly a planning scan is acquired of the whole spine, followed by a diagnostic scan aligned with selected discs of interest. Manual determination of the optimal disc plane is tedious and prone to operator variation. A fast and accurate method to automatically determine the disc alignment can decrease examination time and increase the reliability of diagnosis. We present a validation study of an automatic spine alignment system for determining the orientation of intervertebral discs in MR studies. In order to measure the effectiveness of the automatic alignment system, we compared its performance with human observers. 12 MR spinal scans of adult spines were tested. Two observers independently indicated the intervertebral plane for each disc, and then repeated the procedure on another day, in order to determine the inter- and intra-observer variability associated with manual alignment. Results were also collected for the observers utilizing the automatic spine alignment system, in order to determine the method's consistency and its accuracy with respect to human observers. We found that the results from the automatic alignment system are comparable with the alignment determined by human observers, with the computer showing greater speed and consistency.

  10. BSMAP: whole genome bisulfite sequence MAPping program

    Directory of Open Access Journals (Sweden)

    Li Wei

    2009-07-01

    Full Text Available Abstract Background Bisulfite sequencing is a powerful technique to study DNA cytosine methylation. Bisulfite treatment followed by PCR amplification specifically converts unmethylated cytosines to thymine. Coupled with next generation sequencing technology, it is able to detect the methylation status of every cytosine in the genome. However, mapping high-throughput bisulfite reads to the reference genome remains a great challenge due to the increased searching space, reduced complexity of bisulfite sequence, asymmetric cytosine to thymine alignments, and multiple CpG heterogeneous methylation. Results We developed an efficient bisulfite reads mapping algorithm BSMAP to address the above issues. BSMAP combines genome hashing and bitwise masking to achieve fast and accurate bisulfite mapping. Compared with existing bisulfite mapping approaches, BSMAP is faster, more sensitive and more flexible. Conclusion BSMAP is the first general-purpose bisulfite mapping software. It is able to map high-throughput bisulfite reads at whole genome level with feasible memory and CPU usage. It is freely available under GPL v3 license at http://code.google.com/p/bsmap/.

  11. ABS: Sequence alignment by scanning

    KAUST Repository

    Bonny, Mohamed Talal

    2011-08-01

    Sequence alignment is an essential tool in almost any computational biology research. It processes large database sequences and considered to be high consumers of computation time. Heuristic algorithms are used to get approximate but fast results. We introduce fast alignment algorithm, called Alignment By Scanning (ABS), to provide an approximate alignment of two DNA sequences. We compare our algorithm with the well-known alignment algorithms, the FASTA (which is heuristic) and the \\'Needleman-Wunsch\\' (which is optimal). The proposed algorithm achieves up to 76% enhancement in alignment score when it is compared with the FASTA Algorithm. The evaluations are conducted using different lengths of DNA sequences. © 2011 IEEE.

  12. Fast global sequence alignment technique

    KAUST Repository

    Bonny, Mohamed Talal

    2011-11-01

    Bioinformatics database is growing exponentially in size. Processing these large amount of data may take hours of time even if super computers are used. One of the most important processing tool in Bioinformatics is sequence alignment. We introduce fast alignment algorithm, called \\'Alignment By Scanning\\' (ABS), to provide an approximate alignment of two DNA sequences. We compare our algorithm with the wellknown sequence alignment algorithms, the \\'GAP\\' (which is heuristic) and the \\'Needleman-Wunsch\\' (which is optimal). The proposed algorithm achieves up to 51% enhancement in alignment score when it is compared with the GAP Algorithm. The evaluations are conducted using different lengths of DNA sequences. © 2011 IEEE.

  13. Alignment of the ATLAS Inner Detector Tracking System

    CERN Document Server

    Wang, J; The ATLAS collaboration

    2011-01-01

    Atlas is a multipurpose experiment that records the LHC collisions. In order to reconstruct the trajectories of charged particles, ATLAS is equipped with a tracking system built using distinct technologies: silicon planar sensors (both pixel and microstrips) and drift-tubes (the Inner Detector). The tracking system is embedded in a 2 T solenoid field. In order to reach the track parameter accuracy requested by the physics goals of the experiment, the ATLAS tracking system requires to determine accurately its almost 700,000 degrees of freedom. The demanded precision for the alignment of the silicon sensors is below 10 micrometers. The implementation of the track based alignment within the ATLAS software framework unifies different alignment approaches and allows the alignment of all tracking subsystems together. The alignment software counts of course on the tracking information (track-hit residuals) but also includes the capability to set constraints on the beam spot and primary vertex for the global position...

  14. Absorber Alignment Measurement Tool for Solar Parabolic Trough Collectors: Preprint

    Energy Technology Data Exchange (ETDEWEB)

    Stynes, J. K.; Ihas, B.

    2012-04-01

    As we pursue efforts to lower the capital and installation costs of parabolic trough solar collectors, it is essential to maintain high optical performance. While there are many optical tools available to measure the reflector slope errors of parabolic trough solar collectors, there are few tools to measure the absorber alignment. A new method is presented here to measure the absorber alignment in two dimensions to within 0.5 cm. The absorber alignment is measured using a digital camera and four photogrammetric targets. Physical contact with the receiver absorber or glass is not necessary. The alignment of the absorber is measured along its full length so that sagging of the absorber can be quantified with this technique. The resulting absorber alignment measurement provides critical information required to accurately determine the intercept factor of a collector.

  15. The Laser Shaft Alignment System with Dual PSDs

    Institute of Scientific and Technical Information of China (English)

    JIAO Guohua; LI Yulin; ZHANG Dongbo; LI Tonghai; HU Baowen

    2006-01-01

    Shaft alignment is an important technique during installation and maintenance of a rotating machine. A high-precision laser alignment system has been designed with dual PSDs (Position Sensing Detector) to change traditional manual way of shaft alignment and to make the measurement easier and more accurate. The system is comprised of two small measuring units (laser transmitter and detector) and a PDA (Personal Digital Assistant) with the measurement software. The laser alignment system with dual PSDs was improved on a single PSD system, and it gets higher measurement accuracy than the previous design, and it has been succeeded in designing and implement for actual shaft alignment. In the system, the range of offset measurement is ±4 mm, and the resolution is 1.5 μm, and the accuracy is less than 2 μm.

  16. A laser shaft alignment system with dual PSDs

    Institute of Scientific and Technical Information of China (English)

    JIAO Guo-hua; LI Yu-lin; ZHANG Dong-bo; LI Tong-hai; HU Bao-wen

    2006-01-01

    Shaft alignment is an important technique during installation and maintenance of a rotating machine. A high-precision laser alignment system has been designed with dual PSDs (Position Sensing Detector) to change traditional manual way of shaft alignment and to make the measurement easier and more accurate. The system is comprised of two small measuring units (laser transmitter and detector) and a PDA (Personal Digital Assistant) with measurement software. The laser alignment system with dual PSDs was improved on a single PSD system, and yields higher measurement accuracy than the previous design, and has been successful for designing and implements actual shaft alignment. In the system, the range of offset measurement is ±4 mm, and the resolution is 1.5 μm, with accuracy being less than 2 μm.

  17. A fast cross-validation method for alignment of electron tomography images based on Beer-Lambert law

    Science.gov (United States)

    Yan, Rui; Edwards, Thomas J.; Pankratz, Logan M.; Kuhn, Richard J.; Lanman, Jason K.; Liu, Jun; Jiang, Wen

    2015-01-01

    In electron tomography, accurate alignment of tilt series is an essential step in attaining high-resolution 3D reconstructions. Nevertheless, quantitative assessment of alignment quality has remained a challenging issue, even though many alignment methods have been reported. Here, we report a fast and accurate method, tomoAlignEval, based on the Beer-Lambert law, for the evaluation of alignment quality. Our method is able to globally estimate the alignment accuracy by measuring the goodness of log-linear relationship of the beam intensity attenuations at different tilt angles. Extensive tests with experimental data demonstrated its robust performance with stained and cryo samples. Our method is not only significantly faster but also more sensitive than measurements of tomogram resolution using Fourier shell correlation method (FSCe/o). From these tests, we also conclude that while current alignment methods are sufficiently accurate for stained samples, inaccurate alignments remain a major limitation for high resolution cryo-electron tomography. PMID:26455556

  18. Formatt: Correcting protein multiple structural alignments by incorporating sequence alignment

    Directory of Open Access Journals (Sweden)

    Daniels Noah M

    2012-10-01

    Full Text Available Abstract Background The quality of multiple protein structure alignments are usually computed and assessed based on geometric functions of the coordinates of the backbone atoms from the protein chains. These purely geometric methods do not utilize directly protein sequence similarity, and in fact, determining the proper way to incorporate sequence similarity measures into the construction and assessment of protein multiple structure alignments has proved surprisingly difficult. Results We present Formatt, a multiple structure alignment based on the Matt purely geometric multiple structure alignment program, that also takes into account sequence similarity when constructing alignments. We show that Formatt outperforms Matt and other popular structure alignment programs on the popular HOMSTRAD benchmark. For the SABMark twilight zone benchmark set that captures more remote homology, Formatt and Matt outperform other programs; depending on choice of embedded sequence aligner, Formatt produces either better sequence and structural alignments with a smaller core size than Matt, or similarly sized alignments with better sequence similarity, for a small cost in average RMSD. Conclusions Considering sequence information as well as purely geometric information seems to improve quality of multiple structure alignments, though defining what constitutes the best alignment when sequence and structural measures would suggest different alignments remains a difficult open question.

  19. Intramedullary versus extramedullary alignment of the tibial component in the Triathlon knee

    Directory of Open Access Journals (Sweden)

    Synnott Keith

    2011-08-01

    Full Text Available Abstract Background Long term survivorship in total knee arthroplasty is significantly dependant on prosthesis alignment. Our aim was determine which alignment guide was more accurate in positioning of the tibial component in total knee arthroplasty. We also aimed to assess whether there was any difference in short term patient outcome. Method A comparison of intramedullary versus extramedullary alignment jig was performed. Radiological alignment of tibial components and patient outcomes of 103 Triathlon total knee arthroplasties were analysed. Results Use of the intramedullary was found to be significantly more accurate in determining coronal alignment (p = 0.02 while use of the extramedullary jig was found to give more accurate results in sagittal alignment (p = 0.04. There was no significant difference in WOMAC or SF-36 at six months. Conclusion Use of an intramedullary jig is preferable for positioning of the tibial component using this knee system.

  20. Intramedullary versus extramedullary alignment of the tibial component in the Triathlon knee

    LENUS (Irish Health Repository)

    Cashman, James P

    2011-08-20

    Abstract Background Long term survivorship in total knee arthroplasty is significantly dependant on prosthesis alignment. Our aim was determine which alignment guide was more accurate in positioning of the tibial component in total knee arthroplasty. We also aimed to assess whether there was any difference in short term patient outcome. Method A comparison of intramedullary versus extramedullary alignment jig was performed. Radiological alignment of tibial components and patient outcomes of 103 Triathlon total knee arthroplasties were analysed. Results Use of the intramedullary was found to be significantly more accurate in determining coronal alignment (p = 0.02) while use of the extramedullary jig was found to give more accurate results in sagittal alignment (p = 0.04). There was no significant difference in WOMAC or SF-36 at six months. Conclusion Use of an intramedullary jig is preferable for positioning of the tibial component using this knee system.

  1. Three-time rapid transfer alignment method of SINS/GPS navigation system of high-speed marine missile

    Institute of Scientific and Technical Information of China (English)

    WANG Si; DENG Zheng-long; SU Ling-feng

    2008-01-01

    The transfer alignment of SINS/GPS navigation system of a high-speed marine missile was investiga-ted. With the help of the big acceleration of a high-speed missile, the transfer alignment was changed into a three-time alignment. The azimuth alignment was coarsely finished in 10s in the first time alignment, the hori-zontal alignment was accurately and rapidly finished in the second time alignment, and the azimuth alignment was accurately finished in the third time alignment. Because the second time alignment and the third time align-ment were finished by GPS after the missile was launched, the horizontal alignment and the second azimuth a-lignment got rid of the influence of the warship body flexibility deforming. The precision and rapidity of the hori-zontal alignment were prominently increased due to the vertical launch of the marine missile with the big accel-eration. Simulation verifies the effectiveness of the proposed alignment method.

  2. Inflation by alignment

    Energy Technology Data Exchange (ETDEWEB)

    Burgess, C.P. [PH -TH Division, CERN,CH-1211, Genève 23 (Switzerland); Department of Physics & Astronomy, McMaster University,1280 Main Street West, Hamilton ON (Canada); Perimeter Institute for Theoretical Physics,31 Caroline Street North, Waterloo ON (Canada); Roest, Diederik [Van Swinderen Institute for Particle Physics and Gravity, University of Groningen,Nijenborgh 4, 9747 AG Groningen (Netherlands)

    2015-06-08

    Pseudo-Goldstone bosons (pGBs) can provide technically natural inflatons, as has been comparatively well-explored in the simplest axion examples. Although inflationary success requires trans-Planckian decay constants, f≳M{sub p}, several mechanisms have been proposed to obtain this, relying on (mis-)alignments between potential and kinetic energies in multiple-field models. We extend these mechanisms to a broader class of inflationary models, including in particular the exponential potentials that arise for pGB potentials based on noncompact groups (and so which might apply to moduli in an extra-dimensional setting). The resulting potentials provide natural large-field inflationary models and can predict a larger primordial tensor signal than is true for simpler single-field versions of these models. In so doing we provide a unified treatment of several alignment mechanisms, showing how each emerges as a limit of the more general setup.

  3. Aligning component upgrades

    Directory of Open Access Journals (Sweden)

    Roberto Di Cosmo

    2011-08-01

    Full Text Available Modern software systems, like GNU/Linux distributions or Eclipse-based development environment, are often deployed by selecting components out of large component repositories. Maintaining such software systems by performing component upgrades is a complex task, and the users need to have an expressive preferences language at their disposal to specify the kind of upgrades they are interested in. Recent research has shown that it is possible to develop solvers that handle preferences expressed as a combination of a few basic criteria used in the MISC competition, ranging from the number of new components to the freshness of the final configuration. In this work we introduce a set of new criteria that allow the users to specify their preferences for solutions with components aligned to the same upstream sources, provide an efficient encoding and report on the experimental results that prove that optimising these alignment criteria is a tractable problem in practice.

  4. Inflation by Alignment

    CERN Document Server

    Burgess, Cliff

    2015-01-01

    Pseudo-Goldstone bosons (pGBs) can provide technically natural inflatons, as has been comparatively well-explored in the simplest axion examples. Although inflationary success requires trans-Planckian decay constants, f > Mp, several mechanisms have been proposed to obtain this, relying on (mis-)alignments between potential and kinetic energies in multiple-field models. We extend these mechanisms to a broader class of inflationary models, including in particular the exponential potentials that arise for pGB potentials based on noncompact groups (and so which might apply to moduli in an extra-dimensional setting). The resulting potentials provide natural large-field inflationary models and can predict a larger primordial tensor signal than is true for simpler single-field versions of these models. In so doing we provide a unified treatment of several alignment mechanisms, showing how each emerges as a limit of the more general setup.

  5. Aligning component upgrades

    CERN Document Server

    Di Cosmo, Roberto; Michel, Claude; 10.4204/EPTCS.65.1

    2011-01-01

    Modern software systems, like GNU/Linux distributions or Eclipse-based development environment, are often deployed by selecting components out of large component repositories. Maintaining such software systems by performing component upgrades is a complex task, and the users need to have an expressive preferences language at their disposal to specify the kind of upgrades they are interested in. Recent research has shown that it is possible to develop solvers that handle preferences expressed as a combination of a few basic criteria used in the MISC competition, ranging from the number of new components to the freshness of the final configuration. In this work we introduce a set of new criteria that allow the users to specify their preferences for solutions with components aligned to the same upstream sources, provide an efficient encoding and report on the experimental results that prove that optimising these alignment criteria is a tractable problem in practice.

  6. Efficient oligonucleotide probe selection for pan-genomic tiling arrays

    Directory of Open Access Journals (Sweden)

    Zhang Wei

    2009-09-01

    Full Text Available Abstract Background Array comparative genomic hybridization is a fast and cost-effective method for detecting, genotyping, and comparing the genomic sequence of unknown bacterial isolates. This method, as with all microarray applications, requires adequate coverage of probes targeting the regions of interest. An unbiased tiling of probes across the entire length of the genome is the most flexible design approach. However, such a whole-genome tiling requires that the genome sequence is known in advance. For the accurate analysis of uncharacterized bacteria, an array must query a fully representative set of sequences from the species' pan-genome. Prior microarrays have included only a single strain per array or the conserved sequences of gene families. These arrays omit potentially important genes and sequence variants from the pan-genome. Results This paper presents a new probe selection algorithm (PanArray that can tile multiple whole genomes using a minimal number of probes. Unlike arrays built on clustered gene families, PanArray uses an unbiased, probe-centric approach that does not rely on annotations, gene clustering, or multi-alignments. Instead, probes are evenly tiled across all sequences of the pan-genome at a consistent level of coverage. To minimize the required number of probes, probes conserved across multiple strains in the pan-genome are selected first, and additional probes are used only where necessary to span polymorphic regions of the genome. The viability of the algorithm is demonstrated by array designs for seven different bacterial pan-genomes and, in particular, the design of a 385,000 probe array that fully tiles the genomes of 20 different Listeria monocytogenes strains with overlapping probes at greater than twofold coverage. Conclusion PanArray is an oligonucleotide probe selection algorithm for tiling multiple genome sequences using a minimal number of probes. It is capable of fully tiling all genomes of a species on

  7. Alignment of concerns

    DEFF Research Database (Denmark)

    Andersen, Tariq Osman; Bansler, Jørgen P.; Kensing, Finn

    E-health promises to enable and support active patient participation in chronic care. However, these fairly recent innovations are complicated matters and emphasize significant challenges, such as patients’ and clinicians’ different ways of conceptualizing disease and illness. Informed by insight...... from medical phenomenology and our own empirical work in telemonitoring and medical care of heart patients, we propose a design rationale for e-health systems conceptualized as the ‘alignment of concerns’....

  8. Orbit IMU alignment: Error analysis

    Science.gov (United States)

    Corson, R. W.

    1980-01-01

    A comprehensive accuracy analysis of orbit inertial measurement unit (IMU) alignments using the shuttle star trackers was completed and the results are presented. Monte Carlo techniques were used in a computer simulation of the IMU alignment hardware and software systems to: (1) determine the expected Space Transportation System 1 Flight (STS-1) manual mode IMU alignment accuracy; (2) investigate the accuracy of alignments in later shuttle flights when the automatic mode of star acquisition may be used; and (3) verify that an analytical model previously used for estimating the alignment error is a valid model. The analysis results do not differ significantly from expectations. The standard deviation in the IMU alignment error for STS-1 alignments was determined to the 68 arc seconds per axis. This corresponds to a 99.7% probability that the magnitude of the total alignment error is less than 258 arc seconds.

  9. Nuclear reactor alignment plate configuration

    Energy Technology Data Exchange (ETDEWEB)

    Altman, David A; Forsyth, David R; Smith, Richard E; Singleton, Norman R

    2014-01-28

    An alignment plate that is attached to a core barrel of a pressurized water reactor and fits within slots within a top plate of a lower core shroud and upper core plate to maintain lateral alignment of the reactor internals. The alignment plate is connected to the core barrel through two vertically-spaced dowel pins that extend from the outside surface of the core barrel through a reinforcement pad and into corresponding holes in the alignment plate. Additionally, threaded fasteners are inserted around the perimeter of the reinforcement pad and into the alignment plate to further secure the alignment plate to the core barrel. A fillet weld also is deposited around the perimeter of the reinforcement pad. To accomodate thermal growth between the alignment plate and the core barrel, a gap is left above, below and at both sides of one of the dowel pins in the alignment plate holes through with the dowel pins pass.

  10. Heap: a highly sensitive and accurate SNP detection tool for low-coverage high-throughput sequencing data

    KAUST Repository

    Kobayashi, Masaaki

    2017-04-20

    Recent availability of large-scale genomic resources enables us to conduct so called genome-wide association studies (GWAS) and genomic prediction (GP) studies, particularly with next-generation sequencing (NGS) data. The effectiveness of GWAS and GP depends on not only their mathematical models, but the quality and quantity of variants employed in the analysis. In NGS single nucleotide polymorphism (SNP) calling, conventional tools ideally require more reads for higher SNP sensitivity and accuracy. In this study, we aimed to develop a tool, Heap, that enables robustly sensitive and accurate calling of SNPs, particularly with a low coverage NGS data, which must be aligned to the reference genome sequences in advance. To reduce false positive SNPs, Heap determines genotypes and calls SNPs at each site except for sites at the both ends of reads or containing a minor allele supported by only one read. Performance comparison with existing tools showed that Heap achieved the highest F-scores with low coverage (7X) restriction-site associated DNA sequencing reads of sorghum and rice individuals. This will facilitate cost-effective GWAS and GP studies in this NGS era. Code and documentation of Heap are freely available from https://github.com/meiji-bioinf/heap (29 March 2017, date last accessed) and our web site (http://bioinf.mind.meiji.ac.jp/lab/en/tools.html (29 March 2017, date last accessed)).

  11. RECAT - Redundant Channel Alignment Technique

    Science.gov (United States)

    2016-06-07

    distribution unlimited 13. SUPPLEMENTARY NOTES NUWC2015 14. ABSTRACT A problem in the analog-to- digital , (A/D), conversion of broadband tape recorded...Alignment Technique, is used to align data taken on one pass with data from any other pass. The accuracy of this alignment is a function of the digital ...Redundant Channel Alignment Technique; analog-to- digital ; A/D; Broadband Bearing Time Processing 16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF

  12. Microbial genomic taxonomy.

    Science.gov (United States)

    Thompson, Cristiane C; Chimetto, Luciane; Edwards, Robert A; Swings, Jean; Stackebrandt, Erko; Thompson, Fabiano L

    2013-12-23

    A need for a genomic species definition is emerging from several independent studies worldwide. In this commentary paper, we discuss recent studies on the genomic taxonomy of diverse microbial groups and a unified species definition based on genomics. Accordingly, strains from the same microbial species share >95% Average Amino Acid Identity (AAI) and Average Nucleotide Identity (ANI), >95% identity based on multiple alignment genes, genomic signature, and > 70% in silico Genome-to-Genome Hybridization similarity (GGDH). Species of the same genus will form monophyletic groups on the basis of 16S rRNA gene sequences, Multilocus Sequence Analysis (MLSA) and supertree analysis. In addition to the established requirements for species descriptions, we propose that new taxa descriptions should also include at least a draft genome sequence of the type strain in order to obtain a clear outlook on the genomic landscape of the novel microbe. The application of the new genomic species definition put forward here will allow researchers to use genome sequences to define simultaneously coherent phenotypic and genomic groups.

  13. Method for alignment of microwires

    Energy Technology Data Exchange (ETDEWEB)

    Beardslee, Joseph A.; Lewis, Nathan S.; Sadtler, Bryce

    2017-01-24

    A method of aligning microwires includes modifying the microwires so they are more responsive to a magnetic field. The method also includes using a magnetic field so as to magnetically align the microwires. The method can further include capturing the microwires in a solid support structure that retains the longitudinal alignment of the microwires when the magnetic field is not applied to the microwires.

  14. CAFE: aCcelerated Alignment-FrEe sequence analysis.

    Science.gov (United States)

    Lu, Yang Young; Tang, Kujin; Ren, Jie; Fuhrman, Jed A; Waterman, Michael S; Sun, Fengzhu

    2017-05-03

    Alignment-free genome and metagenome comparisons are increasingly important with the development of next generation sequencing (NGS) technologies. Recently developed state-of-the-art k-mer based alignment-free dissimilarity measures including CVTree, $d_2^*$ and $d_2^S$ are more computationally expensive than measures based solely on the k-mer frequencies. Here, we report a standalone software, aCcelerated Alignment-FrEe sequence analysis (CAFE), for efficient calculation of 28 alignment-free dissimilarity measures. CAFE allows for both assembled genome sequences and unassembled NGS shotgun reads as input, and wraps the output in a standard PHYLIP format. In downstream analyses, CAFE can also be used to visualize the pairwise dissimilarity measures, including dendrograms, heatmap, principal coordinate analysis and network display. CAFE serves as a general k-mer based alignment-free analysis platform for studying the relationships among genomes and metagenomes, and is freely available at https://github.com/younglululu/CAFE. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  15. Alignment Between Genetic and Physical Maps of Gibberella zeae

    Science.gov (United States)

    We previously published a genetic map of Gibberella zeae (Fusarium graminearum) based on a cross between Kansas strain Z-3639 (lineage 7) and Japanese strain R-5470 (lineage 6). In this study, that genetic map was aligned with the third assembly of the genomic sequence of G. zeae strain PH-1 (linea...

  16. VIGOR extended to annotate genomes for additional 12 different viruses.

    Science.gov (United States)

    Wang, Shiliang; Sundaram, Jaideep P; Stockwell, Timothy B

    2012-07-01

    A gene prediction program, VIGOR (Viral Genome ORF Reader), was developed at J. Craig Venter Institute in 2010 and has been successfully performing gene calling in coronavirus, influenza, rhinovirus and rotavirus for projects at the Genome Sequencing Center for Infectious Diseases. VIGOR uses sequence similarity search against custom protein databases to identify protein coding regions, start and stop codons and other gene features. Ribonucleicacid editing and other features are accurately identified based on sequence similarity and signature residues. VIGOR produces four output files: a gene prediction file, a complementary DNA file, an alignment file, and a gene feature table file. The gene feature table can be used to create GenBank submission. VIGOR takes a single input: viral genomic sequences in FASTA format. VIGOR has been extended to predict genes for 12 viruses: measles virus, mumps virus, rubella virus, respiratory syncytial virus, alphavirus and Venezuelan equine encephalitis virus, norovirus, metapneumovirus, yellow fever virus, Japanese encephalitis virus, parainfluenza virus and Sendai virus. VIGOR accurately detects the complex gene features like ribonucleicacid editing, stop codon leakage and ribosomal shunting. Precisely identifying the mat_peptide cleavage for some viruses is a built-in feature of VIGOR. The gene predictions for these viruses have been evaluated by testing from 27 to 240 genomes from GenBank.

  17. gmos: Rapid Detection of Genome Mosaicism over Short Evolutionary Distances.

    Science.gov (United States)

    Domazet-Lošo, Mirjana; Domazet-Lošo, Tomislav

    2016-01-01

    Prokaryotic and viral genomes are often altered by recombination and horizontal gene transfer. The existing methods for detecting recombination are primarily aimed at viral genomes or sets of loci, since the expensive computation of underlying statistical models often hinders the comparison of complete prokaryotic genomes. As an alternative, alignment-free solutions are more efficient, but cannot map (align) a query to subject genomes. To address this problem, we have developed gmos (Genome MOsaic Structure), a new program that determines the mosaic structure of query genomes when compared to a set of closely related subject genomes. The program first computes local alignments between query and subject genomes and then reconstructs the query mosaic structure by choosing the best local alignment for each query region. To accomplish the analysis quickly, the program mostly relies on pairwise alignments and constructs multiple sequence alignments over short overlapping subject regions only when necessary. This fine-tuned implementation achieves an efficiency comparable to an alignment-free tool. The program performs well for simulated and real data sets of closely related genomes and can be used for fast recombination detection; for instance, when a new prokaryotic pathogen is discovered. As an example, gmos was used to detect genome mosaicism in a pathogenic Enterococcus faecium strain compared to seven closely related genomes. The analysis took less than two minutes on a single 2.1 GHz processor. The output is available in fasta format and can be visualized using an accessory program, gmosDraw (freely available with gmos).

  18. Genomic libraries: I. Construction and screening of fosmid genomic libraries.

    Science.gov (United States)

    Quail, Mike A; Matthews, Lucy; Sims, Sarah; Lloyd, Christine; Beasley, Helen; Baxter, Simon W

    2011-01-01

    Large insert genome libraries have been a core resource required to sequence genomes, analyze haplotypes, and aid gene discovery. While next generation sequencing technologies are revolutionizing the field of genomics, traditional genome libraries will still be required for accurate genome assembly. Their utility is also being extended to functional studies for understanding DNA regulatory elements. Here, we present a detailed method for constructing genomic fosmid libraries, testing for common contaminants, gridding the library to nylon membranes, then hybridizing the library membranes with a radiolabeled probe to identify corresponding genomic clones. While this chapter focuses on fosmid libraries, many of these steps can also be applied to bacterial artificial chromosome libraries.

  19. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega.

    Science.gov (United States)

    Sievers, Fabian; Wilm, Andreas; Dineen, David; Gibson, Toby J; Karplus, Kevin; Li, Weizhong; Lopez, Rodrigo; McWilliam, Hamish; Remmert, Michael; Söding, Johannes; Thompson, Julie D; Higgins, Desmond G

    2011-10-11

    Multiple sequence alignments are fundamental to many sequence analysis methods. Most alignments are computed using the progressive alignment heuristic. These methods are starting to become a bottleneck in some analysis pipelines when faced with data sets of the size of many thousands of sequences. Some methods allow computation of larger data sets while sacrificing quality, and others produce high-quality alignments, but scale badly with the number of sequences. In this paper, we describe a new program called Clustal Omega, which can align virtually any number of protein sequences quickly and that delivers accurate alignments. The accuracy of the package on smaller test cases is similar to that of the high-quality aligners. On larger data sets, Clustal Omega outperforms other packages in terms of execution time and quality. Clustal Omega also has powerful features for adding sequences to and exploiting information in existing alignments, making use of the vast amount of precomputed information in public databases like Pfam.

  20. Alignment of suprathermally rotating grains

    Science.gov (United States)

    Lazarian, A.

    1995-12-01

    It is shown that mechanical alignment can be efficient for suprathermally rotating grains, provided that they drift with supersonic velocities. Such a drift should be widely spread due to both Alfvenic waves and ambipolar diffusion. Moreover, if suprathermal rotation is caused by grain interaction with a radiative flux, it is shown that mechanical alignment may be present even in the absence of supersonic drift. This means that the range of applicability of mechanical alignment is wider than generally accepted and that it can rival the paramagnetic one. We also study the latter mechanism and re-examine the interplay between poisoning of active sites and desorption of molecules blocking the access to the active sites of H_2 formation, in order to explain the observed poor alignment of small grains and good alignment of large grains. To obtain a more comprehensive picture of alignment, we briefly discuss the alignment by radiation fluxes and by grain magnetic moments.

  1. Semiautomated improvement of RNA alignments

    DEFF Research Database (Denmark)

    Andersen, Ebbe Sloth; Lind-Thomsen, Allan; Knudsen, Bjarne

    2007-01-01

    We have developed a semiautomated RNA sequence editor (SARSE) that integrates tools for analyzing RNA alignments. The editor highlights different properties of the alignment by color, and its integrated analysis tools prevent the introduction of errors when doing alignment editing. SARSE readily...... connects to external tools to provide a flexible semiautomatic editing environment. A new method, Pcluster, is introduced for dividing the sequences of an RNA alignment into subgroups with secondary structure differences. Pcluster was used to evaluate 574 seed alignments obtained from the Rfam database...... and we identified 71 alignments with significant prediction of inconsistent base pairs and 102 alignments with significant prediction of novel base pairs. Four RNA families were used to illustrate how SARSE can be used to manually or automatically correct the inconsistent base pairs detected by Pcluster...

  2. BIOACCESSIBILITY TESTS ACCURATELY ESTIMATE ...

    Science.gov (United States)

    Hazards of soil-borne Pb to wild birds may be more accurately quantified if the bioavailability of that Pb is known. To better understand the bioavailability of Pb to birds, we measured blood Pb concentrations in Japanese quail (Coturnix japonica) fed diets containing Pb-contaminated soils. Relative bioavailabilities were expressed by comparison with blood Pb concentrations in quail fed a Pb acetate reference diet. Diets containing soil from five Pb-contaminated Superfund sites had relative bioavailabilities from 33%-63%, with a mean of about 50%. Treatment of two of the soils with P significantly reduced the bioavailability of Pb. The bioaccessibility of the Pb in the test soils was then measured in six in vitro tests and regressed on bioavailability. They were: the “Relative Bioavailability Leaching Procedure” (RBALP) at pH 1.5, the same test conducted at pH 2.5, the “Ohio State University In vitro Gastrointestinal” method (OSU IVG), the “Urban Soil Bioaccessible Lead Test”, the modified “Physiologically Based Extraction Test” and the “Waterfowl Physiologically Based Extraction Test.” All regressions had positive slopes. Based on criteria of slope and coefficient of determination, the RBALP pH 2.5 and OSU IVG tests performed very well. Speciation by X-ray absorption spectroscopy demonstrated that, on average, most of the Pb in the sampled soils was sorbed to minerals (30%), bound to organic matter 24%, or present as Pb sulfate 18%. Ad

  3. Design of multiple sequence alignment algorithms on parallel, distributed memory supercomputers.

    Science.gov (United States)

    Church, Philip C; Goscinski, Andrzej; Holt, Kathryn; Inouye, Michael; Ghoting, Amol; Makarychev, Konstantin; Reumann, Matthias

    2011-01-01

    The challenge of comparing two or more genomes that have undergone recombination and substantial amounts of segmental loss and gain has recently been addressed for small numbers of genomes. However, datasets of hundreds of genomes are now common and their sizes will only increase in the future. Multiple sequence alignment of hundreds of genomes remains an intractable problem due to quadratic increases in compute time and memory footprint. To date, most alignment algorithms are designed for commodity clusters without parallelism. Hence, we propose the design of a multiple sequence alignment algorithm on massively parallel, distributed memory supercomputers to enable research into comparative genomics on large data sets. Following the methodology of the sequential progressiveMauve algorithm, we design data structures including sequences and sorted k-mer lists on the IBM Blue Gene/P supercomputer (BG/P). Preliminary results show that we can reduce the memory footprint so that we can potentially align over 250 bacterial genomes on a single BG/P compute node. We verify our results on a dataset of E.coli, Shigella and S.pneumoniae genomes. Our implementation returns results matching those of the original algorithm but in 1/2 the time and with 1/4 the memory footprint for scaffold building. In this study, we have laid the basis for multiple sequence alignment of large-scale datasets on a massively parallel, distributed memory supercomputer, thus enabling comparison of hundreds instead of a few genome sequences within reasonable time.

  4. CELT optics Alignment Procedure

    Science.gov (United States)

    Mast, Terry S.; Nelson, Jerry E.; Chanan, Gary A.; Noethe, Lothar

    2003-01-01

    The California Extremely Large Telescope (CELT) is a project to build a 30-meter diameter telescope for research in astronomy at visible and infrared wavelengths. The current optical design calls for a primary, secondary, and tertiary mirror with Ritchey-Chretién foci at two Nasmyth platforms. The primary mirror is a mosaic of 1080 actively-stabilized hexagonal segments. This paper summarizes a CELT report that describes a step-by-step procedure for aligning the many degrees of freedom of the CELT optics.

  5. ATLAS Inner Detector Alignment

    CERN Document Server

    Bocci, A

    2008-01-01

    The ATLAS experiment is a multi-purpose particle detector that will study high-energy particle collisions produced by the Large Hadron Collider at CERN. In order to achieve its physics goals, the ATLAS tracking requires that the positions of the silicon detector elements have to be known to a precision better than 10 μm. Several track-based alignment algorithms have been developed for the Inner Detector. An extensive validation has been performed with simulated events and real data coming from the ATLAS. Results from such validation are reported in this paper.

  6. TSGC and JSC Alignment

    Science.gov (United States)

    Sanchez, Humberto

    2013-01-01

    NASA and the SGCs are, by design, intended to work closely together and have synergistic Vision, Mission, and Goals. The TSGC affiliates and JSC have been working together, but not always in a concise, coordinated, nor strategic manner. Today we have a couple of simple ideas to present about how TSGC and JSC have started to work together in a more concise, coordinated, and strategic manner, and how JSC and non-TSG Jurisdiction members have started to collaborate: Idea I: TSGC and JSC Technical Alignment Idea II: Concept of Clusters.

  7. Comparison of Two Forced Alignment Systems for Aligning Bribri Speech

    Directory of Open Access Journals (Sweden)

    Rolando Coto-Solano

    2017-04-01

    Full Text Available Forced alignment provides drastic savings in time when aligning speech recordings and is particularly useful for the study of Indigenous languages, which are severely under-resourced in corpora and models. Here we compare two forced alignment systems, FAVE-align and EasyAlign, to determine which one provides more precision when processing running speech in the Chibchan language Bribri. We aligned a segment of a story narrated in Bribri and compared the errors in finding the center of the words and the edges of phonemes when compared with the manual correction. FAVE-align showed better performance: It has an error of 7% compared to 24% with EasyAlign when finding the center of words, and errors of 22~24 ms when finding the edges of phonemes, compared to errors of 86~130 ms with EasyAlign. In addition to this, EasyAlign failed to detect 7% of phonemes, while also inserting 58 spurious phones into the transcription. Future research includes verifying these results for other genres and other Chibchan languages. Finally, these results provide additional evidence for the applicability of natural language processing methods to Chibchan languages and point to future work such as the construction of corpora and the training of automated speech recognition systems.

  8. An Exact Mathematical Programming Approach to Multiple RNA Sequence-Structure Alignment

    NARCIS (Netherlands)

    Bauer, M.; Klau, G.W.; Reinert, K.

    2008-01-01

    One of the main tasks in computational biology is the computation of alignments of genomic sequences to reveal their commonalities. In case of DNA or protein sequences, sequence information alone is usually sufficient to compute reliable alignments. RNA molecules, however, build spatial confor

  9. All about alignment

    CERN Multimedia

    2006-01-01

    The ALICE absorbers, iron wall and superstructure have been installed with great precision. The ALICE front absorber, positioned in the centre of the detector, has been installed and aligned. Weighing more than 400 tonnes, the ALICE absorbers and the surrounding support structures have been installed and aligned with a precision of 1-2 mm, hardly an easy task but a very important one. The ALICE absorbers are made of three parts: the front absorber, a 35-tonne cone-shaped structure, and two small-angle absorbers, long straight cylinder sections weighing 18 and 40 tonnes. The three pieces lined up have a total length of about 17 m. In addition to these, ALICE technicians have installed a 300-tonne iron filter wall made of blocks that fit together like large Lego pieces and a surrounding metal support structure to hold the tracking and trigger chambers. The absorbers house the vacuum chamber and are also the reference surface for the positioning of the tracking and trigger chambers. For this reason, the ab...

  10. High-throughput sequence alignment using Graphics Processing Units

    Directory of Open Access Journals (Sweden)

    Trapnell Cole

    2007-12-01

    Full Text Available Abstract Background The recent availability of new, less expensive high-throughput DNA sequencing technologies has yielded a dramatic increase in the volume of sequence data that must be analyzed. These data are being generated for several purposes, including genotyping, genome resequencing, metagenomics, and de novo genome assembly projects. Sequence alignment programs such as MUMmer have proven essential for analysis of these data, but researchers will need ever faster, high-throughput alignment tools running on inexpensive hardware to keep up with new sequence technologies. Results This paper describes MUMmerGPU, an open-source high-throughput parallel pairwise local sequence alignment program that runs on commodity Graphics Processing Units (GPUs in common workstations. MUMmerGPU uses the new Compute Unified Device Architecture (CUDA from nVidia to align multiple query sequences against a single reference sequence stored as a suffix tree. By processing the queries in parallel on the highly parallel graphics card, MUMmerGPU achieves more than a 10-fold speedup over a serial CPU version of the sequence alignment kernel, and outperforms the exact alignment component of MUMmer on a high end CPU by 3.5-fold in total application time when aligning reads from recent sequencing projects using Solexa/Illumina, 454, and Sanger sequencing technologies. Conclusion MUMmerGPU is a low cost, ultra-fast sequence alignment program designed to handle the increasing volume of data produced by new, high-throughput sequencing technologies. MUMmerGPU demonstrates that even memory-intensive applications can run significantly faster on the relatively low-cost GPU than on the CPU.

  11. Testing the tidal alignment model of galaxy intrinsic alignment

    CERN Document Server

    Blazek, Jonathan; Seljak, Uros

    2011-01-01

    Weak gravitational lensing has become a powerful probe of large-scale structure and cosmological parameters. Precision weak lensing measurements require an understanding of the intrinsic alignment of galaxy ellipticities, which can in turn inform models of galaxy formation. It is hypothesized that elliptical galaxies align with the background tidal field and that this alignment mechanism dominates the correlation between ellipticities on cosmological scales (in the absence of lensing). We use recent large-scale structure measurements from the Sloan Digital Sky Survey to test this picture with several statistics: (1) the correlation between ellipticity and galaxy overdensity, w_{g+}; (2) the intrinsic alignment auto-correlation functions; (3) the correlation functions of curl-free, E, and divergence-free, B, modes (the latter of which is zero in the linear tidal alignment theory); (4) the alignment correlation function, w_g(r_p,theta), a recently developed statistic that generalizes the galaxy correlation func...

  12. Pareto optimal pairwise sequence alignment.

    Science.gov (United States)

    DeRonne, Kevin W; Karypis, George

    2013-01-01

    Sequence alignment using evolutionary profiles is a commonly employed tool when investigating a protein. Many profile-profile scoring functions have been developed for use in such alignments, but there has not yet been a comprehensive study of Pareto optimal pairwise alignments for combining multiple such functions. We show that the problem of generating Pareto optimal pairwise alignments has an optimal substructure property, and develop an efficient algorithm for generating Pareto optimal frontiers of pairwise alignments. All possible sets of two, three, and four profile scoring functions are used from a pool of 11 functions and applied to 588 pairs of proteins in the ce_ref data set. The performance of the best objective combinations on ce_ref is also evaluated on an independent set of 913 protein pairs extracted from the BAliBASE RV11 data set. Our dynamic-programming-based heuristic approach produces approximated Pareto optimal frontiers of pairwise alignments that contain comparable alignments to those on the exact frontier, but on average in less than 1/58th the time in the case of four objectives. Our results show that the Pareto frontiers contain alignments whose quality is better than the alignments obtained by single objectives. However, the task of identifying a single high-quality alignment among those in the Pareto frontier remains challenging.

  13. Research on localization and alignment technology for transfer cask

    Energy Technology Data Exchange (ETDEWEB)

    Wang, Jingchuan, E-mail: jchwang@sjtu.edu.cn [Department of Automation, Shanghai Jiao Tong University, Shanghai (China); Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai (China); Yang, Ming; Chen, Weidong [Department of Automation, Shanghai Jiao Tong University, Shanghai (China); Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai (China)

    2015-10-15

    Highlights: • A method for the alignment between TB and HCB based on localizability is proposed. • A localization method based on the localizability estimation is proposed to realize the cask's localization accurately and ensures the transfer cask's accurate docking in the front of the window of Tokmak Building. • The experimental results show that the proposed algorithm works well in the indoor simulation environment. This system will be test in EAST of China. - Abstract: According to the long length characteristics of transfer cask compared to the environment space between Tokmak Building (TB) and HCB (Hot Cell Building), this paper proposes an autonomous localization and alignment method for the internal components transportation and replacement. A localization method based on the localizability estimation is used to realize the cask's localization and navigation accurately. Once the cask arrives at the front of the TB window, the position and attitude measurement system is used to detect the relative alignment error between the seal door of pallet and the window of TB real-time. The alignment between seal door and TB window could be realized based on this offset. The simulation experiment based on the real model is designed according to the real TB situation. The experiment results show that the proposed localization and alignment method can be used for transfer cask.

  14. Accurate structural correlations from maximum likelihood superpositions.

    Directory of Open Access Journals (Sweden)

    Douglas L Theobald

    2008-02-01

    Full Text Available The cores of globular proteins are densely packed, resulting in complicated networks of structural interactions. These interactions in turn give rise to dynamic structural correlations over a wide range of time scales. Accurate analysis of these complex correlations is crucial for understanding biomolecular mechanisms and for relating structure to function. Here we report a highly accurate technique for inferring the major modes of structural correlation in macromolecules using likelihood-based statistical analysis of sets of structures. This method is generally applicable to any ensemble of related molecules, including families of nuclear magnetic resonance (NMR models, different crystal forms of a protein, and structural alignments of homologous proteins, as well as molecular dynamics trajectories. Dominant modes of structural correlation are determined using principal components analysis (PCA of the maximum likelihood estimate of the correlation matrix. The correlations we identify are inherently independent of the statistical uncertainty and dynamic heterogeneity associated with the structural coordinates. We additionally present an easily interpretable method ("PCA plots" for displaying these positional correlations by color-coding them onto a macromolecular structure. Maximum likelihood PCA of structural superpositions, and the structural PCA plots that illustrate the results, will facilitate the accurate determination of dynamic structural correlations analyzed in diverse fields of structural biology.

  15. Computational design and engineering of polymeric orthodontic aligners.

    Science.gov (United States)

    Barone, S; Paoli, A; Razionale, A V; Savignano, R

    2016-10-05

    Transparent and removable aligners represent an effective solution to correct various orthodontic malocclusions through minimally invasive procedures. An aligner-based treatment requires patients to sequentially wear dentition-mating shells obtained by thermoforming polymeric disks on reference dental models. An aligner is shaped introducing a geometrical mismatch with respect to the actual tooth positions to induce a loading system, which moves the target teeth toward the correct positions. The common practice is based on selecting the aligner features (material, thickness, and auxiliary elements) by only considering clinician's subjective assessments. In this article, a computational design and engineering methodology has been developed to reconstruct anatomical tissues, to model parametric aligner shapes, to simulate orthodontic movements, and to enhance the aligner design. The proposed approach integrates computer-aided technologies, from tomographic imaging to optical scanning, from parametric modeling to finite element analyses, within a 3-dimensional digital framework. The anatomical modeling provides anatomies, including teeth (roots and crowns), jaw bones, and periodontal ligaments, which are the references for the down streaming parametric aligner shaping. The biomechanical interactions between anatomical models and aligner geometries are virtually reproduced using a finite element analysis software. The methodology allows numerical simulations of patient-specific conditions and the comparative analyses of different aligner configurations. In this article, the digital framework has been used to study the influence of various auxiliary elements on the loading system delivered to a maxillary and a mandibular central incisor during an orthodontic tipping movement. Numerical simulations have shown a high dependency of the orthodontic tooth movement on the auxiliary element configuration, which should then be accurately selected to maximize the aligner

  16. Correction of the Caulobacter crescentus NA1000 genome annotation.

    Directory of Open Access Journals (Sweden)

    Bert Ely

    Full Text Available Bacterial genome annotations are accumulating rapidly in the GenBank database and the use of automated annotation technologies to create these annotations has become the norm. However, these automated methods commonly result in a small, but significant percentage of genome annotation errors. To improve accuracy and reliability, we analyzed the Caulobacter crescentus NA1000 genome utilizing computer programs Artemis and MICheck to manually examine the third codon position GC content, alignment to a third codon position GC frame plot peak, and matches in the GenBank database. We identified 11 new genes, modified the start site of 113 genes, and changed the reading frame of 38 genes that had been incorrectly annotated. Furthermore, our manual method of identifying protein-coding genes allowed us to remove 112 non-coding regions that had been designated as coding regions. The improved NA1000 genome annotation resulted in a reduction in the use of rare codons since noncoding regions with atypical codon usage were removed from the annotation and 49 new coding regions were added to the annotation. Thus, a more accurate codon usage table was generated as well. These results demonstrate that a comparison of the location of peaks third codon position GC content to the location of protein coding regions could be used to verify the annotation of any genome that has a GC content that is greater than 60%.

  17. Correction of the Caulobacter crescentus NA1000 genome annotation.

    Science.gov (United States)

    Ely, Bert; Scott, LaTia Etheredge

    2014-01-01

    Bacterial genome annotations are accumulating rapidly in the GenBank database and the use of automated annotation technologies to create these annotations has become the norm. However, these automated methods commonly result in a small, but significant percentage of genome annotation errors. To improve accuracy and reliability, we analyzed the Caulobacter crescentus NA1000 genome utilizing computer programs Artemis and MICheck to manually examine the third codon position GC content, alignment to a third codon position GC frame plot peak, and matches in the GenBank database. We identified 11 new genes, modified the start site of 113 genes, and changed the reading frame of 38 genes that had been incorrectly annotated. Furthermore, our manual method of identifying protein-coding genes allowed us to remove 112 non-coding regions that had been designated as coding regions. The improved NA1000 genome annotation resulted in a reduction in the use of rare codons since noncoding regions with atypical codon usage were removed from the annotation and 49 new coding regions were added to the annotation. Thus, a more accurate codon usage table was generated as well. These results demonstrate that a comparison of the location of peaks third codon position GC content to the location of protein coding regions could be used to verify the annotation of any genome that has a GC content that is greater than 60%.

  18. The diploid genome sequence of an Asian individual

    DEFF Research Database (Denmark)

    Wang, Jun; Wang, Wei; Li, Ruiqiang

    2008-01-01

    Here we present the first diploid genome sequence of an Asian individual. The genome was sequenced to 36-fold average coverage using massively parallel sequencing technology. We aligned the short reads onto the NCBI human reference genome to 99.97% coverage, and guided by the reference genome, we...

  19. Onorbit IMU alignment error budget

    Science.gov (United States)

    Corson, R. W.

    1980-01-01

    The Star Tracker, Crew Optical Alignment Sight (COAS), and Inertial Measurement Unit (IMU) from a complex navigation system with a multitude of error sources were combined. A complete list of the system errors is presented. The errors were combined in a rational way to yield an estimate of the IMU alignment accuracy for STS-1. The expected standard deviation in the IMU alignment error for STS-1 type alignments was determined to be 72 arc seconds per axis for star tracker alignments and 188 arc seconds per axis for COAS alignments. These estimates are based on current knowledge of the star tracker, COAS, IMU, and navigation base error specifications, and were partially verified by preliminary Monte Carlo analysis.

  20. Groundwater recharge: Accurately representing evapotranspiration

    CSIR Research Space (South Africa)

    Bugan, Richard DH

    2011-09-01

    Full Text Available Groundwater recharge is the basis for accurate estimation of groundwater resources, for determining the modes of water allocation and groundwater resource susceptibility to climate change. Accurate estimations of groundwater recharge with models...

  1. HBLAST: Parallelised sequence similarity--A Hadoop MapReducable basic local alignment search tool.

    Science.gov (United States)

    O'Driscoll, Aisling; Belogrudov, Vladislav; Carroll, John; Kropp, Kai; Walsh, Paul; Ghazal, Peter; Sleator, Roy D

    2015-04-01

    The recent exponential growth of genomic databases has resulted in the common task of sequence alignment becoming one of the major bottlenecks in the field of computational biology. It is typical for these large datasets and complex computations to require cost prohibitive High Performance Computing (HPC) to function. As such, parallelised solutions have been proposed but many exhibit scalability limitations and are incapable of effectively processing "Big Data" - the name attributed to datasets that are extremely large, complex and require rapid processing. The Hadoop framework, comprised of distributed storage and a parallelised programming framework known as MapReduce, is specifically designed to work with such datasets but it is not trivial to efficiently redesign and implement bioinformatics algorithms according to this paradigm. The parallelisation strategy of "divide and conquer" for alignment algorithms can be applied to both data sets and input query sequences. However, scalability is still an issue due to memory constraints or large databases, with very large database segmentation leading to additional performance decline. Herein, we present Hadoop Blast (HBlast), a parallelised BLAST algorithm that proposes a flexible method to partition both databases and input query sequences using "virtual partitioning". HBlast presents improved scalability over existing solutions and well balanced computational work load while keeping database segmentation and recompilation to a minimum. Enhanced BLAST search performance on cheap memory constrained hardware has significant implications for in field clinical diagnostic testing; enabling faster and more accurate identification of pathogenic DNA in human blood or tissue samples. Copyright © 2015 Elsevier Inc. All rights reserved.

  2. Catalyzing alignment processes

    DEFF Research Database (Denmark)

    Lauridsen, Erik Hagelskjær; Jørgensen, Ulrik

    2004-01-01

    in societal and industrial environmental awareness and improvements. The coordination of these elements – covered by the notion of coherence – is seen as the most important mechanism for bringing about a change in environmental impact. The elements comprise of regulatory regimes and available technology......, the networks of environmental professionals that work in the environmental organisation, in consulting and regulatory enforcement, and dominating business cultures. These have previously been identified in the literature as individually significant in relation to the evolving environmental agendas...... time and in combination with other social processes establish more aligned and standardized environmental performance between countries. However, examples of the introduction of environmental management suggests that EMS’ only plays a minor role in developing the actual environmental objectives...

  3. PATtyFams: Protein families for the microbial genomes in the PATRIC database

    Directory of Open Access Journals (Sweden)

    James J Davis

    2016-02-01

    Full Text Available The ability to build accurate protein families is a fundamental operation in bioinformatics that influences comparative analyses, genome annotation and metabolic modeling. For several years we have been maintaining protein families for all microbial genomes in the PATRIC database (Pathosystems Resource Integration Center, patricbrc.org in order to drive many of the comparative analysis tools that are available through the PATRIC website. However, due to the burgeoning number of genomes, traditional approaches for generating protein families are becoming prohibitive. In this report, we describe a new approach for generating protein families, which we call PATtyFams. This method uses the k-mer-based function assignments available through RAST (Rapid Annotation using Subsystem Technology to rapidly guide family formation, and then differentiates the function-based groups into families using a Markov Cluster algorithm (MCL. This new approach for generating protein families is rapid, scalable and has properties that are consistent with alignment-based methods.

  4. Lunar Alignments - Identification and Analysis

    Science.gov (United States)

    González-García, A. César

    Lunar alignments are difficult to establish given the apparent lack of written accounts clearly pointing toward lunar alignments for individual temples. While some individual cases are reviewed and highlighted, the weight of the proof must fall on statistical sampling. Some definitions for the lunar alignments are provided in order to clarify the targets, and thus, some new tools are provided to try to test the lunar hypothesis in several cases, especially in megalithic astronomy.

  5. GraphAlignment: Bayesian pairwise alignment of biological networks

    Directory of Open Access Journals (Sweden)

    Kolář Michal

    2012-11-01

    Full Text Available Abstract Background With increased experimental availability and accuracy of bio-molecular networks, tools for their comparative and evolutionary analysis are needed. A key component for such studies is the alignment of networks. Results We introduce the Bioconductor package GraphAlignment for pairwise alignment of bio-molecular networks. The alignment incorporates information both from network vertices and network edges and is based on an explicit evolutionary model, allowing inference of all scoring parameters directly from empirical data. We compare the performance of our algorithm to an alternative algorithm, Græmlin 2.0. On simulated data, GraphAlignment outperforms Græmlin 2.0 in several benchmarks except for computational complexity. When there is little or no noise in the data, GraphAlignment is slower than Græmlin 2.0. It is faster than Græmlin 2.0 when processing noisy data containing spurious vertex associations. Its typical case complexity grows approximately as O(N2.6. On empirical bacterial protein-protein interaction networks (PIN and gene co-expression networks, GraphAlignment outperforms Græmlin 2.0 with respect to coverage and specificity, albeit by a small margin. On large eukaryotic PIN, Græmlin 2.0 outperforms GraphAlignment. Conclusions The GraphAlignment algorithm is robust to spurious vertex associations, correctly resolves paralogs, and shows very good performance in identification of homologous vertices defined by high vertex and/or interaction similarity. The simplicity and generality of GraphAlignment edge scoring makes the algorithm an appropriate choice for global alignment of networks.

  6. MANGO: a new approach to multiple sequence alignment.

    Science.gov (United States)

    Zhang, Zefeng; Lin, Hao; Li, Ming

    2007-01-01

    Multiple sequence alignment is a classical and challenging task for biological sequence analysis. The problem is NP-hard. The full dynamic programming takes too much time. The progressive alignment heuristics adopted by most state of the art multiple sequence alignment programs suffer from the 'once a gap, always a gap' phenomenon. Is there a radically new way to do multiple sequence alignment? This paper introduces a novel and orthogonal multiple sequence alignment method, using multiple optimized spaced seeds and new algorithms to handle these seeds efficiently. Our new algorithm processes information of all sequences as a whole, avoiding problems caused by the popular progressive approaches. Because the optimized spaced seeds are provably significantly more sensitive than the consecutive k-mers, the new approach promises to be more accurate and reliable. To validate our new approach, we have implemented MANGO: Multiple Alignment with N Gapped Oligos. Experiments were carried out on large 16S RNA benchmarks showing that MANGO compares favorably, in both accuracy and speed, against state-of-art multiple sequence alignment methods, including ClustalW 1.83, MUSCLE 3.6, MAFFT 5.861, Prob-ConsRNA 1.11, Dialign 2.2.1, DIALIGN-T 0.2.1, T-Coffee 4.85, POA 2.0 and Kalign 2.0.

  7. Genome-Wide Association Mapping and Genomic Selection for Alfalfa (Medicago sativa) Forage Quality Traits.

    Science.gov (United States)

    Biazzi, Elisa; Nazzicari, Nelson; Pecetti, Luciano; Brummer, E Charles; Palmonari, Alberto; Tava, Aldo; Annicchiarico, Paolo

    2017-01-01

    Genetic progress for forage quality has been poor in alfalfa (Medicago sativa L.), the most-grown forage legume worldwide. This study aimed at exploring opportunities for marker-assisted selection (MAS) and genomic selection of forage quality traits based on breeding values of parent plants. Some 154 genotypes from a broadly-based reference population were genotyped by genotyping-by-sequencing (GBS), and phenotyped for leaf-to-stem ratio, leaf and stem contents of protein, neutral detergent fiber (NDF) and acid detergent lignin (ADL), and leaf and stem NDF digestibility after 24 hours (NDFD), of their dense-planted half-sib progenies in three growing conditions (summer harvest, full irrigation; summer harvest, suspended irrigation; autumn harvest). Trait-marker analyses were performed on progeny values averaged over conditions, owing to modest germplasm × condition interaction. Genomic selection exploited 11,450 polymorphic SNP markers, whereas a subset of 8,494 M. truncatula-aligned markers were used for a genome-wide association study (GWAS). GWAS confirmed the polygenic control of quality traits and, in agreement with phenotypic correlations, indicated substantially different genetic control of a given trait in stems and leaves. It detected several SNPs in different annotated genes that were highly linked to stem protein content. Also, it identified a small genomic region on chromosome 8 with high concentration of annotated genes associated with leaf ADL, including one gene probably involved in the lignin pathway. Three genomic selection models, i.e., Ridge-regression BLUP, Bayes B and Bayesian Lasso, displayed similar prediction accuracy, whereas SVR-lin was less accurate. Accuracy values were moderate (0.3-0.4) for stem NDFD and leaf protein content, modest for leaf ADL and NDFD, and low to very low for the other traits. Along with previous results for the same germplasm set, this study indicates that GBS data can be exploited to improve both quality traits

  8. Apparatus for accurately measuring high temperatures

    Science.gov (United States)

    Smith, D.D.

    The present invention is a thermometer used for measuring furnace temperatures in the range of about 1800/sup 0/ to 2700/sup 0/C. The thermometer comprises a broadband multicolor thermal radiation sensor positioned to be in optical alignment with the end of a blackbody sight tube extending into the furnace. A valve-shutter arrangement is positioned between the radiation sensor and the sight tube and a chamber for containing a charge of high pressure gas is positioned between the valve-shutter arrangement and the radiation sensor. A momentary opening of the valve shutter arrangement allows a pulse of the high gas to purge the sight tube of air-borne thermal radiation contaminants which permits the radiation sensor to accurately measure the thermal radiation emanating from the end of the sight tube.

  9. Mask alignment system for semiconductor processing

    Energy Technology Data Exchange (ETDEWEB)

    Webb, Aaron P.; Carlson, Charles T.; Weaver, William T.; Grant, Christopher N.

    2017-02-14

    A mask alignment system for providing precise and repeatable alignment between ion implantation masks and workpieces. The system includes a mask frame having a plurality of ion implantation masks loosely connected thereto. The mask frame is provided with a plurality of frame alignment cavities, and each mask is provided with a plurality of mask alignment cavities. The system further includes a platen for holding workpieces. The platen may be provided with a plurality of mask alignment pins and frame alignment pins configured to engage the mask alignment cavities and frame alignment cavities, respectively. The mask frame can be lowered onto the platen, with the frame alignment cavities moving into registration with the frame alignment pins to provide rough alignment between the masks and workpieces. The mask alignment cavities are then moved into registration with the mask alignment pins, thereby shifting each individual mask into precise alignment with a respective workpiece.

  10. Slider--maximum use of probability information for alignment of short sequence reads and SNP detection.

    Science.gov (United States)

    Malhis, Nawar; Butterfield, Yaron S N; Ester, Martin; Jones, Steven J M

    2009-01-01

    A plethora of alignment tools have been created that are designed to best fit different types of alignment conditions. While some of these are made for aligning Illumina Sequence Analyzer reads, none of these are fully utilizing its probability (prb) output. In this article, we will introduce a new alignment approach (Slider) that reduces the alignment problem space by utilizing each read base's probabilities given in the prb files. Compared with other aligners, Slider has higher alignment accuracy and efficiency. In addition, given that Slider matches bases with probabilities other than the most probable, it significantly reduces the percentage of base mismatches. The result is that its SNP predictions are more accurate than other SNP prediction approaches used today that start from the most probable sequence, including those using base quality.

  11. Systematic evaluation of spliced alignment programs for RNA-seq data.

    Science.gov (United States)

    Engström, Pär G; Steijger, Tamara; Sipos, Botond; Grant, Gregory R; Kahles, André; Rätsch, Gunnar; Goldman, Nick; Hubbard, Tim J; Harrow, Jennifer; Guigó, Roderic; Bertone, Paul

    2013-12-01

    High-throughput RNA sequencing is an increasingly accessible method for studying gene structure and activity on a genome-wide scale. A critical step in RNA-seq data analysis is the alignment of partial transcript reads to a reference genome sequence. To assess the performance of current mapping software, we invited developers of RNA-seq aligners to process four large human and mouse RNA-seq data sets. In total, we compared 26 mapping protocols based on 11 programs and pipelines and found major performance differences between methods on numerous benchmarks, including alignment yield, basewise accuracy, mismatch and gap placement, exon junction discovery and suitability of alignments for transcript reconstruction. We observed concordant results on real and simulated RNA-seq data, confirming the relevance of the metrics employed. Future developments in RNA-seq alignment methods would benefit from improved placement of multimapped reads, balanced utilization of existing gene annotation and a reduced false discovery rate for splice junctions.

  12. RNA Structural Alignments, Part I

    DEFF Research Database (Denmark)

    Havgaard, Jakob Hull; Gorodkin, Jan

    2014-01-01

    Simultaneous alignment and secondary structure prediction of RNA sequences is often referred to as "RNA structural alignment." A class of the methods for structural alignment is based on the principles proposed by Sankoff more than 25 years ago. The Sankoff algorithm simultaneously folds and alig...... the methods based on the Sankoff algorithm. All the practical implementations of the algorithm use heuristics to make them run in reasonable time and memory. These heuristics are also described in this chapter.......Simultaneous alignment and secondary structure prediction of RNA sequences is often referred to as "RNA structural alignment." A class of the methods for structural alignment is based on the principles proposed by Sankoff more than 25 years ago. The Sankoff algorithm simultaneously folds and aligns...... two or more sequences. The advantage of this algorithm over those that separate the folding and alignment steps is that it makes better predictions. The disadvantage is that it is slower and requires more computer memory to run. The amount of computational resources needed to run the Sankoff algorithm...

  13. Lexical alignment in triadic communication.

    Science.gov (United States)

    Foltz, Anouschka; Gaspers, Judith; Thiele, Kristina; Stenneken, Prisca; Cimiano, Philipp

    2015-01-01

    Lexical alignment refers to the adoption of one's interlocutor's lexical items. Accounts of the mechanisms underlying such lexical alignment differ (among other aspects) in the role assigned to addressee-centered behavior. In this study, we used a triadic communicative situation to test which factors may modulate the extent to which participants' lexical alignment reflects addressee-centered behavior. Pairs of naïve participants played a picture matching game and received information about the order in which pictures were to be matched from a voice over headphones. On critical trials, participants did or did not hear a name for the picture to be matched next over headphones. Importantly, when the voice over headphones provided a name, it did not match the name that the interlocutor had previously used to describe the object. Participants overwhelmingly used the word that the voice over headphones provided. This result points to non-addressee-centered behavior and is discussed in terms of disrupting alignment with the interlocutor as well as in terms of establishing alignment with the voice over headphones. In addition, the type of picture (line drawing vs. tangram shape) independently modulated lexical alignment, such that participants showed more lexical alignment to their interlocutor for (more ambiguous) tangram shapes compared to line drawings. Overall, the results point to a rather large role for non-addressee-centered behavior during lexical alignment.

  14. CATO: The Clone Alignment Tool.

    Directory of Open Access Journals (Sweden)

    Peter V Henstock

    Full Text Available High-throughput cloning efforts produce large numbers of sequences that need to be aligned, edited, compared with reference sequences, and organized as files and selected clones. Different pieces of software are typically required to perform each of these tasks. We have designed a single piece of software, CATO, the Clone Alignment Tool, that allows a user to align, evaluate, edit, and select clone sequences based on comparisons to reference sequences. The input and output are designed to be compatible with standard data formats, and thus suitable for integration into a clone processing pipeline. CATO provides both sequence alignment and visualizations to facilitate the analysis of cloning experiments. The alignment algorithm matches each of the relevant candidate sequences against each reference sequence. The visualization portion displays three levels of matching: 1 a top-level summary of the top candidate sequences aligned to each reference sequence, 2 a focused alignment view with the nucleotides of matched sequences displayed against one reference sequence, and 3 a pair-wise alignment of a single reference and candidate sequence pair. Users can select the minimum matching criteria for valid clones, edit or swap reference sequences, and export the results to a summary file as part of the high-throughput cloning workflow.

  15. CATO: The Clone Alignment Tool.

    Science.gov (United States)

    Henstock, Peter V; LaPan, Peter

    2016-01-01

    High-throughput cloning efforts produce large numbers of sequences that need to be aligned, edited, compared with reference sequences, and organized as files and selected clones. Different pieces of software are typically required to perform each of these tasks. We have designed a single piece of software, CATO, the Clone Alignment Tool, that allows a user to align, evaluate, edit, and select clone sequences based on comparisons to reference sequences. The input and output are designed to be compatible with standard data formats, and thus suitable for integration into a clone processing pipeline. CATO provides both sequence alignment and visualizations to facilitate the analysis of cloning experiments. The alignment algorithm matches each of the relevant candidate sequences against each reference sequence. The visualization portion displays three levels of matching: 1) a top-level summary of the top candidate sequences aligned to each reference sequence, 2) a focused alignment view with the nucleotides of matched sequences displayed against one reference sequence, and 3) a pair-wise alignment of a single reference and candidate sequence pair. Users can select the minimum matching criteria for valid clones, edit or swap reference sequences, and export the results to a summary file as part of the high-throughput cloning workflow.

  16. Development of a new laser alignment device with Winston-Lutz phantom in radiotherapy

    Energy Technology Data Exchange (ETDEWEB)

    Lim, Young Kyung; Min, Soonk; Jeong, Eun Hee; Jeong, Jong Hwi; Kim, Haksoo; Park, Jeong-Hoon; Shin, DongHo; Lee, Se Byeong [National Cancer Center, Goyang (Korea, Republic of); Choi, Sang Hyoun [Korea Cancer Center Hospital, Seoul (Korea, Republic of); Hwang, Ui-Jung [National Medical Center, Seoul (Korea, Republic of); Kwak, Jung Won [Asan Medical Center, Seoul (Korea, Republic of); Kim, Siyong [Virginia Commonwealth University, Richmond (United States)

    2015-10-15

    The lasers must be aligned precisely to the radiation isocenter. According to the report provided by the American Association of Physicists in Medicine (AAPM) Task Group 142, the localizing lasers should be aligned to within ±2 mm of radiation isocenter for non intensity modulated radiation therapy (IMRT), ±1 mm for IMRT, and less than ±1 mm for stereotactic radiosurgery (SRS) on a monthly basis. In this study, we developed and tested a new laser alignment device adopting an accurate, reproducible and straightforward alignment method in radiotherapy. The device consists of two laser alignments parts: the first part is an optical alignment part, and the second is a radiation alignment part. In the radiation alignment, a Winston-Lutz (W-L) phantom which was installed in the device was used. In this study, we developed a new laser alignment device with a W-L phantom for radiotherapy. Its performance was also tested in a conventional medical linac and a simulator. It was revealed that the device could align the patient-setup lasers in the treatment room accurately, precisely, and fast. We expect the device can be used as a quality assurance tool daily and monthly.

  17. Alignments in the nobelium isotopes

    Institute of Scientific and Technical Information of China (English)

    ZHENG Shi-Zie; XU Fu-Rong; YUAN Cen-Xi; QI Chong

    2009-01-01

    Total-Routhian-Surface calculations have been performed to investigate the deformation and align-ment properties of the No isotopes. It is found that normal deformed and superdeformed states in these nuclei can coexist at low excitation energies. In neutron-deficient No isotopes, the superdeformed shapes can even become the ground states. Moreover, we plotted the kinematic moments of inertia of the No isotopes, which follow very nicely available experimental data. It is noted that, as the rotational frequency increases, align-ments develop at hω=0.2-0.3 MeV. Our calculations show that the occupation of the vj orbital plays an important role in the alignments of the No isotopes.

  18. Alignment of flexible protein structures.

    Science.gov (United States)

    Shatsky, M; Fligelman, Z Y; Nussinov, R; Wolfson, H J

    2000-01-01

    We present two algorithms which align flexible protein structures. Both apply efficient structural pattern detection and graph theoretic techniques. The FlexProt algorithm simultaneously detects the hinge regions and aligns the rigid subparts of the molecules. It does it by efficiently detecting maximal congruent rigid fragments in both molecules and calculating their optimal arrangement which does not violate the protein sequence order. The FlexMol algorithm is sequence order independent, yet requires as input the hypothesized hinge positions. Due its sequence order independence it can also be applied to protein-protein interface matching and drug molecule alignment. It aligns the rigid parts of the molecule using the Geometric Hashing method and calculates optimal connectivity among these parts by graph-theoretic techniques. Both algorithms are highly efficient even compared with rigid structure alignment algorithms. Typical running times on a standard desktop PC (400 MHz) are about 7 seconds for FlexProt and about 1 minute for FlexMol.

  19. The CMS Silicon Tracker Alignment

    CERN Document Server

    Castello, R

    2008-01-01

    The alignment of the Strip and Pixel Tracker of the Compact Muon Solenoid experiment, with its large number of independent silicon sensors and its excellent spatial resolution, is a complex and challenging task. Besides high precision mounting, survey measurements and the Laser Alignment System, track-based alignment is needed to reach the envisaged precision.\\\\ Three different algorithms for track-based alignment were successfully tested on a sample of cosmic-ray data collected at the Tracker Integration Facility, where 15\\% of the Tracker was tested. These results, together with those coming from the CMS global run, will provide the basis for the full-scale alignment of the Tracker, which will be carried out with the first \\emph{p-p} collisions.

  20. Interference Alignment for Secrecy

    CERN Document Server

    Koyluoglu, Onur Ozan; Lai, Lifeng; Poor, H Vincent

    2008-01-01

    This paper studies the frequency/time selective $K$-user Gaussian interference channel with secrecy constraints. Two distinct models, namely the interference channel with confidential messages and the one with an external eavesdropper, are analyzed. The key difference between the two models is the lack of channel state information (CSI) about the external eavesdropper. Using interference alignment along with secrecy pre-coding, it is shown that each user can achieve non-zero secure Degrees of Freedom (DoF) for both cases. More precisely, the proposed coding scheme achieves $\\frac{K-2}{2K-2}$ secure DoF {\\em with probability one} per user in the confidential messages model. For the external eavesdropper scenario, on the other hand, it is shown that each user can achieve $\\frac{K-2}{2K}$ secure DoF {\\em in the ergodic setting}. Remarkably, these results establish the {\\em positive impact} of interference on the secrecy capacity region of wireless networks.

  1. Space Mirror Alignment System

    Science.gov (United States)

    Jau, Bruno M.; McKinney, Colin; Smythe, Robert F.; Palmer, Dean L.

    2011-01-01

    An optical alignment mirror mechanism (AMM) has been developed with angular positioning accuracy of +/-0.2 arcsec. This requires the mirror s linear positioning actuators to have positioning resolutions of +/-112 nm to enable the mirror to meet the angular tip/tilt accuracy requirement. Demonstrated capabilities are 0.1 arc-sec angular mirror positioning accuracy, which translates into linear positioning resolutions at the actuator of 50 nm. The mechanism consists of a structure with sets of cross-directional flexures that enable the mirror s tip and tilt motion, a mirror with its kinematic mount, and two linear actuators. An actuator comprises a brushless DC motor, a linear ball screw, and a piezoelectric brake that holds the mirror s position while the unit is unpowered. An interferometric linear position sensor senses the actuator s position. The AMMs were developed for an Astrometric Beam Combiner (ABC) optical bench, which is part of an interferometer development. Custom electronics were also developed to accommodate the presence of multiple AMMs within the ABC and provide a compact, all-in-one solution to power and control the AMMs.

  2. Downlink Interference Alignment

    CERN Document Server

    Suh, Changho; Tse, David

    2010-01-01

    We develop an interference alignment (IA) technique for a downlink cellular system. In the uplink, IA schemes need channel-state-information exchange across base-stations of different cells, but our downlink IA technique requires feedback only within a cell. As a result, the proposed scheme can be implemented with a few changes to an existing cellular system where the feedback mechanism (within a cell) is already being considered for supporting multi-user MIMO. Not only is our proposed scheme implementable with little effort, it can in fact provide substantial gain especially when interference from a dominant interferer (base-station) is significantly stronger than the remaining interference: it is shown that in the two-isolated cell layout, our scheme provides four-fold gain in throughput performance over a standard multi-user MIMO technique. We show through simulations that our technique provides respectable gain under more realistic scenarios: it gives approximately 55% and 20% gain for a linear cell layou...

  3. Alignment-Annotator web server: rendering and annotating sequence alignments.

    Science.gov (United States)

    Gille, Christoph; Fähling, Michael; Weyand, Birgit; Wieland, Thomas; Gille, Andreas

    2014-07-01

    Alignment-Annotator is a novel web service designed to generate interactive views of annotated nucleotide and amino acid sequence alignments (i) de novo and (ii) embedded in other software. All computations are performed at server side. Interactivity is implemented in HTML5, a language native to web browsers. The alignment is initially displayed using default settings and can be modified with the graphical user interfaces. For example, individual sequences can be reordered or deleted using drag and drop, amino acid color code schemes can be applied and annotations can be added. Annotations can be made manually or imported (BioDAS servers, the UniProt, the Catalytic Site Atlas and the PDB). Some edits take immediate effect while others require server interaction and may take a few seconds to execute. The final alignment document can be downloaded as a zip-archive containing the HTML files. Because of the use of HTML the resulting interactive alignment can be viewed on any platform including Windows, Mac OS X, Linux, Android and iOS in any standard web browser. Importantly, no plugins nor Java are required and therefore Alignment-Anotator represents the first interactive browser-based alignment visualization. http://www.bioinformatics.org/strap/aa/ and http://strap.charite.de/aa/. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  4. The measurement of upper body alignment during the golf drive.

    Science.gov (United States)

    Wheat, J S; Vernon, T; Milner, C E

    2007-05-01

    Transverse plane rotations of the upper body are often estimated during the golf swing. The aim of this study was to determine the agreement between upper body alignments measured using markers attached to the thorax and markers on the acromion process during the golf drive. Three-dimensional coordinate data from nine markers were collected (300 Hz) during eight golf drives for 10 participants. The transverse plane alignment of the upper body was calculated using three techniques: inter-acromion vector, thorax vector, and Cardan angles. Agreement between the methods was then assessed using intra-class correlation and 95% limits of agreement. Our results suggested that the thorax vector can be used to provide an accurate estimation of thorax alignment at all stages of the golf swing (R > or = 0.97, systematic difference < 1.0 degrees , random difference < 3.8 degrees ). The inter-acromion vector gave an accurate estimation of thorax alignment at address (R = 0.90, systematic difference = 0.0 degrees , random difference = 4.3 degrees ) but it should not be used to estimate thorax alignment at the top of the backswing (R = 0.32, systematic difference = -16.0 degrees , random difference = 8.7 degrees ) or impact (R = 0.90, systematic difference = -5.1 degrees , random difference = 8.3 degrees ) during the golf drive.

  5. Sigma: multiple alignment of weakly-conserved non-coding DNA sequence

    Directory of Open Access Journals (Sweden)

    Siddharthan Rahul

    2006-03-01

    Full Text Available Abstract Background Existing tools for multiple-sequence alignment focus on aligning protein sequence or protein-coding DNA sequence, and are often based on extensions to Needleman-Wunsch-like pairwise alignment methods. We introduce a new tool, Sigma, with a new algorithm and scoring scheme designed specifically for non-coding DNA sequence. This problem acquires importance with the increasing number of published sequences of closely-related species. In particular, studies of gene regulation seek to take advantage of comparative genomics, and recent algorithms for finding regulatory sites in phylogenetically-related intergenic sequence require alignment as a preprocessing step. Much can also be learned about evolution from intergenic DNA, which tends to evolve faster than coding DNA. Sigma uses a strategy of seeking the best possible gapless local alignments (a strategy earlier used by DiAlign, at each step making the best possible alignment consistent with existing alignments, and scores the significance of the alignment based on the lengths of the aligned fragments and a background model which may be supplied or estimated from an auxiliary file of intergenic DNA. Results Comparative tests of sigma with five earlier algorithms on synthetic data generated to mimic real data show excellent performance, with Sigma balancing high "sensitivity" (more bases aligned with effective filtering of "incorrect" alignments. With real data, while "correctness" can't be directly quantified for the alignment, running the PhyloGibbs motif finder on pre-aligned sequence suggests that Sigma's alignments are superior. Conclusion By taking into account the peculiarities of non-coding DNA, Sigma fills a gap in the toolbox of bioinformatics.

  6. Accurate pose estimation for forensic identification

    Science.gov (United States)

    Merckx, Gert; Hermans, Jeroen; Vandermeulen, Dirk

    2010-04-01

    In forensic authentication, one aims to identify the perpetrator among a series of suspects or distractors. A fundamental problem in any recognition system that aims for identification of subjects in a natural scene is the lack of constrains on viewing and imaging conditions. In forensic applications, identification proves even more challenging, since most surveillance footage is of abysmal quality. In this context, robust methods for pose estimation are paramount. In this paper we will therefore present a new pose estimation strategy for very low quality footage. Our approach uses 3D-2D registration of a textured 3D face model with the surveillance image to obtain accurate far field pose alignment. Starting from an inaccurate initial estimate, the technique uses novel similarity measures based on the monogenic signal to guide a pose optimization process. We will illustrate the descriptive strength of the introduced similarity measures by using them directly as a recognition metric. Through validation, using both real and synthetic surveillance footage, our pose estimation method is shown to be accurate, and robust to lighting changes and image degradation.

  7. Refining borders of genome-rearrangements including repetitions

    Directory of Open Access Journals (Sweden)

    JA Arjona-Medina

    2016-10-01

    Full Text Available Abstract Background DNA rearrangement events have been widely studied in comparative genomic for many years. The importance of these events resides not only in the study about relatedness among different species, but also to determine the mechanisms behind evolution. Although there are many methods to identify genome-rearrangements (GR, the refinement of their borders has become a huge challenge. Until now no accepted method exists to achieve accurate fine-tuning: i.e. the notion of breakpoint (BP is still an open issue, and despite repeated regions are vital to understand evolution they are not taken into account in most of the GR detection and refinement methods. Methods and results We propose a method to refine the borders of GR including repeated regions. Instead of removing these repetitions to facilitate computation, we take advantage of them using a consensus alignment sequence of the repeated region in between two blocks. Using the concept of identity vectors for Synteny Blocks (SB and repetitions, a Finite State Machine is designed to detect transition points in the difference between such vectors. The method does not force the BP to be a region or a point but depends on the alignment transitions within the SBs and repetitions. Conclusion The accurate definition of the borders of SB and repeated genomic regions and consequently the detection of BP might help to understand the evolutionary model of species. In this manuscript we present a new proposal for such a refinement. Features of the SBs borders and BPs are different and fit with what is expected. SBs with more diversity in annotations and BPs short and richer in DNA replication and stress response, which are strongly linked with rearrangements.

  8. Orbit Alignment in Triple Stars

    Science.gov (United States)

    Tokovinin, Andrei

    2017-08-01

    The statistics of the angle Φ between orbital angular momenta in hierarchical triple systems with known inner visual or astrometric orbits are studied. A correlation between apparent revolution directions proves the partial orbit alignment known from earlier works. The alignment is strong in triples with outer projected separation less than ∼50 au, where the average Φ is about 20^\\circ . In contrast, outer orbits wider than 1000 au are not aligned with the inner orbits. It is established that the orbit alignment decreases with the increasing mass of the primary component. The average eccentricity of inner orbits in well-aligned triples is smaller than in randomly aligned ones. These findings highlight the role of dissipative interactions with gas in defining the orbital architecture of low-mass triple systems. On the other hand, chaotic dynamics apparently played a role in shaping more massive hierarchies. The analysis of projected configurations and triples with known inner and outer orbits indicates that the distribution of Φ is likely bimodal, where 80% of triples have {{Φ }}< 70^\\circ and the remaining ones are randomly aligned.

  9. Aligning for Innovation - Alignment Strategy to Drive Innovation

    Science.gov (United States)

    Johnson, Hurel; Teltschik, David; Bussey, Horace, Jr.; Moy, James

    2010-01-01

    With the sudden need for innovation that will help the country achieve its long-term space exploration objectives, the question of whether NASA is aligned effectively to drive the innovation that it so desperately needs to take space exploration to the next level should be entertained. Authors such as Robert Kaplan and David North have noted that companies that use a formal system for implementing strategy consistently outperform their peers. They have outlined a six-stage management systems model for implementing strategy, which includes the aligning of the organization towards its objectives. This involves the alignment of the organization from the top down. This presentation will explore the impacts of existing U.S. industrial policy on technological innovation; assess the current NASA organizational alignment and its impacts on driving technological innovation; and finally suggest an alternative approach that may drive the innovation needed to take the world to the next level of space exploration, with NASA truly leading the way.

  10. Aligning for Innovation - Alignment Strategy to Drive Innovation

    Science.gov (United States)

    Johnson, Hurel; Teltschik, David; Bussey, Horace, Jr.; Moy, James

    2010-01-01

    With the sudden need for innovation that will help the country achieve its long-term space exploration objectives, the question of whether NASA is aligned effectively to drive the innovation that it so desperately needs to take space exploration to the next level should be entertained. Authors such as Robert Kaplan and David North have noted that companies that use a formal system for implementing strategy consistently outperform their peers. They have outlined a six-stage management systems model for implementing strategy, which includes the aligning of the organization towards its objectives. This involves the alignment of the organization from the top down. This presentation will explore the impacts of existing U.S. industrial policy on technological innovation; assess the current NASA organizational alignment and its impacts on driving technological innovation; and finally suggest an alternative approach that may drive the innovation needed to take the world to the next level of space exploration, with NASA truly leading the way.

  11. Validation of the CLIC alignment strategy on short range

    CERN Document Server

    Mainaud Durand, H; Griffet, S; Kemppinen, J; Rude, V; Sosin, M

    2012-01-01

    The pre-alignment of CLIC consists of aligning the components of linacs and beam delivery systems (BDS) in the most accurate possible way, so that a first pilot beam can circulate and allow the implementation of the beam based alignment. Taking into account the precision and accuracy needed: 10 µm rms over sliding windows of 200m, this pre-alignment must be active and it can be divided into two parts: the determination of a straight reference over 20 km, thanks to a metrological network and the determination of the component positions with respect to this reference, and their adjustment. The second part is the object of the paper, describing the steps of the proposed strategy: firstly the fiducialisation of the different components of CLIC; secondly, the alignment of these components on common supports and thirdly the active alignment of these supports using sensors and actuators. These steps have been validated on a test setup over a length of 4m, and the obtained results are analysed.

  12. Static rearfoot alignment: a comparison of clinical and radiographic measures.

    Science.gov (United States)

    Lamm, Bradley M; Mendicino, Robert W; Catanzariti, Alan R; Hillstrom, Howard J

    2005-01-01

    Foot structure is typically evaluated using static clinical and radiographic measures. To date, the literature is devoid of a correlation between rearfoot frontal plane radiographic parameters and clinical measures of alignment. In a repeated-measures study comparing radiographic and clinical rearfoot alignment in 24 healthy subjects, radiographic angular measurements were made from standard weightbearing anteroposterior, lateral, long leg calcaneal axial, and rearfoot alignment views. Clinical measurements were made using a jig and scanner to assess the malleolar valgus index and a goniometer to evaluate the resting and neutral calcaneal stance positions. There was a significant correlation between frontal plane radiographic angles (long leg calcaneal axial and rearfoot alignment views) (r = 0.814). Similarly, there was a significant correlation between clinical measures (resting calcaneal stance position and malleolar valgus index) (r = 0.714). A multivariate stepwise regression showed that resting calcaneal stance position can be accurately predicted from 3 of the 15 clinical and radiographic measurements collected: malleolar valgus index, rearfoot alignment view, and long leg calcaneal axial view (r = 0.829). In summary, a commonly used clinical measure of static rearfoot alignment, resting calcaneal stance position, was correlated closely with the malleolar valgus index and both frontal plane radiographic parameters.

  13. CMS Muon Alignment: System Description and first results

    CERN Document Server

    Sobron, M

    2008-01-01

    The CMS detector has been instrumented with a precise and complex opto-mechanical alignment subsystem that provides a common reference frame between Tracker and Muon detection systems by means of a net of laser beams. The system allows a continuous and accurate monitoring of the muon chambers positions with respect to the Tracker body. Preliminary results of operation during the test of the CMS 4T solenoid magnet, performed in 2006, are presented. These measurements complement the information provided by the use of survey techniques and the results of alignment algorithms based on muon tracks crossing the detector.

  14. CMASA: an accurate algorithm for detecting local protein structural similarity and its application to enzyme catalytic site annotation

    Directory of Open Access Journals (Sweden)

    Li Gong-Hua

    2010-08-01

    Full Text Available Abstract Background The rapid development of structural genomics has resulted in many "unknown function" proteins being deposited in Protein Data Bank (PDB, thus, the functional prediction of these proteins has become a challenge for structural bioinformatics. Several sequence-based and structure-based methods have been developed to predict protein function, but these methods need to be improved further, such as, enhancing the accuracy, sensitivity, and the computational speed. Here, an accurate algorithm, the CMASA (Contact MAtrix based local Structural Alignment algorithm, has been developed to predict unknown functions of proteins based on the local protein structural similarity. This algorithm has been evaluated by building a test set including 164 enzyme families, and also been compared to other methods. Results The evaluation of CMASA shows that the CMASA is highly accurate (0.96, sensitive (0.86, and fast enough to be used in the large-scale functional annotation. Comparing to both sequence-based and global structure-based methods, not only the CMASA can find remote homologous proteins, but also can find the active site convergence. Comparing to other local structure comparison-based methods, the CMASA can obtain the better performance than both FFF (a method using geometry to predict protein function and SPASM (a local structure alignment method; and the CMASA is more sensitive than PINTS and is more accurate than JESS (both are local structure alignment methods. The CMASA was applied to annotate the enzyme catalytic sites of the non-redundant PDB, and at least 166 putative catalytic sites have been suggested, these sites can not be observed by the Catalytic Site Atlas (CSA. Conclusions The CMASA is an accurate algorithm for detecting local protein structural similarity, and it holds several advantages in predicting enzyme active sites. The CMASA can be used in large-scale enzyme active site annotation. The CMASA can be available by the

  15. Structural alignment of RNA with triple helix structure.

    Science.gov (United States)

    Wong, Thomas K F; Yiu, S M

    2012-04-01

    Structural alignment is useful in identifying members of ncRNAs. Existing tools are all based on the secondary structures of the molecules. There is evidence showing that tertiary interactions (the interaction between a single-stranded nucleotide and a base-pair) in triple helix structures are critical in some functions of ncRNAs. In this article, we address the problem of structural alignment of RNAs with the triple helix. We provide a formal definition to capture a simplified model of a triple helix structure, then develop an algorithm of O(mn(3)) time to align a query sequence (of length m) with known triple helix structure with a target sequence (of length n) with an unknown structure. The resulting algorithm is shown to be useful in identifying ncRNA members in a simulated genome.

  16. Optimization of Substitution Matrix for Sequence Alignment of Major Capsid Proteins of Human Herpes Simplex Virus

    Directory of Open Access Journals (Sweden)

    Vipan Kumar Sohpal

    2011-12-01

    Full Text Available Protein sequence alignment has become an informative tool in modern molecular biology research. A number of substitution matrices have been readily available for sequence alignments, but it is challenging task to compute optimal matrices for alignment accuracy. Here, we used the parameter optimization procedure to select the optimal Q of substitution matrices for major viral capsid protein of human herpes simplex virus. Results predict that Blosum matrix is most accurate on alignment benchmarks, and Blosum 60 provides the optimal Q in all substitution matrices. PAM 200 matrices results slightly below than Blosum 60, while VTML matrices are intermediate of PAM and VT matrices under dynamic programming.

  17. Magnetic axis alignment and the Poisson alignment reference system

    Science.gov (United States)

    Griffith, Lee V.; Schenz, Richard F.; Sommargren, Gary E.

    1989-01-01

    Three distinct metrological operations are necessary to align a free-electron laser (FEL): the magnetic axis must be located, a straight line reference (SLR) must be generated, and the magnetic axis must be related to the SLR. This paper begins with a review of the motivation for developing an alignment system that will assure better than 100 micrometer accuracy in the alignment of the magnetic axis throughout an FEL. The paper describes techniques for identifying the magnetic axis of solenoids, quadrupoles, and wiggler poles. Propagation of a laser beam is described to the extent of revealing sources of nonlinearity in the beam. Development and use of the Poisson line, a diffraction effect, is described in detail. Spheres in a large-diameter laser beam create Poisson lines and thus provide a necessary mechanism for gauging between the magnetic axis and the SLR. Procedures for installing FEL components and calibrating alignment fiducials to the magnetic axes of the components are also described. An error budget shows that the Poisson alignment reference system will make it possible to meet the alignment tolerances for an FEL.

  18. DNA Sequence Alignment during Homologous Recombination.

    Science.gov (United States)

    Greene, Eric C

    2016-05-27

    Homologous recombination allows for the regulated exchange of genetic information between two different DNA molecules of identical or nearly identical sequence composition, and is a major pathway for the repair of double-stranded DNA breaks. A key facet of homologous recombination is the ability of recombination proteins to perfectly align the damaged DNA with homologous sequence located elsewhere in the genome. This reaction is referred to as the homology search and is akin to the target searches conducted by many different DNA-binding proteins. Here I briefly highlight early investigations into the homology search mechanism, and then describe more recent research. Based on these studies, I summarize a model that includes a combination of intersegmental transfer, short-distance one-dimensional sliding, and length-specific microhomology recognition to efficiently align DNA sequences during the homology search. I also suggest some future directions to help further our understanding of the homology search. Where appropriate, I direct the reader to other recent reviews describing various issues related to homologous recombination.

  19. DNA Sequence Alignment during Homologous Recombination*

    Science.gov (United States)

    Greene, Eric C.

    2016-01-01

    Homologous recombination allows for the regulated exchange of genetic information between two different DNA molecules of identical or nearly identical sequence composition, and is a major pathway for the repair of double-stranded DNA breaks. A key facet of homologous recombination is the ability of recombination proteins to perfectly align the damaged DNA with homologous sequence located elsewhere in the genome. This reaction is referred to as the homology search and is akin to the target searches conducted by many different DNA-binding proteins. Here I briefly highlight early investigations into the homology search mechanism, and then describe more recent research. Based on these studies, I summarize a model that includes a combination of intersegmental transfer, short-distance one-dimensional sliding, and length-specific microhomology recognition to efficiently align DNA sequences during the homology search. I also suggest some future directions to help further our understanding of the homology search. Where appropriate, I direct the reader to other recent reviews describing various issues related to homologous recombination. PMID:27129270

  20. RF Jitter Modulation Alignment Sensing

    Science.gov (United States)

    Ortega, L. F.; Fulda, P.; Diaz-Ortiz, M.; Perez Sanchez, G.; Ciani, G.; Voss, D.; Mueller, G.; Tanner, D. B.

    2017-01-01

    We will present the numerical and experimental results of a new alignment sensing scheme which can reduce the complexity of alignment sensing systems currently used, while maintaining the same shot noise limited sensitivity. This scheme relies on the ability of electro-optic beam deflectors to create angular modulation sidebands in radio frequency, and needs only a single-element photodiode and IQ demodulation to generate error signals for tilt and translation degrees of freedom in one dimension. It distances itself from current techniques by eliminating the need for beam centering servo systems, quadrant photodetectors and Gouy phase telescopes. RF Jitter alignment sensing can be used to reduce the complexity in the alignment systems of many laser optical experiments, including LIGO and the ALPS experiment.

  1. Calibration of shaft alignment instruments

    Science.gov (United States)

    Hemming, Bjorn

    1998-09-01

    Correct shaft alignment is vital for most rotating machines. Several shaft alignment instruments, ranging form dial indicator based to laser based, are commercially available. At VTT Manufacturing Technology a device for calibration of shaft alignment instruments was developed during 1997. A feature of the developed device is the similarity to the typical use of shaft alignment instruments i.e. the rotation of two shafts during the calibration. The benefit of the rotation is that all errors of the shaft alignment instrument, for example the deformations of the suspension bars, are included. However, the rotation increases significantly the uncertainty of calibration because of errors in the suspension of the shafts in the developed device for calibration of shaft alignment instruments. Without rotation the uncertainty of calibration is 0.001 mm for the parallel offset scale and 0,003 mm/m for the angular scale. With rotation the uncertainty of calibration is 0.002 mm for the scale and 0.004 mm/m for the angular scale.

  2. NNLOPS accurate associated HW production

    CERN Document Server

    Astill, William; Re, Emanuele; Zanderighi, Giulia

    2016-01-01

    We present a next-to-next-to-leading order accurate description of associated HW production consistently matched to a parton shower. The method is based on reweighting events obtained with the HW plus one jet NLO accurate calculation implemented in POWHEG, extended with the MiNLO procedure, to reproduce NNLO accurate Born distributions. Since the Born kinematics is more complex than the cases treated before, we use a parametrization of the Collins-Soper angles to reduce the number of variables required for the reweighting. We present phenomenological results at 13 TeV, with cuts suggested by the Higgs Cross Section Working Group.

  3. deepBlockAlign: a tool for aligning RNA-seq profiles of read block patterns

    Science.gov (United States)

    Ekstrøm, Claus T.; Stadler, Peter F.; Hoffmann, Steve; Gorodkin, Jan

    2012-01-01

    Motivation: High-throughput sequencing methods allow whole transcriptomes to be sequenced fast and cost-effectively. Short RNA sequencing provides not only quantitative expression data but also an opportunity to identify novel coding and non-coding RNAs. Many long transcripts undergo post-transcriptional processing that generates short RNA sequence fragments. Mapped back to a reference genome, they form distinctive patterns that convey information on both the structure of the parent transcript and the modalities of its processing. The miR-miR* pattern from microRNA precursors is the best-known, but by no means singular, example. Results: deepBlockAlign introduces a two-step approach to align RNA-seq read patterns with the aim of quickly identifying RNAs that share similar processing footprints. Overlapping mapped reads are first merged to blocks and then closely spaced blocks are combined to block groups, each representing a locus of expression. In order to compare block groups, the constituent blocks are first compared using a modified sequence alignment algorithm to determine similarity scores for pairs of blocks. In the second stage, block patterns are compared by means of a modified Sankoff algorithm that takes both block similarities and similarities of pattern of distances within the block groups into account. Hierarchical clustering of block groups clearly separates most miRNA and tRNA, and also identifies about a dozen tRNAs clustering together with miRNA. Most of these putative Dicer-processed tRNAs, including eight cases reported to generate products with miRNA-like features in literature, exhibit read blocks distinguished by precise start position of reads. Availability: The program deepBlockAlign is available as source code from http://rth.dk/resources/dba/. Contact: gorodkin@rth.dk; studla@bioinf.uni-leipzig.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:22053076

  4. PROMALS3D: multiple protein sequence alignment enhanced with evolutionary and three-dimensional structural information.

    Science.gov (United States)

    Pei, Jimin; Grishin, Nick V

    2014-01-01

    Multiple sequence alignment (MSA) is an essential tool with many applications in bioinformatics and computational biology. Accurate MSA construction for divergent proteins remains a difficult computational task. The constantly increasing protein sequences and structures in public databases could be used to improve alignment quality. PROMALS3D is a tool for protein MSA construction enhanced with additional evolutionary and structural information from database searches. PROMALS3D automatically identifies homologs from sequence and structure databases for input proteins, derives structure-based constraints from alignments of three-dimensional structures, and combines them with sequence-based constraints of profile-profile alignments in a consistency-based framework to construct high-quality multiple sequence alignments. PROMALS3D output is a consensus alignment enriched with sequence and structural information about input proteins and their homologs. PROMALS3D Web server and package are available at http://prodata.swmed.edu/PROMALS3D.

  5. Separating weak lensing and intrinsic alignments using radio observations

    CERN Document Server

    Whittaker, Lee; Battye, Richard A

    2015-01-01

    We discuss methods for performing weak lensing using radio observations to recover information about the intrinsic structural properties of the source galaxies. Radio surveys provide unique information that can benefit weak lensing studies, such as HI emission, which may be used to construct galaxy velocity maps, and polarized synchrotron radiation; both of which provide information about the unlensed galaxy and can be used to reduce galaxy shape noise and the contribution of intrinsic alignments. Using a proxy for the intrinsic position angle of an observed galaxy, we develop techniques for cleanly separating weak gravitational lensing signals from intrinsic alignment contamination in forthcoming radio surveys. Random errors on the intrinsic orientation estimates introduce biases into the shear and intrinsic alignment estimates. However, we show that these biases can be corrected for if the error distribution is accurately known. We demonstrate our methods using simulations, where we reconstruct the shear an...

  6. Alignment of the ATLAS Inner Detector

    CERN Document Server

    Marti-Garcia, Salvador; The ATLAS collaboration

    2016-01-01

    The Run-2 of the LHC has presented new challenges to track and vertex reconstruction with higher energies, denser jets and higher rates. In addition, the Insertable B-layer (IBL) is a fourth pixel layer, which has been deployed at the centre of ATLAS during the longshutdown-1 of the LHC. The physics performance of the experiment requires a high resolution and unbiased measurement of all charged particle kinematic parameters. In its turn, the performance of the tracking depends, among many other issues, on the accurate determination of the alignment parameters of the tracking sensors. The offline track based alignment of the ATLAS tracking system has to deal with more than 700,000 degrees of freedom (DoF). This represents a considerable numerical challenge in terms of both CPU time and precision. During Run-2, a mechanical distortion of the IBL staves up to 20um has been observed during data-taking, plus other short time scale movements. The talk will also describe the procedures implemented to detect and remo...

  7. Optical alignment of Centaur's inertial guidance system

    Science.gov (United States)

    Gordan, Andrew L.

    1987-01-01

    During Centaur launch operations the launch azimuth of the inertial platform's U-accelerometer input axis must be accurately established and maintained. This is accomplished by using an optically closed loop system with a long-range autotheodolite whose line of sight was established by a first-order survey. A collimated light beam from the autotheodolite intercepts a reflecting Porro prism mounted on the platform azimuth gimbal. Thus, any deviation of the Porro prism from its predetermined heading is optically detected by the autotheodolite. The error signal produced is used to torque the azimuth gimbal back to its required launch azimuth. The heading of the U-accelerometer input axis is therefore maintained automatically. Previously, the autotheodolite system could not distinguish between vehicle sway and rotational motion of the inertial platform unless at least three prisms were used. One prism was mounted on the inertial platform to maintain azimuth alignment, and two prisms were mounted externally on the vehicle to track sway. For example, the automatic azimuth-laying theodolite (AALT-SV-M2) on the Saturn vehilce used three prisms. The results of testing and modifying the AALT-SV-M2 autotheodolite to simultaneously monitor and maintain alignment of the inertial platform and track the sway of the vehicle from a single Porro prism.

  8. Anatomically Plausible Surface Alignment and Reconstruction

    DEFF Research Database (Denmark)

    Paulsen, Rasmus R.; Larsen, Rasmus

    2010-01-01

    With the increasing clinical use of 3D surface scanners, there is a need for accurate and reliable algorithms that can produce anatomically plausible surfaces. In this paper, a combined method for surface alignment and reconstruction is proposed. It is based on an implicit surface representation...... combined with a Markov Random Field regularisation method. Conceptually, the method maintains an implicit ideal description of the sought surface. This implicit surface is iteratively updated by realigning the input point sets and Markov Random Field regularisation. The regularisation is based on a prior...... energy that has earlier proved to be particularly well suited for human surface scans. The method has been tested on full cranial scans of ten test subjects and on several scans of the outer human ear....

  9. Image denoising using local tangent space alignment

    Science.gov (United States)

    Feng, JianZhou; Song, Li; Huo, Xiaoming; Yang, XiaoKang; Zhang, Wenjun

    2010-07-01

    We propose a novel image denoising approach, which is based on exploring an underlying (nonlinear) lowdimensional manifold. Using local tangent space alignment (LTSA), we 'learn' such a manifold, which approximates the image content effectively. The denoising is performed by minimizing a newly defined objective function, which is a sum of two terms: (a) the difference between the noisy image and the denoised image, (b) the distance from the image patch to the manifold. We extend the LTSA method from manifold learning to denoising. We introduce the local dimension concept that leads to adaptivity to different kind of image patches, e.g. flat patches having lower dimension. We also plug in a basic denoising stage to estimate the local coordinate more accurately. It is found that the proposed method is competitive: its performance surpasses the K-SVD denoising method.

  10. Sensing Characteristics of A Precision Aligner Using Moire Gratings for Precision Alignment System

    Institute of Scientific and Technical Information of China (English)

    ZHOU Lizhong; Hideo Furuhashi; Yoshiyuki Uchida

    2001-01-01

    Sensing characteristics of a precision aligner using moire gratings for precision alignment sysem has been investigated. A differential moire alignment system and a modified alignment system were used. The influence of the setting accuracy of the gap length and inclination of gratings on the alignment accuracy has been studied experimentally and theoretically. Setting accuracy of the gap length less than 2.5μm is required in modified moire alignment. There is no influence of the gap length on the alignment accuracy in the differential alignment system. The inclination affects alignment accuracies in both differential and modified moire alignment systems.

  11. The Drosophila genome nexus: a population genomic resource of 623 Drosophila melanogaster genomes, including 197 from a single ancestral range population.

    Science.gov (United States)

    Lack, Justin B; Cardeno, Charis M; Crepeau, Marc W; Taylor, William; Corbett-Detig, Russell B; Stevens, Kristian A; Langley, Charles H; Pool, John E

    2015-04-01

    Hundreds of wild-derived Drosophila melanogaster genomes have been published, but rigorous comparisons across data sets are precluded by differences in alignment methodology. The most common approach to reference-based genome assembly is a single round of alignment followed by quality filtering and variant detection. We evaluated variations and extensions of this approach and settled on an assembly strategy that utilizes two alignment programs and incorporates both substitutions and short indels to construct an updated reference for a second round of mapping prior to final variant detection. Utilizing this approach, we reassembled published D. melanogaster population genomic data sets and added unpublished genomes from several sub-Saharan populations. Most notably, we present aligned data from phase 3 of the Drosophila Population Genomics Project (DPGP3), which provides 197 genomes from a single ancestral range population of D. melanogaster (from Zambia). The large sample size, high genetic diversity, and potentially simpler demographic history of the DPGP3 sample will make this a highly valuable resource for fundamental population genetic research. The complete set of assemblies described here, termed the Drosophila Genome Nexus, presently comprises 623 consistently aligned genomes and is publicly available in multiple formats with supporting documentation and bioinformatic tools. This resource will greatly facilitate population genomic analysis in this model species by reducing the methodological differences between data sets. Copyright © 2015 by the Genetics Society of America.

  12. Software for computing and annotating genomic ranges.

    Directory of Open Access Journals (Sweden)

    Michael Lawrence

    Full Text Available We describe Bioconductor infrastructure for representing and computing on annotated genomic ranges and integrating genomic data with the statistical computing features of R and its extensions. At the core of the infrastructure are three packages: IRanges, GenomicRanges, and GenomicFeatures. These packages provide scalable data structures for representing annotated ranges on the genome, with special support for transcript structures, read alignments and coverage vectors. Computational facilities include efficient algorithms for overlap and nearest neighbor detection, coverage calculation and other range operations. This infrastructure directly supports more than 80 other Bioconductor packages, including those for sequence analysis, differential expression analysis and visualization.

  13. A comprehensive evaluation of alignment algorithms in the context of RNA-seq.

    Science.gov (United States)

    Lindner, Robert; Friedel, Caroline C

    2012-01-01

    Transcriptome sequencing (RNA-Seq) overcomes limitations of previously used RNA quantification methods and provides one experimental framework for both high-throughput characterization and quantification of transcripts at the nucleotide level. The first step and a major challenge in the analysis of such experiments is the mapping of sequencing reads to a transcriptomic origin including the identification of splicing events. In recent years, a large number of such mapping algorithms have been developed, all of which have in common that they require algorithms for aligning a vast number of reads to genomic or transcriptomic sequences. Although the FM-index based aligner Bowtie has become a de facto standard within mapping pipelines, a much larger number of possible alignment algorithms have been developed also including other variants of FM-index based aligners. Accordingly, developers and users of RNA-seq mapping pipelines have the choice among a large number of available alignment algorithms. To provide guidance in the choice of alignment algorithms for these purposes, we evaluated the performance of 14 widely used alignment programs from three different algorithmic classes: algorithms using either hashing of the reference transcriptome, hashing of reads, or a compressed FM-index representation of the genome. Here, special emphasis was placed on both precision and recall and the performance for different read lengths and numbers of mismatches and indels in a read. Our results clearly showed the significant reduction in memory footprint and runtime provided by FM-index based aligners at a precision and recall comparable to the best hash table based aligners. Furthermore, the recently developed Bowtie 2 alignment algorithm shows a remarkable tolerance to both sequencing errors and indels, thus, essentially making hash-based aligners obsolete.

  14. A comprehensive evaluation of alignment algorithms in the context of RNA-seq.

    Directory of Open Access Journals (Sweden)

    Robert Lindner

    Full Text Available Transcriptome sequencing (RNA-Seq overcomes limitations of previously used RNA quantification methods and provides one experimental framework for both high-throughput characterization and quantification of transcripts at the nucleotide level. The first step and a major challenge in the analysis of such experiments is the mapping of sequencing reads to a transcriptomic origin including the identification of splicing events. In recent years, a large number of such mapping algorithms have been developed, all of which have in common that they require algorithms for aligning a vast number of reads to genomic or transcriptomic sequences. Although the FM-index based aligner Bowtie has become a de facto standard within mapping pipelines, a much larger number of possible alignment algorithms have been developed also including other variants of FM-index based aligners. Accordingly, developers and users of RNA-seq mapping pipelines have the choice among a large number of available alignment algorithms. To provide guidance in the choice of alignment algorithms for these purposes, we evaluated the performance of 14 widely used alignment programs from three different algorithmic classes: algorithms using either hashing of the reference transcriptome, hashing of reads, or a compressed FM-index representation of the genome. Here, special emphasis was placed on both precision and recall and the performance for different read lengths and numbers of mismatches and indels in a read. Our results clearly showed the significant reduction in memory footprint and runtime provided by FM-index based aligners at a precision and recall comparable to the best hash table based aligners. Furthermore, the recently developed Bowtie 2 alignment algorithm shows a remarkable tolerance to both sequencing errors and indels, thus, essentially making hash-based aligners obsolete.

  15. ECR Browser: A Tool For Visualizing And Accessing Data From Comparisons Of Multiple Vertebrate Genomes

    Energy Technology Data Exchange (ETDEWEB)

    Loots, G G; Ovcharenko, I; Stubbs, L; Nobrega, M A

    2004-01-06

    The increasing number of vertebrate genomes being sequenced in draft or finished form provide a unique opportunity to study and decode the language of DNA sequence through comparative genome alignments. However, novel tools and strategies are required to accommodate this increasing volume of genomic information and to facilitate experimental annotation of genome function. Here we present the ECR Browser, a tool that provides an easy and dynamic access to whole genome alignments of human, mouse, rat and fish sequences. This web-based tool (http://ecrbrowser.dcode.org) provides the starting point for discovery of novel genes, identification of distant gene regulatory elements and prediction of transcription factor binding sites. The genome alignment portal of the ECR Browser also permits fast and automated alignment of any user-submitted sequence to the genome of choice. The interconnection of the ECR browser with other DNA sequence analysis tools creates a unique portal for studying and exploring vertebrate genomes.

  16. FadE: whole genome methylation analysis for multiple sequencing platforms.

    Science.gov (United States)

    Souaiaia, Tade; Zhang, Zheng; Chen, Ting

    2013-01-01

    DNA methylation plays a central role in genomic regulation and disease. Sodium bisulfite treatment (SBT) causes unmethylated cytosines to be sequenced as thymine, which allows methylation levels to reflected in the number of 'C'-'C' alignments covering reference cytosines. Di-base color reads produced by lifetech's SOLiD sequencer provide unreliable results when translated to bases because single sequencing errors effect the downstream sequence. We describe FadE, an algorithm to accurately determine genome-wide methylation rates directly in color or nucleotide space. FadE uses SBT unmethylated and untreated data to determine background error rates and incorporate them into a model which uses Newton-Raphson optimization to estimate the methylation rate and provide a credible interval describing its distribution at every reference cytosine. We sequenced two slides of human fibroblast cell-line bisulfite-converted fragment library with the SOLiD sequencer to investigate genome-wide methylation levels. FadE reported widespread differences in methylation levels across CpG islands and a large number of differentially methylated regions adjacent to genes which compares favorably to the results of an investigation on the same cell-line using nucleotide-space reads at higher coverage levels, suggesting that FadE is an accurate method to estimate genome-wide methylation with color or nucleotide reads. http://code.google.com/p/fade/.

  17. Accurate frequency alignment in fabrication of high-order microring-resonator filters.

    Science.gov (United States)

    Sun, Jie; Holzwarth, Charles W; Dahlem, Marcus; Hastings, Jeffrey T; Smith, Henry I

    2008-09-29

    Frequency mismatch in high-order microring-resonator filters is investigated. We demonstrate that this frequency mismatch is caused mainly by the intrafield distortion of scanning-electron-beam-lithography (SEBL) used in fabrication. The intrafield distortion of an SEBL system is measured, and a simple method is also proposed to correct this distortion. By applying this correction method, the average frequency mismatch in second-order microring-resonator filters was reduced from -8.6 GHz to 0.28 GHz.

  18. Accurate statistics for local sequence alignment with position-dependent scoring by rare-event sampling

    Directory of Open Access Journals (Sweden)

    Rahmann Sven

    2011-02-01

    Full Text Available Abstract Background Molecular database search tools need statistical models to assess the significance for the resulting hits. In the classical approach one asks the question how probable a certain score is observed by pure chance. Asymptotic theories for such questions are available for two random i.i.d. sequences. Some effort had been made to include effects of finite sequence lengths and to account for specific compositions of the sequences. In many applications, such as a large-scale database homology search for transmembrane proteins, these models are not the most appropriate ones. Search sensitivity and specificity benefit from position-dependent scoring schemes or use of Hidden Markov Models. Additional, one may wish to go beyond the assumption that the sequences are i.i.d. Despite their practical importance, the statistical properties of these settings have not been well investigated yet. Results In this paper, we discuss an efficient and general method to compute the score distribution to any desired accuracy. The general approach may be applied to different sequence models and and various similarity measures that satisfy a few weak assumptions. We have access to the low-probability region ("tail" of the distribution where scores are larger than expected by pure chance and therefore relevant for practical applications. Our method uses recent ideas from rare-event simulations, combining Markov chain Monte Carlo simulations with importance sampling and generalized ensembles. We present results for the score statistics of fixed and random queries against random sequences. In a second step, we extend the approach to a model of transmembrane proteins, which can hardly be described as i.i.d. sequences. For this case, we compare the statistical properties of a fixed query model as well as a hidden Markov sequence model in connection with a position based scoring scheme against the classical approach. Conclusions The results illustrate that the sensitivity and specificity strongly depend on the underlying scoring and sequence model. A specific ROC analysis for the case of transmembrane proteins supports our observation.

  19. Experiments in rapid development of accurate phonetic alignments for TTS in Afrikaans

    CSIR Research Space (South Africa)

    Van Niekerk, DR

    2011-11-01

    Full Text Available are not familiar with TTS technology and its idiosyncrasies [4]. The rapid development of systems based on small speech corpora (comparable in size to [5]) of more naturally read speech raises additional considerations not only related to increased prosodic... different do- mains and/or exhibit quality problems such as code switching and large amounts of unpronounceable (by standard spelling rules) tokens. This compromises phonetic coverage and poten- tially also fluency during recording. ? Language...

  20. Photosensitive Polymers for Liquid Crystal Alignment

    Science.gov (United States)

    Mahilny, U. V.; Stankevich, A. I.; Trofimova, A. V.; Muravsky, A. A.; Murauski, A. A.

    The peculiarities of alignment of liquid crystal (LC) materials by the layers of photocrosslinkable polymers with side benzaldehyde groups are considered. The investigation of mechanism of photostimulated alignment by rubbed benzaldehyde layer is performed. The methods of creation of multidomain aligning layers on the basis of photostimulated rubbing alignment are described.

  1. Alignment of the Muon System at the CMS Experiment

    Science.gov (United States)

    Mueller, Ryan; Perniè, Luca; Pakhotin, Yuriy; Kamon, Teruki; Safonov, Alexei; Brown, Malachi

    2017-01-01

    The muon detectors of the CMS experiment provide fast trigger decisions, muon identifications and muon track measurements. Alignment of the muon detectors is crucial for accurate reconstruction of events with high pT muons that are present in signatures for many new physics scenarios. The muon detector's relative positions and orientations with respect to the inner silicon tracker may be precisely measured using reconstructed tracks propagating from the interaction point. This track-based alignment procedure is capable of aligning individual muon detectors to within 100 microns along sensitive modes. However, weak (insensitive) modes may not be well measured due to the system's design and cause systematic miss-measurements. In this report, we present a new track-based procedure which enables all 6 alignment parameters - 3 positions and 3 rotations for each individual muon detector. The improved algorithm allows for measurement of weak modes and considerably reduced related systematic uncertainties. We describe results of the alignment procedure obtained with 2016 data.

  2. Peak alignment using wavelet pattern matching and differential evolution.

    Science.gov (United States)

    Zhang, Zhi-Min; Chen, Shan; Liang, Yi-Zeng

    2011-01-30

    Retention time shifts badly impair qualitative or quantitative results of chemometric analyses when entire chromatographic data are used. Hence, chromatograms should be aligned to perform further analysis. Being inspired and motivated by this purpose, a practical and handy peak alignment method (alignDE) is proposed, implemented in this research for one-way chromatograms, which basically consists of five steps: (1) chromatogram lengths equalization using linear interpolation; (2) accurate peak pattern matching by continuous wavelet transform (CWT) with the Mexican Hat and Haar wavelets as its mother wavelets; (3) flexible baseline fitting utilizing penalized least squares; (4) peak clustering when gap of two peaks is smaller than a certain threshold; (5) peak alignment using differential evolution (DE) to maximize linear correlation coefficient between reference signal and signal to be aligned. This method is demonstrated with both simulated chromatograms and real chromatograms, for example, chromatograms of fungal extracts and Red Peony Root obtained by HPLC-DAD. It is implemented in R language and available as open source software to a broad range of chromatograph users (http://code.google.com/p/alignde).

  3. Multispectral optical telescope alignment testing for a cryogenic space environment

    Science.gov (United States)

    Newswander, Trent; Hooser, Preston; Champagne, James

    2016-09-01

    Multispectral space telescopes with visible to long wave infrared spectral bands provide difficult alignment challenges. The visible channels require precision in alignment and stability to provide good image quality in short wavelengths. This is most often accomplished by choosing materials with near zero thermal expansion glass or ceramic mirrors metered with carbon fiber reinforced polymer (CFRP) that are designed to have a matching thermal expansion. The IR channels are less sensitive to alignment but they often require cryogenic cooling for improved sensitivity with the reduced radiometric background. Finding efficient solutions to this difficult problem of maintaining good visible image quality at cryogenic temperatures has been explored with the building and testing of a telescope simulator. The telescope simulator is an onaxis ZERODUR® mirror, CFRP metered set of optics. Testing has been completed to accurately measure telescope optical element alignment and mirror figure changes in a cryogenic space simulated environment. Measured alignment error and mirror figure error test results are reported with a discussion of their impact on system optical performance.

  4. Active network alignment: a matching-based approach

    CERN Document Server

    Malmi, Eric; Gionis, Aristides

    2016-01-01

    Network alignment is the problem of matching the nodes of two graphs, maximizing the similarity of the matched nodes and the edges between them. This problem is encountered in a wide array of applications - from biological networks to social networks to ontologies - where multiple networked data sources need to be integrated. Due to the difficulty of the task, an accurate alignment can rarely be found without human assistance. Thus, it is of great practical importance to develop network alignment algorithms that can optimally leverage experts who are able to provide the correct alignment for a small number of nodes. Yet, only a handful of existing works address this active network alignment setting. The majority of the existing active methods focus on absolute queries ("are nodes $a$ and $b$ the same or not?"), whereas we argue that it is generally easier for a human expert to answer relative queries ("which node in the set $\\{b_1, \\ldots, b_n\\}$ is the most similar to node $a$?"). This paper introduces a nov...

  5. Alignment method for parabolic trough solar concentrators

    Science.gov (United States)

    Diver, Richard B [Albuquerque, NM

    2010-02-23

    A Theoretical Overlay Photographic (TOP) alignment method uses the overlay of a theoretical projected image of a perfectly aligned concentrator on a photographic image of the concentrator to align the mirror facets of a parabolic trough solar concentrator. The alignment method is practical and straightforward, and inherently aligns the mirror facets to the receiver. When integrated with clinometer measurements for which gravity and mechanical drag effects have been accounted for and which are made in a manner and location consistent with the alignment method, all of the mirrors on a common drive can be aligned and optimized for any concentrator orientation.

  6. Phylo: a citizen science approach for improving multiple sequence alignment.

    Science.gov (United States)

    Kawrykow, Alexander; Roumanis, Gary; Kam, Alfred; Kwak, Daniel; Leung, Clarence; Wu, Chu; Zarour, Eleyine; Sarmenta, Luis; Blanchette, Mathieu; Waldispühl, Jérôme

    2012-01-01

    Comparative genomics, or the study of the relationships of genome structure and function across different species, offers a powerful tool for studying evolution, annotating genomes, and understanding the causes of various genetic disorders. However, aligning multiple sequences of DNA, an essential intermediate step for most types of analyses, is a difficult computational task. In parallel, citizen science, an approach that takes advantage of the fact that the human brain is exquisitely tuned to solving specific types of problems, is becoming increasingly popular. There, instances of hard computational problems are dispatched to a crowd of non-expert human game players and solutions are sent back to a central server. We introduce Phylo, a human-based computing framework applying "crowd sourcing" techniques to solve the Multiple Sequence Alignment (MSA) problem. The key idea of Phylo is to convert the MSA problem into a casual game that can be played by ordinary web users with a minimal prior knowledge of the biological context. We applied this strategy to improve the alignment of the promoters of disease-related genes from up to 44 vertebrate species. Since the launch in November 2010, we received more than 350,000 solutions submitted from more than 12,000 registered users. Our results show that solutions submitted contributed to improving the accuracy of up to 70% of the alignment blocks considered. We demonstrate that, combined with classical algorithms, crowd computing techniques can be successfully used to help improving the accuracy of MSA. More importantly, we show that an NP-hard computational problem can be embedded in casual game that can be easily played by people without significant scientific training. This suggests that citizen science approaches can be used to exploit the billions of "human-brain peta-flops" of computation that are spent every day playing games. Phylo is available at: http://phylo.cs.mcgill.ca.

  7. Phylo: a citizen science approach for improving multiple sequence alignment.

    Directory of Open Access Journals (Sweden)

    Alexander Kawrykow

    Full Text Available BACKGROUND: Comparative genomics, or the study of the relationships of genome structure and function across different species, offers a powerful tool for studying evolution, annotating genomes, and understanding the causes of various genetic disorders. However, aligning multiple sequences of DNA, an essential intermediate step for most types of analyses, is a difficult computational task. In parallel, citizen science, an approach that takes advantage of the fact that the human brain is exquisitely tuned to solving specific types of problems, is becoming increasingly popular. There, instances of hard computational problems are dispatched to a crowd of non-expert human game players and solutions are sent back to a central server. METHODOLOGY/PRINCIPAL FINDINGS: We introduce Phylo, a human-based computing framework applying "crowd sourcing" techniques to solve the Multiple Sequence Alignment (MSA problem. The key idea of Phylo is to convert the MSA problem into a casual game that can be played by ordinary web users with a minimal prior knowledge of the biological context. We applied this strategy to improve the alignment of the promoters of disease-related genes from up to 44 vertebrate species. Since the launch in November 2010, we received more than 350,000 solutions submitted from more than 12,000 registered users. Our results show that solutions submitted contributed to improving the accuracy of up to 70% of the alignment blocks considered. CONCLUSIONS/SIGNIFICANCE: We demonstrate that, combined with classical algorithms, crowd computing techniques can be successfully used to help improving the accuracy of MSA. More importantly, we show that an NP-hard computational problem can be embedded in casual game that can be easily played by people without significant scientific training. This suggests that citizen science approaches can be used to exploit the billions of "human-brain peta-flops" of computation that are spent every day playing games

  8. Adaptive Processing for Sequence Alignment

    KAUST Repository

    Zidan, Mohammed Affan

    2012-01-26

    Disclosed are various embodiments for adaptive processing for sequence alignment. In one embodiment, among others, a method includes obtaining a query sequence and a plurality of database sequences. A first portion of the plurality of database sequences is distributed to a central processing unit (CPU) and a second portion of the plurality of database sequences is distributed to a graphical processing unit (GPU) based upon a predetermined splitting ratio associated with the plurality of database sequences, where the database sequences of the first portion are shorter than the database sequences of the second portion. A first alignment score for the query sequence is determined with the CPU based upon the first portion of the plurality of database sequences and a second alignment score for the query sequence is determined with the GPU based upon the second portion of the plurality of database sequences.

  9. Laser shaft alignment measurement model

    Science.gov (United States)

    Mo, Chang-tao; Chen, Changzheng; Hou, Xiang-lin; Zhang, Guoyu

    2007-12-01

    Laser beam's track which is on photosensitive surface of the a receiver will be closed curve, when driving shaft and the driven shaft rotate with same angular velocity and rotation direction. The coordinate of arbitrary point which is on the curve is decided by the relative position of two shafts. Basing on the viewpoint, a mathematic model of laser alignment is set up. By using a data acquisition system and a data processing model of laser alignment meter with single laser beam and a detector, and basing on the installation parameter of computer, the state parameter between two shafts can be obtained by more complicated calculation and correction. The correcting data of the four under chassis of the adjusted apparatus moving on the level and the vertical plane can be calculated. This will instruct us to move the apparatus to align the shafts.

  10. ECR Browser: a tool for visualizing and accessing data from comparisons of multiple vertebrate genomes

    OpenAIRE

    Ovcharenko, Ivan; Nobrega, Marcelo A.; Loots, Gabriela G.; Stubbs, Lisa

    2004-01-01

    With an increasing number of vertebrate genomes being sequenced in draft or finished form, unique opportunities for decoding the language of DNA sequence through comparative genome alignments have arisen. However, novel tools and strategies are required to accommodate this large volume of genomic information and to facilitate the transfer of predictions generated by comparative sequence alignment to researchers focused on experimental annotation of genome function. Here, we present the ECR Br...

  11. Genometa--a fast and accurate classifier for short metagenomic shotgun reads.

    Science.gov (United States)

    Davenport, Colin F; Neugebauer, Jens; Beckmann, Nils; Friedrich, Benedikt; Kameri, Burim; Kokott, Svea; Paetow, Malte; Siekmann, Björn; Wieding-Drewes, Matthias; Wienhöfer, Markus; Wolf, Stefan; Tümmler, Burkhard; Ahlers, Volker; Sprengel, Frauke

    2012-01-01

    Metagenomic studies use high-throughput sequence data to investigate microbial communities in situ. However, considerable challenges remain in the analysis of these data, particularly with regard to speed and reliable analysis of microbial species as opposed to higher level taxa such as phyla. We here present Genometa, a computationally undemanding graphical user interface program that enables identification of bacterial species and gene content from datasets generated by inexpensive high-throughput short read sequencing technologies. Our approach was first verified on two simulated metagenomic short read datasets, detecting 100% and 94% of the bacterial species included with few false positives or false negatives. Subsequent comparative benchmarking analysis against three popular metagenomic algorithms on an Illumina human gut dataset revealed Genometa to attribute the most reads to bacteria at species level (i.e. including all strains of that species) and demonstrate similar or better accuracy than the other programs. Lastly, speed was demonstrated to be many times that of BLAST due to the use of modern short read aligners. Our method is highly accurate if bacteria in the sample are represented by genomes in the reference sequence but cannot find species absent from the reference. This method is one of the most user-friendly and resource efficient approaches and is thus feasible for rapidly analysing millions of short reads on a personal computer. The Genometa program, a step by step tutorial and Java source code are freely available from http://genomics1.mh-hannover.de/genometa/ and on http://code.google.com/p/genometa/. This program has been tested on Ubuntu Linux and Windows XP/7.

  12. Genometa--a fast and accurate classifier for short metagenomic shotgun reads.

    Directory of Open Access Journals (Sweden)

    Colin F Davenport

    Full Text Available Metagenomic studies use high-throughput sequence data to investigate microbial communities in situ. However, considerable challenges remain in the analysis of these data, particularly with regard to speed and reliable analysis of microbial species as opposed to higher level taxa such as phyla. We here present Genometa, a computationally undemanding graphical user interface program that enables identification of bacterial species and gene content from datasets generated by inexpensive high-throughput short read sequencing technologies. Our approach was first verified on two simulated metagenomic short read datasets, detecting 100% and 94% of the bacterial species included with few false positives or false negatives. Subsequent comparative benchmarking analysis against three popular metagenomic algorithms on an Illumina human gut dataset revealed Genometa to attribute the most reads to bacteria at species level (i.e. including all strains of that species and demonstrate similar or better accuracy than the other programs. Lastly, speed was demonstrated to be many times that of BLAST due to the use of modern short read aligners. Our method is highly accurate if bacteria in the sample are represented by genomes in the reference sequence but cannot find species absent from the reference. This method is one of the most user-friendly and resource efficient approaches and is thus feasible for rapidly analysing millions of short reads on a personal computer.The Genometa program, a step by step tutorial and Java source code are freely available from http://genomics1.mh-hannover.de/genometa/ and on http://code.google.com/p/genometa/. This program has been tested on Ubuntu Linux and Windows XP/7.

  13. Fr-TM-align: a new protein structural alignment method based on fragment alignments and the TM-score

    Directory of Open Access Journals (Sweden)

    Skolnick Jeffrey

    2008-12-01

    Full Text Available Abstract Background Protein tertiary structure comparisons are employed in various fields of contemporary structural biology. Most structure comparison methods involve generation of an initial seed alignment, which is extended and/or refined to provide the best structural superposition between a pair of protein structures as assessed by a structure comparison metric. One such metric, the TM-score, was recently introduced to provide a combined structure quality measure of the coordinate root mean square deviation between a pair of structures and coverage. Using the TM-score, the TM-align structure alignment algorithm was developed that was often found to have better accuracy and coverage than the most commonly used structural alignment programs; however, there were a number of situations when this was not true. Results To further improve structure alignment quality, the Fr-TM-align algorithm has been developed where aligned fragment pairs are used to generate the initial seed alignments that are then refined using dynamic programming to maximize the TM-score. For the assessment of the structural alignment quality from Fr-TM-align in comparison to other programs such as CE and TM-align, we examined various alignment quality assessment scores such as PSI and TM-score. The assessment showed that the structural alignment quality from Fr-TM-align is better in comparison to both CE and TM-align. On average, the structural alignments generated using Fr-TM-align have a higher TM-score (~9% and coverage (~7% in comparison to those generated by TM-align. Fr-TM-align uses an exhaustive procedure to generate initial seed alignments. Hence, the algorithm is computationally more expensive than TM-align. Conclusion Fr-TM-align, a new algorithm that employs fragment alignment and assembly provides better structural alignments in comparison to TM-align. The source code and executables of Fr-TM-align are freely downloadable at: http://cssb.biology.gatech.edu/skolnick/files/FrTMalign/.

  14. XUV ionization of aligned molecules

    Energy Technology Data Exchange (ETDEWEB)

    Kelkensberg, F.; Siu, W.; Gademann, G. [FOM Institute AMOLF, Science Park 104, NL-1098 XG Amsterdam (Netherlands); Rouzee, A.; Vrakking, M. J. J. [FOM Institute AMOLF, Science Park 104, NL-1098 XG Amsterdam (Netherlands); Max-Born-Institut, Max-Born Strasse 2A, D-12489 Berlin (Germany); Johnsson, P. [FOM Institute AMOLF, Science Park 104, NL-1098 XG Amsterdam (Netherlands); Department of Physics, Lund University, Post Office Box 118, SE-221 00 Lund (Sweden); Lucchini, M. [Department of Physics, Politecnico di Milano, Istituto di Fotonica e Nanotecnologie CNR-IFN, Piazza Leonardo da Vinci 32, 20133 Milano (Italy); Lucchese, R. R. [Department of Chemistry, Texas A and M University, College Station, Texas 77843-3255 (United States)

    2011-11-15

    New extreme-ultraviolet (XUV) light sources such as high-order-harmonic generation (HHG) and free-electron lasers (FELs), combined with laser-induced alignment techniques, enable novel methods for making molecular movies based on measuring molecular frame photoelectron angular distributions. Experiments are presented where CO{sub 2} molecules were impulsively aligned using a near-infrared laser and ionized using femtosecond XUV pulses obtained by HHG. Measured electron angular distributions reveal contributions from four orbitals and the onset of the influence of the molecular structure.

  15. The alignment-distribution graph

    Science.gov (United States)

    Chatterjee, Siddhartha; Gilbert, John R.; Schreiber, Robert

    1993-01-01

    Implementing a data-parallel language such as Fortran 90 on a distributed-memory parallel computer requires distributing aggregate data objects (such as arrays) among the memory modules attached to the processors. The mapping of objects to the machine determines the amount of residual communication needed to bring operands of parallel operations into alignment with each other. We present a program representation called the alignment distribution graph that makes these communication requirements explicit. We describe the details of the representation, show how to model communication cost in this framework, and outline several algorithms for determining object mappings that approximately minimize residual communication.

  16. Position list word aligned hybrid

    DEFF Research Database (Denmark)

    Deliege, Francois; Pedersen, Torben Bach

    2010-01-01

    Compressed bitmap indexes are increasingly used for efficiently querying very large and complex databases. The Word Aligned Hybrid (WAH) bitmap compression scheme is commonly recognized as the most efficient compression scheme in terms of CPU efficiency. However, WAH compressed bitmaps use a lot...... of storage space. This paper presents the Position List Word Aligned Hybrid (PLWAH) compression scheme that improves significantly over WAH compression by better utilizing the available bits and new CPU instructions. For typical bit distributions, PLWAH compressed bitmaps are often half the size of WAH...

  17. B-MIC: An Ultrafast Three-Level Parallel Sequence Aligner Using MIC.

    Science.gov (United States)

    Cui, Yingbo; Liao, Xiangke; Zhu, Xiaoqian; Wang, Bingqiang; Peng, Shaoliang

    2016-03-01

    Sequence alignment is the central process for sequence analysis, where mapping raw sequencing data to reference genome. The large amount of data generated by NGS is far beyond the process capabilities of existing alignment tools. Consequently, sequence alignment becomes the bottleneck of sequence analysis. Intensive computing power is required to address this challenge. Intel recently announced the MIC coprocessor, which can provide massive computing power. The Tianhe-2 is the world's fastest supercomputer now equipped with three MIC coprocessors each compute node. A key feature of sequence alignment is that different reads are independent. Considering this property, we proposed a MIC-oriented three-level parallelization strategy to speed up BWA, a widely used sequence alignment tool, and developed our ultrafast parallel sequence aligner: B-MIC. B-MIC contains three levels of parallelization: firstly, parallelization of data IO and reads alignment by a three-stage parallel pipeline; secondly, parallelization enabled by MIC coprocessor technology; thirdly, inter-node parallelization implemented by MPI. In this paper, we demonstrate that B-MIC outperforms BWA by a combination of those techniques using Inspur NF5280M server and the Tianhe-2 supercomputer. To the best of our knowledge, B-MIC is the first sequence alignment tool to run on Intel MIC and it can achieve more than fivefold speedup over the original BWA while maintaining the alignment precision.

  18. Comparative genomics in cyprinids: Common carp EST's help the annotation of the zebrafish genome

    NARCIS (Netherlands)

    Christoffels, A.; Bartfai, R.; Srinivasan, H.; Komen, J.

    2006-01-01

    Background - Automatic annotation of sequenced eukaryotic genomes integrates a combination of methodologies such as ab-initio methods and alignment of homologous genes and/or proteins. For example, annotation of the zebrafish genome within Ensembl relies heavily on available cDNA and protein sequenc

  19. Genome size variation in the genus Avena.

    Science.gov (United States)

    Yan, Honghai; Martin, Sara L; Bekele, Wubishet A; Latta, Robert G; Diederichsen, Axel; Peng, Yuanying; Tinker, Nicholas A

    2016-03-01

    Genome size is an indicator of evolutionary distance and a metric for genome characterization. Here, we report accurate estimates of genome size in 99 accessions from 26 species of Avena. We demonstrate that the average genome size of C genome diploid species (2C = 10.26 pg) is 15% larger than that of A genome species (2C = 8.95 pg), and that this difference likely accounts for a progression of size among tetraploid species, where AB genome configuration had similar genome sizes (average 2C = 25.74 pg). Genome size was mostly consistent within species and in general agreement with current information about evolutionary distance among species. Results also suggest that most of the polyploid species in Avena have experienced genome downsizing in relation to their diploid progenitors. Genome size measurements could provide additional quality control for species identification in germplasm collections, especially in cases where diploid and polyploid species have similar morphology.

  20. PhyloCSF: a comparative genomics method to distinguish protein coding and non-coding regions.

    Science.gov (United States)

    Lin, Michael F; Jungreis, Irwin; Kellis, Manolis

    2011-07-01

    As high-throughput transcriptome sequencing provides evidence for novel transcripts in many species, there is a renewed need for accurate methods to classify small genomic regions as protein coding or non-coding. We present PhyloCSF, a novel comparative genomics method that analyzes a multispecies nucleotide sequence alignment to determine whether it is likely to represent a conserved protein-coding region, based on a formal statistical comparison of phylogenetic codon models. We show that PhyloCSF's classification performance in 12-species Drosophila genome alignments exceeds all other methods we compared in a previous study. We anticipate that this method will be widely applicable as the transcriptomes of many additional species, tissues and subcellular compartments are sequenced, particularly in the context of ENCODE and modENCODE, and as interest grows in long non-coding RNAs, often initially recognized by their lack of protein coding potential rather than conserved RNA secondary structures. The Objective Caml source code and executables for GNU/Linux and Mac OS X are freely available at http://compbio.mit.edu/PhyloCSF CONTACT: mlin@mit.edu; manoli@mit.edu.

  1. Fast pairwise structural RNA alignments by pruning of the dynamical programming matrix

    DEFF Research Database (Denmark)

    Havgaard, Jakob Hull; Torarinsson, Elfar; Gorodkin, Jan

    2007-01-01

    genomes. One main problem with these methods is their computational complexity, and heuristics are therefore employed. Two heuristics are currently very popular: pre-folding and pre-aligning. However, these heuristics are not ideal, as pre-aligning is dependent on sequence similarity that may...... the advantage of providing the constraints dynamically. This has been included in a new implementation of the FOLDALIGN algorithm for pairwise local or global structural alignment of RNA sequences. It is shown that time and memory requirements are dramatically lowered while overall performance is maintained....... Furthermore, a new divide and conquer method is introduced to limit the memory requirement during global alignment and backtrack of local alignment. All branch points in the computed RNA structure are found and used to divide the structure into smaller unbranched segments. Each segment is then realigned...

  2. Detecting the limits of regulatory element conservation anddivergence estimation using pairwise and multiple alignments

    Energy Technology Data Exchange (ETDEWEB)

    Pollard, Daniel A.; Moses, Alan M.; Iyer, Venky N.; Eisen,Michael B.

    2006-08-14

    Background: Molecular evolutionary studies of noncodingsequences rely on multiple alignments. Yet how multiple alignmentaccuracy varies across sequence types, tree topologies, divergences andtools, and further how this variation impacts specific inferences,remains unclear. Results: Here we develop a molecular evolutionsimulation platform, CisEvolver, with models of background noncoding andtranscription factor binding site evolution, and use simulated alignmentsto systematically examine multiple alignment accuracy and its impact ontwo key molecular evolutionary inferences: transcription factor bindingsite conservation and divergence estimation. We find that the accuracy ofmultiple alignments is determined almost exclusively by the pairwisedivergence distance of the two most diverged species and that additionalspecies have a negligible influence on alignment accuracy. Conservedtranscription factor binding sites align better than surroundingnoncoding DNA yet are often found to be misaligned at relatively shortdivergence distances, such that studies of binding site gain and losscould easily be confounded by alignment error. Divergence estimates frommultiple alignments tend to be overestimated at short divergencedistances but reach a tool specific divergence at which they cease toincrease, leading to underestimation at long divergences. Our moststriking finding was that overall alignment accuracy, binding sitealignment accuracy and divergence estimation accuracy vary greatly acrossbranches in a tree and are most accurate for terminal branches connectingsister taxa and least accurate for internal branches connectingsub-alignments. Conclusions: Our results suggest that variation inalignment accuracy can lead to errors in molecular evolutionaryinferences that could be construed as biological variation. Thesefindings have implications for which species to choose for analyses, whatkind of errors would be expected for a given set of species and howmultiple alignment tools and

  3. Genomic instability and cancer: an introduction

    Institute of Scientific and Technical Information of China (English)

    Zhiyuan Shen

    2011-01-01

    @@ Genomic instability as a major driving force of tumorigenesis.The ultimate goal of cell division for most non-cancerous somatic cells is to accurately duplicate the genome and then evenly divide the duplicated genome into the two daughter cells.This ensures that the daughter cells will have exactly the same genetic material as their parent cell.

  4. Systematic evaluation of spliced alignment programs for RNA-seq data

    OpenAIRE

    Engström, Pär G; Steijger, Tamara; Sipos, Botond; Grant, Gregory R; Kahles, André; RGASP Consortium; Rätsch, Gunnar; Goldman, Nick; Hubbard, Tim J.; Harrow, Jennifer; Guigó Serra, Roderic; Bertone, Paul

    2013-01-01

    High-throughput RNA sequencing is an increasingly accessible method for studying gene structure and activity on a genome-wide scale. A critical step in RNA-seq data analysis is the alignment of partial transcript reads to a reference genome sequence. To assess the performance of current mapping software, we invited developers of RNA-seq aligners to process four large human and mouse RNA-seq data sets. In total, we compared 26 mapping protocols based on 11 programs and pipelines and found majo...

  5. DIALIGN P: Fast pair-wise and multiple sequence alignment using parallel processors

    Directory of Open Access Journals (Sweden)

    Kaufmann Michael

    2004-09-01

    Full Text Available Abstract Background Parallel computing is frequently used to speed up computationally expensive tasks in Bioinformatics. Results Herein, a parallel version of the multi-alignment program DIALIGN is introduced. We propose two ways of dividing the program into independent sub-routines that can be run on different processors: (a pair-wise sequence alignments that are used as a first step to multiple alignment account for most of the CPU time in DIALIGN. Since alignments of different sequence pairs are completely independent of each other, they can be distributed to multiple processors without any effect on the resulting output alignments. (b For alignments of large genomic sequences, we use a heuristics by splitting up sequences into sub-sequences based on a previously introduced anchored alignment procedure. For our test sequences, this combined approach reduces the program running time of DIALIGN by up to 97%. Conclusions By distributing sub-routines to multiple processors, the running time of DIALIGN can be crucially improved. With these improvements, it is possible to apply the program in large-scale genomics and proteomics projects that were previously beyond its scope.

  6. ChromAlign: A two-step algorithmic procedure for time alignment of three-dimensional LC-MS chromatographic surfaces.

    Science.gov (United States)

    Sadygov, Rovshan G; Maroto, Fernando Martin; Hühmer, Andreas F R

    2006-12-15

    We present an algorithmic approach to align three-dimensional chromatographic surfaces of LC-MS data of complex mixture samples. The approach consists of two steps. In the first step, we prealign chromatographic profiles: two-dimensional projections of chromatographic surfaces. This is accomplished by correlation analysis using fast Fourier transforms. In this step, a temporal offset that maximizes the overlap and dot product between two chromatographic profiles is determined. In the second step, the algorithm generates correlation matrix elements between full mass scans of the reference and sample chromatographic surfaces. The temporal offset from the first step indicates a range of the mass scans that are possibly correlated, then the correlation matrix is calculated only for these mass scans. The correlation matrix carries information on highly correlated scans, but it does not itself determine the scan or time alignment. Alignment is determined as a path in the correlation matrix that maximizes the sum of the correlation matrix elements. The computational complexity of the optimal path generation problem is reduced by the use of dynamic programming. The program produces time-aligned surfaces. The use of the temporal offset from the first step in the second step reduces the computation time for generating the correlation matrix and speeds up the process. The algorithm has been implemented in a program, ChromAlign, developed in C++ language for the .NET2 environment in WINDOWS XP. In this work, we demonstrate the applications of ChromAlign to alignment of LC-MS surfaces of several datasets: a mixture of known proteins, samples from digests of surface proteins of T-cells, and samples prepared from digests of cerebrospinal fluid. ChromAlign accurately aligns the LC-MS surfaces we studied. In these examples, we discuss various aspects of the alignment by ChromAlign, such as constant time axis shifts and warping of chromatographic surfaces.

  7. LASAGNA: A novel algorithm for transcription factor binding site alignment

    Science.gov (United States)

    2013-01-01

    Background Scientists routinely scan DNA sequences for transcription factor (TF) binding sites (TFBSs). Most of the available tools rely on position-specific scoring matrices (PSSMs) constructed from aligned binding sites. Because of the resolutions of assays used to obtain TFBSs, databases such as TRANSFAC, ORegAnno and PAZAR store unaligned variable-length DNA segments containing binding sites of a TF. These DNA segments need to be aligned to build a PSSM. While the TRANSFAC database provides scoring matrices for TFs, nearly 78% of the TFs in the public release do not have matrices available. As work on TFBS alignment algorithms has been limited, it is highly desirable to have an alignment algorithm tailored to TFBSs. Results We designed a novel algorithm named LASAGNA, which is aware of the lengths of input TFBSs and utilizes position dependence. Results on 189 TFs of 5 species in the TRANSFAC database showed that our method significantly outperformed ClustalW2 and MEME. We further compared a PSSM method dependent on LASAGNA to an alignment-free TFBS search method. Results on 89 TFs whose binding sites can be located in genomes showed that our method is significantly more precise at fixed recall rates. Finally, we described LASAGNA-ChIP, a more sophisticated version for ChIP (Chromatin immunoprecipitation) experiments. Under the one-per-sequence model, it showed comparable performance with MEME in discovering motifs in ChIP-seq peak sequences. Conclusions We conclude that the LASAGNA algorithm is simple and effective in aligning variable-length binding sites. It has been integrated into a user-friendly webtool for TFBS search and visualization called LASAGNA-Search. The tool currently stores precomputed PSSM models for 189 TFs and 133 TFs built from TFBSs in the TRANSFAC Public database (release 7.0) and the ORegAnno database (08Nov10 dump), respectively. The webtool is available at http://biogrid.engr.uconn.edu/lasagna_search/. PMID:23522376

  8. Aligned natural inflation with modulations

    Energy Technology Data Exchange (ETDEWEB)

    Choi, Kiwoon, E-mail: kchoi@ibs.re.kr [Center for Theoretical Physics of the Universe, Institute for Basic Science (IBS), Daejeon, 34051 (Korea, Republic of); Kim, Hyungjin, E-mail: hjkim06@kaist.ac.kr [Center for Theoretical Physics of the Universe, Institute for Basic Science (IBS), Daejeon, 34051 (Korea, Republic of); Department of Physics, KAIST, Daejeon, 305-701 (Korea, Republic of)

    2016-08-10

    The weak gravity conjecture applied for the aligned natural inflation indicates that generically there can be a modulation of the inflaton potential, with a period determined by sub-Planckian axion scale. We study the oscillations in the primordial power spectrum induced by such modulation, and discuss the resulting observational constraints on the model.

  9. Aligned natural inflation with modulations

    Directory of Open Access Journals (Sweden)

    Kiwoon Choi

    2016-08-01

    Full Text Available The weak gravity conjecture applied for the aligned natural inflation indicates that generically there can be a modulation of the inflaton potential, with a period determined by sub-Planckian axion scale. We study the oscillations in the primordial power spectrum induced by such modulation, and discuss the resulting observational constraints on the model.

  10. Aligning Assessments for COSMA Accreditation

    Science.gov (United States)

    Laird, Curt; Johnson, Dennis A.; Alderman, Heather

    2015-01-01

    Many higher education sport management programs are currently in the process of seeking accreditation from the Commission on Sport Management Accreditation (COSMA). This article provides a best-practice method for aligning student learning outcomes with a sport management program's mission and goals. Formative and summative assessment procedures…

  11. The Rigors of Aligning Performance

    Science.gov (United States)

    2015-06-01

    organization must consider and work closely with its many stakeholders so as to guarantee satisfaction ; this idea is especially important as there is no...define success. Methodology includes a literature review, employee and customer surveys and a Strength, Weaknesses, Opportunities, Threats...bearing in mind customer perceptions. Recommendations include employee training centered on goal alignment, which is vital to highlight the

  12. Theoretical and practical feasibility demonstration of a micrometric remotely controlled pre-alignment system for the CLIC linear collider

    CERN Document Server

    Mainaud Durand, H; Chritin, N; Griffet, S; Kemppinen, J; Sosin, M; Touze, T

    2011-01-01

    The active pre-alignment of the Compact Linear Collider (CLIC) is one of the key points of the project: the components must be pre-aligned w.r.t. a straight line within a few microns over a sliding window of 200 m, along the two linacs of 20 km each. The proposed solution consists of stretched wires of more than 200 m, overlapping over half of their length, which will be the reference of alignment. Wire Positioning Sensors (WPS), coupled to the supports to be pre-aligned, will perform precise and accurate measurements within a few microns w.r.t. these wires. A micrometric fiducialisation of the components and a micrometric alignment of the components on common supports will make the strategy of pre-alignment complete. In this paper, the global strategy of active pre-alignment is detailed and illustrated by the latest results demonstrating the feasibility of the proposed solution.

  13. Progressive multiple sequence alignments from triplets

    Directory of Open Access Journals (Sweden)

    Stadler Peter F

    2007-07-01

    Full Text Available Abstract Background The quality of progressive sequence alignments strongly depends on the accuracy of the individual pairwise alignment steps since gaps that are introduced at one step cannot be removed at later aggregation steps. Adjacent insertions and deletions necessarily appear in arbitrary order in pairwise alignments and hence form an unavoidable source of errors. Research Here we present a modified variant of progressive sequence alignments that addresses both issues. Instead of pairwise alignments we use exact dynamic programming to align sequence or profile triples. This avoids a large fractions of the ambiguities arising in pairwise alignments. In the subsequent aggregation steps we follow the logic of the Neighbor-Net algorithm, which constructs a phylogenetic network by step-wisely replacing triples by pairs instead of combining pairs to singletons. To this end the three-way alignments are subdivided into two partial alignments, at which stage all-gap columns are naturally removed. This alleviates the "once a gap, always a gap" problem of progressive alignment procedures. Conclusion The three-way Neighbor-Net based alignment program aln3nn is shown to compare favorably on both protein sequences and nucleic acids sequences to other progressive alignment tools. In the latter case one easily can include scoring terms that consider secondary structure features. Overall, the quality of resulting alignments in general exceeds that of clustalw or other multiple alignments tools even though our software does not included heuristics for context dependent (mismatch scores.

  14. AlignMiner: a Web-based tool for detection of divergent regions in multiple sequence alignments of conserved sequences

    Directory of Open Access Journals (Sweden)

    Claros M Gonzalo

    2010-06-01

    Full Text Available Abstract Background Multiple sequence alignments are used to study gene or protein function, phylogenetic relations, genome evolution hypotheses and even gene polymorphisms. Virtually without exception, all available tools focus on conserved segments or residues. Small divergent regions, however, are biologically important for specific quantitative polymerase chain reaction, genotyping, molecular markers and preparation of specific antibodies, and yet have received little attention. As a consequence, they must be selected empirically by the researcher. AlignMiner has been developed to fill this gap in bioinformatic analyses. Results AlignMiner is a Web-based application for detection of conserved and divergent regions in alignments of conserved sequences, focusing particularly on divergence. It accepts alignments (protein or nucleic acid obtained using any of a variety of algorithms, which does not appear to have a significant impact on the final results. AlignMiner uses different scoring methods for assessing conserved/divergent regions, Entropy being the method that provides the highest number of regions with the greatest length, and Weighted being the most restrictive. Conserved/divergent regions can be generated either with respect to the consensus sequence or to one master sequence. The resulting data are presented in a graphical interface developed in AJAX, which provides remarkable user interaction capabilities. Users do not need to wait until execution is complete and can.even inspect their results on a different computer. Data can be downloaded onto a user disk, in standard formats. In silico and experimental proof-of-concept cases have shown that AlignMiner can be successfully used to designing specific polymerase chain reaction primers as well as potential epitopes for antibodies. Primer design is assisted by a module that deploys several oligonucleotide parameters for designing primers "on the fly". Conclusions AlignMiner can be used

  15. In-Flight Self-Alignment Method Aided by Geomagnetism for Moving Basement of Guided Munitions

    Directory of Open Access Journals (Sweden)

    Shuang-biao Zhang

    2015-01-01

    Full Text Available Due to power-after-launch mode of guided munitions of high rolling speed, initial attitude of munitions cannot be determined accurately, and this makes it difficult for navigation and control system to work effectively and validly. An in-flight self-alignment method aided by geomagnetism that includes a fast in-flight coarse alignment method and an in-flight alignment model based on Kalman theory is proposed in this paper. Firstly a fast in-flight coarse alignment method is developed by using gyros, magnetic sensors, and trajectory angles. Then, an in-flight alignment model is derived by investigation of the measurement errors and attitude errors, which regards attitude errors as state variables and geomagnetic components in navigation frame as observed variables. Finally, fight data of a spinning projectile is used to verify the performance of the in-flight self-alignment method. The satisfying results show that (1 the precision of coarse alignment can attain below 5°; (2 the attitude errors by in-flight alignment model converge to 24′ at early of the latter half of the flight; (3 the in-flight alignment model based on Kalman theory has better adaptability, and show satisfying performance.

  16. Accessing the SEED genome databases via Web services API: tools for programmers

    National Research Council Canada - National Science Library

    Disz, Terry; Akhter, Sajia; Cuevas, Daniel; Olson, Robert; Overbeek, Ross; Vonstein, Veronika; Stevens, Rick; Edwards, Robert A

    2010-01-01

    .... The database contains accurate and up-to-date annotations based on the subsystems concept that leverages clustering between genomes and other clues to accurately and efficiently annotate microbial genomes...

  17. Aligned Layers of Silver Nano-Fibers

    Directory of Open Access Journals (Sweden)

    Andrii B. Golovin

    2012-02-01

    Full Text Available We describe a new dichroic polarizers made by ordering silver nano-fibers to aligned layers. The aligned layers consist of nano-fibers and self-assembled molecular aggregates of lyotropic liquid crystals. Unidirectional alignment of the layers is achieved by means of mechanical shearing. Aligned layers of silver nano-fibers are partially transparent to a linearly polarized electromagnetic radiation. The unidirectional alignment and density of the silver nano-fibers determine degree of polarization of transmitted light. The aligned layers of silver nano-fibers might be used in optics, microwave applications, and organic electronics.

  18. Image-based quantification of fiber alignment within electrospun tissue engineering scaffolds is related to mechanical anisotropy.

    Science.gov (United States)

    Fee, Timothy; Downs, Crawford; Eberhardt, Alan; Zhou, Yong; Berry, Joel

    2016-07-01

    It is well documented that electrospun tissue engineering scaffolds can be fabricated with variable degrees of fiber alignment to produce scaffolds with anisotropic mechanical properties. Several attempts have been made to quantify the degree of fiber alignment within an electrospun scaffold using image-based methods. However, these methods are limited by the inability to produce a quantitative measure of alignment that can be used to make comparisons across publications. Therefore, we have developed a new approach to quantifying the alignment present within a scaffold from scanning electron microscopic (SEM) images. The alignment is determined by using the Sobel approximation of the image gradient to determine the distribution of gradient angles with an image. This data was fit to a Von Mises distribution to find the dispersion parameter κ, which was used as a quantitative measure of fiber alignment. We fabricated four groups of electrospun polycaprolactone (PCL) + Gelatin scaffolds with alignments ranging from κ = 1.9 (aligned) to κ = 0.25 (random) and tested our alignment quantification method on these scaffolds. It was found that our alignment quantification method could distinguish between scaffolds of different alignments more accurately than two other published methods. Additionally, the alignment parameter κ was found to be a good predictor the mechanical anisotropy of our electrospun scaffolds. The ability to quantify fiber alignment within and make direct comparisons of scaffold fiber alignment across publications can reduce ambiguity between published results where cells are cultured on "highly aligned" fibrous scaffolds. This could have important implications for characterizing mechanics and cellular behavior on aligned tissue engineering scaffolds. © 2016 Wiley Periodicals, Inc. J Biomed Mater Res Part A: 104A: 1680-1686, 2016.

  19. LocARNA-P: Accurate boundary prediction and improved detection of structural RNAs

    DEFF Research Database (Denmark)

    Will, Sebastian; Joshi, Tejal; Hofacker, Ivo L.

    2012-01-01

    Current genomic screens for noncoding RNAs (ncRNAs) predict a large number of genomic regions containing potential structural ncRNAs. The analysis of these data requires highly accurate prediction of ncRNA boundaries and discrimination of promising candidate ncRNAs from weak predictions. Existing...

  20. MaxSSmap: a GPU program for mapping divergent short reads to genomes with the maximum scoring subsequence.

    Science.gov (United States)

    Turki, Turki; Roshan, Usman

    2014-11-15

    Programs based on hash tables and Burrows-Wheeler are very fast for mapping short reads to genomes but have low accuracy in the presence of mismatches and gaps. Such reads can be aligned accurately with the Smith-Waterman algorithm but it can take hours and days to map millions of reads even for bacteria genomes. We introduce a GPU program called MaxSSmap with the aim of achieving comparable accuracy to Smith-Waterman but with faster runtimes. Similar to most programs MaxSSmap identifies a local region of the genome followed by exact alignment. Instead of using hash tables or Burrows-Wheeler in the first part, MaxSSmap calculates maximum scoring subsequence score between the read and disjoint fragments of the genome in parallel on a GPU and selects the highest scoring fragment for exact alignment. We evaluate MaxSSmap's accuracy and runtime when mapping simulated Illumina E.coli and human chromosome one reads of different lengths and 10% to 30% mismatches with gaps to the E.coli genome and human chromosome one. We also demonstrate applications on real data by mapping ancient horse DNA reads to modern genomes and unmapped paired reads from NA12878 in 1000 genomes. We show that MaxSSmap attains comparable high accuracy and low error to fast Smith-Waterman programs yet has much lower runtimes. We show that MaxSSmap can map reads rejected by BWA and NextGenMap with high accuracy and low error much faster than if Smith-Waterman were used. On short read lengths of 36 and 51 both MaxSSmap and Smith-Waterman have lower accuracy compared to at higher lengths. On real data MaxSSmap produces many alignments with high score and mapping quality that are not given by NextGenMap and BWA. The MaxSSmap source code in CUDA and OpenCL is freely available from http://www.cs.njit.edu/usman/MaxSSmap.

  1. RNASequel: accurate and repeat tolerant realignment of RNA-seq reads.

    Science.gov (United States)

    Wilson, Gavin W; Stein, Lincoln D

    2015-10-15

    RNA-seq is a key technology for understanding the biology of the cell because of its ability to profile transcriptional and post-transcriptional regulation at single nucleotide resolutions. Compared to DNA sequencing alignment algorithms, RNA-seq alignment algorithms have a diminished ability to accurately detect and map base pair substitutions, gaps, discordant pairs and repetitive regions. These shortcomings adversely affect experiments that require a high degree of accuracy, notably the ability to detect RNA editing. We have developed RNASequel, a software package that runs as a post-processing step in conjunction with an RNA-seq aligner and systematically corrects common alignment artifacts. Its key innovations are a two-pass splice junction alignment system that includes de novo splice junctions and the use of an empirically determined estimate of the fragment size distribution when resolving read pairs. We demonstrate that RNASequel produces improved alignments when used in conjunction with STAR or Tophat2 using two simulated datasets. We then show that RNASequel improves the identification of adenosine to inosine RNA editing sites on biological datasets. This software will be useful in applications requiring the accurate identification of variants in RNA sequencing data, the discovery of RNA editing sites and the analysis of alternative splicing.

  2. Alignment of multiple-off-axis-beam imaging/interference systems.

    Science.gov (United States)

    Vadivel, Shruthi K; Leibovici, Matthieu C R; Gaylord, Thomas K

    2016-04-20

    The alignment of components in complex multibeam arrangements is typically prone to errors that limit the performance of the system. A systematic procedure for aligning such systems is presented here. The method facilitates the precision alignment of the optical elements to achieve the accurate projection of multiple on- and off-axis images and the simultaneous interference of the multiple beams. In addition to the multibeam imaging/interference system presented, the procedure can be employed in other multibeam imaging and/or interfering configurations.

  3. Aligning seminars with Bologna requirements

    DEFF Research Database (Denmark)

    Lueg, Klarissa; Lueg, Rainer; Lauridsen, Ole

    2016-01-01

    Changes in public policy, such as the Bologna Process, require students to be equipped with multifunctional competencies to master relevant tasks in unfamiliar situations. Achieving this goal might imply a change in many curricula toward deeper learning. As a didactical means to achieve deep...... learning results, the authors suggest reciprocal peer tutoring (RPT); as a conceptual framework the authors suggest the SOLO (Structure of Observed Learning Outcomes) taxonomy and constructive alignment as suggested by Biggs and Tang. Our study presents results from the introduction of RPT in a large...... course. The authors find that RPT produces satisfying learning outcomes, active students, and ideal constructive alignments of the seminar content with the exam, the intended learning outcomes, and the requirements of the Bologna Process. Our data, which comprise surveys and evaluations from both faculty...

  4. Prism Window for Optical Alignment

    Science.gov (United States)

    Tang, Hong

    2008-01-01

    A prism window has been devised for use, with an autocollimator, in aligning optical components that are (1) required to be oriented parallel to each other and/or at a specified angle of incidence with respect to a common optical path and (2) mounted at different positions along the common optical path. The prism window can also be used to align a single optical component at a specified angle of incidence. Prism windows could be generally useful for orienting optical components in manufacture of optical instruments. "Prism window" denotes an application-specific unit comprising two beam-splitter windows that are bonded together at an angle chosen to obtain the specified angle of incidence.

  5. Aligned mesoporous architectures and devices.

    Energy Technology Data Exchange (ETDEWEB)

    Brinker, C. Jeffrey; Lu, Yunfeng (University of California Los Angeles, Los Angeles, CA)

    2011-03-01

    This is the final report for the Presidential Early Career Award for Science and Engineering - PECASE (LDRD projects 93369 and 118841) awarded to Professor Yunfeng Lu (Tulane University and University of California-Los Angeles). During the last decade, mesoporous materials with tunable periodic pores have been synthesized using surfactant liquid crystalline as templates, opening a new avenue for a wide spectrum of applications. However, the applications are somewhat limited by the unfavorabe pore orientation of these materials. Although substantial effort has been devoted to align the pore channels, fabrication of mesoporous materials with perpendicular pore channels remains challenging. This project focused on fabrication of mesoporous materials with perpendicularly aligned pore channels. We demonstrated structures for use in water purification, separation, sensors, templated synthesis, microelectronics, optics, controlled release, and highly selective catalysts.

  6. Hohlraum Target Alignment from X-ray Detector Images using Starburst Design Patterns

    Energy Technology Data Exchange (ETDEWEB)

    Leach, R R; Conder, A; Edwards, O; Kroll, J; Kozioziemski, B; Mapoles, E; McGuigan, D; Wilhelmsen, K

    2010-12-14

    National Ignition Facility (NIF) is a high-energy laser facility comprised of 192 laser beams focused with enough power and precision on a hydrogen-filled spherical, cryogenic target to initiate a fusion reaction. The target container, or hohlraum, must be accurately aligned to an x-ray imaging system to allow careful monitoring of the frozen fuel layer in the target. To achieve alignment, x-ray images are acquired through starburst-shaped windows cut into opposite sides of the hohlraum. When the hohlraum is in alignment, the starburst pattern pairs match nearly exactly and allow a clear view of the ice layer formation on the edge of the target capsule. During the alignment process, x-ray image analysis is applied to determine the direction and magnitude of adjustment required. X-ray detector and source are moved in concert during the alignment process. The automated pointing alignment system described here is both accurate and efficient. In this paper, we describe the control and associated image processing that enables automation of the starburst pointing alignment.

  7. A DNA sequence alignment algorithm using quality information and a fuzzy inference method

    Institute of Scientific and Technical Information of China (English)

    Kwangbaek Kim; Minhwan Kim; Youngwoon Woo

    2008-01-01

    DNA sequence alignment algorithms in computational molecular biology have been improved by diverse methods.In this paper.We propose a DNA sequence alignment that Uses quality information and a fuzzy inference method developed based on the characteristics of DNA fragments and a fuzzy logic system in order to improve conventional DNA sequence alignment methods that uses DNA sequence quality information.In conventional algorithms.DNA sequence alignment scores are calculated by the global sequence alignment algorithm proposed by Needleman-Wunsch,which is established by using quality information of each DNA fragment.However,there may be errors in the process of calculating DNA sequence alignment scores when the quality of DNA fragment tips is low.because only the overall DNA sequence quality information are used.In our proposed method.an exact DNA sequence alignment can be achieved in spite of the low quality of DNA fragment tips by improvement of conventional algorithms using quality information.Mapping score parameters used to calculate DNA sequence alignment scores are dynamically adjusted by the fuzzy logic system utilizing lengths of DNA fragments and frequencies of low quality DNA bases in the fragments.From the experiments by applying real genome data of National Center for Bioteclmology Information,we could see that the proposed method is more efficient than conventional algorithms.

  8. Library preparation for highly accurate population sequencing of RNA viruses

    Science.gov (United States)

    Acevedo, Ashley; Andino, Raul

    2015-01-01

    Circular resequencing (CirSeq) is a novel technique for efficient and highly accurate next-generation sequencing (NGS) of RNA virus populations. The foundation of this approach is the circularization of fragmented viral RNAs, which are then redundantly encoded into tandem repeats by ‘rolling-circle’ reverse transcription. When sequenced, the redundant copies within each read are aligned to derive a consensus sequence of their initial RNA template. This process yields sequencing data with error rates far below the variant frequencies observed for RNA viruses, facilitating ultra-rare variant detection and accurate measurement of low-frequency variants. Although library preparation takes ~5 d, the high-quality data generated by CirSeq simplifies downstream data analysis, making this approach substantially more tractable for experimentalists. PMID:24967624

  9. CLAST: CUDA implemented large-scale alignment search tool.

    Science.gov (United States)

    Yano, Masahiro; Mori, Hiroshi; Akiyama, Yutaka; Yamada, Takuji; Kurokawa, Ken

    2014-12-11

    Metagenomics is a powerful methodology to study microbial communities, but it is highly dependent on nucleotide sequence similarity searching against sequence databases. Metagenomic analyses with next-generation sequencing technologies produce enormous numbers of reads from microbial communities, and many reads are derived from microbes whose genomes have not yet been sequenced, limiting the usefulness of existing sequence similarity search tools. Therefore, there is a clear need for a sequence similarity search tool that can rapidly detect weak similarity in large datasets. We developed a tool, which we named CLAST (CUDA implemented large-scale alignment search tool), that enables analyses of millions of reads and thousands of reference genome sequences, and runs on NVIDIA Fermi architecture graphics processing units. CLAST has four main advantages over existing alignment tools. First, CLAST was capable of identifying sequence similarities ~80.8 times faster than BLAST and 9.6 times faster than BLAT. Second, CLAST executes global alignment as the default (local alignment is also an option), enabling CLAST to assign reads to taxonomic and functional groups based on evolutionarily distant nucleotide sequences with high accuracy. Third, CLAST does not need a preprocessed sequence database like Burrows-Wheeler Transform-based tools, and this enables CLAST to incorporate large, frequently updated sequence databases. Fourth, CLAST requires computer or server node. CLAST achieved very high speed (similar to the Burrows-Wheeler Transform-based Bowtie 2 for long reads) and sensitivity (equal to BLAST, BLAT, and FR-HIT) without the need for extensive database preprocessing or a specialized computing platform. Our results demonstrate that CLAST has the potential to be one of the most powerful and realistic approaches to analyze the massive amount of sequence data from next-generation sequencing technologies.

  10. The Cluster Substructure - Alignment Connection

    OpenAIRE

    Plionis, Manolis

    2001-01-01

    Using the APM cluster data we investigate whether the dynamical status of clusters is related to the large-scale structure of the Universe. We find that cluster substructure is strongly correlated with the tendency of clusters to be aligned with their nearest neighbour and in general with the nearby clusters that belong to the same supercluster. Furthermore, dynamically young clusters are more clustered than the overall cluster population. These are strong indications that cluster develop in ...

  11. Alignment in double capture processes

    Energy Technology Data Exchange (ETDEWEB)

    Moretto-Capelle, P.; Benhenni, M.; Bordenave-Montesquieu, A.; Benoit-Cattin, P.; Gleizes, A. (IRSAMC, URA CNRS 770, Univ. Paul Sabatier, 118 rte de Narbonne, 31062 Toulouse Cedex (France))

    1993-06-05

    The electron spectra emitted when a double capture occurs in N[sup 7+]+He and Ne[sup 8+]+He systems at 10 qkeV collisional energy, allow us to determine the angular distributions of the 3[ell]3[ell] [prime] lines through a special spectra fitting procedure which includes interferences between neighbouring states. It is found that the doubly excited states populated in double capture processes are generally aligned.

  12. GASSST: global alignment short sequence search tool

    National Research Council Canada - National Science Library

    Rizk, Guillaume; Lavenier, Dominique

    2010-01-01

    .... Our goal with our new aligner GASSST (Global Alignment Short Sequence Search Tool) is thus 2-fold-achieving high performance with no restrictions on the number of indels with a design that is still effective on long reads...

  13. Grain alignment in starless cores

    Energy Technology Data Exchange (ETDEWEB)

    Jones, T. J.; Bagley, M. [Minnesota Institute for Astrophysics, University of Minnesota, Minneapolis, MN 55455 (United States); Krejny, M. [Cree Inc., 4600 Silicon Dr., Durham, NC (United States); Andersson, B.-G. [SOFIA Science Center, USRA, Moffett Field, CA (United States); Bastien, P., E-mail: tjj@astro.umn.edu [Centre de recherche en astrophysique du Québec and Départment de Physique, Université de Montréal, Montréal (Canada)

    2015-01-01

    We present near-IR polarimetry data of background stars shining through a selection of starless cores taken in the K band, probing visual extinctions up to A{sub V}∼48. We find that P{sub K}/τ{sub K} continues to decline with increasing A{sub V} with a power law slope of roughly −0.5. Examination of published submillimeter (submm) polarimetry of starless cores suggests that by A{sub V}≳20 the slope for P versus τ becomes ∼−1, indicating no grain alignment at greater optical depths. Combining these two data sets, we find good evidence that, in the absence of a central illuminating source, the dust grains in dense molecular cloud cores with no internal radiation source cease to become aligned with the local magnetic field at optical depths greater than A{sub V}∼20. A simple model relating the alignment efficiency to the optical depth into the cloud reproduces the observations well.

  14. From Word Alignment to Word Senses, via Multilingual Wordnets

    Directory of Open Access Journals (Sweden)

    Dan Tufis

    2006-05-01

    Full Text Available Most of the successful commercial applications in language processing (text and/or speech dispense with any explicit concern on semantics, with the usual motivations stemming from the computational high costs required for dealing with semantics, in case of large volumes of data. With recent advances in corpus linguistics and statistical-based methods in NLP, revealing useful semantic features of linguistic data is becoming cheaper and cheaper and the accuracy of this process is steadily improving. Lately, there seems to be a growing acceptance of the idea that multilingual lexical ontologisms might be the key towards aligning different views on the semantic atomic units to be used in characterizing the general meaning of various and multilingual documents. Depending on the granularity at which semantic distinctions are necessary, the accuracy of the basic semantic processing (such as word sense disambiguation can be very high with relatively low complexity computing. The paper substantiates this statement by presenting a statistical/based system for word alignment and word sense disambiguation in parallel corpora. We describe a word alignment platform which ensures text pre-processing (tokenization, POS-tagging, lemmatization, chunking, sentence and word alignment as required by an accurate word sense disambiguation.

  15. Efficient and accurate fragmentation methods.

    Science.gov (United States)

    Pruitt, Spencer R; Bertoni, Colleen; Brorsen, Kurt R; Gordon, Mark S

    2014-09-16

    Conspectus Three novel fragmentation methods that are available in the electronic structure program GAMESS (general atomic and molecular electronic structure system) are discussed in this Account. The fragment molecular orbital (FMO) method can be combined with any electronic structure method to perform accurate calculations on large molecular species with no reliance on capping atoms or empirical parameters. The FMO method is highly scalable and can take advantage of massively parallel computer systems. For example, the method has been shown to scale nearly linearly on up to 131 000 processor cores for calculations on large water clusters. There have been many applications of the FMO method to large molecular clusters, to biomolecules (e.g., proteins), and to materials that are used as heterogeneous catalysts. The effective fragment potential (EFP) method is a model potential approach that is fully derived from first principles and has no empirically fitted parameters. Consequently, an EFP can be generated for any molecule by a simple preparatory GAMESS calculation. The EFP method provides accurate descriptions of all types of intermolecular interactions, including Coulombic interactions, polarization/induction, exchange repulsion, dispersion, and charge transfer. The EFP method has been applied successfully to the study of liquid water, π-stacking in substituted benzenes and in DNA base pairs, solvent effects on positive and negative ions, electronic spectra and dynamics, non-adiabatic phenomena in electronic excited states, and nonlinear excited state properties. The effective fragment molecular orbital (EFMO) method is a merger of the FMO and EFP methods, in which interfragment interactions are described by the EFP potential, rather than the less accurate electrostatic potential. The use of EFP in this manner facilitates the use of a smaller value for the distance cut-off (Rcut). Rcut determines the distance at which EFP interactions replace fully quantum

  16. Accurate determination of antenna directivity

    DEFF Research Database (Denmark)

    Dich, Mikael

    1997-01-01

    The derivation of a formula for accurate estimation of the total radiated power from a transmitting antenna for which the radiated power density is known in a finite number of points on the far-field sphere is presented. The main application of the formula is determination of directivity from power......-pattern measurements. The derivation is based on the theory of spherical wave expansion of electromagnetic fields, which also establishes a simple criterion for the required number of samples of the power density. An array antenna consisting of Hertzian dipoles is used to test the accuracy and rate of convergence...

  17. Genome-Wide Association Mapping and Genomic Selection for Alfalfa (Medicago sativa) Forage Quality Traits

    Science.gov (United States)

    Pecetti, Luciano; Brummer, E. Charles; Palmonari, Alberto; Tava, Aldo

    2017-01-01

    Genetic progress for forage quality has been poor in alfalfa (Medicago sativa L.), the most-grown forage legume worldwide. This study aimed at exploring opportunities for marker-assisted selection (MAS) and genomic selection of forage quality traits based on breeding values of parent plants. Some 154 genotypes from a broadly-based reference population were genotyped by genotyping-by-sequencing (GBS), and phenotyped for leaf-to-stem ratio, leaf and stem contents of protein, neutral detergent fiber (NDF) and acid detergent lignin (ADL), and leaf and stem NDF digestibility after 24 hours (NDFD), of their dense-planted half-sib progenies in three growing conditions (summer harvest, full irrigation; summer harvest, suspended irrigation; autumn harvest). Trait-marker analyses were performed on progeny values averaged over conditions, owing to modest germplasm × condition interaction. Genomic selection exploited 11,450 polymorphic SNP markers, whereas a subset of 8,494 M. truncatula-aligned markers were used for a genome-wide association study (GWAS). GWAS confirmed the polygenic control of quality traits and, in agreement with phenotypic correlations, indicated substantially different genetic control of a given trait in stems and leaves. It detected several SNPs in different annotated genes that were highly linked to stem protein content. Also, it identified a small genomic region on chromosome 8 with high concentration of annotated genes associated with leaf ADL, including one gene probably involved in the lignin pathway. Three genomic selection models, i.e., Ridge-regression BLUP, Bayes B and Bayesian Lasso, displayed similar prediction accuracy, whereas SVR-lin was less accurate. Accuracy values were moderate (0.3–0.4) for stem NDFD and leaf protein content, modest for leaf ADL and NDFD, and low to very low for the other traits. Along with previous results for the same germplasm set, this study indicates that GBS data can be exploited to improve both quality traits

  18. The UCSC Genome Browser database: 2016 update.

    Science.gov (United States)

    Speir, Matthew L; Zweig, Ann S; Rosenbloom, Kate R; Raney, Brian J; Paten, Benedict; Nejad, Parisa; Lee, Brian T; Learned, Katrina; Karolchik, Donna; Hinrichs, Angie S; Heitner, Steve; Harte, Rachel A; Haeussler, Maximilian; Guruvadoo, Luvina; Fujita, Pauline A; Eisenhart, Christopher; Diekhans, Mark; Clawson, Hiram; Casper, Jonathan; Barber, Galt P; Haussler, David; Kuhn, Robert M; Kent, W James

    2016-01-01

    For the past 15 years, the UCSC Genome Browser (http://genome.ucsc.edu/) has served the international research community by offering an integrated platform for viewing and analyzing information from a large database of genome assemblies and their associated annotations. The UCSC Genome Browser has been under continuous development since its inception with new data sets and software features added frequently. Some release highlights of this year include new and updated genome browsers for various assemblies, including bonobo and zebrafish; new gene annotation sets; improvements to track and assembly hub support; and a new interactive tool, the "Data Integrator", for intersecting data from multiple tracks. We have greatly expanded the data sets available on the most recent human assembly, hg38/GRCh38, to include updated gene prediction sets from GENCODE, more phenotype- and disease-associated variants from ClinVar and ClinGen, more genomic regulatory data, and a new multiple genome alignment.

  19. An Overview of Multiple Sequence Alignment Systems

    CERN Document Server

    Saeed, Fahad

    2009-01-01

    An overview of current multiple alignment systems to date are described.The useful algorithms, the procedures adopted and their limitations are presented.We also present the quality of the alignments obtained and in which cases(kind of alignments, kind of sequences etc) the particular systems are useful.

  20. Physician-Hospital Alignment in Orthopedic Surgery.

    Science.gov (United States)

    Bushnell, Brandon D

    2015-09-01

    The concept of "alignment" between physicians and hospitals is a popular buzzword in the age of health care reform. Despite their often tumultuous histories, physicians and hospitals find themselves under increasing pressures to work together toward common goals. However, effective alignment is more than just simple cooperation between parties. The process of achieving alignment does not have simple, universal steps. Alignment will differ based on individual situational factors and the type of specialty involved. Ultimately, however, there are principles that underlie the concept of alignment and should be a part of any physician-hospital alignment efforts. In orthopedic surgery, alignment involves the clinical, administrative, financial, and even personal aspects of a surgeon's practice. It must be based on the principles of financial interest, clinical authority, administrative participation, transparency, focus on the patient, and mutual necessity. Alignment can take on various forms as well, with popular models consisting of shared governance and comanagement, gainsharing, bundled payments, accountable care organizations, and other methods. As regulatory and financial pressures continue to motivate physicians and hospitals to develop alignment relationships, new and innovative methods of alignment will also appear. Existing models will mature and evolve, with individual variability based on local factors. However, certain trends seem to be appearing as time progresses and alignment relationships deepen, including regional and national collaboration, population management, and changes in the legal system. This article explores the history, principles, and specific methods of physician-hospital alignment and its critical importance for the future of health care delivery.

  1. Vertically aligned nanostructure scanning probe microscope tips

    Science.gov (United States)

    Guillorn, Michael A.; Ilic, Bojan; Melechko, Anatoli V.; Merkulov, Vladimir I.; Lowndes, Douglas H.; Simpson, Michael L.

    2006-12-19

    Methods and apparatus are described for cantilever structures that include a vertically aligned nanostructure, especially vertically aligned carbon nanofiber scanning probe microscope tips. An apparatus includes a cantilever structure including a substrate including a cantilever body, that optionally includes a doped layer, and a vertically aligned nanostructure coupled to the cantilever body.

  2. Hardware Acceleration of Bioinformatics Sequence Alignment Applications

    NARCIS (Netherlands)

    Hasan, L.

    2011-01-01

    Biological sequence alignment is an important and challenging task in bioinformatics. Alignment may be defined as an arrangement of two or more DNA or protein sequences to highlight the regions of their similarity. Sequence alignment is used to infer the evolutionary relationship between a set of pr

  3. Alignment of lower-limb prostheses.

    Science.gov (United States)

    Zahedi, M S; Spence, W D; Solomonidis, S E; Paul, J P

    1986-04-01

    Alignment of a prosthesis is defined as the position of the socket relative to the other prosthetic components of the limb. During dynamic alignment the prosthetist, using subjective judgment and feedback from the patient, aims to achieve the most suitable limb geometry for best function and comfort. Until recently it was generally believed that a patient could only be satisfied with a unique "optimum alignment." The purpose of this systematic study of lower-limb alignment parameters was to gain an understanding of the factors that make a limb configuration or optimum alignment, acceptable to the patient, and to obtain a measure of the variation of this alignment that would be acceptable to the amputee. In this paper, the acceptable range of alignments for 10 below- and 10 above-knee amputees are established. Three prosthetists were involved in the majority of the 183 below-knee and 100 above-knee fittings, although several other prosthetists were also involved. The effects of each different prosthetist on the established range of alignment for each patient are reported to be significant. It is now established that an amputee can tolerate several alignments ranging in some parameters by as much as 148 mm in shifts and 17 degrees in tilts. This paper describes the method of defining and measuring the alignment of lower-limb prostheses. It presents quantitatively established values for bench alignment position and the range of adjustment required for incorporation into the design of new alignment units.

  4. Aligning Projection Images from Binary Volumes

    NARCIS (Netherlands)

    Bleichrodt, F.; Beenhouwer, J. de; Sijbers, J.; Batenburg, K.J.

    2014-01-01

    In tomography, slight differences between the geometry of the scanner hardware and the geometric model used in the reconstruction lead to alignment artifacts. To exploit high-resolution detectors used in many applications of tomography, alignment of the projection data is essential. Markerless align

  5. Inferring comprehensible business/ICT alignment rules

    NARCIS (Netherlands)

    Cumps, B.; Martens, D.; De Backer, M.; Haesen, R.; Viaene, S.; Dedene, G.; Baesens, B.; Snoeck, M.

    2009-01-01

    We inferred business rules for business/ICT alignment by applying a novel rule induction algorithm on a data set containing rich alignment information polled from 641 organisations in 7 European countries. The alignment rule set was created using AntMiner+, a rule induction technique with a reputati

  6. Shift dynamics of capillary self-alignment

    NARCIS (Netherlands)

    Arutinov, G.; Mastrangeli, M.; Smits, E.C.P.; Heck, G.V.; Schoo, H.F.M.; Toonder, J.J.M. den; Dietzel, A.H.

    2014-01-01

    This paper describes the dynamics of capillary self-alignment of components with initial shift offsets from matching receptor sites. The analysis of the full uniaxial self-alignment dynamics of foil-based mesoscopic dies from pre-alignment to final settling evidenced three distinct, sequential regim

  7. Strategic Alignment and New Product Development

    DEFF Research Database (Denmark)

    Acur, Nuran; Kandemir, Destan; Boer, Harry

    2012-01-01

    Strategic alignment is widely accepted as a prerequisite for a firm’s success, but insight into the role of alignment in, and its impact on, the new product evelopment (NPD) process and its performance is less well developed. Most publications on this topic either focus on one form of alignment o...

  8. MutationAligner: a resource of recurrent mutation hotspots in protein domains in cancer.

    Science.gov (United States)

    Gauthier, Nicholas Paul; Reznik, Ed; Gao, Jianjiong; Sumer, Selcuk Onur; Schultz, Nikolaus; Sander, Chris; Miller, Martin L

    2016-01-04

    The MutationAligner web resource, available at http://www.mutationaligner.org, enables discovery and exploration of somatic mutation hotspots identified in protein domains in currently (mid-2015) more than 5000 cancer patient samples across 22 different tumor types. Using multiple sequence alignments of protein domains in the human genome, we extend the principle of recurrence analysis by aggregating mutations in homologous positions across sets of paralogous genes. Protein domain analysis enhances the statistical power to detect cancer-relevant mutations and links mutations to the specific biological functions encoded in domains. We illustrate how the MutationAligner database and interactive web tool can be used to explore, visualize and analyze mutation hotspots in protein domains across genes and tumor types. We believe that MutationAligner will be an important resource for the cancer research community by providing detailed clues for the functional importance of particular mutations, as well as for the design of functional genomics experiments and for decision support in precision medicine. MutationAligner is slated to be periodically updated to incorporate additional analyses and new data from cancer genomics projects. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  9. Establishing a framework for comparative analysis of genome sequences

    Energy Technology Data Exchange (ETDEWEB)

    Bansal, A.K.

    1995-06-01

    This paper describes a framework and a high-level language toolkit for comparative analysis of genome sequence alignment The framework integrates the information derived from multiple sequence alignment and phylogenetic tree (hypothetical tree of evolution) to derive new properties about sequences. Multiple sequence alignments are treated as an abstract data type. Abstract operations have been described to manipulate a multiple sequence alignment and to derive mutation related information from a phylogenetic tree by superimposing parsimonious analysis. The framework has been applied on protein alignments to derive constrained columns (in a multiple sequence alignment) that exhibit evolutionary pressure to preserve a common property in a column despite mutation. A Prolog toolkit based on the framework has been implemented and demonstrated on alignments containing 3000 sequences and 3904 columns.

  10. The Oryza Map Alignment Project (OMAP) introgression lines for allelic diversity and new germplasm development

    Science.gov (United States)

    The Oryza Map Alignment Project (OMAP) has developed a genus wide model system for the study of rice that will ultimately provide a complete understanding of the genus. The purpose of this project is to capitalize on the strengths of the Arizona Genomics Institute (AGI), OMAP participants and the r...

  11. Hydropathy profile alignment : a tool to search for structural homologues of membrane proteins

    NARCIS (Netherlands)

    Lolkema, JS; Slotboom, DJ

    1998-01-01

    Hydropathy profile alignment is introduced as a tool in functional genomics. The architecture of membrane proteins is reflected in the hydropathy profile of the amino acid sequence. Both secondary and tertiary structural elements determine the profile which provides enough sensitivity to detect evol

  12. A Vondrak low pass filter for IMU sensor initial alignment on a disturbed base.

    Science.gov (United States)

    Li, Zengke; Wang, Jian; Gao, Jingxiang; Li, Binghao; Zhou, Feng

    2014-12-10

    The initial alignment of the Inertial Measurement Unit (IMU) is an important process of INS to determine the coordinate transformation matrix which is used in the integration of Global Positioning Systems (GPS) with Inertial Navigation Systems (INS). In this paper a novel alignment method for a disturbed base, such as a vehicle disturbed by wind outdoors, implemented with the aid of a Vondrak low pass filter, is proposed. The basic principle of initial alignment including coarse alignment and fine alignment is introduced first. The spectral analysis is processed to compare the differences between the characteristic error of INS force observation on a stationary base and on disturbed bases. In order to reduce the high frequency noise in the force observation more accurately and more easily, a Vondrak low pass filter is constructed based on the spectral analysis result. The genetic algorithms method is introduced to choose the smoothing factor in the Vondrak filter and the corresponding objective condition is built. The architecture of the proposed alignment method with the Vondrak low pass filter is shown. Furthermore, simulated experiments and actual experiments were performed to validate the new algorithm. The results indicate that, compared with the conventional alignment method, the Vondrak filter could eliminate the high frequency noise in the force observation and the proposed alignment method could improve the attitude accuracy. At the same time, only one parameter needs to be set, which makes the proposed method easier to implement than other low-pass filter methods.

  13. A Vondrak Low Pass Filter for IMU Sensor Initial Alignment on a Disturbed Base

    Directory of Open Access Journals (Sweden)

    Zengke Li

    2014-12-01

    Full Text Available The initial alignment of the Inertial Measurement Unit (IMU is an important process of INS to determine the coordinate transformation matrix which is used in the integration of Global Positioning Systems (GPS with Inertial Navigation Systems (INS. In this paper a novel alignment method for a disturbed base, such as a vehicle disturbed by wind outdoors, implemented with the aid of a Vondrak low pass filter, is proposed. The basic principle of initial alignment including coarse alignment and fine alignment is introduced first. The spectral analysis is processed to compare the differences between the characteristic error of INS force observation on a stationary base and on disturbed bases. In order to reduce the high frequency noise in the force observation more accurately and more easily, a Vondrak low pass filter is constructed based on the spectral analysis result. The genetic algorithms method is introduced to choose the smoothing factor in the Vondrak filter and the corresponding objective condition is built. The architecture of the proposed alignment method with the Vondrak low pass filter is shown. Furthermore, simulated experiments and actual experiments were performed to validate the new algorithm. The results indicate that, compared with the conventional alignment method, the Vondrak filter could eliminate the high frequency noise in the force observation and the proposed alignment method could improve the attitude accuracy. At the same time, only one parameter needs to be set, which makes the proposed method easier to implement than other low-pass filter methods.

  14. Measures of frontal plane lower limb alignment obtained from static radiographs and dynamic gait analysis.

    Science.gov (United States)

    Hunt, Michael A; Birmingham, Trevor B; Jenkyn, Thomas R; Giffin, J Robert; Jones, Ian C

    2008-05-01

    Currently, lower limb alignment is measured statically from radiographs that may not accurately represent the condition of the limb when moving and weight-bearing. Thus, the purpose of the present study was to introduce and examine a novel measure of dynamic lower limb alignment obtained during walking in patients with knee OA. In this cross-sectional study, standing, full-length lower limb radiographs were acquired from 80 individuals with confirmed knee OA, who also underwent three-dimensional gait analyses with reflective markers placed on the segments of the lower limb. Frontal plane lower limb alignment was measured using the static radiographs (mechanical axis) and gait analyses (marker-based alignment) by identifying the centres of the hip, knee, and ankle from both methods. Simple linear regression indicated these measures were highly correlated (r=0.84), however, 30% of the variance in the marker-based measure of lower limb alignment was not explained by the mechanical axis despite using the same anatomical landmarks. Results from this study suggest that a valid measure of dynamic lower limb alignment can be obtained from a standard quantitative gait analysis and highlight the differences in measures of lower limb alignment obtained in static and dynamic situations. Future research into the clinical utility of measures of dynamic alignment in the treatment of OA may aid in the development of interventions specifically tailored to one's dynamic lower limb biomechanics during gait.

  15. Automatic spreader-container alignment system using infrared structured lights.

    Science.gov (United States)

    Liu, Yu; Wang, Yibo; Lv, Jimin; Zhang, Maojun

    2012-06-01

    This paper presents a computer-vision system to assist reach stackers to automatically align the spreader with the target container. By analyzing infrared lines on the top of the container, the proposed system is able to calculate the relative position between the spreader and the container. The invisible structured lights are equipped in this system to enable all-weather operation, which can avoid environmental factors such as shadows and differences in climate. Additionally, the lateral inclination of the spreader is taken into consideration to offer a more accurate alignment than other competing systems. Estimation errors are reduced through approaches including power series and linear regression. The accuracy can be controlled within 2 cm or 2 deg, which meets the requirements of reach stackers' operation.

  16. Alignment of wave functions for angular momentum projection

    CERN Document Server

    Taniguchi, Yasutaka

    2016-01-01

    Angular momentum projection is used to obtain eigen states of angular momentum from general wave functions. Multi-configuration mixing calculation with angular momentum projection is an important microscopic method in nuclear physics. For accurate multi-configuration mixing calculation with angular momentum projection, concentrated distribution of $z$ components $K$ of angular momentum in the body-fixed frame ($K$-distribution) is favored. Orientation of wave functions strongly affects $K$-distribution. Minimization of variance of $\\hat{J}_z$ is proposed as an alignment method to obtain wave functions that have concentrated $K$-distribution. Benchmark calculations are performed for $\\alpha$-$^{24}$Mg cluster structure, triaxially superdeformed states in $^{40}$Ar, and Hartree-Fock states of some nuclei. The proposed alignment method is useful and works well for various wave functions to obtain concentrated $K$-distribution.

  17. GateKeeper: A New Hardware Architecture for Accelerating Pre-Alignment in DNA Short Read Mapping.

    Science.gov (United States)

    Alser, Mohammed; Hassan, Hasan; Xin, Hongyi; Ergin, Oguz; Mutlu, Onur; Alkan, Can

    2017-05-31

    High throughput DNA sequencing (HTS) technologies generate an excessive number of small DNA segments -called short reads - that cause significant computational burden. To analyze the entire genome, each of the billions of short reads must be mapped to a reference genome based on the similarity between a read and "candidate" locations in that reference genome. The similarity measurement, called alignment , formulated as an approximate string matching problem, is the computational bottleneck because: (1) it is implemented using quadratic-time dynamic programming algorithms, and (2) the majority of candidate locations in the reference genome do not align with a given read due to high dissimilarity. Calculating the alignment of such incorrect candidate locations consumes an overwhelming majority of a modern read mapper's execution time. Therefore, it is crucial to develop a fast and effective filter that can detect incorrect candidate locations and eliminate them before using computationally costly alignment operations. We propose GateKeeper, a new hardware accelerator that functions as a pre-alignment step that quickly filters out most incorrect candidate locations. GateKeeper is the first design to accelerate pre-alignment using Field-Programmable Gate Arrays (FPGAs), which can perform pre-alignment much faster than software. When implemented on a single FPGA chip, Gate-Keeper maintains high accuracy (on average >96%) while providing, on average, 90-fold and 130-fold speedup over the state-of-the-art software pre-alignment techniques, Adjacency Filter and Shift-ed Hamming Distance (SHD), respectively. The addition of GateKeeper as a pre-alignment can reduce the verification time of mrFAST mapper by a factor of 10. https://github.com/BilkentCompGen/GateKeeper . mohammedalser@bilkent.edu.tr , onur.mutlu@inf.ethz.ch , calkan@cs.bilkent.edu.tr . Supplementary data are available at Bioinformatics online .

  18. Aligning molecules with intense nonresonant laser fields

    DEFF Research Database (Denmark)

    Larsen, J.J.; Safvan, C.P.; Sakai, H.

    1999-01-01

    Molecules in a seeded supersonic beam are aligned by the interaction between an intense nonresonant linearly polarized laser field and the molecular polarizability. We demonstrate the general applicability of the scheme by aligning I2, ICl, CS2, CH3I, and C6H5I molecules. The alignment is probed...... by mass selective two dimensional imaging of the photofragment ions produced by femtosecond laser pulses. Calculations on the degree of alignment of I2 are in good agreement with the experiments. We discuss some future applications of laser aligned molecules....

  19. Subsonic Mechanical Alignment of Irregular Grains

    CERN Document Server

    Lazarian, Alex

    2007-01-01

    We show that grains can be efficiently aligned by interacting with a subsonic gaseous flow. The alignment arises from grains having irregularities that scatter atoms with different efficiency in the right and left directions. The grains tend to align with long axes perpendicular to magnetic field, which corresponds to Davis-Greenstein predictions, but does not involve magnetic field. For rather conservative factors characterizing the grain helicity and scattering efficiency of impinging atoms, the alignment of helical grains is much more efficient than the Gold-type alignment processes.

  20. Accurate ab initio spin densities

    CERN Document Server

    Boguslawski, Katharina; Legeza, Örs; Reiher, Markus

    2012-01-01

    We present an approach for the calculation of spin density distributions for molecules that require very large active spaces for a qualitatively correct description of their electronic structure. Our approach is based on the density-matrix renormalization group (DMRG) algorithm to calculate the spin density matrix elements as basic quantity for the spatially resolved spin density distribution. The spin density matrix elements are directly determined from the second-quantized elementary operators optimized by the DMRG algorithm. As an analytic convergence criterion for the spin density distribution, we employ our recently developed sampling-reconstruction scheme [J. Chem. Phys. 2011, 134, 224101] to build an accurate complete-active-space configuration-interaction (CASCI) wave function from the optimized matrix product states. The spin density matrix elements can then also be determined as an expectation value employing the reconstructed wave function expansion. Furthermore, the explicit reconstruction of a CA...

  1. The Accurate Particle Tracer Code

    CERN Document Server

    Wang, Yulei; Qin, Hong; Yu, Zhi

    2016-01-01

    The Accurate Particle Tracer (APT) code is designed for large-scale particle simulations on dynamical systems. Based on a large variety of advanced geometric algorithms, APT possesses long-term numerical accuracy and stability, which are critical for solving multi-scale and non-linear problems. Under the well-designed integrated and modularized framework, APT serves as a universal platform for researchers from different fields, such as plasma physics, accelerator physics, space science, fusion energy research, computational mathematics, software engineering, and high-performance computation. The APT code consists of seven main modules, including the I/O module, the initialization module, the particle pusher module, the parallelization module, the field configuration module, the external force-field module, and the extendible module. The I/O module, supported by Lua and Hdf5 projects, provides a user-friendly interface for both numerical simulation and data analysis. A series of new geometric numerical methods...

  2. Accurate Modeling of Advanced Reflectarrays

    DEFF Research Database (Denmark)

    Zhou, Min

    Analysis and optimization methods for the design of advanced printed re ectarrays have been investigated, and the study is focused on developing an accurate and efficient simulation tool. For the analysis, a good compromise between accuracy and efficiency can be obtained using the spectral domain...... to the POT. The GDOT can optimize for the size as well as the orientation and position of arbitrarily shaped array elements. Both co- and cross-polar radiation can be optimized for multiple frequencies, dual polarization, and several feed illuminations. Several contoured beam reflectarrays have been designed...... using the GDOT to demonstrate its capabilities. To verify the accuracy of the GDOT, two offset contoured beam reflectarrays that radiate a high-gain beam on a European coverage have been designed and manufactured, and subsequently measured at the DTU-ESA Spherical Near-Field Antenna Test Facility...

  3. Accurate thickness measurement of graphene

    Science.gov (United States)

    Shearer, Cameron J.; Slattery, Ashley D.; Stapleton, Andrew J.; Shapter, Joseph G.; Gibson, Christopher T.

    2016-03-01

    Graphene has emerged as a material with a vast variety of applications. The electronic, optical and mechanical properties of graphene are strongly influenced by the number of layers present in a sample. As a result, the dimensional characterization of graphene films is crucial, especially with the continued development of new synthesis methods and applications. A number of techniques exist to determine the thickness of graphene films including optical contrast, Raman scattering and scanning probe microscopy techniques. Atomic force microscopy (AFM), in particular, is used extensively since it provides three-dimensional images that enable the measurement of the lateral dimensions of graphene films as well as the thickness, and by extension the number of layers present. However, in the literature AFM has proven to be inaccurate with a wide range of measured values for single layer graphene thickness reported (between 0.4 and 1.7 nm). This discrepancy has been attributed to tip-surface interactions, image feedback settings and surface chemistry. In this work, we use standard and carbon nanotube modified AFM probes and a relatively new AFM imaging mode known as PeakForce tapping mode to establish a protocol that will allow users to accurately determine the thickness of graphene films. In particular, the error in measuring the first layer is reduced from 0.1-1.3 nm to 0.1-0.3 nm. Furthermore, in the process we establish that the graphene-substrate adsorbate layer and imaging force, in particular the pressure the tip exerts on the surface, are crucial components in the accurate measurement of graphene using AFM. These findings can be applied to other 2D materials.

  4. Accurate thickness measurement of graphene.

    Science.gov (United States)

    Shearer, Cameron J; Slattery, Ashley D; Stapleton, Andrew J; Shapter, Joseph G; Gibson, Christopher T

    2016-03-29

    Graphene has emerged as a material with a vast variety of applications. The electronic, optical and mechanical properties of graphene are strongly influenced by the number of layers present in a sample. As a result, the dimensional characterization of graphene films is crucial, especially with the continued development of new synthesis methods and applications. A number of techniques exist to determine the thickness of graphene films including optical contrast, Raman scattering and scanning probe microscopy techniques. Atomic force microscopy (AFM), in particular, is used extensively since it provides three-dimensional images that enable the measurement of the lateral dimensions of graphene films as well as the thickness, and by extension the number of layers present. However, in the literature AFM has proven to be inaccurate with a wide range of measured values for single layer graphene thickness reported (between 0.4 and 1.7 nm). This discrepancy has been attributed to tip-surface interactions, image feedback settings and surface chemistry. In this work, we use standard and carbon nanotube modified AFM probes and a relatively new AFM imaging mode known as PeakForce tapping mode to establish a protocol that will allow users to accurately determine the thickness of graphene films. In particular, the error in measuring the first layer is reduced from 0.1-1.3 nm to 0.1-0.3 nm. Furthermore, in the process we establish that the graphene-substrate adsorbate layer and imaging force, in particular the pressure the tip exerts on the surface, are crucial components in the accurate measurement of graphene using AFM. These findings can be applied to other 2D materials.

  5. Prediction of MHC class II binding affinity using SMM-align, a novel stabilization matrix alignment method

    Directory of Open Access Journals (Sweden)

    Lund Ole

    2007-07-01

    Full Text Available Abstract Background Antigen presenting cells (APCs sample the extra cellular space and present peptides from here to T helper cells, which can be activated if the peptides are of foreign origin. The peptides are presented on the surface of the cells in complex with major histocompatibility class II (MHC II molecules. Identification of peptides that bind MHC II molecules is thus a key step in rational vaccine design and developing methods for accurate prediction of the peptide:MHC interactions play a central role in epitope discovery. The MHC class II binding groove is open at both ends making the correct alignment of a peptide in the binding groove a crucial part of identifying the core of an MHC class II binding motif. Here, we present a novel stabilization matrix alignment method, SMM-align, that allows for direct prediction of peptide:MHC binding affinities. The predictive performance of the method is validated on a large MHC class II benchmark data set covering 14 HLA-DR (human MHC and three mouse H2-IA alleles. Results The predictive performance of the SMM-align method was demonstrated to be superior to that of the Gibbs sampler, TEPITOPE, SVRMHC, and MHCpred methods. Cross validation between peptide data set obtained from different sources demonstrated that direct incorporation of peptide length potentially results in over-fitting of the binding prediction method. Focusing on amino terminal peptide flanking residues (PFR, we demonstrate a consistent gain in predictive performance by favoring binding registers with a minimum PFR length of two amino acids. Visualizing the binding motif as obtained by the SMM-align and TEPITOPE methods highlights a series of fundamental discrepancies between the two predicted motifs. For the DRB1*1302 allele for instance, the TEPITOPE method favors basic amino acids at most anchor positions, whereas the SMM-align method identifies a preference for hydrophobic or neutral amino acids at the anchors. Conclusion

  6. RevTrans: multiple alignment of coding DNA from aligned amino acid sequences

    DEFF Research Database (Denmark)

    Wernersson, Rasmus; Pedersen, Anders Gorm

    2003-01-01

    The simple fact that proteins are built from 20 amino acids while DNA only contains four different bases, means that the 'signal-to-noise ratio' in protein sequence alignments is much better than in alignments of DNA. Besides this information-theoretical advantage, protein alignments also benefit...... proteins. It is therefore preferable to align coding DNA at the amino acid level and it is for this purpose we have constructed the program RevTrans. RevTrans constructs a multiple DNA alignment by: (i) translating the DNA; (ii) aligning the resulting peptide sequences; and (iii) building a multiple DNA...... alignment by 'reverse translation' of the aligned protein sequences. In the resulting DNA alignment, gaps occur in groups of three corresponding to entire codons, and analogous codon positions are therefore always lined up. These features are useful when constructing multiple DNA alignments for phylogenetic...

  7. Alignment of Short Reads: A Crucial Step for Application of Next-Generation Sequencing Data in Precision Medicine

    Directory of Open Access Journals (Sweden)

    Hao Ye

    2015-11-01

    Full Text Available Precision medicine or personalized medicine has been proposed as a modernized and promising medical strategy. Genetic variants of patients are the key information for implementation of precision medicine. Next-generation sequencing (NGS is an emerging technology for deciphering genetic variants. Alignment of raw reads to a reference genome is one of the key steps in NGS data analysis. Many algorithms have been developed for alignment of short read sequences since 2008. Users have to make a decision on which alignment algorithm to use in their studies. Selection of the right alignment algorithm determines not only the alignment algorithm but also the set of suitable parameters to be used by the algorithm. Understanding these algorithms helps in selecting the appropriate alignment algorithm for different applications in precision medicine. Here, we review current available algorithms and their major strategies such as seed-and-extend and q-gram filter. We also discuss the challenges in current alignment algorithms, including alignment in multiple repeated regions, long reads alignment and alignment facilitated with known genetic variants.

  8. De novo assembly of a haplotype-resolved human genome.

    Science.gov (United States)

    Cao, Hongzhi; Wu, Honglong; Luo, Ruibang; Huang, Shujia; Sun, Yuhui; Tong, Xin; Xie, Yinlong; Liu, Binghang; Yang, Hailong; Zheng, Hancheng; Li, Jian; Li, Bo; Wang, Yu; Yang, Fang; Sun, Peng; Liu, Siyang; Gao, Peng; Huang, Haodong; Sun, Jing; Chen, Dan; He, Guangzhu; Huang, Weihua; Huang, Zheng; Li, Yue; Tellier, Laurent C A M; Liu, Xiao; Feng, Qiang; Xu, Xun; Zhang, Xiuqing; Bolund, Lars; Krogh, Anders; Kristiansen, Karsten; Drmanac, Radoje; Drmanac, Snezana; Nielsen, Rasmus; Li, Songgang; Wang, Jian; Yang, Huanming; Li, Yingrui; Wong, Gane Ka-Shu; Wang, Jun

    2015-06-01

    The human genome is diploid, and knowledge of the variants on each chromosome is important for the interpretation of genomic information. Here we report the assembly of a haplotype-resolved diploid genome without using a reference genome. Our pipeline relies on fosmid pooling together with whole-genome shotgun strategies, based solely on next-generation sequencing and hierarchical assembly methods. We applied our sequencing method to the genome of an Asian individual and generated a 5.15-Gb assembled genome with a haplotype N50 of 484 kb. Our analysis identified previously undetected indels and 7.49 Mb of novel coding sequences that could not be aligned to the human reference genome, which include at least six predicted genes. This haplotype-resolved genome represents the most complete de novo human genome assembly to date. Application of our approach to identify individual haplotype differences should aid in translating genotypes to phenotypes for the development of personalized medicine.

  9. De novo assembly of a haplotype-resolved human genome

    DEFF Research Database (Denmark)

    Cao, Hongzhi; Wu, Honglong; Luo, Ruibang

    2015-01-01

    The human genome is diploid, and knowledge of the variants on each chromosome is important for the interpretation of genomic information. Here we report the assembly of a haplotype-resolved diploid genome without using a reference genome. Our pipeline relies on fosmid pooling together with whole-genome...... of novel coding sequences that could not be aligned to the human reference genome, which include at least six predicted genes. This haplotype-resolved genome represents the most complete de novo human genome assembly to date. Application of our approach to identify individual haplotype differences should...... shotgun strategies, based solely on next-generation sequencing and hierarchical assembly methods. We applied our sequencing method to the genome of an Asian individual and generated a 5.15-Gb assembled genome with a haplotype N50 of 484 kb. Our analysis identified previously undetected indels and 7.49 Mb...

  10. Galaxy alignments: Theory, modelling and simulations

    CERN Document Server

    Kiessling, Alina; Joachimi, Benjamin; Kirk, Donnacha; Kitching, Thomas D; Leonard, Adrienne; Mandelbaum, Rachel; Schäfer, Björn Malte; Sifón, Cristóbal; Brown, Michael L; Rassat, Anais

    2015-01-01

    The shapes of galaxies are not randomly oriented on the sky. During the galaxy formation and evolution process, environment has a strong influence, as tidal gravitational fields in large-scale structure tend to align the shapes and angular momenta of nearby galaxies. Additionally, events such as galaxy mergers affect the relative alignments of galaxies throughout their history. These "intrinsic galaxy alignments" are known to exist, but are still poorly understood. This review will offer a pedagogical introduction to the current theories that describe intrinsic galaxy alignments, including the apparent difference in intrinsic alignment between early- and late-type galaxies and the latest efforts to model them analytically. It will then describe the ongoing efforts to simulate intrinsic alignments using both $N$-body and hydrodynamic simulations. Due to the relative youth of this field, there is still much to be done to understand intrinsic galaxy alignments and this review summarises the current state of the ...

  11. FOGSAA: Fast Optimal Global Sequence Alignment Algorithm

    Science.gov (United States)

    Chakraborty, Angana; Bandyopadhyay, Sanghamitra

    2013-04-01

    In this article we propose a Fast Optimal Global Sequence Alignment Algorithm, FOGSAA, which aligns a pair of nucleotide/protein sequences faster than any optimal global alignment method including the widely used Needleman-Wunsch (NW) algorithm. FOGSAA is applicable for all types of sequences, with any scoring scheme, and with or without affine gap penalty. Compared to NW, FOGSAA achieves a time gain of (70-90)% for highly similar nucleotide sequences (> 80% similarity), and (54-70)% for sequences having (30-80)% similarity. For other sequences, it terminates with an approximate score. For protein sequences, the average time gain is between (25-40)%. Compared to three heuristic global alignment methods, the quality of alignment is improved by about 23%-53%. FOGSAA is, in general, suitable for aligning any two sequences defined over a finite alphabet set, where the quality of the global alignment is of supreme importance.

  12. Aligning Sequences by Minimum Description Length

    Directory of Open Access Journals (Sweden)

    John S. Conery

    2008-01-01

    Full Text Available This paper presents a new information theoretic framework for aligning sequences in bioinformatics. A transmitter compresses a set of sequences by constructing a regular expression that describes the regions of similarity in the sequences. To retrieve the original set of sequences, a receiver generates all strings that match the expression. An alignment algorithm uses minimum description length to encode and explore alternative expressions; the expression with the shortest encoding provides the best overall alignment. When two substrings contain letters that are similar according to a substitution matrix, a code length function based on conditional probabilities defined by the matrix will encode the substrings with fewer bits. In one experiment, alignments produced with this new method were found to be comparable to alignments from CLUSTALW. A second experiment measured the accuracy of the new method on pairwise alignments of sequences from the BAliBASE alignment benchmark.

  13. Pupil Alignment Measuring Technique and Alignment Reference for Instruments or Optical Systems

    Science.gov (United States)

    Hagopian, John G.

    2010-01-01

    A technique was created to measure the pupil alignment of instruments in situ by measuring calibrated pupil alignment references (PARs) in instruments. The PAR can also be measured using an alignment telescope or an imaging system. PAR allows the verification of the science instrument (SI) pupil alignment at the integrated science instrument module (ISIM) level of assembly at ambient and cryogenic operating temperature. This will allow verification of the ISIM+SI alignment, and provide feedback to realign the SI if necessary.

  14. MEANS FOR DETERMINING CENTRIFUGE ALIGNMENT

    Science.gov (United States)

    Smith, W.Q.

    1958-08-26

    An apparatus is presented for remotely determining the alignment of a centrifuge. The centrifage shaft is provided with a shoulder, upon which two followers ride, one for detecting radial movements, and one upon the shoulder face for determining the axial motion. The followers are attached to separate liquid filled bellows, and a tube connects each bellows to its respective indicating gage at a remote location. Vibrations produced by misalignment of the centrifuge shaft are transmitted to the bellows, and tbence through the tubing to the indicator gage. This apparatus is particularly useful for operation in a hot cell where the materials handled are dangerous to the operating personnel.

  15. Aligned interactions in cosmic rays

    Energy Technology Data Exchange (ETDEWEB)

    Kempa, J., E-mail: kempa@pw.plock.pl [Warsaw University of Technology Branch Plock (Poland)

    2015-12-15

    The first clean Centauro was found in cosmic rays years many ago at Mt Chacaltaya experiment. Since that time, many people have tried to find this type of interaction, both in cosmic rays and at accelerators. But no one has found a clean cases of this type of interaction.It happened finally in the last exposure of emulsion at Mt Chacaltaya where the second clean Centauro has been found. The experimental data for both the Centauros and STRANA will be presented and discussed in this paper. We also present our comments to the intriguing question of the existence of a type of nuclear interactions at high energy with alignment.

  16. Arioc: high-throughput read alignment with GPU-accelerated exploration of the seed-and-extend search space

    Directory of Open Access Journals (Sweden)

    Richard Wilton

    2015-03-01

    Full Text Available When computing alignments of DNA sequences to a large genome, a key element in achieving high processing throughput is to prioritize locations in the genome where high-scoring mappings might be expected. We formulated this task as a series of list-processing operations that can be efficiently performed on graphics processing unit (GPU hardware.We followed this approach in implementing a read aligner called Arioc that uses GPU-based parallel sort and reduction techniques to identify high-priority locations where potential alignments may be found. We then carried out a read-by-read comparison of Arioc’s reported alignments with the alignments found by several leading read aligners. With simulated reads, Arioc has comparable or better accuracy than the other read aligners we tested. With human sequencing reads, Arioc demonstrates significantly greater throughput than the other aligners we evaluated across a wide range of sensitivity settings. The Arioc software is available at https://github.com/RWilton/Arioc. It is released under a BSD open-source license.

  17. Arioc: high-throughput read alignment with GPU-accelerated exploration of the seed-and-extend search space.

    Science.gov (United States)

    Wilton, Richard; Budavari, Tamas; Langmead, Ben; Wheelan, Sarah J; Salzberg, Steven L; Szalay, Alexander S

    2015-01-01

    When computing alignments of DNA sequences to a large genome, a key element in achieving high processing throughput is to prioritize locations in the genome where high-scoring mappings might be expected. We formulated this task as a series of list-processing operations that can be efficiently performed on graphics processing unit (GPU) hardware.We followed this approach in implementing a read aligner called Arioc that uses GPU-based parallel sort and reduction techniques to identify high-priority locations where potential alignments may be found. We then carried out a read-by-read comparison of Arioc's reported alignments with the alignments found by several leading read aligners. With simulated reads, Arioc has comparable or better accuracy than the other read aligners we tested. With human sequencing reads, Arioc demonstrates significantly greater throughput than the other aligners we evaluated across a wide range of sensitivity settings. The Arioc software is available at https://github.com/RWilton/Arioc. It is released under a BSD open-source license.

  18. Multiple alignment analysis on phylogenetic tree of the spread of SARS epidemic using distance method

    Science.gov (United States)

    Amiroch, S.; Pradana, M. S.; Irawan, M. I.; Mukhlash, I.

    2017-09-01

    Multiple Alignment (MA) is a particularly important tool for studying the viral genome and determine the evolutionary process of the specific virus. Application of MA in the case of the spread of the Severe acute respiratory syndrome (SARS) epidemic is an interesting thing because this virus epidemic a few years ago spread so quickly that medical attention in many countries. Although there has been a lot of software to process multiple sequences, but the use of pairwise alignment to process MA is very important to consider. In previous research, the alignment between the sequences to process MA algorithm, Super Pairwise Alignment, but in this study used a dynamic programming algorithm Needleman wunchs simulated in Matlab. From the analysis of MA obtained and stable region and unstable which indicates the position where the mutation occurs, the system network topology that produced the phylogenetic tree of the SARS epidemic distance method, and system area networks mutation.

  19. SOAP3: ultra-fast GPU-based parallel alignment tool for short reads.

    Science.gov (United States)

    Liu, Chi-Man; Wong, Thomas; Wu, Edward; Luo, Ruibang; Yiu, Siu-Ming; Li, Yingrui; Wang, Bingqiang; Yu, Chang; Chu, Xiaowen; Zhao, Kaiyong; Li, Ruiqiang; Lam, Tak-Wah

    2012-03-15

    SOAP3 is the first short read alignment tool that leverages the multi-processors in a graphic processing unit (GPU) to achieve a drastic improvement in speed. We adapted the compressed full-text index (BWT) used by SOAP2 in view of the advantages and disadvantages of GPU. When tested with millions of Illumina Hiseq 2000 length-100 bp reads, SOAP3 takes < 30 s to align a million read pairs onto the human reference genome and is at least 7.5 and 20 times faster than BWA and Bowtie, respectively. For aligning reads with up to four mismatches, SOAP3 aligns slightly more reads than BWA and Bowtie; this is because SOAP3, unlike BWA and Bowtie, is not heuristic-based and always reports all answers.

  20. Automated quantification of aligned collagen for human breast carcinoma prognosis

    Directory of Open Access Journals (Sweden)

    Jeremy S Bredfeldt

    2014-01-01

    Full Text Available Background: Mortality in cancer patients is directly attributable to the ability of cancer cells to metastasize to distant sites from the primary tumor. This migration of tumor cells begins with a remodeling of the local tumor microenvironment, including changes to the extracellular matrix and the recruitment of stromal cells, both of which facilitate invasion of tumor cells into the bloodstream. In breast cancer, it has been proposed that the alignment of collagen fibers surrounding tumor epithelial cells can serve as a quantitative image-based biomarker for survival of invasive ductal carcinoma patients. Specific types of collagen alignment have been identified for their prognostic value and now these tumor associated collagen signatures (TACS are central to several clinical specimen imaging trials. Here, we implement the semi-automated acquisition and analysis of this TACS candidate biomarker and demonstrate a protocol that will allow consistent scoring to be performed throughout large patient cohorts. Methods: Using large field of view high resolution microscopy techniques, image processing and supervised learning methods, we are able to quantify and score features of collagen fiber alignment with respect to adjacent tumor-stromal boundaries. Results: Our semi-automated technique produced scores that have statistically significant correlation with scores generated by a panel of three human observers. In addition, our system generated classification scores that accurately predicted survival in a cohort of 196 breast cancer patients. Feature rank analysis reveals that TACS positive fibers are more well-aligned with each other, are of generally lower density, and terminate within or near groups of epithelial cells at larger angles of interaction. Conclusion: These results demonstrate the utility of a supervised learning protocol for streamlining the analysis of collagen alignment with respect to tumor stromal boundaries.

  1. Classifying Genomic Sequences by Sequence Feature Analysis

    Institute of Scientific and Technical Information of China (English)

    Zhi-Hua Liu; Dian Jiao; Xiao Sun

    2005-01-01

    Traditional sequence analysis depends on sequence alignment. In this study, we analyzed various functional regions of the human genome based on sequence features, including word frequency, dinucleotide relative abundance, and base-base correlation. We analyzed the human chromosome 22 and classified the upstream,exon, intron, downstream, and intergenic regions by principal component analysis and discriminant analysis of these features. The results show that we could classify the functional regions of genome based on sequence feature and discriminant analysis.

  2. Velocity-aligned Doppler spectroscopy

    Energy Technology Data Exchange (ETDEWEB)

    Xu, Z.; Koplitz, B.; Wittig, C.

    1989-03-01

    The technique of velocity-aligned Doppler spectrosocopy (VADS) is presented and discussed. For photolysis/probe experiments with pulsed initiation, VADS can yield Doppler profiles for nascent photofragments that allow detailed center-of-mass (c.m.) kinetic energy distributions to be extracted. When compared with traditional forms of Doppler spectroscopy, the improvement in kinetic energy resolution is dramatic. Changes in the measured profiles are a consequence of spatial discrimination (i.e., focused and overlapping photolysis and probe beams) and delayed observation. These factors result in the selective detection of species whose velocities are aligned with the wave vector of the probe radiation k/sub pr/, thus revealing the speed distribution along k/sub pr/ rather than the distribution of nascent velocity components projected upon this direction. Mathematical details of the procedure used to model VADS are given, and experimental illustrations for HI, H/sub 2/S, and NH/sub 3/ photodissociation are presented. In these examples, pulsed photodissociation produces H atoms that are detected by sequential two-photon, two-frequency ionization via Lyman-..cap alpha.. with a pulsed laser (121.6+364.7 nm), and measuring the Lyman-..cap alpha.. Doppler profile as a function of probe delay reveals both internal and c.m. kinetic energy distributions for the photofragments. Strengths and weaknesses of VADS as a tool for investigating photofragmentation phenomena are also discussed.

  3. Cancer genomics

    DEFF Research Database (Denmark)

    Norrild, Bodil; Guldberg, Per; Ralfkiær, Elisabeth Methner

    2007-01-01

    Almost all cells in the human body contain a complete copy of the genome with an estimated number of 25,000 genes. The sequences of these genes make up about three percent of the genome and comprise the inherited set of genetic information. The genome also contains information that determines whe...

  4. The oryza map alignment project: the golden path to unlocking the genetic potential of wild rice species.

    Science.gov (United States)

    Wing, Rod A; Ammiraju, Jetty S S; Luo, Meizhong; Kim, Hyeran; Yu, Yeisoo; Kudrna, Dave; Goicoechea, Jose L; Wang, Wenming; Nelson, Will; Rao, Kiran; Brar, Darshan; Mackill, Dave J; Han, Bin; Soderlund, Cari; Stein, Lincoln; SanMiguel, Phillip; Jackson, Scott

    2005-09-01

    The wild species of the genus Oryza offer enormous potential to make a significant impact on agricultural productivity of the cultivated rice species Oryza sativa and Oryza glaberrima. To unlock the genetic potential of wild rice we have initiated a project entitled the 'Oryza Map Alignment Project' (OMAP) with the ultimate goal of constructing and aligning BAC/STC based physical maps of 11 wild and one cultivated rice species to the International Rice Genome Sequencing Project's finished reference genome--O. sativa ssp. japonica c. v. Nipponbare. The 11 wild rice species comprise nine different genome types and include six diploid genomes (AA, BB, CC, EE, FF and GG) and four tetrapliod genomes (BBCC, CCDD, HHKK and HHJJ) with broad geographical distribution and ecological adaptation. In this paper we describe our strategy to construct robust physical maps of all 12 rice species with an emphasis on the AA diploid O. nivara--thought to be the progenitor of modern cultivated rice.

  5. Automated alignment-based curation of gene models in filamentous fungi

    OpenAIRE

    2014-01-01

    Background Automated gene-calling is still an error-prone process, particularly for the highly plastic genomes of fungal species. Improvement through quality control and manual curation of gene models is a time-consuming process that requires skilled biologists and is only marginally performed. The wealth of available fungal genomes has not yet been exploited by an automated method that applies quality control of gene models in order to obtain more accurate genome annotations. Results We prov...

  6. Genomic taxonomy of vibrios

    Directory of Open Access Journals (Sweden)

    Iida Tetsuya

    2009-10-01

    Full Text Available Abstract Background Vibrio taxonomy has been based on a polyphasic approach. In this study, we retrieve useful taxonomic information (i.e. data that can be used to distinguish different taxonomic levels, such as species and genera from 32 genome sequences of different vibrio species. We use a variety of tools to explore the taxonomic relationship between the sequenced genomes, including Multilocus Sequence Analysis (MLSA, supertrees, Average Amino Acid Identity (AAI, genomic signatures, and Genome BLAST atlases. Our aim is to analyse the usefulness of these tools for species identification in vibrios. Results We have generated four new genome sequences of three Vibrio species, i.e., V. alginolyticus 40B, V. harveyi-like 1DA3, and V. mimicus strains VM573 and VM603, and present a broad analyses of these genomes along with other sequenced Vibrio species. The genome atlas and pangenome plots provide a tantalizing image of the genomic differences that occur between closely related sister species, e.g. V. cholerae and V. mimicus. The vibrio pangenome contains around 26504 genes. The V. cholerae core genome and pangenome consist of 1520 and 6923 genes, respectively. Pangenomes might allow different strains of V. cholerae to occupy different niches. MLSA and supertree analyses resulted in a similar phylogenetic picture, with a clear distinction of four groups (Vibrio core group, V. cholerae-V. mimicus, Aliivibrio spp., and Photobacterium spp.. A Vibrio species is defined as a group of strains that share > 95% DNA identity in MLSA and supertree analysis, > 96% AAI, ≤ 10 genome signature dissimilarity, and > 61% proteome identity. Strains of the same species and species of the same genus will form monophyletic groups on the basis of MLSA and supertree. Conclusion The combination of different analytical and bioinformatics tools will enable the most accurate species identification through genomic computational analysis. This endeavour will culminate in

  7. A More Accurate Fourier Transform

    CERN Document Server

    Courtney, Elya

    2015-01-01

    Fourier transform methods are used to analyze functions and data sets to provide frequencies, amplitudes, and phases of underlying oscillatory components. Fast Fourier transform (FFT) methods offer speed advantages over evaluation of explicit integrals (EI) that define Fourier transforms. This paper compares frequency, amplitude, and phase accuracy of the two methods for well resolved peaks over a wide array of data sets including cosine series with and without random noise and a variety of physical data sets, including atmospheric $\\mathrm{CO_2}$ concentrations, tides, temperatures, sound waveforms, and atomic spectra. The FFT uses MIT's FFTW3 library. The EI method uses the rectangle method to compute the areas under the curve via complex math. Results support the hypothesis that EI methods are more accurate than FFT methods. Errors range from 5 to 10 times higher when determining peak frequency by FFT, 1.4 to 60 times higher for peak amplitude, and 6 to 10 times higher for phase under a peak. The ability t...

  8. Fast and accurate marker-based projective registration method for uncalibrated transmission electron microscope tilt series.

    Science.gov (United States)

    Lee, Ho; Lee, Jeongjin; Shin, Yeong Gil; Lee, Rena; Xing, Lei

    2010-06-21

    This paper presents a fast and accurate marker-based automatic registration technique for aligning uncalibrated projections taken from a transmission electron microscope (TEM) with different tilt angles and orientations. Most of the existing TEM image alignment methods estimate the similarity between images using the projection model with least-squares metric and guess alignment parameters by computationally expensive nonlinear optimization schemes. Approaches based on the least-squares metric which is sensitive to outliers may cause misalignment since automatic tracking methods, though reliable, can produce a few incorrect trajectories due to a large number of marker points. To decrease the influence of outliers, we propose a robust similarity measure using the projection model with a Gaussian weighting function. This function is very effective in suppressing outliers that are far from correct trajectories and thus provides a more robust metric. In addition, we suggest a fast search strategy based on the non-gradient Powell's multidimensional optimization scheme to speed up optimization as only meaningful parameters are considered during iterative projection model estimation. Experimental results show that our method brings more accurate alignment with less computational cost compared to conventional automatic alignment methods.

  9. Galaxy alignment on large and small scales

    Science.gov (United States)

    Kang, X.; Lin, W. P.; Dong, X.; Wang, Y. O.; Dutton, A.; Macciò, A.

    2016-10-01

    Galaxies are not randomly distributed across the universe but showing different kinds of alignment on different scales. On small scales satellite galaxies have a tendency to distribute along the major axis of the central galaxy, with dependence on galaxy properties that both red satellites and centrals have stronger alignment than their blue counterparts. On large scales, it is found that the major axes of Luminous Red Galaxies (LRGs) have correlation up to 30Mpc/h. Using hydro-dynamical simulation with star formation, we investigate the origin of galaxy alignment on different scales. It is found that most red satellite galaxies stay in the inner region of dark matter halo inside which the shape of central galaxy is well aligned with the dark matter distribution. Red centrals have stronger alignment than blue ones as they live in massive haloes and the central galaxy-halo alignment increases with halo mass. On large scales, the alignment of LRGs is also from the galaxy-halo shape correlation, but with some extent of mis-alignment. The massive haloes have stronger alignment than haloes in filament which connect massive haloes. This is contrary to the naive expectation that cosmic filament is the cause of halo alignment.

  10. SeqMule: automated pipeline for analysis of human exome/genome sequencing data.

    Science.gov (United States)

    Guo, Yunfei; Ding, Xiaolei; Shen, Yufeng; Lyon, Gholson J; Wang, Kai

    2015-09-18

    Next-generation sequencing (NGS) technology has greatly helped us identify disease-contributory variants for Mendelian diseases. However, users are often faced with issues such as software compatibility, complicated configuration, and no access to high-performance computing facility. Discrepancies exist among aligners and variant callers. We developed a computational pipeline, SeqMule, to perform automated variant calling from NGS data on human genomes and exomes. SeqMule integrates computational-cluster-free parallelization capability built on top of the variant callers, and facilitates normalization/intersection of variant calls to generate consensus set with high confidence. SeqMule integrates 5 alignment tools, 5 variant calling algorithms and accepts various combinations all by one-line command, therefore allowing highly flexible yet fully automated variant calling. In a modern machine (2 Intel Xeon X5650 CPUs, 48 GB memory), when fast turn-around is needed, SeqMule generates annotated VCF files in a day from a 30X whole-genome sequencing data set; when more accurate calling is needed, SeqMule generates consensus call set that improves over single callers, as measured by both Mendelian error rate and consistency. SeqMule supports Sun Grid Engine for parallel processing, offers turn-key solution for deployment on Amazon Web Services, allows quality check, Mendelian error check, consistency evaluation, HTML-based reports. SeqMule is available at http://seqmule.openbioinformatics.org.

  11. BrucellaBase: Genome information resource.

    Science.gov (United States)

    Sankarasubramanian, Jagadesan; Vishnu, Udayakumar S; Khader, L K M Abdul; Sridhar, Jayavel; Gunasekaran, Paramasamy; Rajendhran, Jeyaprakash

    2016-09-01

    Brucella sp. causes a major zoonotic disease, brucellosis. Brucella belongs to the family Brucellaceae under the order Rhizobiales of Alphaproteobacteria. We present BrucellaBase, a web-based platform, providing features of a genome database together with unique analysis tools. We have developed a web version of the multilocus sequence typing (MLST) (Whatmore et al., 2007) and phylogenetic analysis of Brucella spp. BrucellaBase currently contains genome data of 510 Brucella strains along with the user interfaces for BLAST, VFDB, CARD, pairwise genome alignment and MLST typing. Availability of these tools will enable the researchers interested in Brucella to get meaningful information from Brucella genome sequences. BrucellaBase will regularly be updated with new genome sequences, new features along with improvements in genome annotations. BrucellaBase is available online at http://www.dbtbrucellosis.in/brucellabase.html or http://59.99.226.203/brucellabase/homepage.html.

  12. Redshift and luminosity evolution of the intrinsic alignments of galaxies in Horizon-AGN

    Science.gov (United States)

    Chisari, N.; Laigle, C.; Codis, S.; Dubois, Y.; Devriendt, J.; Miller, L.; Benabed, K.; Slyz, A.; Gavazzi, R.; Pichon, C.

    2016-09-01

    Intrinsic galaxy shape and angular momentum alignments can arise in cosmological large-scale structure due to tidal interactions or galaxy formation processes. Cosmological hydrodynamical simulations have recently come of age as a tool to study these alignments and their contamination to weak gravitational lensing. We probe the redshift and luminosity evolution of intrinsic alignments in Horizon-AGN between z = 0 and 3 for galaxies with an r-band absolute magnitude of Mr ≤ -20. Alignments transition from being radial at low redshifts and high luminosities, dominated by the contribution of ellipticals, to being tangential at high redshift and low luminosities, where discs dominate the signal. This cannot be explained by the evolution of the fraction of ellipticals and discs alone: intrinsic evolution in the amplitude of alignments is necessary. The alignment amplitude of elliptical galaxies alone is smaller in amplitude by a factor of ≃2, but has similar luminosity and redshift evolution as in current observations and in the non-linear tidal alignment model at projected separations of ≳1 Mpc. Alignments of discs are null in projection and consistent with current low-redshift observations. The combination of the two populations yields an overall amplitude a factor of ≃4 lower than observed alignments of luminous red galaxies with a steeper luminosity dependence. The restriction on accurate galaxy shapes implies that the galaxy population in the simulation is complete only to Mr ≤ -20. Higher resolution simulations will be necessary to avoid extrapolation of the intrinsic alignment predictions to the range of luminosities probed by future surveys.

  13. MACSIMS : multiple alignment of complete sequences information management system

    Directory of Open Access Journals (Sweden)

    Plewniak Frédéric

    2006-06-01

    Full Text Available Abstract Background In the post-genomic era, systems-level studies are being performed that seek to explain complex biological systems by integrating diverse resources from fields such as genomics, proteomics or transcriptomics. New information management systems are now needed for the collection, validation and analysis of the vast amount of heterogeneous data available. Multiple alignments of complete sequences provide an ideal environment for the integration of this information in the context of the protein family. Results MACSIMS is a multiple alignment-based information management program that combines the advantages of both knowledge-based and ab initio sequence analysis methods. Structural and functional information is retrieved automatically from the public databases. In the multiple alignment, homologous regions are identified and the retrieved data is evaluated and propagated from known to unknown sequences with these reliable regions. In a large-scale evaluation, the specificity of the propagated sequence features is estimated to be >99%, i.e. very few false positive predictions are made. MACSIMS is then used to characterise mutations in a test set of 100 proteins that are known to be involved in human genetic diseases. The number of sequence features associated with these proteins was increased by 60%, compared to the features available in the public databases. An XML format output file allows automatic parsing of the MACSIM results, while a graphical display using the JalView program allows manual analysis. Conclusion MACSIMS is a new information management system that incorporates detailed analyses of protein families at the structural, functional and evolutionary levels. MACSIMS thus provides a unique environment that facilitates knowledge extraction and the presentation of the most pertinent information to the biologist. A web server and the source code are available at http://bips.u-strasbg.fr/MACSIMS/.

  14. Meeting Report: Hackathon-Workshop on Darwin Core and MIxS Standards Alignment (February 2012).

    Science.gov (United States)

    Tuama, Eamonn Ó; Deck, John; Dröge, Gabriel; Döring, Markus; Field, Dawn; Kottmann, Renzo; Ma, Juncai; Mori, Hiroshi; Morrison, Norman; Sterk, Peter; Sugawara, Hideaki; Wieczorek, John; Wu, Linhuan; Yilmaz, Pelin

    2012-10-10

    The Global Biodiversity Information Facility and the Genomic Standards Consortium convened a joint workshop at the University of Oxford, 27-29 February 2012, with a small group of experts from Europe, USA, China and Japan, to continue the alignment of the Darwin Core with the MIxS and related genomics standards. Several reference mappings were produced as well as test expressions of MIxS in RDF. The use and management of controlled vocabulary terms was considered in relation to both GBIF and the GSC, and tools for working with terms were reviewed. Extensions for publishing genomic biodiversity data to the GBIF network via a Darwin Core Archive were prototyped and work begun on preparing translations of the Darwin Core to Japanese and Chinese. Five genomic repositories were identified for engagement to begin the process of testing the publishing of genomic data to the GBIF network commencing with the SILVA rRNA database.

  15. Galaxy alignments: Observations and impact on cosmology

    CERN Document Server

    Kirk, Donnacha; Hoekstra, Henk; Joachimi, Benjamin; Kitching, Thomas D; Mandelbaum, Rachel; Sifón, Cristóbal; Cacciato, Marcello; Choi, Ami; Kiessling, Alina; Leonard, Adrienne; Rassat, Anais; Schäfer, Björn Malte

    2015-01-01

    Galaxy shapes are not randomly oriented, rather they are statistically aligned in a way that can depend on formation environment, history and galaxy type. Studying the alignment of galaxies can therefore deliver important information about the astrophysics of galaxy formation and evolution as well as the growth of structure in the Universe. In this review paper we summarise key measurements of intrinsic alignments, divided by galaxy type, scale and environment. We also cover the statistics and formalism necessary to understand the observations in the literature. With the emergence of weak gravitational lensing as a precision probe of cosmology, galaxy alignments took on an added importance because they can mimic cosmic shear, the effect of gravitational lensing by large-scale structure on observed galaxy shapes. This makes intrinsic alignments an important systematic effect in weak lensing studies. We quantify the impact of intrinsic alignments on cosmic shear surveys and finish by reviewing practical mitigat...

  16. Magnetic alignment and patterning of cellulose fibers

    Directory of Open Access Journals (Sweden)

    Fumiko Kimura and Tsunehisa Kimura

    2008-01-01

    Full Text Available The alignment and patterning of cellulose fibers under magnetic fields are reported. Static and rotating magnetic fields were used to align cellulose fibers with sizes ranging from millimeter to nanometer sizes. Cellulose fibers of the millimeter order, which were prepared for papermaking, and much smaller fibers with micrometer to nanometer sizes prepared by the acid hydrolysis of larger ones underwent magnetic alignment. Under a rotating field, a uniaxial alignment of fibers was achieved. The alignment was successfully fixed by the photopolymerization of a UV-curable resin precursor used as matrix. A monodomain chiral nematic film was prepared from an aqueous suspension of nanofibers. Using a field modulator inserted in a homogeneous magnetic field, simultaneous alignment and patterning were achieved

  17. Magnetic alignment and patterning of cellulose fibers

    Energy Technology Data Exchange (ETDEWEB)

    Kimura, Fumiko; Kimura, Tsunehisa [Division of Forest and Biomaterials Science, Graduate School of Agriculture, Kyoto University, Kitashirakawa, Sakyo-ku, Kyoto 606-8502 (Japan)], E-mail: tkimura@kais.kyoto-u.ac.jp

    2008-04-01

    The alignment and patterning of cellulose fibers under magnetic fields are reported. Static and rotating magnetic fields were used to align cellulose fibers with sizes ranging from millimeter to nanometer sizes. Cellulose fibers of the millimeter order, which were prepared for papermaking, and much smaller fibers with micrometer to nanometer sizes prepared by the acid hydrolysis of larger ones underwent magnetic alignment. Under a rotating field, a uniaxial alignment of fibers was achieved. The alignment was successfully fixed by the photopolymerization of a UV-curable resin precursor used as matrix. A monodomain chiral nematic film was prepared from an aqueous suspension of nanofibers. Using a field modulator inserted in a homogeneous magnetic field, simultaneous alignment and patterning were achieved.

  18. MSOAR 2.0: Incorporating tandem duplications into ortholog assignment based on genome rearrangement

    Directory of Open Access Journals (Sweden)

    Zhang Liqing

    2010-01-01

    Full Text Available Abstract Background Ortholog assignment is a critical and fundamental problem in comparative genomics, since orthologs are considered to be functional counterparts in different species and can be used to infer molecular functions of one species from those of other species. MSOAR is a recently developed high-throughput system for assigning one-to-one orthologs between closely related species on a genome scale. It attempts to reconstruct the evolutionary history of input genomes in terms of genome rearrangement and gene duplication events. It assumes that a gene duplication event inserts a duplicated gene into the genome of interest at a random location (i.e., the random duplication model. However, in practice, biologists believe that genes are often duplicated by tandem duplications, where a duplicated gene is located next to the original copy (i.e., the tandem duplication model. Results In this paper, we develop MSOAR 2.0, an improved system for one-to-one ortholog assignment. For a pair of input genomes, the system first focuses on the tandemly duplicated genes of each genome and tries to identify among them those that were duplicated after the speciation (i.e., the so-called inparalogs, using a simple phylogenetic tree reconciliation method. For each such set of tandemly duplicated inparalogs, all but one gene will be deleted from the concerned genome (because they cannot possibly appear in any one-to-one ortholog pairs, and MSOAR is invoked. Using both simulated and real data experiments, we show that MSOAR 2.0 is able to achieve a better sensitivity and specificity than MSOAR. In comparison with the well-known genome-scale ortholog assignment tool InParanoid, Ensembl ortholog database, and the orthology information extracted from the well-known whole-genome multiple alignment program MultiZ, MSOAR 2.0 shows the highest sensitivity. Although the specificity of MSOAR 2.0 is slightly worse than that of InParanoid in the real data experiments

  19. Planar self-aligned imprint lithography for coplanar plasmonic nanostructures fabrication

    KAUST Repository

    Wan, Weiwei

    2014-03-01

    Nanoimprint lithography (NIL) is a cost-efficient nanopatterning technology because of its promising advantages of high throughput and high resolution. However, accurate multilevel overlay capability of NIL required for integrated circuit manufacturing remains a challenge due to the high cost of achieving mechanical alignment precision. Although self-aligned imprint lithography was developed to avoid the need of alignment for the vertical layered structures, it has limited usage in the manufacture of the coplanar structures, such as integrated plasmonic devices. In this paper, we develop a new process of planar self-alignment imprint lithography (P-SAIL) to fabricate the metallic and dielectric structures on the same plane. P-SAIL transfers the multilevel imprint processes to a single-imprint process which offers higher efficiency and less cost than existing manufacturing methods. Such concept is demonstrated in an example of fabricating planar plasmonic structures consisting of different materials. © 2014 Springer-Verlag Berlin Heidelberg.

  20. Program for PET image alignment: Effects on calculated differences in cerebral metabolic rates for glucose

    Energy Technology Data Exchange (ETDEWEB)

    Phillips, R.L.; London, E.D.; Links, J.M.; Cascella, N.G. (NIDA Addiction Research Center, Baltimore, MD (USA))

    1990-12-01

    A program was developed to align positron emission tomography images from multiple studies on the same subject. The program allowed alignment of two images with a fineness of one-tenth the width of a pixel. The indications and effects of misalignment were assessed in eight subjects from a placebo-controlled double-blind crossover study on the effects of cocaine on regional cerebral metabolic rates for glucose. Visual examination of a difference image provided a sensitive and accurate tool for assessing image alignment. Image alignment within 2.8 mm was essential to reduce variability of measured cerebral metabolic rates for glucose. Misalignment by this amount introduced errors on the order of 20% in the computed metabolic rate for glucose. These errors propagate to the difference between metabolic rates for a subject measured in basal versus perturbed states.

  1. Alignment of in-vessel components by metrology defined adaptive machining

    Energy Technology Data Exchange (ETDEWEB)

    Wilson, David [ITER Organization, Route de Vinon sur Verdon, CS90 046, St Paul-lez-Durance (France); Bernard, Nathanaël [G2Métric, Launaguet 31140 (France); Mariani, Antony [Spatial Alignment Ltd., Witney (United Kingdom)

    2015-10-15

    Highlights: • Advanced metrology techniques developed for large volume high density in-vessel surveys. • Virtual alignment process employed to optimize the alignment of 440 blanket modules. • Auto-geometry construct, from survey data, using CAD proximity detection and orientation logic. • HMI developed to relocate blanket modules if customization limits on interfaces are exceeded. • Data export format derived for Catia parametric models, defining customization requirements. - Abstract: The assembly of ITER will involve the precise and accurate alignment of a large number of components and assemblies in areas where access will often be severely constrained and where process efficiency will be critical. One such area is the inside of the vacuum vessel where several thousand components shall be custom machined to provide the alignment references for in-vessel systems. The paper gives an overview of the process that will be employed; to survey the interfaces for approximately 3500 components then define and execute the customization process.

  2. Velocity-aligned Doppler spectroscopy

    Science.gov (United States)

    Xu, Z.; Koplitz, B.; Wittig, C.

    1989-03-01

    The use of velocity-aligned Doppler spectroscopy (VADS) to measure center-of-mass kinetic-energy distributions of nascent photofragments produced in pulsed-initiation photolysis/probe experiments is described and demonstrated. In VADS, pulsed photolysis and probe laser beams counterpropagate through the ionization region of a time-of-flight mass spectrometer. The theoretical principles of VADS and the mathematical interpretation of VADS data are explained and illustrated with diagrams; the experimental setup is described; and results for the photodissociation of HI, H2S, and NH3 are presented in graphs and characterized in detail. VADS is shown to give much higher kinetic-energy resolution than conventional Doppler spectroscopy.

  3. Microwave Emission from Aligned Dust

    CERN Document Server

    Lazarian, A

    2003-01-01

    Polarized microwave emission from dust is an important foreground that may contaminate polarized CMB studies unless carefully accounted for. We discuss potential difficulties associated with this foreground, namely, the existence of different grain populations with very different emission/polarization properties and variations of the polarization yield with grain temperature. In particular, we discuss observational evidence in favor of rotational emission from tiny PAH particles with dipole moments, i.e. ``spinning dust'', and also consider magneto-dipole emission from strongly magnetized grains. We argue that in terms of polarization, the magneto-dipole emission may dominate even if its contribution to total emissivity is subdominant. Addressing polarized emission at frequencies larger than approsimately 100 GHz, we discuss the complications arising from the existence of dust components with different temperatures and possibly different alignment properties.

  4. Recursions for statistical multiple alignment.

    Science.gov (United States)

    Hein, Jotun; Jensen, Jens Ledet; Pedersen, Christian N S

    2003-12-09

    Algorithms are presented that allow the calculation of the probability of a set of sequences related by a binary tree that have evolved according to the Thorne-Kishino-Felsenstein model for a fixed set of parameters. The algorithms are based on a Markov chain generating sequences and their alignment at nodes in a tree. Depending on whether the complete realization of this Markov chain is decomposed into the first transition and the rest of the realization or the last transition and the first part of the realization, two kinds of recursions are obtained that are computationally similar but probabilistically different. The running time of the algorithms is O(Pi id=1 Li), where Li is the length of the ith observed sequences and d is the number of sequences. An alternative recursion is also formulated that uses only a Markov chain involving the inner nodes of a tree.

  5. Aligned carbon nanotubes for nanoelectronics

    Science.gov (United States)

    Choi, Won Bong; Bae, Eunju; Kang, Donghun; Chae, Soodoo; Cheong, Byung-ho; Ko, Ju-hye; Lee, Eungmin; Park, Wanjun

    2004-10-01

    We discuss the central issues to be addressed for realizing carbon nanotube (CNT) nanoelectronics. We focus on selective growth, electron energy bandgap engineering and device integration. We have introduced a nanotemplate to control the selective growth, length and diameter of CNTs. Vertically aligned CNTs are synthesized for developing a vertical CNT-field effect transistor (FET). The ohmic contact of the CNT/metal interface is formed by rapid thermal annealing. Diameter control, synthesis of Y-shaped CNTs and surface modification of CNTs open up the possibility for energy bandgap modulation. The concepts of an ultra-high density transistor based on the vertical-CNT array and a nonvolatile memory based on the top gate structure with an oxide-nitride-oxide charge trap are also presented. We suggest that the deposited memory film can be used for the quantum dot storage due to the localized electric field created by a nano scale CNT-electron channel.

  6. Multilingual alignments by monolingual string differences

    OpenAIRE

    Lardilleux, Adrien; Lepage, Yves

    2008-01-01

    International audience; We propose a method to obtain subsentential alignments from several languages simultaneously. The method handles several languages at once, and avoids the complexity explosion due to the usual pair-by-pair processing. It can be used for different units (characters, morphemes, words, chunks). An evaluation of word alignments with a trilingual machine translation corpus has been conducted. A comparison of the results with those obtained by state of the art alignment soft...

  7. Distributed Interference Alignment with Low Overhead

    CERN Document Server

    Ma, Yanjun; Chen, Rui

    2011-01-01

    Based on closed-form interference alignment (IA) solutions, a low overhead distributed interference alignment (LOIA) scheme is proposed in this paper for the $K$-user SISO interference channel, and extension to multiple antenna scenario is also considered. Compared with the iterative interference alignment (IIA) algorithm proposed by Gomadam et al., the overhead is greatly reduced. Simulation results show that the IIA algorithm is strictly suboptimal compared with our LOIA algorithm in the overhead-limited scenario.

  8. Spin alignment in superdeformed rotational bands

    Energy Technology Data Exchange (ETDEWEB)

    Stephens, F.S. (Lawrence Berkeley Lab., CA (USA). Nuclear Science Div.)

    1990-12-24

    Many superdeformed bands in different nuclei are found to have virtually identical moments of inertia and alignments that differ from each other by quantized amounts - multiples of 1/2 {Dirac h}. Pseudo spins represent the only source of quantized alignment that has been thought of to date. Additional puzzles in these bands are the absence of other larger effects on the moments of inertia, and a surprising number of alignments of 1 {Dirac h}. (orig.).

  9. COS to FGS Alignment {NUV}

    Science.gov (United States)

    Hartig, George

    2009-07-01

    DESCRIPTION: In order to determine the location of the COS reference frame with respect to the FGS reference frames, NUV MIRRORA images will be obtained of an astrometric target and field. Astrometric guide stars and targets must be employed for this activity in order to facilitate the alignment wth the FGS. Images will be obtained at the initial pointing and at positions offset in V2 and in V3. Starting with the original blind pointing, obtain MIRRORA image exposures in a 5x5 POS-TARG grid centered on initial pointing; repeat the image sequence at two bracketing focus positions in same visit. Following completion of third pattern, return to nominal focus and perform 5x5 ACQ/SEARCH target acquisition and obtain one TIME-TAG MIRRORA image and one ACCUM verification exposure. Next perform an ACQ/IMAGE target acquisition followed by an ACCUM verification exposure. Also obtain ACCUM verification exposure for each of the two alternate focus positions used previously. Using MIRRORB obtain ACCUM confirmation image at nominal focus and ACCUM images at alternate focus positions and then perform an ACQ/IMAGE and confirming image at nominal focus. Analyze imagery, uplink pointing offset as offset 11469A and adjust nominal focus via patchable constant uplinked with subsequent visit of this program; update aperture locations via modified SIAF file uplinked with subsequent SMS. Use updated focus and offset pointing as input for COS 09 {program 11469 - NUV Optics Alignment and Focus} {note the SIAF update is not a prerequisite for COS 09 to proceed, but the pointing offset and focus update are}.

  10. Mitochondrial genome sequences illuminate maternal lineages of conservation concern in a rare carnivore

    Directory of Open Access Journals (Sweden)

    Pilgrim Kristine

    2011-04-01

    Full Text Available Abstract Background Science-based wildlife management relies on genetic information to infer population connectivity and identify conservation units. The most commonly used genetic marker for characterizing animal biodiversity and identifying maternal lineages is the mitochondrial genome. Mitochondrial genotyping figures prominently in conservation and management plans, with much of the attention focused on the non-coding displacement ("D" loop. We used massively parallel multiplexed sequencing to sequence complete mitochondrial genomes from 40 fishers, a threatened carnivore that possesses low mitogenomic diversity. This allowed us to test a key assumption of conservation genetics, specifically, that the D-loop accurately reflects genealogical relationships and variation of the larger mitochondrial genome. Results Overall mitogenomic divergence in fishers is exceedingly low, with 66 segregating sites and an average pairwise distance between genomes of 0.00088 across their aligned length (16,290 bp. Estimates of variation and genealogical relationships from the displacement (D loop region (299 bp are contradicted by the complete mitochondrial genome, as well as the protein coding fraction of the mitochondrial genome. The sources of this contradiction trace primarily to the near-absence of mutations marking the D-loop region of one of the most divergent lineages, and secondarily to independent (recurrent mutations at two nucleotide position in the D-loop amplicon. Conclusions Our study has two important implications. First, inferred genealogical reconstructions based on the fisher D-loop region contradict inferences based on the entire mitogenome to the point that the populations of greatest conservation concern cannot be accurately resolved. Whole-genome analysis identifies Californian haplotypes from the northern-most populations as highly distinctive, with a significant excess of amino acid changes that may be indicative of molecular

  11. Self-transport and self-alignment of microchips using microscopic rain

    Science.gov (United States)

    Chang, Bo; Shah, Ali; Zhou, Quan; Ras, Robin H. A.; Hjort, Klas

    2015-10-01

    Alignment of microchips with receptors is an important process step in the construction of integrated micro- and nanosystems for emerging technologies, and facilitating alignment by spontaneous self-assembly processes is highly desired. Previously, capillary self-alignment of microchips driven by surface tension effects on patterned surfaces has been reported, where it was essential for microchips to have sufficient overlap with receptor sites. Here we demonstrate for the first time capillary self-transport and self-alignment of microchips, where microchips are initially placed outside the corresponding receptor sites and can be self-transported by capillary force to the receptor sites followed by self-alignment. The surface consists of hydrophilic silicon receptor sites surrounded by superhydrophobic black silicon. Rain-induced microscopic droplets are used to form the meniscus for the self-transport and self-alignment. The boundary conditions for the self-transport have been explored by modeling and confirmed experimentally. The maximum permitted gap between a microchip and a receptor site is determined by the volume of the liquid and by the wetting contrast between receptor site and substrate. Microscopic rain applied on hydrophilic-superhydrophobic patterned surfaces greatly improves the capability, reliability and error-tolerance of the process, avoiding the need for accurate initial placement of microchips, and thereby greatly simplifying the alignment process.

  12. Self-transport and self-alignment of microchips using microscopic rain.

    Science.gov (United States)

    Chang, Bo; Shah, Ali; Zhou, Quan; Ras, Robin H A; Hjort, Klas

    2015-10-09

    Alignment of microchips with receptors is an important process step in the construction of integrated micro- and nanosystems for emerging technologies, and facilitating alignment by spontaneous self-assembly processes is highly desired. Previously, capillary self-alignment of microchips driven by surface tension effects on patterned surfaces has been reported, where it was essential for microchips to have sufficient overlap with receptor sites. Here we demonstrate for the first time capillary self-transport and self-alignment of microchips, where microchips are initially placed outside the corresponding receptor sites and can be self-transported by capillary force to the receptor sites followed by self-alignment. The surface consists of hydrophilic silicon receptor sites surrounded by superhydrophobic black silicon. Rain-induced microscopic droplets are used to form the meniscus for the self-transport and self-alignment. The boundary conditions for the self-transport have been explored by modeling and confirmed experimentally. The maximum permitted gap between a microchip and a receptor site is determined by the volume of the liquid and by the wetting contrast between receptor site and substrate. Microscopic rain applied on hydrophilic-superhydrophobic patterned surfaces greatly improves the capability, reliability and error-tolerance of the process, avoiding the need for accurate initial placement of microchips, and thereby greatly simplifying the alignment process.

  13. Choosing the best heuristic for seeded alignment of DNA sequences

    Directory of Open Access Journals (Sweden)

    Buhler Jeremy

    2006-03-01

    Full Text Available Abstract Background Seeded alignment is an important component of algorithms for fast, large-scale DNA similarity search. A good seed matching heuristic can reduce the execution time of genomic-scale sequence comparison without degrading sensitivity. Recently, many types of seed have been proposed to improve on the performance of traditional contiguous seeds as used in, e.g., NCBI BLASTN. Choosing among these seed types, particularly those that use information besides the presence or absence of matching residue pairs, requires practical guidance based on a rigorous comparison, including assessment of sensitivity, specificity, and computational efficiency. This work performs such a comparison, focusing on alignments in DNA outside widely studied coding regions. Results We compare seeds of several types, including those allowing transition mutations rather than matches at fixed positions, those allowing transitions at arbitrary positions ("BLASTZ" seeds, and those using a more general scoring matrix. For each seed type, we use an extended version of our Mandala seed design software to choose seeds with optimized sensitivity for various levels of specificity. Our results show that, on a test set biased toward alignments of noncoding DNA, transition information significantly improves seed performance, while finer distinctions between different types of mismatches do not. BLASTZ seeds perform especially well. These results depend on properties of our test set that are not shared by EST-based test sets with a strong bias toward coding DNA. Conclusion Practical seed design requires careful attention to the properties of the alignments being sought. For noncoding DNA sequences, seeds that use transition information, especially BLASTZ-style seeds, are particularly useful. The Mandala seed design software can be found at http://www.cse.wustl.edu/~yanni/mandala/.

  14. The UCSC Archaeal Genome Browser: 2012 update.

    Science.gov (United States)

    Chan, Patricia P; Holmes, Andrew D; Smith, Andrew M; Tran, Danny; Lowe, Todd M

    2012-01-01

    The UCSC Archaeal Genome Browser (http://archaea.ucsc.edu) offers a graphical web-based resource for exploration and discovery within archaeal and other selected microbial genomes. By bringing together existing gene annotations, gene expression data, multiple-genome alignments, pre-computed sequence comparisons and other specialized analysis tracks, the genome browser is a powerful aggregator of varied genomic information. The genome browser environment maintains the current look-and-feel of the vertebrate UCSC Genome Browser, but also integrates archaeal and bacterial-specific tracks with a few graphic display enhancements. The browser currently contains 115 archaeal genomes, plus 31 genomes of viruses known to infect archaea. Some of the recently developed or enhanced tracks visualize data from published high-throughput RNA-sequencing studies, the NCBI Conserved Domain Database, sequences from pre-genome sequencing studies, predicted gene boundaries from three different protein gene prediction algorithms, tRNAscan-SE gene predictions with RNA secondary structures and CRISPR locus predictions. We have also developed a companion resource, the Archaeal COG Browser, to provide better search and display of arCOG gene function classifications, including their phylogenetic distribution among available archaeal genomes.

  15. The art of editing RNA structural alignments

    DEFF Research Database (Denmark)

    Andersen, Ebbe Sloth

    2014-01-01

    Manual editing of RNA structural alignments may be considered more art than science, since it still requires an expert biologist to take multiple levels of information into account and be slightly creative when constructing high-quality alignments. Even though the task is rather tedious, it is re......Manual editing of RNA structural alignments may be considered more art than science, since it still requires an expert biologist to take multiple levels of information into account and be slightly creative when constructing high-quality alignments. Even though the task is rather tedious...

  16. The art of editing RNA structural alignments

    DEFF Research Database (Denmark)

    Andersen, Ebbe Sloth

    2014-01-01

    Manual editing of RNA structural alignments may be considered more art than science, since it still requires an expert biologist to take multiple levels of information into account and be slightly creative when constructing high-quality alignments. Even though the task is rather tedious, it is re......Manual editing of RNA structural alignments may be considered more art than science, since it still requires an expert biologist to take multiple levels of information into account and be slightly creative when constructing high-quality alignments. Even though the task is rather tedious...

  17. The alignment between spatial planning, transportation planning ...

    African Journals Online (AJOL)

    engagement processes, support, ... Carel Schoeman • The alignment between spatial planning, transportation planning and environmental ..... NDOT: Public Transport Strategy (2007) .... Community Land Reform Act 28 of 1996 (CLARA).

  18. Nova alignment and laser diagnostics systems - 1

    Energy Technology Data Exchange (ETDEWEB)

    Bliss, E.S.; Ozarski, R.G.; Myers, D.W.; Richards, J.B.; Swift, C.D.; Boyd, R.D.; Hugenberger, R.E.; Seppala, L.G.; Parker, J.; Dryden, E.H.

    1981-01-01

    The alignment and laser diagnostic systems guide laser pulses through the separate amplifier chains to the target, measure their temporal, spatial and energy characteristics, and ensure simultaneous arrival at the target to within 5 picoseconds. Alignment tasks accomplished prior to each target shot involve automatic or remote-manual adjustments of approximately 2000 stepper motors and other actuators for the full 20 beam, 3 wavelength system. The primary detectors for alignment functions are CCD cameras with both digital and standard video output. Diagnostic data handling and processing is accomplished digitally, and both the alignment and diagnostic systems are integrated into the facility-wide digital control network.

  19. VIRUS spectrograph assembly and alignment procedures

    Science.gov (United States)

    Prochaska, Travis; Allen, Richard D.; Boster, Emily; DePoy, D. L.; Herbig, Benjamin; Hill, Gary J.; Lee, Hanshin; Marshall, Jennifer L.; Martin, Emily C.; Meador, William; Rheault, Jean-Philippe; Tuttle, Sarah E.; Vattiat, Brian L.

    2012-09-01

    We describe the mechanical assembly and optical alignment processes used to construct the Visual Integral-Field Replicable Unit Spectrograph (VIRUS) instrument. VIRUS is a set of 150+ optical spectrographs designed to support observations for the Hobby-Eberly Telescope Dark Energy Experiment (HETDEX). To meet the instrument's manufacturing constraints, a production line will be set up to build subassemblies in parallel. To aid in the instrument's assembly and alignment, specialized fixtures and adjustment apparatuses have been developed. We describe the design and operations of the various optics alignment apparatuses, as well as the mirrors' alignment and bonding fixtures.

  20. Some aspects of SR beamline alignment

    Energy Technology Data Exchange (ETDEWEB)

    Gaponov, Yu.A., E-mail: Yury.Gaponov@maxlab.lu.se [MAX-lab, Lund University, P.O.B. 118, SE-221 00 Lund (Sweden); Cerenius, Y. [MAX-lab, Lund University, P.O.B. 118, SE-221 00 Lund (Sweden); Nygaard, J. [Faculty of Life Sciences, University of Copenhagen, DK-1871 Frederiksberg C (Denmark); Ursby, T.; Larsson, K. [MAX-lab, Lund University, P.O.B. 118, SE-221 00 Lund (Sweden)

    2011-09-01

    Based on the Synchrotron Radiation (SR) beamline optical element-by-element alignment with analysis of the alignment results an optimized beamline alignment algorithm has been designed and developed. The alignment procedures have been designed and developed for the MAX-lab I911-4 fixed energy beamline. It has been shown that the intermediate information received during the monochromator alignment stage can be used for the correction of both monochromator and mirror without the next stages of alignment of mirror, slits, sample holder, etc. Such an optimization of the beamline alignment procedures decreases the time necessary for the alignment and becomes useful and helpful in the case of any instability of the beamline optical elements, storage ring electron orbit or the wiggler insertion device, which could result in the instability of angular and positional parameters of the SR beam. A general purpose software package for manual, semi-automatic and automatic SR beamline alignment has been designed and developed using the developed algorithm. The TANGO control system is used as the middle-ware between the stand-alone beamline control applications BLTools, BPMonitor and the beamline equipment.