WorldWideScience

Sample records for deopdd sequence improvement

  1. Transformed composite sequences for improved qubit addressing

    Science.gov (United States)

    Merrill, J. True; Doret, S. Charles; Vittorini, Grahame; Addison, J. P.; Brown, Kenneth R.

    2014-10-01

    Selective laser addressing of a single atom or atomic ion qubit can be improved using narrow-band composite pulse sequences. We describe a Lie-algebraic technique to generalize known narrow-band sequences and introduce sequences related by dilation and rotation of sequence generators. Our method improves known narrow-band sequences by decreasing both the pulse time and the residual error. Finally, we experimentally demonstrate these composite sequences using 40Ca+ ions trapped in a surface-electrode ion trap.

  2. Improved motor sequence retention by motionless listening.

    Science.gov (United States)

    Lahav, Amir; Katz, Tal; Chess, Roxanne; Saltzman, Elliot

    2013-05-01

    This study examined the effect of listening to a newly learned musical piece on subsequent motor retention of the piece. Thirty-six non-musicians were trained to play an unfamiliar melody on a piano keyboard. Next, they were randomly assigned to participate in three follow-up listening sessions over 1 week. Subjects who, during their listening sessions, listened to the same initial piece showed significant improvements in motor memory and retention of the piece despite the absence of physical practice. These improvements included increased pitch accuracy, time accuracy, and dynamic intensity of key pressing. Similar improvements, though to a lesser degree, were observed in subjects who, during their listening sessions, were distracted by another task. Control subjects, who after learning the piece had listened to nonmusical sounds, showed impaired motoric retention of the piece at 1 week from the initial acquisition day. These results imply that motor sequences can be established in motor memory without direct access to motor-related information. In addition, the study revealed that the listening-induced improvements did not generalize to the learning of a new musical piece composed of the same notes as the initial piece learned, limiting the effects to musical motor sequences that are already part of the individual's motor repertoire.

  3. Improving sequence segmentation learning by predicting trigrams

    NARCIS (Netherlands)

    van den Bosch, A.; Daelemans, W.; Dagan, I.; Gildea, D.

    2005-01-01

    Symbolic machine-learning classifiers are known to suffer from near-sightedness when performing sequence segmentation (chunking) tasks in natural language processing: without special architectural additions they are oblivious of the decisions they made earlier when making new ones. We introduce a

  4. Musical Scales in Tone Sequences Improve Temporal Accuracy.

    Science.gov (United States)

    Li, Min S; Di Luca, Massimiliano

    2018-01-01

    Predicting the time of stimulus onset is a key component in perception. Previous investigations of perceived timing have focused on the effect of stimulus properties such as rhythm and temporal irregularity, but the influence of non-temporal properties and their role in predicting stimulus timing has not been exhaustively considered. The present study aims to understand how a non-temporal pattern in a sequence of regularly timed stimuli could improve or bias the detection of temporal deviations. We presented interspersed sequences of 3, 4, 5, and 6 auditory tones where only the timing of the last stimulus could slightly deviate from isochrony. Participants reported whether the last tone was 'earlier' or 'later' relative to the expected regular timing. In two conditions, the tones composing the sequence were either organized into musical scales or they were random tones. In one experiment, all sequences ended with the same tone; in the other experiment, each sequence ended with a different tone. Results indicate higher discriminability of anisochrony with musical scales and with longer sequences, irrespective of the knowledge of the final tone. Such an outcome suggests that the predictability of non-temporal properties, as enabled by the musical scale pattern, can be a factor in determining the sensitivity of time judgments.

  5. Improved nonparametric inference for multiple correlated periodic sequences

    KAUST Repository

    Sun, Ying

    2013-08-26

    This paper proposes a cross-validation method for estimating the period as well as the values of multiple correlated periodic sequences when data are observed at evenly spaced time points. The period of interest is estimated conditional on the other correlated sequences. An alternative method for period estimation based on Akaike\\'s information criterion is also discussed. The improvement of the period estimation performance is investigated both theoretically and by simulation. We apply the multivariate cross-validation method to the temperature data obtained from multiple ice cores, investigating the periodicity of the El Niño effect. Our methodology is also illustrated by estimating patients\\' cardiac cycle from different physiological signals, including arterial blood pressure, electrocardiography, and fingertip plethysmograph.

  6. Phylo: a citizen science approach for improving multiple sequence alignment.

    Directory of Open Access Journals (Sweden)

    Alexander Kawrykow

    Full Text Available BACKGROUND: Comparative genomics, or the study of the relationships of genome structure and function across different species, offers a powerful tool for studying evolution, annotating genomes, and understanding the causes of various genetic disorders. However, aligning multiple sequences of DNA, an essential intermediate step for most types of analyses, is a difficult computational task. In parallel, citizen science, an approach that takes advantage of the fact that the human brain is exquisitely tuned to solving specific types of problems, is becoming increasingly popular. There, instances of hard computational problems are dispatched to a crowd of non-expert human game players and solutions are sent back to a central server. METHODOLOGY/PRINCIPAL FINDINGS: We introduce Phylo, a human-based computing framework applying "crowd sourcing" techniques to solve the Multiple Sequence Alignment (MSA problem. The key idea of Phylo is to convert the MSA problem into a casual game that can be played by ordinary web users with a minimal prior knowledge of the biological context. We applied this strategy to improve the alignment of the promoters of disease-related genes from up to 44 vertebrate species. Since the launch in November 2010, we received more than 350,000 solutions submitted from more than 12,000 registered users. Our results show that solutions submitted contributed to improving the accuracy of up to 70% of the alignment blocks considered. CONCLUSIONS/SIGNIFICANCE: We demonstrate that, combined with classical algorithms, crowd computing techniques can be successfully used to help improving the accuracy of MSA. More importantly, we show that an NP-hard computational problem can be embedded in casual game that can be easily played by people without significant scientific training. This suggests that citizen science approaches can be used to exploit the billions of "human-brain peta-flops" of computation that are spent every day playing games

  7. Automated degenerate PCR primer design for high-throughput sequencing improves efficiency of viral sequencing

    Directory of Open Access Journals (Sweden)

    Li Kelvin

    2012-11-01

    Full Text Available Abstract Background In a high-throughput environment, to PCR amplify and sequence a large set of viral isolates from populations that are potentially heterogeneous and continuously evolving, the use of degenerate PCR primers is an important strategy. Degenerate primers allow for the PCR amplification of a wider range of viral isolates with only one set of pre-mixed primers, thus increasing amplification success rates and minimizing the necessity for genome finishing activities. To successfully select a large set of degenerate PCR primers necessary to tile across an entire viral genome and maximize their success, this process is best performed computationally. Results We have developed a fully automated degenerate PCR primer design system that plays a key role in the J. Craig Venter Institute’s (JCVI high-throughput viral sequencing pipeline. A consensus viral genome, or a set of consensus segment sequences in the case of a segmented virus, is specified using IUPAC ambiguity codes in the consensus template sequence to represent the allelic diversity of the target population. PCR primer pairs are then selected computationally to produce a minimal amplicon set capable of tiling across the full length of the specified target region. As part of the tiling process, primer pairs are computationally screened to meet the criteria for successful PCR with one of two described amplification protocols. The actual sequencing success rates for designed primers for measles virus, mumps virus, human parainfluenza virus 1 and 3, human respiratory syncytial virus A and B and human metapneumovirus are described, where >90% of designed primer pairs were able to consistently successfully amplify >75% of the isolates. Conclusions Augmenting our previously developed and published JCVI Primer Design Pipeline, we achieved similarly high sequencing success rates with only minor software modifications. The recommended methodology for the construction of the consensus

  8. An improved high throughput sequencing method for studying oomycete communities

    DEFF Research Database (Denmark)

    Sapkota, Rumakanta; Nicolaisen, Mogens

    2015-01-01

    the usefulness of the method not only in soil DNA but also in a plant DNA background. In conclusion, we demonstrate a successful approach for pyrosequencing of oomycete communities using ITS1 as the barcode sequence with well-known primers for oomycete DNA amplification....... communities. Thewell-known primer sets ITS4, ITS6 and ITS7were used in the study in a semi-nested PCR approach to target the internal transcribed spacer (ITS) 1 of ribosomal DNA in a next generation sequencing protocol. These primers have been used in similar studies before, butwith limited success.......Wewere able to increase the proportion of retrieved oomycete sequences dramaticallymainly by increasing the annealing temperature during PCR. The optimized protocol was validated using three mock communities and the method was further evaluated using total DNA from 26 soil samples collected from different...

  9. A Probabilistic Approach for Improved Sequence Mapping in Metatranscriptomic Studies

    Science.gov (United States)

    Mapping millions of short DNA sequences a reference genome is a necessary step in many experiments designed to investigate the expression of genes involved in disease resistance. This is a difficult task in which several challenges often arise resulting in a suboptimal mapping. This mapping process ...

  10. The Accident Sequence Precursor program: Methods improvements and current results

    International Nuclear Information System (INIS)

    Minarick, J.W.; Manning, F.M.; Harris, J.D.

    1987-01-01

    Changes in the US NRC Accident Sequence Precursor program methods since the initial program evaluations of 1969-81 operational events are described, along with insights from the review of 1984-85 events. For 1984-85, the number of significant precursors was consistent with the number observed in 1980-81, dominant sequences associated with significant events were reasonably consistent with PRA estimates for BWRs, but lacked the contribution due to small-break LOCAs previously observed and predicted in PWRs, and the frequency of initiating events and non-recoverable system failures exhibited some reduction compared to 1980-81. Operational events which provide information concerning additional PRA modeling needs are also described

  11. Using relational databases for improved sequence similarity searching and large-scale genomic analyses.

    Science.gov (United States)

    Mackey, Aaron J; Pearson, William R

    2004-10-01

    Relational databases are designed to integrate diverse types of information and manage large sets of search results, greatly simplifying genome-scale analyses. Relational databases are essential for management and analysis of large-scale sequence analyses, and can also be used to improve the statistical significance of similarity searches by focusing on subsets of sequence libraries most likely to contain homologs. This unit describes using relational databases to improve the efficiency of sequence similarity searching and to demonstrate various large-scale genomic analyses of homology-related data. This unit describes the installation and use of a simple protein sequence database, seqdb_demo, which is used as a basis for the other protocols. These include basic use of the database to generate a novel sequence library subset, how to extend and use seqdb_demo for the storage of sequence similarity search results and making use of various kinds of stored search results to address aspects of comparative genomic analysis.

  12. Sequencing of allotetraploid cotton (Gossypium hirsutum L. acc. TM-1) provides a resource for fiber improvement.

    Science.gov (United States)

    Zhang, Tianzhen; Hu, Yan; Jiang, Wenkai; Fang, Lei; Guan, Xueying; Chen, Jiedan; Zhang, Jinbo; Saski, Christopher A; Scheffler, Brian E; Stelly, David M; Hulse-Kemp, Amanda M; Wan, Qun; Liu, Bingliang; Liu, Chunxiao; Wang, Sen; Pan, Mengqiao; Wang, Yangkun; Wang, Dawei; Ye, Wenxue; Chang, Lijing; Zhang, Wenpan; Song, Qingxin; Kirkbride, Ryan C; Chen, Xiaoya; Dennis, Elizabeth; Llewellyn, Danny J; Peterson, Daniel G; Thaxton, Peggy; Jones, Don C; Wang, Qiong; Xu, Xiaoyang; Zhang, Hua; Wu, Huaitong; Zhou, Lei; Mei, Gaofu; Chen, Shuqi; Tian, Yue; Xiang, Dan; Li, Xinghe; Ding, Jian; Zuo, Qiyang; Tao, Linna; Liu, Yunchao; Li, Ji; Lin, Yu; Hui, Yuanyuan; Cao, Zhisheng; Cai, Caiping; Zhu, Xiefei; Jiang, Zhi; Zhou, Baoliang; Guo, Wangzhen; Li, Ruiqiang; Chen, Z Jeffrey

    2015-05-01

    Upland cotton is a model for polyploid crop domestication and transgenic improvement. Here we sequenced the allotetraploid Gossypium hirsutum L. acc. TM-1 genome by integrating whole-genome shotgun reads, bacterial artificial chromosome (BAC)-end sequences and genotype-by-sequencing genetic maps. We assembled and annotated 32,032 A-subgenome genes and 34,402 D-subgenome genes. Structural rearrangements, gene loss, disrupted genes and sequence divergence were more common in the A subgenome than in the D subgenome, suggesting asymmetric evolution. However, no genome-wide expression dominance was found between the subgenomes. Genomic signatures of selection and domestication are associated with positively selected genes (PSGs) for fiber improvement in the A subgenome and for stress tolerance in the D subgenome. This draft genome sequence provides a resource for engineering superior cotton lines.

  13. Improving accuracy of protein-protein interaction prediction by considering the converse problem for sequence representation

    Directory of Open Access Journals (Sweden)

    Wang Yong

    2011-10-01

    Full Text Available Abstract Background With the development of genome-sequencing technologies, protein sequences are readily obtained by translating the measured mRNAs. Therefore predicting protein-protein interactions from the sequences is of great demand. The reason lies in the fact that identifying protein-protein interactions is becoming a bottleneck for eventually understanding the functions of proteins, especially for those organisms barely characterized. Although a few methods have been proposed, the converse problem, if the features used extract sufficient and unbiased information from protein sequences, is almost untouched. Results In this study, we interrogate this problem theoretically by an optimization scheme. Motivated by the theoretical investigation, we find novel encoding methods for both protein sequences and protein pairs. Our new methods exploit sufficiently the information of protein sequences and reduce artificial bias and computational cost. Thus, it significantly outperforms the available methods regarding sensitivity, specificity, precision, and recall with cross-validation evaluation and reaches ~80% and ~90% accuracy in Escherichia coli and Saccharomyces cerevisiae respectively. Our findings here hold important implication for other sequence-based prediction tasks because representation of biological sequence is always the first step in computational biology. Conclusions By considering the converse problem, we propose new representation methods for both protein sequences and protein pairs. The results show that our method significantly improves the accuracy of protein-protein interaction predictions.

  14. Improving Auditory Sequencing Skills in the Kindergarten-Age Child through the Increased Instruction of Music.

    Science.gov (United States)

    Brandt, Millicent Hume

    A music teacher specialist at an elementary school with a typical kindergarten music program implemented a 10-week practicum intervention designed to improve the auditory sequencing skills of kindergarten children through increased instruction in music. Test scores on screening measures indicated the need to improve the children's auditory memory…

  15. Improvement of methods for large scale sequencing; application to human Xq28

    Energy Technology Data Exchange (ETDEWEB)

    Gibbs, R.A.; Andersson, B.; Wentland, M.A. [Baylor College of Medicine, Houston, TX (United States)] [and others

    1994-09-01

    Sequencing of a one-metabase region of Xq28, spanning the FRAXA and IDS loci has been undertaken in order to investigate the practicality of the shotgun approach for large scale sequencing and as a platform to develop improved methods. The efficiency of several steps in the shotgun sequencing strategy has been increased using PCR-based approaches. An improved method for preparation of M13 libraries has been developed. This protocol combines a previously described adaptor-based protocol with the uracil DNA glycosylase (UDG)-cloning procedure. The efficiency of this procedure has been found to be up to 100-fold higher than that of previously used protocols. In addition the novel protocol is more reliable and thus easy to establish in a laboratory. The method has also been adapted for the simultaneous shotgun sequencing of multiple short fragments by concentrating them before library construction is presented. This protocol is suitable for rapid characterization of cDNA clones. A library was constructed from 15 PCR-amplified and concentrated human cDNA inserts, and the insert sequences could easily be identified as separate contigs during the assembly process and the sequence coverage was even along each fragment. Using this strategy, the fine structures of the FraxA and IDS loci have been revealed and several EST homologies indicating novel expressed sequences have been identified. Use of PCR to close repetitive regions that are difficult to clone was tested by determination of the sequence of a cosmid mapping DXS455 in Xq28, containing a polymorphic VNTR. The region containing the VNTR was not represented in the shotgun library, but by designing PCR primers in the sequences flanking the gap and by cloning and sequencing the PCR product, the fine structure of the VNTR has been determined. It was found to be an AT-rich VNTR with a repeated 25-mer at the center.

  16. Application of Quaternion in improving the quality of global sequence alignment scores for an ambiguous sequence target in Streptococcus pneumoniae DNA

    Science.gov (United States)

    Lestari, D.; Bustamam, A.; Novianti, T.; Ardaneswari, G.

    2017-07-01

    DNA sequence can be defined as a succession of letters, representing the order of nucleotides within DNA, using a permutation of four DNA base codes including adenine (A), guanine (G), cytosine (C), and thymine (T). The precise code of the sequences is determined using DNA sequencing methods and technologies, which have been developed since the 1970s and currently become highly developed, advanced and highly throughput sequencing technologies. So far, DNA sequencing has greatly accelerated biological and medical research and discovery. However, in some cases DNA sequencing could produce any ambiguous and not clear enough sequencing results that make them quite difficult to be determined whether these codes are A, T, G, or C. To solve these problems, in this study we can introduce other representation of DNA codes namely Quaternion Q = (PA, PT, PG, PC), where PA, PT, PG, PC are the probability of A, T, G, C bases that could appear in Q and PA + PT + PG + PC = 1. Furthermore, using Quaternion representations we are able to construct the improved scoring matrix for global sequence alignment processes, by applying a dot product method. Moreover, this scoring matrix produces better and higher quality of the match and mismatch score between two DNA base codes. In implementation, we applied the Needleman-Wunsch global sequence alignment algorithm using Octave, to analyze our target sequence which contains some ambiguous sequence data. The subject sequences are the DNA sequences of Streptococcus pneumoniae families obtained from the Genebank, meanwhile the target DNA sequence are received from our collaborator database. As the results we found the Quaternion representations improve the quality of the sequence alignment score and we can conclude that DNA sequence target has maximum similarity with Streptococcus pneumoniae.

  17. Optimized, unequal pulse spacing in multiple echo sequences improves refocusing in magnetic resonance.

    Science.gov (United States)

    Jenista, Elizabeth R; Stokes, Ashley M; Branca, Rosa Tamara; Warren, Warren S

    2009-11-28

    A recent quantum computing paper (G. S. Uhrig, Phys. Rev. Lett. 98, 100504 (2007)) analytically derived optimal pulse spacings for a multiple spin echo sequence designed to remove decoherence in a two-level system coupled to a bath. The spacings in what has been called a "Uhrig dynamic decoupling (UDD) sequence" differ dramatically from the conventional, equal pulse spacing of a Carr-Purcell-Meiboom-Gill (CPMG) multiple spin echo sequence. The UDD sequence was derived for a model that is unrelated to magnetic resonance, but was recently shown theoretically to be more general. Here we show that the UDD sequence has theoretical advantages for magnetic resonance imaging of structured materials such as tissue, where diffusion in compartmentalized and microstructured environments leads to fluctuating fields on a range of different time scales. We also show experimentally, both in excised tissue and in a live mouse tumor model, that optimal UDD sequences produce different T(2)-weighted contrast than do CPMG sequences with the same number of pulses and total delay, with substantial enhancements in most regions. This permits improved characterization of low-frequency spectral density functions in a wide range of applications.

  18. IMPROVING THE SKILL AND THE INTEREST OF WRITING ADVERTISEMENTS AND POSTERS THROUGH ESA SEQUENCE

    Directory of Open Access Journals (Sweden)

    Fatma Yuniarti

    2016-04-01

    Full Text Available The action reserach aims at improving the students’ writing skill especially to write advertsements and posters. Both are the short functional texts to be learned at the first semester of the ninth grade. According to the data on pre cycle, the students of class IXA Junior High School Swadhipa Natar, South Lampung got difficulties in writing advertisements and posters. A treatment was necessary to help the students overcome their problem. To consider the related literature, the writer decided to implement ESA sequence in the class. The elements of teaching in ESA Sequence are Engage (to arouse the students’ interests, Study (learn the language focus, and Activate (use the language freely and communicatively.The data were taken from the test of the linguistic competence mastery, the students writing, and the questionnaire. The result shows ESA Sequence can improve the students’ ability in writing advertisements and posters.Key words : ESA (Engange Study Activate, advertisement, poster.

  19. Improving probe set selection for microbial community analysis by leveraging taxonomic information of training sequences

    Directory of Open Access Journals (Sweden)

    Jiang Tao

    2011-10-01

    Full Text Available Abstract Background Population levels of microbial phylotypes can be examined using a hybridization-based method that utilizes a small set of computationally-designed DNA probes targeted to a gene common to all. Our previous algorithm attempts to select a set of probes such that each training sequence manifests a unique theoretical hybridization pattern (a binary fingerprint to a probe set. It does so without taking into account similarity between training gene sequences or their putative taxonomic classifications, however. We present an improved algorithm for probe set selection that utilizes the available taxonomic information of training gene sequences and attempts to choose probes such that the resultant binary fingerprints cluster into real taxonomic groups. Results Gene sequences manifesting identical fingerprints with probes chosen by the new algorithm are more likely to be from the same taxonomic group than probes chosen by the previous algorithm. In cases where they are from different taxonomic groups, underlying DNA sequences of identical fingerprints are more similar to each other in probe sets made with the new versus the previous algorithm. Complete removal of large taxonomic groups from training data does not greatly decrease the ability of probe sets to distinguish those groups. Conclusions Probe sets made from the new algorithm create fingerprints that more reliably cluster into biologically meaningful groups. The method can readily distinguish microbial phylotypes that were excluded from the training sequences, suggesting novel microbes can also be detected.

  20. Improving probe set selection for microbial community analysis by leveraging taxonomic information of training sequences.

    Science.gov (United States)

    Ruegger, Paul M; Della Vedova, Gianluca; Jiang, Tao; Borneman, James

    2011-10-10

    Population levels of microbial phylotypes can be examined using a hybridization-based method that utilizes a small set of computationally-designed DNA probes targeted to a gene common to all. Our previous algorithm attempts to select a set of probes such that each training sequence manifests a unique theoretical hybridization pattern (a binary fingerprint) to a probe set. It does so without taking into account similarity between training gene sequences or their putative taxonomic classifications, however. We present an improved algorithm for probe set selection that utilizes the available taxonomic information of training gene sequences and attempts to choose probes such that the resultant binary fingerprints cluster into real taxonomic groups. Gene sequences manifesting identical fingerprints with probes chosen by the new algorithm are more likely to be from the same taxonomic group than probes chosen by the previous algorithm. In cases where they are from different taxonomic groups, underlying DNA sequences of identical fingerprints are more similar to each other in probe sets made with the new versus the previous algorithm. Complete removal of large taxonomic groups from training data does not greatly decrease the ability of probe sets to distinguish those groups. Probe sets made from the new algorithm create fingerprints that more reliably cluster into biologically meaningful groups. The method can readily distinguish microbial phylotypes that were excluded from the training sequences, suggesting novel microbes can also be detected.

  1. When less is more: 'slicing' sequencing data improves read decoding accuracy and de novo assembly quality.

    Science.gov (United States)

    Lonardi, Stefano; Mirebrahim, Hamid; Wanamaker, Steve; Alpert, Matthew; Ciardo, Gianfranco; Duma, Denisa; Close, Timothy J

    2015-09-15

    As the invention of DNA sequencing in the 70s, computational biologists have had to deal with the problem of de novo genome assembly with limited (or insufficient) depth of sequencing. In this work, we investigate the opposite problem, that is, the challenge of dealing with excessive depth of sequencing. We explore the effect of ultra-deep sequencing data in two domains: (i) the problem of decoding reads to bacterial artificial chromosome (BAC) clones (in the context of the combinatorial pooling design we have recently proposed), and (ii) the problem of de novo assembly of BAC clones. Using real ultra-deep sequencing data, we show that when the depth of sequencing increases over a certain threshold, sequencing errors make these two problems harder and harder (instead of easier, as one would expect with error-free data), and as a consequence the quality of the solution degrades with more and more data. For the first problem, we propose an effective solution based on 'divide and conquer': we 'slice' a large dataset into smaller samples of optimal size, decode each slice independently, and then merge the results. Experimental results on over 15 000 barley BACs and over 4000 cowpea BACs demonstrate a significant improvement in the quality of the decoding and the final assembly. For the second problem, we show for the first time that modern de novo assemblers cannot take advantage of ultra-deep sequencing data. Python scripts to process slices and resolve decoding conflicts are available from http://goo.gl/YXgdHT; software Hashfilter can be downloaded from http://goo.gl/MIyZHs stelo@cs.ucr.edu or timothy.close@ucr.edu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  2. Designing small universal k-mer hitting sets for improved analysis of high-throughput sequencing.

    Directory of Open Access Journals (Sweden)

    Yaron Orenstein

    2017-10-01

    Full Text Available With the rapidly increasing volume of deep sequencing data, more efficient algorithms and data structures are needed. Minimizers are a central recent paradigm that has improved various sequence analysis tasks, including hashing for faster read overlap detection, sparse suffix arrays for creating smaller indexes, and Bloom filters for speeding up sequence search. Here, we propose an alternative paradigm that can lead to substantial further improvement in these and other tasks. For integers k and L > k, we say that a set of k-mers is a universal hitting set (UHS if every possible L-long sequence must contain a k-mer from the set. We develop a heuristic called DOCKS to find a compact UHS, which works in two phases: The first phase is solved optimally, and for the second we propose several efficient heuristics, trading set size for speed and memory. The use of heuristics is motivated by showing the NP-hardness of a closely related problem. We show that DOCKS works well in practice and produces UHSs that are very close to a theoretical lower bound. We present results for various values of k and L and by applying them to real genomes show that UHSs indeed improve over minimizers. In particular, DOCKS uses less than 30% of the 10-mers needed to span the human genome compared to minimizers. The software and computed UHSs are freely available at github.com/Shamir-Lab/DOCKS/ and acgt.cs.tau.ac.il/docks/, respectively.

  3. Improvements and impacts of GRCh38 human reference on high throughput sequencing data analysis.

    Science.gov (United States)

    Guo, Yan; Dai, Yulin; Yu, Hui; Zhao, Shilin; Samuels, David C; Shyr, Yu

    2017-03-01

    Analyses of high throughput sequencing data starts with alignment against a reference genome, which is the foundation for all re-sequencing data analyses. Each new release of the human reference genome has been augmented with improved accuracy and completeness. It is presumed that the latest release of human reference genome, GRCh38 will contribute more to high throughput sequencing data analysis by providing more accuracy. But the amount of improvement has not yet been quantified. We conducted a study to compare the genomic analysis results between the GRCh38 reference and its predecessor GRCh37. Through analyses of alignment, single nucleotide polymorphisms, small insertion/deletions, copy number and structural variants, we show that GRCh38 offers overall more accurate analysis of human sequencing data. More importantly, GRCh38 produced fewer false positive structural variants. In conclusion, GRCh38 is an improvement over GRCh37 not only from the genome assembly aspect, but also yields more reliable genomic analysis results. Copyright © 2017. Published by Elsevier Inc.

  4. AST: an automated sequence-sampling method for improving the taxonomic diversity of gene phylogenetic trees.

    Science.gov (United States)

    Zhou, Chan; Mao, Fenglou; Yin, Yanbin; Huang, Jinling; Gogarten, Johann Peter; Xu, Ying

    2014-01-01

    A challenge in phylogenetic inference of gene trees is how to properly sample a large pool of homologous sequences to derive a good representative subset of sequences. Such a need arises in various applications, e.g. when (1) accuracy-oriented phylogenetic reconstruction methods may not be able to deal with a large pool of sequences due to their high demand in computing resources; (2) applications analyzing a collection of gene trees may prefer to use trees with fewer operational taxonomic units (OTUs), for instance for the detection of horizontal gene transfer events by identifying phylogenetic conflicts; and (3) the pool of available sequences is biased towards extensively studied species. In the past, the creation of subsamples often relied on manual selection. Here we present an Automated sequence-Sampling method for improving the Taxonomic diversity of gene phylogenetic trees, AST, to obtain representative sequences that maximize the taxonomic diversity of the sampled sequences. To demonstrate the effectiveness of AST, we have tested it to solve four problems, namely, inference of the evolutionary histories of the small ribosomal subunit protein S5 of E. coli, 16 S ribosomal RNAs and glycosyl-transferase gene family 8, and a study of ancient horizontal gene transfers from bacteria to plants. Our results show that the resolution of our computational results is almost as good as that of manual inference by domain experts, hence making the tool generally useful to phylogenetic studies by non-phylogeny specialists. The program is available at http://csbl.bmb.uga.edu/~zhouchan/AST.php.

  5. Locus Reference Genomic sequences: An improved basis for describing human DNA variants

    KAUST Repository

    Dalgleish, Raymond; Flicek, Paul; Cunningham, Fiona; Astashyn, Alex; Tully, Raymond E; Proctor, Glenn; Chen, Yuan; McLaren, William M; Larsson, Pontus; Vaughan, Brendan W; Bé roud, Christophe; Dobson, Glen; Lehvä slaiho, Heikki; Taschner, Peter EM; den Dunnen, Johan T; Devereau, Andrew; Birney, Ewan; Brookes, Anthony J; Maglott, Donna R

    2010-01-01

    As our knowledge of the complexity of gene architecture grows, and we increase our understanding of the subtleties of gene expression, the process of accurately describing disease-causing gene variants has become increasingly problematic. In part, this is due to current reference DNA sequence formats that do not fully meet present needs. Here we present the Locus Reference Genomic (LRG) sequence format, which has been designed for the specifi c purpose of gene variant reporting. The format builds on the successful National Center for Biotechnology Information (NCBI) RefSeqGene project and provides a single-fi le record containing a uniquely stable reference DNA sequence along with all relevant transcript and protein sequences essential to the description of gene variants. In principle, LRGs can be created for any organism, not just human. In addition, we recognize the need to respect legacy numbering systems for exons and amino acids and the LRG format takes account of these. We hope that widespread adoption of LRGs - which will be created and maintained by the NCBI and the European Bioinformatics Institute (EBI) - along with consistent use of the Human Genome Variation Society (HGVS)- approved variant nomenclature will reduce errors in the reporting of variants in the literature and improve communication about variants aff ecting human health. Further information can be found on the LRG web site (http://www.lrg-sequence.org). 2010 Dalgleish et al.; licensee BioMed Central Ltd.

  6. Incorporation of unique molecular identifiers in TruSeq adapters improves the accuracy of quantitative sequencing.

    Science.gov (United States)

    Hong, Jungeui; Gresham, David

    2017-11-01

    Quantitative analysis of next-generation sequencing (NGS) data requires discriminating duplicate reads generated by PCR from identical molecules that are of unique origin. Typically, PCR duplicates are identified as sequence reads that align to the same genomic coordinates using reference-based alignment. However, identical molecules can be independently generated during library preparation. Misidentification of these molecules as PCR duplicates can introduce unforeseen biases during analyses. Here, we developed a cost-effective sequencing adapter design by modifying Illumina TruSeq adapters to incorporate a unique molecular identifier (UMI) while maintaining the capacity to undertake multiplexed, single-index sequencing. Incorporation of UMIs into TruSeq adapters (TrUMIseq adapters) enables identification of bona fide PCR duplicates as identically mapped reads with identical UMIs. Using TrUMIseq adapters, we show that accurate removal of PCR duplicates results in improved accuracy of both allele frequency (AF) estimation in heterogeneous populations using DNA sequencing and gene expression quantification using RNA-Seq.

  7. Locus Reference Genomic sequences: An improved basis for describing human DNA variants

    KAUST Repository

    Dalgleish, Raymond

    2010-04-15

    As our knowledge of the complexity of gene architecture grows, and we increase our understanding of the subtleties of gene expression, the process of accurately describing disease-causing gene variants has become increasingly problematic. In part, this is due to current reference DNA sequence formats that do not fully meet present needs. Here we present the Locus Reference Genomic (LRG) sequence format, which has been designed for the specifi c purpose of gene variant reporting. The format builds on the successful National Center for Biotechnology Information (NCBI) RefSeqGene project and provides a single-fi le record containing a uniquely stable reference DNA sequence along with all relevant transcript and protein sequences essential to the description of gene variants. In principle, LRGs can be created for any organism, not just human. In addition, we recognize the need to respect legacy numbering systems for exons and amino acids and the LRG format takes account of these. We hope that widespread adoption of LRGs - which will be created and maintained by the NCBI and the European Bioinformatics Institute (EBI) - along with consistent use of the Human Genome Variation Society (HGVS)- approved variant nomenclature will reduce errors in the reporting of variants in the literature and improve communication about variants aff ecting human health. Further information can be found on the LRG web site (http://www.lrg-sequence.org). 2010 Dalgleish et al.; licensee BioMed Central Ltd.

  8. Coval: improving alignment quality and variant calling accuracy for next-generation sequencing data.

    Directory of Open Access Journals (Sweden)

    Shunichi Kosugi

    Full Text Available Accurate identification of DNA polymorphisms using next-generation sequencing technology is challenging because of a high rate of sequencing error and incorrect mapping of reads to reference genomes. Currently available short read aligners and DNA variant callers suffer from these problems. We developed the Coval software to improve the quality of short read alignments. Coval is designed to minimize the incidence of spurious alignment of short reads, by filtering mismatched reads that remained in alignments after local realignment and error correction of mismatched reads. The error correction is executed based on the base quality and allele frequency at the non-reference positions for an individual or pooled sample. We demonstrated the utility of Coval by applying it to simulated genomes and experimentally obtained short-read data of rice, nematode, and mouse. Moreover, we found an unexpectedly large number of incorrectly mapped reads in 'targeted' alignments, where the whole genome sequencing reads had been aligned to a local genomic segment, and showed that Coval effectively eliminated such spurious alignments. We conclude that Coval significantly improves the quality of short-read sequence alignments, thereby increasing the calling accuracy of currently available tools for SNP and indel identification. Coval is available at http://sourceforge.net/projects/coval105/.

  9. Improving High-Throughput Sequencing Approaches for Reconstructing the Evolutionary Dynamics of Upper Paleolithic Human Groups

    DEFF Research Database (Denmark)

    Seguin-Orlando, Andaine

    the development and testing of innovative molecular approaches aiming at improving the amount of informative HTS data one can recover from ancient DNA extracts. We have characterized important ligation and amplification biases in the sequencing library building and enrichment steps, which can impede further...... been mainly driven by the development of High-Throughput DNA Sequencing (HTS) technologies but also by the implementation of novel molecular tools tailored to the manipulation of ultra short and damaged DNA molecules. Our ability to retrieve traces of genetic material has tremendously improved, pushing......, that impact on the overall efficacy of the method. In a second part, we implemented some of these molecular tools to the processing of five Upper Paleolithic human samples from the Kostenki and Sunghir sites in Western Eurasia, in order to reconstruct the deep genomic history of European populations...

  10. IMPROVING THE SKILL AND THE INTEREST OF WRITING ADVERTISEMENTS AND POSTERS THROUGH ESA SEQUENCE

    OpenAIRE

    author Fatma Yuniarti

    2015-01-01

    The action reserach aims at improving the students’ writing skill especially to write advertsements and posters. Both are the short functional texts to be learned at the second semester. According to the data on pre cycle, the students of second semester got difficulties to write advertisements and posters. A treatment was necessary to help the students overcome their problem. To consider the related literature, the writer decided to implement ESA sequence (Harmer 2001) in the class. The elem...

  11. IMPROVING THE SKILL AND THE INTEREST OF WRITING ADVERTISEMENTS AND POSTERS THROUGH ESA SEQUENCE

    Directory of Open Access Journals (Sweden)

    author Fatma Yuniarti

    2015-01-01

    Full Text Available The action reserach aims at improving the students’ writing skill especially to write advertsements and posters. Both are the short functional texts to be learned at the second semester. According to the data on pre cycle, the students of second semester got difficulties to write advertisements and posters. A treatment was necessary to help the students overcome their problem. To consider the related literature, the writer decided to implement ESA sequence (Harmer 2001 in the class. The elements of teaching in ESA Sequence are Engage (to arouse the students’ interests, Study (learn the language focus, and Activate (use the language freely and communicatively. The data were taken from the test of the linguistic competence mastery, the students writing, and the questionnaire. The students’ linguistic competence got increased as shown by the score (58 in pre-cycle, 66 in cycle 1, and 70 in cycle 2. The students’ ability to write the short functional texts also get improved as indicated by the average score on writing tasks (53 in pre-cycle, 63 in cycle 1, 72 in cycle 2. The interest also gets better as shown by the score of the questionnaire (22,3 in pre-cycle, 33,5 in cycle 1, and 37 in cycle 2. It means ESA Sequence can improve the studets’ ability to write advertisements and posters.Key words : advertisement, ESA (Engange Study Activate, poster

  12. Molecular footprints of domestication and improvement in soybean revealed by whole genome re-sequencing

    DEFF Research Database (Denmark)

    Li, Ying-hui; Zhao, Shan-cen; Ma, Jian-xin

    2013-01-01

    and genetic improvement were identified.CONCLUSIONS:Given the uniqueness of the soybean germplasm sequenced, this study drew a clear picture of human-mediated evolution of the soybean genomes. The genomic resources and information provided by this study would also facilitate the discovery of genes......BACKGROUND:Artificial selection played an important role in the origin of modern Glycine max cultivars from the wild soybean Glycine soja. To elucidate the consequences of artificial selection accompanying the domestication and modern improvement of soybean, 25 new and 30 published whole-genome re...

  13. Open-Source Sequence Clustering Methods Improve the State Of the Art.

    Science.gov (United States)

    Kopylova, Evguenia; Navas-Molina, Jose A; Mercier, Céline; Xu, Zhenjiang Zech; Mahé, Frédéric; He, Yan; Zhou, Hong-Wei; Rognes, Torbjørn; Caporaso, J Gregory; Knight, Rob

    2016-01-01

    Sequence clustering is a common early step in amplicon-based microbial community analysis, when raw sequencing reads are clustered into operational taxonomic units (OTUs) to reduce the run time of subsequent analysis steps. Here, we evaluated the performance of recently released state-of-the-art open-source clustering software products, namely, OTUCLUST, Swarm, SUMACLUST, and SortMeRNA, against current principal options (UCLUST and USEARCH) in QIIME, hierarchical clustering methods in mothur, and USEARCH's most recent clustering algorithm, UPARSE. All the latest open-source tools showed promising results, reporting up to 60% fewer spurious OTUs than UCLUST, indicating that the underlying clustering algorithm can vastly reduce the number of these derived OTUs. Furthermore, we observed that stringent quality filtering, such as is done in UPARSE, can cause a significant underestimation of species abundance and diversity, leading to incorrect biological results. Swarm, SUMACLUST, and SortMeRNA have been included in the QIIME 1.9.0 release. IMPORTANCE Massive collections of next-generation sequencing data call for fast, accurate, and easily accessible bioinformatics algorithms to perform sequence clustering. A comprehensive benchmark is presented, including open-source tools and the popular USEARCH suite. Simulated, mock, and environmental communities were used to analyze sensitivity, selectivity, species diversity (alpha and beta), and taxonomic composition. The results demonstrate that recent clustering algorithms can significantly improve accuracy and preserve estimated diversity without the application of aggressive filtering. Moreover, these tools are all open source, apply multiple levels of multithreading, and scale to the demands of modern next-generation sequencing data, which is essential for the analysis of massive multidisciplinary studies such as the Earth Microbiome Project (EMP) (J. A. Gilbert, J. K. Jansson, and R. Knight, BMC Biol 12:69, 2014, http

  14. Partial Transmit Sequence Optimization Using Improved Harmony Search Algorithm for PAPR Reduction in OFDM

    Directory of Open Access Journals (Sweden)

    Mangal Singh

    2017-12-01

    Full Text Available This paper considers the use of the Partial Transmit Sequence (PTS technique to reduce the Peak‐to‐Average Power Ratio (PAPR of an Orthogonal Frequency Division Multiplexing signal in wireless communication systems. Search complexity is very high in the traditional PTS scheme because it involves an extensive random search over all combinations of allowed phase vectors, and it increases exponentially with the number of phase vectors. In this paper, a suboptimal metaheuristic algorithm for phase optimization based on an improved harmony search (IHS is applied to explore the optimal combination of phase vectors that provides improved performance compared with existing evolutionary algorithms such as the harmony search algorithm and firefly algorithm. IHS enhances the accuracy and convergence rate of the conventional algorithms with very few parameters to adjust. Simulation results show that an improved harmony search‐based PTS algorithm can achieve a significant reduction in PAPR using a simple network structure compared with conventional algorithms.

  15. Microfluidic screening and whole-genome sequencing identifies mutations associated with improved protein secretion by yeast

    DEFF Research Database (Denmark)

    Huang, Mingtao; Bai, Yunpeng; Sjostrom, Staffan L.

    2015-01-01

    There is an increasing demand for biotech-based production of recombinant proteins for use as pharmaceuticals in the food and feed industry and in industrial applications. Yeast Saccharomyces cerevisiae is among preferred cell factories for recombinant protein production, and there is increasing...... interest in improving its protein secretion capacity. Due to the complexity of the secretory machinery in eukaryotic cells, it is difficult to apply rational engineering for construction of improved strains. Here we used high-throughput microfluidics for the screening of yeast libraries, generated by UV...... mutagenesis. Several screening and sorting rounds resulted in the selection of eight yeast clones with significantly improved secretion of recombinant a-amylase. Efficient secretion was genetically stable in the selected clones. We performed whole-genome sequencing of the eight clones and identified 330...

  16. Revisiting the phylogeny of Zoanthidea (Cnidaria: Anthozoa): Staggered alignment of hypervariable sequences improves species tree inference.

    Science.gov (United States)

    Swain, Timothy D

    2018-01-01

    The recent rapid proliferation of novel taxon identification in the Zoanthidea has been accompanied by a parallel propagation of gene trees as a tool of species discovery, but not a corresponding increase in our understanding of phylogeny. This disparity is caused by the trade-off between the capabilities of automated DNA sequence alignment and data content of genes applied to phylogenetic inference in this group. Conserved genes or segments are easily aligned across the order, but produce poorly resolved trees; hypervariable genes or segments contain the evolutionary signal necessary for resolution and robust support, but sequence alignment is daunting. Staggered alignments are a form of phylogeny-informed sequence alignment composed of a mosaic of local and universal regions that allow phylogenetic inference to be applied to all nucleotides from both hypervariable and conserved gene segments. Comparisons between species tree phylogenies inferred from all data (staggered alignment) and hypervariable-excluded data (standard alignment) demonstrate improved confidence and greater topological agreement with other sources of data for the complete-data tree. This novel phylogeny is the most comprehensive to date (in terms of taxa and data) and can serve as an expandable tool for evolutionary hypothesis testing in the Zoanthidea. Spanish language abstract available in Text S1. Translation by L. O. Swain, DePaul University, Chicago, Illinois, 60604, USA. Copyright © 2017 Elsevier Inc. All rights reserved.

  17. Improved analyses of human mtDNA sequences support a recent African origin for Homo sapiens.

    Science.gov (United States)

    Penny, D; Steel, M; Waddell, P J; Hendy, M D

    1995-09-01

    New quantitative methods are applied to the 135 human mitochondrial sequences from the Vigilant et al. data set. General problems in analyzing large numbers of short sequences are discussed, and an improved strategy is suggested. A key feature is to focus not on individual trees but on the general "landscape" of trees. Over 1,000 searches were made from random starting trees with only one tree (a local optimum) being retained each time, thereby ensuring optima were found independently. A new tree comparison metric was developed that is unaffected by rearrangements of trees around many very short internal edges. Use of this metric showed that downweighting hypervariable sites revealed more evolutionary structure than studies that weighted all sites equally. Our results are consistent with convergence toward a global optimum. Crucial features are that the best optima show very strong regional differentiation, a common group of 49 African sequences is found in all the best optima, and the best optima contain the 16 !Kung sequences in a separate group of San people. The other 86 sequences form a heterogeneous mixture of Africans, Europeans, Australopapuans, and Asians. Thus all major human lineages occur in Africa, but only a subset occurs in the rest of the world. The existence of these African-only groups strongly contradicts multiregional theories for the origin of Homo sapiens that require widespread migration and interbreeding over the entire range of H. erectus. Only when the multiregional model is rejected is it appropriate to consider the root, based on a single locus, to be the center of origin of a population (otherwise different loci could give alternative geographic positions for the root). For this data, several methods locate the root within the group of 49 African sequences and are thus consistent with the recent African origin of H. sapiens. We demonstrate that the time of the last common ancestor cannot be the time of major expansion in human numbers

  18. Improved Ancestry Estimation for both Genotyping and Sequencing Data using Projection Procrustes Analysis and Genotype Imputation

    Science.gov (United States)

    Wang, Chaolong; Zhan, Xiaowei; Liang, Liming; Abecasis, Gonçalo R.; Lin, Xihong

    2015-01-01

    Accurate estimation of individual ancestry is important in genetic association studies, especially when a large number of samples are collected from multiple sources. However, existing approaches developed for genome-wide SNP data do not work well with modest amounts of genetic data, such as in targeted sequencing or exome chip genotyping experiments. We propose a statistical framework to estimate individual ancestry in a principal component ancestry map generated by a reference set of individuals. This framework extends and improves upon our previous method for estimating ancestry using low-coverage sequence reads (LASER 1.0) to analyze either genotyping or sequencing data. In particular, we introduce a projection Procrustes analysis approach that uses high-dimensional principal components to estimate ancestry in a low-dimensional reference space. Using extensive simulations and empirical data examples, we show that our new method (LASER 2.0), combined with genotype imputation on the reference individuals, can substantially outperform LASER 1.0 in estimating fine-scale genetic ancestry. Specifically, LASER 2.0 can accurately estimate fine-scale ancestry within Europe using either exome chip genotypes or targeted sequencing data with off-target coverage as low as 0.05×. Under the framework of LASER 2.0, we can estimate individual ancestry in a shared reference space for samples assayed at different loci or by different techniques. Therefore, our ancestry estimation method will accelerate discovery in disease association studies not only by helping model ancestry within individual studies but also by facilitating combined analysis of genetic data from multiple sources. PMID:26027497

  19. Enzyme sequence similarity improves the reaction alignment method for cross-species pathway comparison

    Energy Technology Data Exchange (ETDEWEB)

    Ovacik, Meric A. [Chemical and Biochemical Engineering Department, Rutgers University, Piscataway, NJ 08854 (United States); Androulakis, Ioannis P., E-mail: yannis@rci.rutgers.edu [Chemical and Biochemical Engineering Department, Rutgers University, Piscataway, NJ 08854 (United States); Biomedical Engineering Department, Rutgers University, Piscataway, NJ 08854 (United States)

    2013-09-15

    Pathway-based information has become an important source of information for both establishing evolutionary relationships and understanding the mode of action of a chemical or pharmaceutical among species. Cross-species comparison of pathways can address two broad questions: comparison in order to inform evolutionary relationships and to extrapolate species differences used in a number of different applications including drug and toxicity testing. Cross-species comparison of metabolic pathways is complex as there are multiple features of a pathway that can be modeled and compared. Among the various methods that have been proposed, reaction alignment has emerged as the most successful at predicting phylogenetic relationships based on NCBI taxonomy. We propose an improvement of the reaction alignment method by accounting for sequence similarity in addition to reaction alignment method. Using nine species, including human and some model organisms and test species, we evaluate the standard and improved comparison methods by analyzing glycolysis and citrate cycle pathways conservation. In addition, we demonstrate how organism comparison can be conducted by accounting for the cumulative information retrieved from nine pathways in central metabolism as well as a more complete study involving 36 pathways common in all nine species. Our results indicate that reaction alignment with enzyme sequence similarity results in a more accurate representation of pathway specific cross-species similarities and differences based on NCBI taxonomy.

  20. Enzyme sequence similarity improves the reaction alignment method for cross-species pathway comparison

    International Nuclear Information System (INIS)

    Ovacik, Meric A.; Androulakis, Ioannis P.

    2013-01-01

    Pathway-based information has become an important source of information for both establishing evolutionary relationships and understanding the mode of action of a chemical or pharmaceutical among species. Cross-species comparison of pathways can address two broad questions: comparison in order to inform evolutionary relationships and to extrapolate species differences used in a number of different applications including drug and toxicity testing. Cross-species comparison of metabolic pathways is complex as there are multiple features of a pathway that can be modeled and compared. Among the various methods that have been proposed, reaction alignment has emerged as the most successful at predicting phylogenetic relationships based on NCBI taxonomy. We propose an improvement of the reaction alignment method by accounting for sequence similarity in addition to reaction alignment method. Using nine species, including human and some model organisms and test species, we evaluate the standard and improved comparison methods by analyzing glycolysis and citrate cycle pathways conservation. In addition, we demonstrate how organism comparison can be conducted by accounting for the cumulative information retrieved from nine pathways in central metabolism as well as a more complete study involving 36 pathways common in all nine species. Our results indicate that reaction alignment with enzyme sequence similarity results in a more accurate representation of pathway specific cross-species similarities and differences based on NCBI taxonomy

  1. Alternative Enzymes Lead to Improvements in Sequence Coverage and PTM Analysis

    Science.gov (United States)

    Hooper, Kyle; Rosenblatt, Michael; Urh, Marjeta; Saveliev, Sergei; Hosfield, Chris; Kobs, Gary; Ford, Michael; Jones, Richard; Amunugama, Ravi; Allen, David; Brazas, Robert

    2013-01-01

    The profiling of proteins using biological mass spectrometry (bottom up proteomics) most commonly requires trypsin. Trypsin is advantageous in that it produces peptides of optimal charge and size. However, for applications in which the proteins under investigation are part of a complex mixture or not isolated at high levels (i.e. low ng from an immunoprecipitation), sequence coverage is rarely complete. In addition, we have found that in several cases, like phosphorylation, acetylation, and methylation, alternative proteases are required to prepare peptides suitable for MS detection. This poster will provide specific examples which demonstrate this observation. For example, the application of a combined Trypsin/ Lys-C mixture reduces the number of missed cleavages by more than 3-fold producing samples with lower CV's (for biological replicates). The mixture is also well-suited for the complete proteolysis of hydrophobic, compact proteins. The addition of chymotrypsin and elastase has been found to be useful for identifying phosphorylation sites on proteins, especially on sequences where the site of phosphorylation inhibits trypsin (i.e. proximal to K or R). Many epigenetic applications have focused on histone modifications, like lysine acetylation and arginine methylation. Alternative proteases like Asp-N, Glu-C, and chymotrypsin have been especially useful given the fact that the modified K and R residues are resistant to c-terminal cleavage by trypsin. Finally, in the case of serum profiling, the addition of the endoglycosidase, PNGase F has been found to improve sequence coverage due to the removal of N-linked glycans.

  2. Complete exon sequencing of all known Usher syndrome genes greatly improves molecular diagnosis.

    Science.gov (United States)

    Bonnet, Crystel; Grati, M'hamed; Marlin, Sandrine; Levilliers, Jacqueline; Hardelin, Jean-Pierre; Parodi, Marine; Niasme-Grare, Magali; Zelenika, Diana; Délépine, Marc; Feldmann, Delphine; Jonard, Laurence; El-Amraoui, Aziz; Weil, Dominique; Delobel, Bruno; Vincent, Christophe; Dollfus, Hélène; Eliot, Marie-Madeleine; David, Albert; Calais, Catherine; Vigneron, Jacqueline; Montaut-Verient, Bettina; Bonneau, Dominique; Dubin, Jacques; Thauvin, Christel; Duvillard, Alain; Francannet, Christine; Mom, Thierry; Lacombe, Didier; Duriez, Françoise; Drouin-Garraud, Valérie; Thuillier-Obstoy, Marie-Françoise; Sigaudy, Sabine; Frances, Anne-Marie; Collignon, Patrick; Challe, Georges; Couderc, Rémy; Lathrop, Mark; Sahel, José-Alain; Weissenbach, Jean; Petit, Christine; Denoyelle, Françoise

    2011-05-11

    Usher syndrome (USH) combines sensorineural deafness with blindness. It is inherited in an autosomal recessive mode. Early diagnosis is critical for adapted educational and patient management choices, and for genetic counseling. To date, nine causative genes have been identified for the three clinical subtypes (USH1, USH2 and USH3). Current diagnostic strategies make use of a genotyping microarray that is based on the previously reported mutations. The purpose of this study was to design a more accurate molecular diagnosis tool. We sequenced the 366 coding exons and flanking regions of the nine known USH genes, in 54 USH patients (27 USH1, 21 USH2 and 6 USH3). Biallelic mutations were detected in 39 patients (72%) and monoallelic mutations in an additional 10 patients (18.5%). In addition to biallelic mutations in one of the USH genes, presumably pathogenic mutations in another USH gene were detected in seven patients (13%), and another patient carried monoallelic mutations in three different USH genes. Notably, none of the USH3 patients carried detectable mutations in the only known USH3 gene, whereas they all carried mutations in USH2 genes. Most importantly, the currently used microarray would have detected only 30 of the 81 different mutations that we found, of which 39 (48%) were novel. Based on these results, complete exon sequencing of the currently known USH genes stands as a definite improvement for molecular diagnosis of this disease, which is of utmost importance in the perspective of gene therapy.

  3. Team-based learning to improve learning outcomes in a therapeutics course sequence.

    Science.gov (United States)

    Bleske, Barry E; Remington, Tami L; Wells, Trisha D; Dorsch, Michael P; Guthrie, Sally K; Stumpf, Janice L; Alaniz, Marissa C; Ellingrod, Vicki L; Tingen, Jeffrey M

    2014-02-12

    To compare the effectiveness of team-based learning (TBL) to that of traditional lectures on learning outcomes in a therapeutics course sequence. A revised TBL curriculum was implemented in a therapeutic course sequence. Multiple choice and essay questions identical to those used to test third-year students (P3) taught using a traditional lecture format were administered to the second-year pharmacy students (P2) taught using the new TBL format. One hundred thirty-one multiple-choice questions were evaluated; 79 tested recall of knowledge and 52 tested higher level, application of knowledge. For the recall questions, students taught through traditional lectures scored significantly higher compared to the TBL students (88%±12% vs. 82%±16%, p=0.01). For the questions assessing application of knowledge, no differences were seen between teaching pedagogies (81%±16% vs. 77%±20%, p=0.24). Scores on essay questions and the number of students who achieved 100% were also similar between groups. Transition to a TBL format from a traditional lecture-based pedagogy allowed P2 students to perform at a similar level as students with an additional year of pharmacy education on application of knowledge type questions. However, P3 students outperformed P2 students regarding recall type questions and overall. Further assessment of long-term learning outcomes is needed to determine if TBL produces more persistent learning and improved application in clinical settings.

  4. Improving transcriptome assembly through error correction of high-throughput sequence reads

    Directory of Open Access Journals (Sweden)

    Matthew D. MacManes

    2013-07-01

    Full Text Available The study of functional genomics, particularly in non-model organisms, has been dramatically improved over the last few years by the use of transcriptomes and RNAseq. While these studies are potentially extremely powerful, a computationally intensive procedure, the de novo construction of a reference transcriptome must be completed as a prerequisite to further analyses. The accurate reference is critically important as all downstream steps, including estimating transcript abundance are critically dependent on the construction of an accurate reference. Though a substantial amount of research has been done on assembly, only recently have the pre-assembly procedures been studied in detail. Specifically, several stand-alone error correction modules have been reported on and, while they have shown to be effective in reducing errors at the level of sequencing reads, how error correction impacts assembly accuracy is largely unknown. Here, we show via use of a simulated and empiric dataset, that applying error correction to sequencing reads has significant positive effects on assembly accuracy, and should be applied to all datasets. A complete collection of commands which will allow for the production of Reptile corrected reads is available at https://github.com/macmanes/error_correction/tree/master/scripts and as File S1.

  5. Method for improving accident sequence recognition in nuclear power plant control rooms

    International Nuclear Information System (INIS)

    Heising, C.D.; Dinsmore, S.C.

    1983-01-01

    This work adapts fault trees from plant-specific probabilistic risk analyses (PRAs) to construct and quantitatively evaluate an alarm analysis system for the engineered safety features (ESFs). The purpose is to help improve reactor operator recognition and identification of potential accident sequences. The PRA system fault trees provide system failure mode information which can be used to construct alarm trees. These alarm trees provide a framework for assessing the plant indicators so that the plant conditions are made more readily apparent to plant personnel. In the alarm tree, possible states of each instrumented alarem are identified as true or false. In addition, a warning status is defined and integrated into an alarm analysis routine. The impact of this additional status conditioned on the Boolean laws used to evaluate the alarm trees is examined. An application is described for BWR high pressure coolant injection system (HPCI) that would be utilized during many severe reactor accidents

  6. Complete exon sequencing of all known Usher syndrome genes greatly improves molecular diagnosis

    Directory of Open Access Journals (Sweden)

    Lacombe Didier

    2011-05-01

    Full Text Available Abstract Background Usher syndrome (USH combines sensorineural deafness with blindness. It is inherited in an autosomal recessive mode. Early diagnosis is critical for adapted educational and patient management choices, and for genetic counseling. To date, nine causative genes have been identified for the three clinical subtypes (USH1, USH2 and USH3. Current diagnostic strategies make use of a genotyping microarray that is based on the previously reported mutations. The purpose of this study was to design a more accurate molecular diagnosis tool. Methods We sequenced the 366 coding exons and flanking regions of the nine known USH genes, in 54 USH patients (27 USH1, 21 USH2 and 6 USH3. Results Biallelic mutations were detected in 39 patients (72% and monoallelic mutations in an additional 10 patients (18.5%. In addition to biallelic mutations in one of the USH genes, presumably pathogenic mutations in another USH gene were detected in seven patients (13%, and another patient carried monoallelic mutations in three different USH genes. Notably, none of the USH3 patients carried detectable mutations in the only known USH3 gene, whereas they all carried mutations in USH2 genes. Most importantly, the currently used microarray would have detected only 30 of the 81 different mutations that we found, of which 39 (48% were novel. Conclusions Based on these results, complete exon sequencing of the currently known USH genes stands as a definite improvement for molecular diagnosis of this disease, which is of utmost importance in the perspective of gene therapy.

  7. Using Growing Self-Organising Maps to Improve the Binning Process in Environmental Whole-Genome Shotgun Sequencing

    Science.gov (United States)

    Chan, Chon-Kit Kenneth; Hsu, Arthur L.; Tang, Sen-Lin; Halgamuge, Saman K.

    2008-01-01

    Metagenomic projects using whole-genome shotgun (WGS) sequencing produces many unassembled DNA sequences and small contigs. The step of clustering these sequences, based on biological and molecular features, is called binning. A reported strategy for binning that combines oligonucleotide frequency and self-organising maps (SOM) shows high potential. We improve this strategy by identifying suitable training features, implementing a better clustering algorithm, and defining quantitative measures for assessing results. We investigated the suitability of each of di-, tri-, tetra-, and pentanucleotide frequencies. The results show that dinucleotide frequency is not a sufficiently strong signature for binning 10 kb long DNA sequences, compared to the other three. Furthermore, we observed that increased order of oligonucleotide frequency may deteriorate the assignment result in some cases, which indicates the possible existence of optimal species-specific oligonucleotide frequency. We replaced SOM with growing self-organising map (GSOM) where comparable results are obtained while gaining 7%–15% speed improvement. PMID:18288261

  8. Combining sequence-based prediction methods and circular dichroism and infrared spectroscopic data to improve protein secondary structure determinations

    Directory of Open Access Journals (Sweden)

    Lees Jonathan G

    2008-01-01

    Full Text Available Abstract Background A number of sequence-based methods exist for protein secondary structure prediction. Protein secondary structures can also be determined experimentally from circular dichroism, and infrared spectroscopic data using empirical analysis methods. It has been proposed that comparable accuracy can be obtained from sequence-based predictions as from these biophysical measurements. Here we have examined the secondary structure determination accuracies of sequence prediction methods with the empirically determined values from the spectroscopic data on datasets of proteins for which both crystal structures and spectroscopic data are available. Results In this study we show that the sequence prediction methods have accuracies nearly comparable to those of spectroscopic methods. However, we also demonstrate that combining the spectroscopic and sequences techniques produces significant overall improvements in secondary structure determinations. In addition, combining the extra information content available from synchrotron radiation circular dichroism data with sequence methods also shows improvements. Conclusion Combining sequence prediction with experimentally determined spectroscopic methods for protein secondary structure content significantly enhances the accuracy of the overall results obtained.

  9. Parametric and non-parametric masking of randomness in sequence alignments can be improved and leads to better resolved trees

    Directory of Open Access Journals (Sweden)

    von Reumont Björn M

    2010-03-01

    Full Text Available Abstract Background Methods of alignment masking, which refers to the technique of excluding alignment blocks prior to tree reconstructions, have been successful in improving the signal-to-noise ratio in sequence alignments. However, the lack of formally well defined methods to identify randomness in sequence alignments has prevented a routine application of alignment masking. In this study, we compared the effects on tree reconstructions of the most commonly used profiling method (GBLOCKS which uses a predefined set of rules in combination with alignment masking, with a new profiling approach (ALISCORE based on Monte Carlo resampling within a sliding window, using different data sets and alignment methods. While the GBLOCKS approach excludes variable sections above a certain threshold which choice is left arbitrary, the ALISCORE algorithm is free of a priori rating of parameter space and therefore more objective. Results ALISCORE was successfully extended to amino acids using a proportional model and empirical substitution matrices to score randomness in multiple sequence alignments. A complex bootstrap resampling leads to an even distribution of scores of randomly similar sequences to assess randomness of the observed sequence similarity. Testing performance on real data, both masking methods, GBLOCKS and ALISCORE, helped to improve tree resolution. The sliding window approach was less sensitive to different alignments of identical data sets and performed equally well on all data sets. Concurrently, ALISCORE is capable of dealing with different substitution patterns and heterogeneous base composition. ALISCORE and the most relaxed GBLOCKS gap parameter setting performed best on all data sets. Correspondingly, Neighbor-Net analyses showed the most decrease in conflict. Conclusions Alignment masking improves signal-to-noise ratio in multiple sequence alignments prior to phylogenetic reconstruction. Given the robust performance of alignment

  10. Improvement of the banana "Musa acuminata" reference sequence using NGS data and semi-automated bioinformatics methods.

    Science.gov (United States)

    Martin, Guillaume; Baurens, Franc-Christophe; Droc, Gaëtan; Rouard, Mathieu; Cenci, Alberto; Kilian, Andrzej; Hastie, Alex; Doležel, Jaroslav; Aury, Jean-Marc; Alberti, Adriana; Carreel, Françoise; D'Hont, Angélique

    2016-03-16

    Recent advances in genomics indicate functional significance of a majority of genome sequences and their long range interactions. As a detailed examination of genome organization and function requires very high quality genome sequence, the objective of this study was to improve reference genome assembly of banana (Musa acuminata). We have developed a modular bioinformatics pipeline to improve genome sequence assemblies, which can handle various types of data. The pipeline comprises several semi-automated tools. However, unlike classical automated tools that are based on global parameters, the semi-automated tools proposed an expert mode for a user who can decide on suggested improvements through local compromises. The pipeline was used to improve the draft genome sequence of Musa acuminata. Genotyping by sequencing (GBS) of a segregating population and paired-end sequencing were used to detect and correct scaffold misassemblies. Long insert size paired-end reads identified scaffold junctions and fusions missed by automated assembly methods. GBS markers were used to anchor scaffolds to pseudo-molecules with a new bioinformatics approach that avoids the tedious step of marker ordering during genetic map construction. Furthermore, a genome map was constructed and used to assemble scaffolds into super scaffolds. Finally, a consensus gene annotation was projected on the new assembly from two pre-existing annotations. This approach reduced the total Musa scaffold number from 7513 to 1532 (i.e. by 80%), with an N50 that increased from 1.3 Mb (65 scaffolds) to 3.0 Mb (26 scaffolds). 89.5% of the assembly was anchored to the 11 Musa chromosomes compared to the previous 70%. Unknown sites (N) were reduced from 17.3 to 10.0%. The release of the Musa acuminata reference genome version 2 provides a platform for detailed analysis of banana genome variation, function and evolution. Bioinformatics tools developed in this work can be used to improve genome sequence assemblies in

  11. Using expected sequence features to improve basecalling accuracy of amplicon pyrosequencing data

    DEFF Research Database (Denmark)

    Rask, Thomas Salhøj; Petersen, Bent; Chen, Donald S.

    2016-01-01

    . The new basecalling method described here, named Multipass, implements a probabilistic framework for working with the raw flowgrams obtained by pyrosequencing. For each sequence variant Multipass calculates the likelihood and nucleotide sequence of several most likely sequences given the flowgram data....... This probabilistic approach enables integration of basecalling into a larger model where other parameters can be incorporated, such as the likelihood for observing a full-length open reading frame at the targeted region. We apply the method to 454 amplicon pyrosequencing data obtained from a malaria virulence gene...... family, where Multipass generates 20 % more error-free sequences than current state of the art methods, and provides sequence characteristics that allow generation of a set of high confidence error-free sequences. This novel method can be used to increase accuracy of existing and future amplicon...

  12. Improving ESL Writing Using an Online Formulaic Sequence Word-Combination Checker

    Science.gov (United States)

    Grami, G. M. A.; Alkazemi, B. Y.

    2016-01-01

    Writing correct English sentences can be challenging. Furthermore, writing correct formulaic sequences can be especially difficult because accepted combinations do not follow clear rules governing which words appear together in a sequence. One solution is to provide examples of correct usage accompanied by statistical feedback from web-based…

  13. Comparative sequence analysis of Sordaria macrospora and Neurospora crassa as a means to improve genome annotation.

    Science.gov (United States)

    Nowrousian, Minou; Würtz, Christian; Pöggeler, Stefanie; Kück, Ulrich

    2004-03-01

    One of the most challenging parts of large scale sequencing projects is the identification of functional elements encoded in a genome. Recently, studies of genomes of up to six different Saccharomyces species have demonstrated that a comparative analysis of genome sequences from closely related species is a powerful approach to identify open reading frames and other functional regions within genomes [Science 301 (2003) 71, Nature 423 (2003) 241]. Here, we present a comparison of selected sequences from Sordaria macrospora to their corresponding Neurospora crassa orthologous regions. Our analysis indicates that due to the high degree of sequence similarity and conservation of overall genomic organization, S. macrospora sequence information can be used to simplify the annotation of the N. crassa genome.

  14. The Implementation of "The n-term" Formula to Improve Student Ability in Determining the Rules of a Numeric Sequence

    Science.gov (United States)

    In'am, Akhsanul; Hajar, Siti

    2013-01-01

    A good-quality teacher may determines a good-quality learning, thus good-quality students will be the results. In order to have a good-quality learning, a lot of strategies and methods can be adopted. The objective of this research is to improve students' ability in determining the rules of a numeric sequence and analysing the effectiveness of the…

  15. Improving irrigation efficiency : the need for a relevant sequence of the management tools

    Science.gov (United States)

    Fayolle, Y.

    2009-04-01

    With 70 % of worldwide withdrawals, irrigation efficiency is a key issue in the overall problem of water resources. Management of water dedicated to agriculture should be improved to secure food production and save water to deal with increasing domestic and industrial demands. This paper is based on the results of a collaborative research project conducted in India with a local NGO (the Aga Khan Rural Support Programme, AKRSP(I)) during which GIS were tested. It is aimed at analyzing the efficiency of water usage in a water development programme conducted by the partner NGO in the semi-arid margins of Gujarat state. The analysis raises the question of the articulation of legal, institutional, economical, and technical tools to improve water efficiency. The NGO supervises the construction of surface water harvesting structures for irrigation purposes. Following a participatory approach, it creates and trains user groups to which the management of dams would then be devolved. User group membership depends on financial contribution to the building costs. A legal vacuum regarding surface water management combined with unequal investment capacities favor the concentration of water resources in the hands of a limited number of farmers. This causes low water use efficiency, irrigation choices being mostly oriented to high water consumptive crops and recipient farmers showing no interest in investing in water saving techniques. Our observations favor equality of access and paying more attention to the sequence in which management tools are articulated. On a national scale, as a prerequisite, water user rights as well as NGO's intervention legal framework should be clarified. On a project scale, before construction, information systems could help to identify all potential beneficiaries and optimize equality of access. It aims at reducing the volume of water per farmer to encourage them to irrigate low water consumptive crops and invest in water saving techniques. Depending

  16. Recoding method that removes inhibitory sequences and improves HIV gene expression

    Energy Technology Data Exchange (ETDEWEB)

    Rabadan, Raul; Krasnitz, Michael; Robins, Harlan; Witten, Daniela; Levine, Arnold

    2016-08-23

    The invention relates to inhibitory nucleotide signal sequences or "INS" sequences in the genomes of lentiviruses. In particular the invention relates to the AGG motif present in all viral genomes. The AGG motif may have an inhibitory effect on a virus, for example by reducing the levels of, or maintaining low steady-state levels of, viral RNAs in host cells, and inducing and/or maintaining in viral latency. In one aspect, the invention provides vaccines that contain, or are produced from, viral nucleic acids in which the AGG sequences have been mutated. In another aspect, the invention provides methods and compositions for affecting the function of the AGG motif, and methods for identifying other INS sequences in viral genomes.

  17. Finding and Improving the Key-Frames of Long Video Sequences for Face Recognition

    DEFF Research Database (Denmark)

    Nasrollahi, Kamal; Moeslund, Thomas B.

    2010-01-01

    Face recognition systems are very sensitive to the quality and resolution of their input face images. This makes such systems unreliable when working with long surveillance video sequences without employing some selection and enhancement algorithms. On the other hand, processing all the frames...... of such video sequences by any enhancement or even face recognition algorithm is demanding. Thus, there is a need for a mechanism to summarize the input video sequence to a set of key-frames and then applying an enhancement algorithm to this subset. This paper presents a system doing exactly this. The system...... uses face quality assessment to select the key-frames and a hybrid super-resolution to enhance the face image quality. The suggested system that employs a linear associator face recognizer to evaluate the enhanced results has been tested on real surveillance video sequences and the experimental results...

  18. cDNA sequencing improves the detection of P53 missense mutations in colorectal cancer

    International Nuclear Information System (INIS)

    Szybka, Malgorzata; Kordek, Radzislaw; Zakrzewska, Magdalena; Rieske, Piotr; Pasz-Walczak, Grazyna; Kulczycka-Wojdala, Dominika; Zawlik, Izabela; Stawski, Robert; Jesionek-Kupnicka, Dorota; Liberski, Pawel P

    2009-01-01

    Recently published data showed discrepancies beteween P53 cDNA and DNA sequencing in glioblastomas. We hypothesised that similar discrepancies may be observed in other human cancers. To this end, we analyzed 23 colorectal cancers for P53 mutations and gene expression using both DNA and cDNA sequencing, real-time PCR and immunohistochemistry. We found P53 gene mutations in 16 cases (15 missense and 1 nonsense). Two of the 15 cases with missense mutations showed alterations based only on cDNA, and not DNA sequencing. Moreover, in 6 of the 15 cases with a cDNA mutation those mutations were difficult to detect in the DNA sequencing, so the results of DNA analysis alone could be misinterpreted if the cDNA sequencing results had not also been available. In all those 15 cases, we observed a higher ratio of the mutated to the wild type template by cDNA analysis, but not by the DNA analysis. Interestingly, a similar overexpression of P53 mRNA was present in samples with and without P53 mutations. In terms of colorectal cancer, those discrepancies might be explained under three conditions: 1, overexpression of mutated P53 mRNA in cancer cells as compared with normal cells; 2, a higher content of cells without P53 mutation (normal cells and cells showing K-RAS and/or APC but not P53 mutation) in samples presenting P53 mutation; 3, heterozygous or hemizygous mutations of P53 gene. Additionally, for heterozygous mutations unknown mechanism(s) causing selective overproduction of mutated allele should also be considered. Our data offer new clues for studying discrepancy in P53 cDNA and DNA sequencing analysis

  19. Implementation of Amplicon Parallel Sequencing Leads to Improvement of Diagnosis and Therapy of Lung Cancer Patients.

    Science.gov (United States)

    König, Katharina; Peifer, Martin; Fassunke, Jana; Ihle, Michaela A; Künstlinger, Helen; Heydt, Carina; Stamm, Katrin; Ueckeroth, Frank; Vollbrecht, Claudia; Bos, Marc; Gardizi, Masyar; Scheffler, Matthias; Nogova, Lucia; Leenders, Frauke; Albus, Kerstin; Meder, Lydia; Becker, Kerstin; Florin, Alexandra; Rommerscheidt-Fuss, Ursula; Altmüller, Janine; Kloth, Michael; Nürnberg, Peter; Henkel, Thomas; Bikár, Sven-Ernö; Sos, Martin L; Geese, William J; Strauss, Lewis; Ko, Yon-Dschun; Gerigk, Ulrich; Odenthal, Margarete; Zander, Thomas; Wolf, Jürgen; Merkelbach-Bruse, Sabine; Buettner, Reinhard; Heukamp, Lukas C

    2015-07-01

    The Network Genomic Medicine Lung Cancer was set up to rapidly translate scientific advances into early clinical trials of targeted therapies in lung cancer performing molecular analyses of more than 3500 patients annually. Because sequential analysis of the relevant driver mutations on fixated samples is challenging in terms of workload, tissue availability, and cost, we established multiplex parallel sequencing in routine diagnostics. The aim was to analyze all therapeutically relevant mutations in lung cancer samples in a high-throughput fashion while significantly reducing turnaround time and amount of input DNA compared with conventional dideoxy sequencing of single polymerase chain reaction amplicons. In this study, we demonstrate the feasibility of a 102 amplicon multiplex polymerase chain reaction followed by sequencing on an Illumina sequencer on formalin-fixed paraffin-embedded tissue in routine diagnostics. Analysis of a validation cohort of 180 samples showed this approach to require significantly less input material and to be more reliable, robust, and cost-effective than conventional dideoxy sequencing. Subsequently, 2657 lung cancer patients were analyzed. We observed that comprehensive biomarker testing provided novel information in addition to histological diagnosis and clinical staging. In 2657 consecutively analyzed lung cancer samples, we identified driver mutations at the expected prevalence. Furthermore we found potentially targetable DDR2 mutations at a frequency of 3% in both adenocarcinomas and squamous cell carcinomas. Overall, our data demonstrate the utility of systematic sequencing analysis in a clinical routine setting and highlight the dramatic impact of such an approach on the availability of therapeutic strategies for the targeted treatment of individual cancer patients.

  20. Quality standards for DNA sequence variation databases to improve clinical management under development in Australia

    Directory of Open Access Journals (Sweden)

    B. Bennetts

    2014-09-01

    Full Text Available Despite the routine nature of comparing sequence variations identified during clinical testing to database records, few databases meet quality requirements for clinical diagnostics. To address this issue, The Royal College of Pathologists of Australasia (RCPA in collaboration with the Human Genetics Society of Australasia (HGSA, and the Human Variome Project (HVP is developing standards for DNA sequence variation databases intended for use in the Australian clinical environment. The outputs of this project will be promoted to other health systems and accreditation bodies by the Human Variome Project to support the development of similar frameworks in other jurisdictions.

  1. Use of amplicon sequencing to improve sensitivity in PCR-based detection of microbial pathogen in environmental samples.

    Science.gov (United States)

    Saingam, Prakit; Li, Bo; Yan, Tao

    2018-06-01

    DNA-based molecular detection of microbial pathogens in complex environments is still plagued by sensitivity, specificity and robustness issues. We propose to address these issues by viewing them as inadvertent consequences of requiring specific and adequate amplification (SAA) of target DNA molecules by current PCR methods. Using the invA gene of Salmonella as the model system, we investigated if next generation sequencing (NGS) can be used to directly detect target sequences in false-negative PCR reaction (PCR-NGS) in order to remove the SAA requirement from PCR. False-negative PCR and qPCR reactions were first created using serial dilutions of laboratory-prepared Salmonella genomic DNA and then analyzed directly by NGS. Target invA sequences were detected in all false-negative PCR and qPCR reactions, which lowered the method detection limits near the theoretical minimum of single gene copy detection. The capability of the PCR-NGS approach in correcting false negativity was further tested and confirmed under more environmentally relevant conditions using Salmonella-spiked stream water and sediment samples. Finally, the PCR-NGS approach was applied to ten urban stream water samples and detected invA sequences in eight samples that would be otherwise deemed Salmonella negative. Analysis of the non-target sequences in the false-negative reactions helped to identify primer dime-like short sequences as the main cause of the false negativity. Together, the results demonstrated that the PCR-NGS approach can significantly improve method sensitivity, correct false-negative detections, and enable sequence-based analysis for failure diagnostics in complex environmental samples. Copyright © 2018 Elsevier B.V. All rights reserved.

  2. Increased sequence diversity coverage improves detection of HIV-Specific T cell responses

    DEFF Research Database (Denmark)

    Frahm, N.; Kaufmann, D.E.; Yusim, K.

    2007-01-01

    The accurate identification of HIV-specific T cell responses is important for determining the relationship between immune response, viral control, and disease progression. HIV-specific immune responses are usually measured using peptide sets based on consensus sequences, which frequently miss res...

  3. A ChIP-Seq benchmark shows that sequence conservation mainly improves detection of strong transcription factor binding sites.

    Directory of Open Access Journals (Sweden)

    Tony Håndstad

    Full Text Available BACKGROUND: Transcription factors are important controllers of gene expression and mapping transcription factor binding sites (TFBS is key to inferring transcription factor regulatory networks. Several methods for predicting TFBS exist, but there are no standard genome-wide datasets on which to assess the performance of these prediction methods. Also, it is believed that information about sequence conservation across different genomes can generally improve accuracy of motif-based predictors, but it is not clear under what circumstances use of conservation is most beneficial. RESULTS: Here we use published ChIP-seq data and an improved peak detection method to create comprehensive benchmark datasets for prediction methods which use known descriptors or binding motifs to detect TFBS in genomic sequences. We use this benchmark to assess the performance of five different prediction methods and find that the methods that use information about sequence conservation generally perform better than simpler motif-scanning methods. The difference is greater on high-affinity peaks and when using short and information-poor motifs. However, if the motifs are specific and information-rich, we find that simple motif-scanning methods can perform better than conservation-based methods. CONCLUSIONS: Our benchmark provides a comprehensive test that can be used to rank the relative performance of transcription factor binding site prediction methods. Moreover, our results show that, contrary to previous reports, sequence conservation is better suited for predicting strong than weak transcription factor binding sites.

  4. Targeted Sequencing of Venom Genes from Cone Snail Genomes Improves Understanding of Conotoxin Molecular Evolution.

    Science.gov (United States)

    Phuong, Mark A; Mahardika, Gusti N

    2018-05-01

    To expand our capacity to discover venom sequences from the genomes of venomous organisms, we applied targeted sequencing techniques to selectively recover venom gene superfamilies and nontoxin loci from the genomes of 32 cone snail species (family, Conidae), a diverse group of marine gastropods that capture their prey using a cocktail of neurotoxic peptides (conotoxins). We were able to successfully recover conotoxin gene superfamilies across all species with high confidence (> 100× coverage) and used these data to provide new insights into conotoxin evolution. First, we found that conotoxin gene superfamilies are composed of one to six exons and are typically short in length (mean = ∼85 bp). Second, we expanded our understanding of the following genetic features of conotoxin evolution: 1) positive selection, where exons coding the mature toxin region were often three times more divergent than their adjacent noncoding regions, 2) expression regulation, with comparisons to transcriptome data showing that cone snails only express a fraction of the genes available in their genome (24-63%), and 3) extensive gene turnover, where Conidae species varied from 120 to 859 conotoxin gene copies. Finally, using comparative phylogenetic methods, we found that while diet specificity did not predict patterns of conotoxin evolution, dietary breadth was positively correlated with total conotoxin gene diversity. Overall, the targeted sequencing technique demonstrated here has the potential to radically increase the pace at which venom gene families are sequenced and studied, reshaping our ability to understand the impact of genetic changes on ecologically relevant phenotypes and subsequent diversification.

  5. Improving pairwise comparison of protein sequences with domain co-occurrence

    Science.gov (United States)

    Gascuel, Olivier

    2018-01-01

    Comparing and aligning protein sequences is an essential task in bioinformatics. More specifically, local alignment tools like BLAST are widely used for identifying conserved protein sub-sequences, which likely correspond to protein domains or functional motifs. However, to limit the number of false positives, these tools are used with stringent sequence-similarity thresholds and hence can miss several hits, especially for species that are phylogenetically distant from reference organisms. A solution to this problem is then to integrate additional contextual information to the procedure. Here, we propose to use domain co-occurrence to increase the sensitivity of pairwise sequence comparisons. Domain co-occurrence is a strong feature of proteins, since most protein domains tend to appear with a limited number of other domains on the same protein. We propose a method to take this information into account in a typical BLAST analysis and to construct new domain families on the basis of these results. We used Plasmodium falciparum as a case study to evaluate our method. The experimental findings showed an increase of 14% of the number of significant BLAST hits and an increase of 25% of the proteome area that can be covered with a domain. Our method identified 2240 new domains for which, in most cases, no model of the Pfam database could be linked. Moreover, our study of the quality of the new domains in terms of alignment and physicochemical properties show that they are close to that of standard Pfam domains. Source code of the proposed approach and supplementary data are available at: https://gite.lirmm.fr/menichelli/pairwise-comparison-with-cooccurrence PMID:29293498

  6. Environmental Enrichment Expedites Acquisition and Improves Flexibility on a Temporal Sequencing Task in Mice

    Directory of Open Access Journals (Sweden)

    Darius Rountree-Harrison

    2018-03-01

    Full Text Available Environmental enrichment (EE via increased opportunities for voluntary exercise, sensory stimulation and social interaction, can enhance the function of and behaviours regulated by cognitive circuits. Little is known, however, as to how this intervention affects performance on complex tasks that engage multiple, definable learning and memory systems. Accordingly, we utilised the Olfactory Temporal Order Discrimination (OTOD task which requires animals to recall and report sequence information about a series of recently encountered olfactory stimuli. This approach allowed us to compare animals raised in either enriched or standard laboratory housing conditions on a number of measures, including the acquisition of a complex discrimination task, temporal sequence recall accuracy (i.e., the ability to accurately recall a sequences of events and acuity (i.e., the ability to resolve past events that occurred in close temporal proximity, as well as cognitive flexibility tested in the style of a rule reversal and an Intra-Dimensional Shift (IDS. We found that enrichment accelerated the acquisition of the temporal order discrimination task, although neither accuracy nor acuity was affected at asymptotic performance levels. Further, while a subtle enhancement of overall performance was detected for both rule reversal and IDS versions of the task, accelerated performance recovery could only be attributed to the shift-like contingency change. These findings suggest that EE can affect specific elements of complex, multi-faceted cognitive processes.

  7. Improvement of Hyperemic Myocardial Oxygen Extraction Fraction Estimation By A Diffusion Prepared Sequence

    Science.gov (United States)

    McCommis, Kyle S.; Koktzoglou, Ioannis; Zhang, Haosen; Goldstein, Thomas A.; Northrup, Benjamin E.; Li, Debiao; Gropler, Robert J.; Zheng, Jie

    2010-01-01

    Myocardial oxygen extraction fraction (OEF) during hyperemia can be estimated using a double-inversion-recovery (DIR) prepared T2-weighted black-blood sequence. Severe irregular ECG-triggering due to elevated heart rate and/or arrhythmias may render it difficult to adequately suppress the flowing left ventricle blood signal and thus potentially cause errors in the estimates of myocardial OEF. Thus, the goal of this study was to evaluate another black-blood technique, a diffusion-weighted (DW)-prepared TSE sequence for its ability to determine regional myocardial OEF during hyperemia. Control dogs and dogs with acute coronary artery stenosis were imaged with both the DIR- and DW-prepared TSE sequences at rest and during either dipyridamole or dobutamine hyperemia. Validation of MRI OEF estimates was performed using blood sampling from the artery and coronary sinus in control dogs. The two methods showed comparable correlations with blood sampling results (R2 = 0.9). Similar OEF estimations for all dogs were observed except for the group of dogs with severe coronary stenosis during dobutamine stress. In these dogs, the DW method provided more physiologically reasonable OEF (hyperemic OEF = 0.75 ± 0.08 vs resting OEF of 0.6) than the DIR method (hyperemic OEF = 0.56 ± 0.10). DW-preparation may be a valuable alternative for more accurate oxygenation measurements during irregular ECG-triggering. PMID:20512871

  8. The Use of Non-Variant Sites to Improve the Clinical Assessment of Whole-Genome Sequence Data.

    Directory of Open Access Journals (Sweden)

    Alberto Ferrarini

    Full Text Available Genetic testing, which is now a routine part of clinical practice and disease management protocols, is often based on the assessment of small panels of variants or genes. On the other hand, continuous improvements in the speed and per-base costs of sequencing have now made whole exome sequencing (WES and whole genome sequencing (WGS viable strategies for targeted or complete genetic analysis, respectively. Standard WGS/WES data analytical workflows generally rely on calling of sequence variants respect to the reference genome sequence. However, the reference genome sequence contains a large number of sites represented by rare alleles, by known pathogenic alleles and by alleles strongly associated to disease by GWAS. It's thus critical, for clinical applications of WGS and WES, to interpret whether non-variant sites are homozygous for the reference allele or if the corresponding genotype cannot be reliably called. Here we show that an alternative analytical approach based on the analysis of both variant and non-variant sites from WGS data allows to genotype more than 92% of sites corresponding to known SNPs compared to 6% genotyped by standard variant analysis. These include homozygous reference sites of clinical interest, thus leading to a broad and comprehensive characterization of variation necessary to an accurate evaluation of disease risk. Altogether, our findings indicate that characterization of both variant and non-variant clinically informative sites in the genome is necessary to allow an accurate clinical assessment of a personal genome. Finally, we propose a highly efficient extended VCF (eVCF file format which allows to store genotype calls for sites of clinical interest while remaining compatible with current variant interpretation software.

  9. An improved classification of G-protein-coupled receptors using sequence-derived features

    Directory of Open Access Journals (Sweden)

    Peng Zhen-Ling

    2010-08-01

    Full Text Available Abstract Background G-protein-coupled receptors (GPCRs play a key role in diverse physiological processes and are the targets of almost two-thirds of the marketed drugs. The 3 D structures of GPCRs are largely unavailable; however, a large number of GPCR primary sequences are known. To facilitate the identification and characterization of novel receptors, it is therefore very valuable to develop a computational method to accurately predict GPCRs from the protein primary sequences. Results We propose a new method called PCA-GPCR, to predict GPCRs using a comprehensive set of 1497 sequence-derived features. The principal component analysis is first employed to reduce the dimension of the feature space to 32. Then, the resulting 32-dimensional feature vectors are fed into a simple yet powerful classification algorithm, called intimate sorting, to predict GPCRs at five levels. The prediction at the first level determines whether a protein is a GPCR or a non-GPCR. If it is predicted to be a GPCR, then it will be further predicted into certain family, subfamily, sub-subfamily and subtype by the classifiers at the second, third, fourth, and fifth levels, respectively. To train the classifiers applied at five levels, a non-redundant dataset is carefully constructed, which contains 3178, 1589, 4772, 4924, and 2741 protein sequences at the respective levels. Jackknife tests on this training dataset show that the overall accuracies of PCA-GPCR at five levels (from the first to the fifth can achieve up to 99.5%, 88.8%, 80.47%, 80.3%, and 92.34%, respectively. We further perform predictions on a dataset of 1238 GPCRs at the second level, and on another two datasets of 167 and 566 GPCRs respectively at the fourth level. The overall prediction accuracies of our method are consistently higher than those of the existing methods to be compared. Conclusions The comprehensive set of 1497 features is believed to be capable of capturing information about amino acid

  10. Improved therapy-success prediction with GSS estimated from clinical HIV-1 sequences.

    Science.gov (United States)

    Pironti, Alejandro; Pfeifer, Nico; Kaiser, Rolf; Walter, Hauke; Lengauer, Thomas

    2014-01-01

    Rules-based HIV-1 drug-resistance interpretation (DRI) systems disregard many amino-acid positions of the drug's target protein. The aims of this study are (1) the development of a drug-resistance interpretation system that is based on HIV-1 sequences from clinical practice rather than hard-to-get phenotypes, and (2) the assessment of the benefit of taking all available amino-acid positions into account for DRI. A dataset containing 34,934 therapy-naïve and 30,520 drug-exposed HIV-1 pol sequences with treatment history was extracted from the EuResist database and the Los Alamos National Laboratory database. 2,550 therapy-change-episode baseline sequences (TCEB) were assigned to test set A. Test set B contains 1,084 TCEB from the HIVdb TCE repository. Sequences from patients absent in the test sets were used to train three linear support vector machines to produce scores that predict drug exposure pertaining to each of 20 antiretrovirals: the first one uses the full amino-acid sequences (DEfull), the second one only considers IAS drug-resistance positions (DEonlyIAS), and the third one disregards IAS drug-resistance positions (DEnoIAS). For performance comparison, test sets A and B were evaluated with DEfull, DEnoIAS, DEonlyIAS, geno2pheno[resistance], HIVdb, ANRS, HIV-GRADE, and REGA. Clinically-validated cut-offs were used to convert the continuous output of the first four methods into susceptible-intermediate-resistant (SIR) predictions. With each method, a genetic susceptibility score (GSS) was calculated for each therapy episode in each test set by converting the SIR prediction for its compounds to integer: S=2, I=1, and R=0. The GSS were used to predict therapy success as defined by the EuResist standard datum definition. Statistical significance was assessed using a Wilcoxon signed-rank test. A comparison of the therapy-success prediction performances among the different interpretation systems for test set A can be found in Table 1, while those for test set

  11. Draft genome sequence of Cicer reticulatum L., the wild progenitor of chickpea provides a resource for agronomic trait improvement.

    Science.gov (United States)

    Gupta, Sonal; Nawaz, Kashif; Parween, Sabiha; Roy, Riti; Sahu, Kamlesh; Kumar Pole, Anil; Khandal, Hitaishi; Srivastava, Rishi; Kumar Parida, Swarup; Chattopadhyay, Debasis

    2017-02-01

    Cicer reticulatum L. is the wild progenitor of the fourth most important legume crop chickpea (C. arietinum L.). We assembled short-read sequences into 416 Mb draft genome of C. reticulatum and anchored 78% (327 Mb) of this assembly to eight linkage groups. Genome annotation predicted 25,680 protein-coding genes covering more than 90% of predicted gene space. The genome assembly shared a substantial synteny and conservation of gene orders with the genome of the model legume Medicago truncatula. Resistance gene homologs of wild and domesticated chickpeas showed high sequence homology and conserved synteny. Comparison of gene sequences and nucleotide diversity using 66 wild and domesticated chickpea accessions suggested that the desi type chickpea was genetically closer to the wild species than the kabuli type. Comparative analyses predicted gene flow between the wild and the cultivated species during domestication. Molecular diversity and population genetic structure determination using 15,096 genome-wide single nucleotide polymorphisms revealed an admixed domestication pattern among cultivated (desi and kabuli) and wild chickpea accessions belonging to three population groups reflecting significant influence of parentage or geographical origin for their cultivar-specific population classification. The assembly and the polymorphic sequence resources presented here would facilitate the study of chickpea domestication and targeted use of wild Cicer germplasms for agronomic trait improvement in chickpea. © The Author 2016. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  12. Surface coil imaging of the spine using fast sequences: Improvement of intensity profile and contrast behavior

    International Nuclear Information System (INIS)

    Requardt, H.; Deimling, M.; Weber, H.

    1986-01-01

    Sagittal and axial images obtained using a surface coil suffer from the extreme intensity profile caused by physical properties of the coil and the anatomic entity of subcutaneous fat. The authors present a measuring device that reduces these disadvantages by means of Helmholtz-type coils, and sequences that reduce the fat signal by dephasing its signal part. The extremely short repetition time (<30 msec) allows acquisition times shorter than 10 sec. Breath-holding for this short period to avoid movement artifacts is possible. Images are presented that illustrate the enhanced contrast of spinal tissue and surrounding structures. Comparisons are made with spin-echo and CHESS images

  13. Some reasoning on the improvement of the ETAS modeling at the occurrence of the 2016 central Italy seismic sequence

    Directory of Open Access Journals (Sweden)

    Anna Maria Lombardi

    2016-12-01

    Full Text Available This study presents an application of the ETAS model to the first 20 days of the 2016 central Italy sequence. Despite of the provisional nature of data, the model is able to describe the occurrence rate, but for the first hours after the mainshock occurrence. A sensitivity analysis of the model to two uncertainty sources, the model parameters and the occurrence history, shows that the second has a main role in controlling the performance of the ETAS model, more than the uncertainty on parameters. Previous results, together with the clear inability of ETAS to forecast the occurrence of a sequence before its starting time, give important suggestions about possible improvements. Here, a very preliminary attempt in this sense is presented.

  14. Prehospital rapid sequence intubation improves functional outcome for patients with severe traumatic brain injury: a randomized controlled trial.

    Science.gov (United States)

    Bernard, Stephen A; Nguyen, Vina; Cameron, Peter; Masci, Kevin; Fitzgerald, Mark; Cooper, David J; Walker, Tony; Std, B Paramed; Myles, Paul; Murray, Lynne; David; Taylor; Smith, Karen; Patrick, Ian; Edington, John; Bacon, Andrew; Rosenfeld, Jeffrey V; Judson, Rodney

    2010-12-01

    To determine whether paramedic rapid sequence intubation in patients with severe traumatic brain injury (TBI) improves neurologic outcomes at 6 months compared with intubation in the hospital. Severe TBI is associated with a high rate of mortality and long-term morbidity. Comatose patients with TBI routinely undergo endo-tracheal intubation to protect the airway, prevent hypoxia, and control ventilation. In many places, paramedics perform intubation prior to hospital arrival. However, it is unknown whether this approach improves outcomes. In a prospective, randomized, controlled trial, we assigned adults with severe TBI in an urban setting to either prehospital rapid sequence intubation by paramedics or transport to a hospital emergency department for intubation by physicians. The primary outcome measure was the median extended Glasgow Outcome Scale (GOSe) score at 6 months. Secondary end-points were favorable versus unfavorable outcome at 6 months, length of intensive care and hospital stay, and survival to hospital discharge. A total of 312 patients with severe TBI were randomly assigned to paramedic rapid sequence intubation or hospital intubation. The success rate for paramedic intubation was 97%. At 6 months, the median GOSe score was 5 (interquartile range, 1-6) in patients intubated by paramedics compared with 3 (interquartile range, 1-6) in the patients intubated at hospital (P = 0.28).The proportion of patients with favorable outcome (GOSe, 5-8) was 80 of 157 patients (51%) in the paramedic intubation group compared with 56 of 142 patients (39%) in the hospital intubation group (risk ratio, 1.28; 95% confidence interval, 1.00-1.64; P = 0.046). There were no differences in intensive care or hospital length of stay, or in survival to hospital discharge. In adults with severe TBI, prehospital rapid sequence intubation by paramedics increases the rate of favorable neurologic outcome at 6 months compared with intubation in the hospital.

  15. Instances or Sequences? Improving the State of the Art of Qualitative Research

    Directory of Open Access Journals (Sweden)

    David Silverman

    2005-09-01

    Full Text Available Numbers apparently talk. With few numbers, qualitative researchers appear to rely on examples or instances to support their analysis. Hence research reports routinely display data extracts which serve as telling instances of some claimed phenomenon. However, the use of such an evidential base rightly provokes the charge of (possible anecdotalism, i.e. choosing just those extracts which support your argument. I suggest that this methodological problem is best addressed by returning to those features of our theoretical roots which tend to distinguish what we do from the work of quantitative social scientists. Although SAUSSURE is most cited in linguistics and structural anthropology, he provides a simple rule that applies to us all. In a rebuke to our reli­ance on instances, SAUSSURE tells us "no mean­ing exists in a single item". Everything depends upon how single items (elements are articulated. One everyday activity in which the social world is articulated is through the construction of se­quences. Just as participants attend to the se­quential placing of interactional "events", so should social scientists. Using examples drawn from focus groups, fieldnotes and audiotapes, I argue that the identification of such sequences rather than the citing of instances should constitute a prime test for the adequacy of any claim about qualitative data. URN: urn:nbn:de:0114-fqs0503301

  16. Increasing Genome Sampling and Improving SNP Genotyping for Genotyping-by-Sequencing with New Combinations of Restriction Enzymes.

    Science.gov (United States)

    Fu, Yong-Bi; Peterson, Gregory W; Dong, Yibo

    2016-04-07

    Genotyping-by-sequencing (GBS) has emerged as a useful genomic approach for exploring genome-wide genetic variation. However, GBS commonly samples a genome unevenly and can generate a substantial amount of missing data. These technical features would limit the power of various GBS-based genetic and genomic analyses. Here we present software called IgCoverage for in silico evaluation of genomic coverage through GBS with an individual or pair of restriction enzymes on one sequenced genome, and report a new set of 21 restriction enzyme combinations that can be applied to enhance GBS applications. These enzyme combinations were developed through an application of IgCoverage on 22 plant, animal, and fungus species with sequenced genomes, and some of them were empirically evaluated with different runs of Illumina MiSeq sequencing in 12 plant species. The in silico analysis of 22 organisms revealed up to eight times more genome coverage for the new combinations consisted of pairing four- or five-cutter restriction enzymes than the commonly used enzyme combination PstI + MspI. The empirical evaluation of the new enzyme combination (HinfI + HpyCH4IV) in 12 plant species showed 1.7-6 times more genome coverage than PstI + MspI, and 2.3 times more genome coverage in dicots than monocots. Also, the SNP genotyping in 12 Arabidopsis and 12 rice plants revealed that HinfI + HpyCH4IV generated 7 and 1.3 times more SNPs (with 0-16.7% missing observations) than PstI + MspI, respectively. These findings demonstrate that these novel enzyme combinations can be utilized to increase genome sampling and improve SNP genotyping in various GBS applications. Copyright © 2016 Fu et al.

  17. An Improved Particle Swarm Optimization for Selective Single Machine Scheduling with Sequence Dependent Setup Costs and Downstream Demands

    Directory of Open Access Journals (Sweden)

    Kun Li

    2015-01-01

    Full Text Available This paper investigates a special single machine scheduling problem derived from practical industries, namely, the selective single machine scheduling with sequence dependent setup costs and downstream demands. Different from traditional single machine scheduling, this problem further takes into account the selection of jobs and the demands of downstream lines. This problem is formulated as a mixed integer linear programming model and an improved particle swarm optimization (PSO is proposed to solve it. To enhance the exploitation ability of the PSO, an adaptive neighborhood search with different search depth is developed based on the decision characteristics of the problem. To improve the search diversity and make the proposed PSO algorithm capable of getting out of local optimum, an elite solution pool is introduced into the PSO. Computational results based on extensive test instances show that the proposed PSO can obtain optimal solutions for small size problems and outperform the CPLEX and some other powerful algorithms for large size problems.

  18. Can baroreflex measurements with spontaneous sequence analysis be improved by also measuring breathing and by standardization of filtering strategies?

    International Nuclear Information System (INIS)

    Hollow, M R; Parkes, M J; Clutton-Brock, T H

    2011-01-01

    Baroreflex sensitivity (BRS) is known to be attenuated by inspiration and all the original BRS methodologies took this into account by measuring only in expiration. Spontaneous sequence analysis (SSA) is a non-invasive clinical tool widely used to estimate BRS in Man but does not take breathing into account. We have therefore modified it to test whether it too can detect inspiratory attenuation. Traditional SSA is also entangled with issues of distinguishing causal from random relationships between blood pressure and heart period and of the optimum choice of data filter settings. We have also tested whether the sequences our modified SSA rejects do behave as random relationships and show the limitations of the absence of filter standardization. SSA was performed on eupneic data from 1 h periods in 20 healthy subjects. Applying SSA traditionally produced a mean BRS of 23 ± 3 ms mmHg −1 . After modification to measure breathing, SSA detected significant inspiratory attenuation (11 ± 1 ms mmHg −1 ), and the mean expiratory BRS was significantly higher (26 ± 5 ms mmHg −1 ). Traditional SSA therefore underestimates BRS by an amount (3 ms mmHg −1 ) as big as the major physiological and clinical factors known to alter BRS. We show that the sequences rejected by SSA do behave like random associations between pressure and period. We also show the minimal effect of the r 2 filter and the biases that some pressure and heart period filters can introduce. We discuss whether SSA might be improved by standardization of filter settings and by also measuring breathing

  19. The Use of Next Generation Sequencing and Junction Sequence Analysis Bioinformatics to Achieve Molecular Characterization of Crops Improved Through Modern Biotechnology

    Directory of Open Access Journals (Sweden)

    David Kovalic

    2012-11-01

    Full Text Available The assessment of genetically modified (GM crops for regulatory approval currently requires a detailed molecular characterization of the DNA sequence and integrity of the transgene locus. In addition, molecular characterization is a critical component of event selection and advancement during product development. Typically, molecular characterization has relied on Southern blot analysis to establish locus and copy number along with targeted sequencing of polymerase chain reaction products spanning any inserted DNA to complete the characterization process. Here we describe the use of next generation (NexGen sequencing and junction sequence analysis bioinformatics in a new method for achieving full molecular characterization of a GM event without the need for Southern blot analysis. In this study, we examine a typical GM soybean [ (L. Merr.] line and demonstrate that this new method provides molecular characterization equivalent to the current Southern blot-based method. We also examine an event containing in vivo DNA rearrangement of multiple transfer DNA inserts to demonstrate that the new method is effective at identifying complex cases. Next generation sequencing and bioinformatics offers certain advantages over current approaches, most notably the simplicity, efficiency, and consistency of the method, and provides a viable alternative for efficiently and robustly achieving molecular characterization of GM crops.

  20. Accounting for genetic architecture improves sequence based genomic prediction for a Drosophila fitness trait.

    Science.gov (United States)

    Ober, Ulrike; Huang, Wen; Magwire, Michael; Schlather, Martin; Simianer, Henner; Mackay, Trudy F C

    2015-01-01

    The ability to predict quantitative trait phenotypes from molecular polymorphism data will revolutionize evolutionary biology, medicine and human biology, and animal and plant breeding. Efforts to map quantitative trait loci have yielded novel insights into the biology of quantitative traits, but the combination of individually significant quantitative trait loci typically has low predictive ability. Utilizing all segregating variants can give good predictive ability in plant and animal breeding populations, but gives little insight into trait biology. Here, we used the Drosophila Genetic Reference Panel to perform both a genome wide association analysis and genomic prediction for the fitness-related trait chill coma recovery time. We found substantial total genetic variation for chill coma recovery time, with a genetic architecture that differs between males and females, a small number of molecular variants with large main effects, and evidence for epistasis. Although the top additive variants explained 36% (17%) of the genetic variance among lines in females (males), the predictive ability using genomic best linear unbiased prediction and a relationship matrix using all common segregating variants was very low for females and zero for males. We hypothesized that the low predictive ability was due to the mismatch between the infinitesimal genetic architecture assumed by the genomic best linear unbiased prediction model and the true genetic architecture of chill coma recovery time. Indeed, we found that the predictive ability of the genomic best linear unbiased prediction model is markedly improved when we combine quantitative trait locus mapping with genomic prediction by only including the top variants associated with main and epistatic effects in the relationship matrix. This trait-associated prediction approach has the advantage that it yields biologically interpretable prediction models.

  1. Accounting for genetic architecture improves sequence based genomic prediction for a Drosophila fitness trait.

    Directory of Open Access Journals (Sweden)

    Ulrike Ober

    Full Text Available The ability to predict quantitative trait phenotypes from molecular polymorphism data will revolutionize evolutionary biology, medicine and human biology, and animal and plant breeding. Efforts to map quantitative trait loci have yielded novel insights into the biology of quantitative traits, but the combination of individually significant quantitative trait loci typically has low predictive ability. Utilizing all segregating variants can give good predictive ability in plant and animal breeding populations, but gives little insight into trait biology. Here, we used the Drosophila Genetic Reference Panel to perform both a genome wide association analysis and genomic prediction for the fitness-related trait chill coma recovery time. We found substantial total genetic variation for chill coma recovery time, with a genetic architecture that differs between males and females, a small number of molecular variants with large main effects, and evidence for epistasis. Although the top additive variants explained 36% (17% of the genetic variance among lines in females (males, the predictive ability using genomic best linear unbiased prediction and a relationship matrix using all common segregating variants was very low for females and zero for males. We hypothesized that the low predictive ability was due to the mismatch between the infinitesimal genetic architecture assumed by the genomic best linear unbiased prediction model and the true genetic architecture of chill coma recovery time. Indeed, we found that the predictive ability of the genomic best linear unbiased prediction model is markedly improved when we combine quantitative trait locus mapping with genomic prediction by only including the top variants associated with main and epistatic effects in the relationship matrix. This trait-associated prediction approach has the advantage that it yields biologically interpretable prediction models.

  2. TCS: a new multiple sequence alignment reliability measure to estimate alignment accuracy and improve phylogenetic tree reconstruction.

    Science.gov (United States)

    Chang, Jia-Ming; Di Tommaso, Paolo; Notredame, Cedric

    2014-06-01

    Multiple sequence alignment (MSA) is a key modeling procedure when analyzing biological sequences. Homology and evolutionary modeling are the most common applications of MSAs. Both are known to be sensitive to the underlying MSA accuracy. In this work, we show how this problem can be partly overcome using the transitive consistency score (TCS), an extended version of the T-Coffee scoring scheme. Using this local evaluation function, we show that one can identify the most reliable portions of an MSA, as judged from BAliBASE and PREFAB structure-based reference alignments. We also show how this measure can be used to improve phylogenetic tree reconstruction using both an established simulated data set and a novel empirical yeast data set. For this purpose, we describe a novel lossless alternative to site filtering that involves overweighting the trustworthy columns. Our approach relies on the T-Coffee framework; it uses libraries of pairwise alignments to evaluate any third party MSA. Pairwise projections can be produced using fast or slow methods, thus allowing a trade-off between speed and accuracy. We compared TCS with Heads-or-Tails, GUIDANCE, Gblocks, and trimAl and found it to lead to significantly better estimates of structural accuracy and more accurate phylogenetic trees. The software is available from www.tcoffee.org/Projects/tcs. © The Author 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  3. Next-generation sequencing approaches for improvement of lactic acid bacteria-fermented plant-based beverages

    Directory of Open Access Journals (Sweden)

    Jordyn Bergsveinson

    2017-01-01

    Full Text Available Plant-based beverages and milk alternatives produced from cereals and legumes have grown in popularity in recent years due to a range of consumer concerns over dairy products. These plant-based products can often have undesirable physiochemical properties related to flavour, texture, and nutrient availability and/or deficiencies. Lactic acid bacteria (LAB fermentation offers potential remediation for many of these issues, and allows consumers to retain their perception of the resultant products as natural and additive-free. Using next-generation sequencing (NGS or omics approaches to characterize LAB isolates to find those that will improve properties of plant-based beverages is the most direct way to product improvement. Although NGS/omics approaches have been extensively used for selection of LAB for use in the dairy industry, a comparable effort has not occurred for selecting LAB for fermenting plant raw substrates, save those used in producing wine and certain types of beer. Here we review the few and recent applications of NGS/omics to profile and improve LAB fermentation of various plant-based substrates for beverage production. We also identify specific issues in the production of various LAB fermented plant-based beverages that such NGS/omics applications have the power to resolve.

  4. Genetic diversity analysis of cyanogenic potential (CNp) of root among improved genotypes of cassava using simple sequence repeat markers.

    Science.gov (United States)

    Moyib, O K; Mkumbira, J; Odunola, O A; Dixon, A G

    2012-12-01

    Cyanogenic potential (CNp) of cassava constitutes a serious problem for over 500 million people who rely on the crop as their main source of calories. Genetic diversity is a key to successful crop improvement for breeding new improved variability for target traits. Forty-three improved genotypes of cassava developed by International Institute of Tropical Agriculture (ITA), Ibadan, were characterized for CNp trait using 35 Simple Sequence.Repeat (SSR) markers. Essential colorimetry picric test was used for evaluation of CNp on a color scale of 1 to 14. The CNp scores obtained ranged from 3 to 9, with a mean score of 5.48 (+/- 0.09) based on Statistical Analysis System (SAS) package. TMS M98/ 0068 (4.0 +/- 0.25) was identified as the best genotype with low CNp while TMS M98/0028 (7.75 +/- 0.25) was the worst. The 43 genotypes were assigned into 7 phenotypic groups based on rank-sum analysis in SAS. Dissimilarity analysis representatives for windows generated a phylogenetic tree with 5 clusters which represented hybridizing groups. Each of the clusters (except 4) contained low CNp genotypes that could be used for improving the high CNp genotypes in the same or near cluster. The scatter plot of the genotypes showed that there was little or no demarcation for phenotypic CNp groupings in the molecular groupings. The result of this study demonstrated that SSR markers are powerful tools for the assessment of genetic variability, and proper identification and selection of parents for genetic improvement of low CNp trait among the IITA cassava collection.

  5. HMM-ModE – Improved classification using profile hidden Markov models by optimising the discrimination threshold and modifying emission probabilities with negative training sequences

    Directory of Open Access Journals (Sweden)

    Nandi Soumyadeep

    2007-03-01

    Full Text Available Abstract Background Profile Hidden Markov Models (HMM are statistical representations of protein families derived from patterns of sequence conservation in multiple alignments and have been used in identifying remote homologues with considerable success. These conservation patterns arise from fold specific signals, shared across multiple families, and function specific signals unique to the families. The availability of sequences pre-classified according to their function permits the use of negative training sequences to improve the specificity of the HMM, both by optimizing the threshold cutoff and by modifying emission probabilities to minimize the influence of fold-specific signals. A protocol to generate family specific HMMs is described that first constructs a profile HMM from an alignment of the family's sequences and then uses this model to identify sequences belonging to other classes that score above the default threshold (false positives. Ten-fold cross validation is used to optimise the discrimination threshold score for the model. The advent of fast multiple alignment methods enables the use of the profile alignments to align the true and false positive sequences, and the resulting alignments are used to modify the emission probabilities in the original model. Results The protocol, called HMM-ModE, was validated on a set of sequences belonging to six sub-families of the AGC family of kinases. These sequences have an average sequence similarity of 63% among the group though each sub-group has a different substrate specificity. The optimisation of discrimination threshold, by using negative sequences scored against the model improves specificity in test cases from an average of 21% to 98%. Further discrimination by the HMM after modifying model probabilities using negative training sequences is provided in a few cases, the average specificity rising to 99%. Similar improvements were obtained with a sample of G-Protein coupled receptors

  6. Improved protection system for phase faults on marine vessels based on ratio between negative sequence and positive sequence of the fault current

    DEFF Research Database (Denmark)

    Ciontea, Catalin-Iosif; Hong, Qiteng; Booth, Campbell

    2018-01-01

    algorithm is implemented in a programmable digital relay embedded in a hardware-in-the-loop (HIL) test set-up that emulates a typical maritime feeder using a real-time digital simulator. The HIL set-up allows testing of the new protection method under a wide range of faults and network conditions......This study presents a new method to protect the radial feeders on marine vessels. The proposed protection method is effective against phase–phase (PP) faults and is based on evaluation of the ratio between the negative sequence and positive sequence of the fault currents. It is shown...... that the magnitude of the introduced ratio increases significantly during the PP fault, hence indicating the fault presence in an electric network. Here, the theoretical background of the new method of protection is firstly discussed, based on which the new protection algorithm is described afterwards. The proposed...

  7. Jump-and-return sandwiches: A new family of binomial-like selective inversion sequences with improved performance

    Science.gov (United States)

    Brenner, Tom; Chen, Johnny; Stait-Gardner, Tim; Zheng, Gang; Matsukawa, Shingo; Price, William S.

    2018-03-01

    A new family of binomial-like inversion sequences, named jump-and-return sandwiches (JRS), has been developed by inserting a binomial-like sequence into a standard jump-and-return sequence, discovered through use of a stochastic Genetic Algorithm optimisation. Compared to currently used binomial-like inversion sequences (e.g., 3-9-19 and W5), the new sequences afford wider inversion bands and narrower non-inversion bands with an equal number of pulses. As an example, two jump-and-return sandwich 10-pulse sequences achieved 95% inversion at offsets corresponding to 9.4% and 10.3% of the non-inversion band spacing, compared to 14.7% for the binomial-like W5 inversion sequence, i.e., they afforded non-inversion bands about two thirds the width of the W5 non-inversion band.

  8. Genetic diversity analysis in Malaysian giant prawns using expressed sequence tag microsatellite markers for stock improvement program.

    Science.gov (United States)

    Atin, K H; Christianus, A; Fatin, N; Lutas, A C; Shabanimofrad, M; Subha, B

    2017-08-17

    The Malaysian giant prawn is among the most commonly cultured species of the genus Macrobrachium. Stocks of giant prawns from four rivers in Peninsular Malaysia have been used for aquaculture over the past 25 years, which has led to repeated harvesting, restocking, and transplantation between rivers. Consequently, a stock improvement program is now important to avoid the depletion of wild stocks and the loss of genetic diversity. However, the success of such an improvement program depends on our knowledge of the genetic variation of these base populations. The aim of the current study was to estimate genetic variation and differentiation of these riverine sources using novel expressed sequence tag-microsatellite (EST-SSR) markers, which not only are informative on genetic diversity but also provide information on immune and metabolic traits. Our findings indicated that the tested stocks have inbreeding depression due to a significant deficiency in heterozygotes, and F IS was estimated as 0.15538 to 0.31938. An F-statistics analysis suggested that the stocks are composed of one large panmictic population. Among the four locations, stocks from Johor, in the southern region of the peninsular, showed higher allelic and genetic diversity than the other stocks. To overcome inbreeding problems, the Johor population could be used as a base population in a stock improvement program by crossing to the other populations. The study demonstrated that EST-SSR markers can be incorporated in future marker assisted breeding to aid the proper management of the stocks by breeders and stakeholders in Malaysia.

  9. A simple and accurate two-step long DNA sequences synthesis strategy to improve heterologous gene expression in pichia.

    Directory of Open Access Journals (Sweden)

    Jiang-Ke Yang

    Full Text Available In vitro gene chemical synthesis is a powerful tool to improve the expression of gene in heterologous system. In this study, a two-step gene synthesis strategy that combines an assembly PCR and an overlap extension PCR (AOE was developed. In this strategy, the chemically synthesized oligonucleotides were assembled into several 200-500 bp fragments with 20-25 bp overlap at each end by assembly PCR, and then an overlap extension PCR was conducted to assemble all these fragments into a full length DNA sequence. Using this method, we de novo designed and optimized the codon of Rhizopus oryzae lipase gene ROL (810 bp and Aspergillus niger phytase gene phyA (1404 bp. Compared with the original ROL gene and phyA gene, the codon-optimized genes expressed at a significantly higher level in yeasts after methanol induction. We believe this AOE method to be of special interest as it is simple, accurate and has no limitation with respect to the size of the gene to be synthesized. Combined with de novo design, this method allows the rapid synthesis of a gene optimized for expression in the system of choice and production of sufficient biological material for molecular characterization and biotechnological application.

  10. Improved taxonomic assignment of human intestinal 16S rRNA sequences by a dedicated reference database

    NARCIS (Netherlands)

    Ritari, Jarmo; Salojärvi, Jarkko; Lahti, Leo; Vos, de Willem M.

    2015-01-01

    Background: Current sequencing technology enables taxonomic profiling of microbial ecosystems at high resolution and depth by using the 16S rRNA gene as a phylogenetic marker. Taxonomic assignation of newly acquired data is based on sequence comparisons with comprehensive reference databases to

  11. Sequencing the threat and recommendation components of persuasive messages differentially improves the effectiveness of high- and low-distressing imagery in an anti-alcohol message in students.

    Science.gov (United States)

    Brown, Stephen L; West, Charlotte

    2015-05-01

    Distressing imagery is often used to improve the persuasiveness of mass-reach health promotion messages, but its effectiveness may be limited because audiences avoid attending to content. Prior self-affirmation or self-efficacy inductions have been shown to reduce avoidance and improve audience responsiveness to distressing messages, but these are difficult to introduce into a mass-reach context. Reasoning that a behavioural recommendation may have a similar effect, we reversed the traditional threat-behavioural recommendation health promotion message sequence. 2 × 2 experimental design: Factor 1, high- and low-distress images; Factor 2, threat-recommendation and recommendation-threat sequences. Ninety-one students were exposed to an identical text message accompanied by high- or low-distress imagery presented in threat-recommendation and recommendation-threat sequences. For the high-distress message, greater persuasion was observed for the recommendation-threat than the threat-recommendation sequence. This was partially mediated by participants' greater self-exposure to the threat component of the message, which we attribute to the effect of sequence in reducing attentional avoidance. For the low-distress message, greater persuasion was observed for the threat-recommendation sequence, which was not mediated by reading time allocated to the threat. Tailoring message sequence to suit the degree of distress that message developers wish to induce provides a tool that could improve persuasive messages. These findings provide a first step in this process and discuss further steps needed to consolidate and expand these findings. Statement of contribution What is already known on this subject? Health promotion messages accompanied by distressing imagery might, under some circumstances, persuade individuals to engage in healthier behaviour. Audiences can respond defensively to distressing imagery, but may be less inclined to do so when an easily followed behavioural

  12. Improved Efficiency and Reliability of NGS Amplicon Sequencing Data Analysis for Genetic Diagnostic Procedures Using AGSA Software

    Directory of Open Access Journals (Sweden)

    Axel Poulet

    2016-01-01

    Full Text Available Screening for BRCA mutations in women with familial risk of breast or ovarian cancer is an ideal situation for high-throughput sequencing, providing large amounts of low cost data. However, 454, Roche, and Ion Torrent, Thermo Fisher, technologies produce homopolymer-associated indel errors, complicating their use in routine diagnostics. We developed software, named AGSA, which helps to detect false positive mutations in homopolymeric sequences. Seventy-two familial breast cancer cases were analysed in parallel by amplicon 454 pyrosequencing and Sanger dideoxy sequencing for genetic variations of the BRCA genes. All 565 variants detected by dideoxy sequencing were also detected by pyrosequencing. Furthermore, pyrosequencing detected 42 variants that were missed with Sanger technique. Six amplicons contained homopolymer tracts in the coding sequence that were systematically misread by the software supplied by Roche. Read data plotted as histograms by AGSA software aided the analysis considerably and allowed validation of the majority of homopolymers. As an optimisation, additional 250 patients were analysed using microfluidic amplification of regions of interest (Access Array Fluidigm of the BRCA genes, followed by 454 sequencing and AGSA analysis. AGSA complements a complete line of high-throughput diagnostic sequence analysis, reducing time and costs while increasing reliability, notably for homopolymer tracts.

  13. Identification of transcriptional biomarkers by RNA-sequencing for improved detection of β2-agonists abuse in goat skeletal muscle.

    Directory of Open Access Journals (Sweden)

    Luyao Zhao

    Full Text Available In this paper, high-throughput RNA-sequencing (RNA-seq was used to search for transcriptional biomarkers for β2-agonists. In combination with drug mechanisms, a smaller group of genes with higher detection accuracy was screened out. Unknown samples were first predicted by this group of genes, and liquid chromatograph tandem mass spectrometer (LC-MS/MS was applied to positive samples to validate the biomarkers. The results of principal component analysis (PCA, hierarchical cluster analysis (HCA and discriminant analysis (DA indicated that the eight genes screened by high-throughput RNA-seq were able to distinguish samples in the experimental group and control group. Compared with the nine genes selected from an earlier literature, 17 genes including these nine genes were proven to have a more satisfactory effect, which validated the accuracy of gene selection by RNA-seq. Then, six key genes were selected from the 17 genes according to the variable importance in projection (VIP value of greater than 1. The test results using the six genes and 17 genes were similar, revealing that the six genes were critical genes. By using the six genes, three positive samples possibly treated with drugs were screened out from 25 unknown samples through DA and partial least squares discriminant analysis (PLS-DA. Then, the three samples were verified by a standard method, and mapenterol was detected in a sample. Therefore, the six genes can be used as biomarkers to detect β2-agonists. Compared with the previous study, accurate detection of β2-agonists abuse using six key genes is an improvement method, which show great significance in the monitoring of β2-agonists abuse in animal husbandry.

  14. Centrically reordered inversion recovery half-Fourier single-shot turbo spin-echo sequence: improvement of the image quality of oxygen-enhanced MRI

    International Nuclear Information System (INIS)

    Ohno, Yoshiharu; Hatabu, Hiroto; Higashino, Takanori; Kawamitsu, Hideaki; Watanabe, Hirokazu; Takenaka, Daisuke; Cauteren, Marc van; Sugimura, Kazuro

    2004-01-01

    Purpose: The purpose of the study presented here was to determine the improvement in image quality of oxygen-enhanced magnetic resonance (MR) subtraction imaging obtained with a centrically reordered inversion recovery half-Fourier single-shot turbo spin-echo (c-IR-HASTE) sequence compared with that obtained with a conventional sequentially reordered inversion recovery single-shot HASTE (s-IR-HASTE) sequence for pulmonary imaging. Materials and methods: Oxygen-enhanced MR imaging using a 1.5 T whole body scanner was performed on 12 healthy, non-smoking volunteers. Oxygen-enhanced MR images were obtained with the coronal two-dimensional (2D) c-IR-HASTE sequence and 2D s-IR-HASTE sequence combined with respiratory triggering. For a 256x256 matrix, 132 phase-encoding steps were acquired including four steps for phase correction. Inter-echo spacing for each sequence was 4.0 ms. The effective echo time (TE) for c-IR-HASTE was 4.0 ms, and 16 ms for s-IR-HASTE. The inversion time (TI) was 900 ms. To determine the improvement in oxygen-enhanced MR subtraction imaging by c-IR-HASTE, CNRs of subtraction image, overall image quality, and image degradation of the c-IR-HASTE and s-IR-HASTE techniques were statistically compared. Results: CNR, overall image quality, and image degradation of c-IR-HASTE images showed significant improvement compared to those s-IR-HASTE images (P<0.05). Conclusion: Centrically reordered inversion recovery half-Fourier single-shot turbo spin-echo (c-IR-HASTE) sequence enhanced the signal from the lung and improved the image quality of oxygen-enhanced MR subtraction imaging

  15. Cloning and sequencing of an alkaline protease gene from Bacillus lentus and amplification of the gene on the B. lentus chromosome by an improved technique.

    Science.gov (United States)

    Jørgensen, P L; Tangney, M; Pedersen, P E; Hastrup, S; Diderichsen, B; Jørgensen, S T

    2000-02-01

    A gene encoding an alkaline protease was cloned from an alkalophilic bacillus, and its nucleotide sequence was determined. The cloned gene was used to increase the copy number of the protease gene on the chromosome by an improved gene amplification technique.

  16. Improved primer sequences for the mitochondrial ND1, ND3/4 and ND5/6 segments in salmonid fishes : application to RFLP analysis of Atlantic salmon

    DEFF Research Database (Denmark)

    Eg Nielsen, Einar; Hansen, Michael Møller; Mensberg, Karen-Lise Dons

    1998-01-01

    New specific primers for the mtDNA segments ND1, ND3/4 and ND5/6 designed from the rainbow trout sequence, improved PCR amplification for salmonid fishes. RFLP analysis revealed restriction site variation for all three segments in Atlantic salmon. Eleven haplotypes were detected in a screening...

  17. Depletion of Human DNA in Spiked Clinical Specimens for Improvement of Sensitivity of Pathogen Detection by Next-Generation Sequencing

    OpenAIRE

    Hasan, Mohammad R.; Rawat, Arun; Tang, Patrick; Jithesh, Puthen V.; Thomas, Eva; Tan, Rusung; Tilley, Peter

    2016-01-01

    Next-generation sequencing (NGS) technology has shown promise for the detection of human pathogens from clinical samples. However, one of the major obstacles to the use of NGS in diagnostic microbiology is the low ratio of pathogen DNA to human DNA in most clinical specimens. In this study, we aimed to develop a specimen-processing protocol to remove human DNA and enrich specimens for bacterial and viral DNA for shotgun metagenomic sequencing. Cerebrospinal fluid (CSF) and nasopharyngeal aspi...

  18. MR imaging of the knee: Improvement of signal and contrast efficiency of T1-weighted turbo spin echo sequences by applying a driven equilibrium (DRIVE) pulse

    Energy Technology Data Exchange (ETDEWEB)

    Radlbauer, Rudolf, E-mail: rudolf.radlbauer@stpoelten.lknoe.a [MR Physics Group, Department of Radiology, Landesklinikum St. Poelten, Propst Fuehrer Strasse 4, 3100 St. Poelten (Austria); Lomoschitz, Friedrich, E-mail: friedrich.lomoschitz@stpoelten.lknoe.a [MR Physics Group, Department of Radiology, Landesklinikum St. Poelten, Propst Fuehrer Strasse 4, 3100 St. Poelten (Austria); Salomonowitz, Erich, E-mail: erich.salomonowitz@stpoelten.lknoe.a [MR Physics Group, Department of Radiology, Landesklinikum St. Poelten, Propst Fuehrer Strasse 4, 3100 St. Poelten (Austria); Eberhardt, Knut E., E-mail: info@mrt-kompetenzzentrum.d [MRT Competence Center Schloss Werneck, Balthasar-Neumann-Platz 2, 97440 Werneck (Germany); Stadlbauer, Andreas, E-mail: andi@nmr.a [MR Physics Group, Department of Radiology, Landesklinikum St. Poelten, Propst Fuehrer Strasse 4, 3100 St. Poelten (Austria); Department of Neurosurgery, University of Erlangen-Nuremberg, Schwabachanlage 6, 91054 Erlangen (Germany)

    2010-08-15

    The purpose of this study was to assess the effect of a driven equilibrium (DRIVE) pulse incorporated in a standard T1-weighted turbo spin echo (TSE) sequence as used in our routine MRI protocol for examination of pathologies of the knee. Sixteen consecutive patients with knee disorders were examined using the routine MRI protocol, including T1-weighted TSE-sequences with and without a DRIVE pulse. Signal-to-noise ratios (SNRs) and contrast-to-noise ratio (CNR) of anatomical structures and pathologies were calculated and compared for both sequences. The differences in diagnostic value of the T1-weighted images with and without DRIVE pulse were assessed. SNR was significantly higher on images acquired with DRIVE pulse for fluid, effusion, cartilage and bone. Differences in the SNR of meniscus and muscle between the two sequences were not statistically significant. CNR was significantly increased between muscle and effusion, fluid and cartilage, fluid and meniscus, cartilage and meniscus, bone and cartilage on images acquired using the DRIVE pulse. Diagnostic value of the T1-weighted images was found to be improved for delineation of anatomic structures and for diagnosing a variety of pathologies when a DRIVE pulse is incorporated in the sequence. Incorporation of a DRIVE pulse into a standard T1-weighted TSE-sequence leads to significant increase of SNR and CNR of both, anatomical structures and pathologies, and consequently to an increase in diagnostic value within the same acquisition time.

  19. MR imaging of the knee: Improvement of signal and contrast efficiency of T1-weighted turbo spin echo sequences by applying a driven equilibrium (DRIVE) pulse

    International Nuclear Information System (INIS)

    Radlbauer, Rudolf; Lomoschitz, Friedrich; Salomonowitz, Erich; Eberhardt, Knut E.; Stadlbauer, Andreas

    2010-01-01

    The purpose of this study was to assess the effect of a driven equilibrium (DRIVE) pulse incorporated in a standard T1-weighted turbo spin echo (TSE) sequence as used in our routine MRI protocol for examination of pathologies of the knee. Sixteen consecutive patients with knee disorders were examined using the routine MRI protocol, including T1-weighted TSE-sequences with and without a DRIVE pulse. Signal-to-noise ratios (SNRs) and contrast-to-noise ratio (CNR) of anatomical structures and pathologies were calculated and compared for both sequences. The differences in diagnostic value of the T1-weighted images with and without DRIVE pulse were assessed. SNR was significantly higher on images acquired with DRIVE pulse for fluid, effusion, cartilage and bone. Differences in the SNR of meniscus and muscle between the two sequences were not statistically significant. CNR was significantly increased between muscle and effusion, fluid and cartilage, fluid and meniscus, cartilage and meniscus, bone and cartilage on images acquired using the DRIVE pulse. Diagnostic value of the T1-weighted images was found to be improved for delineation of anatomic structures and for diagnosing a variety of pathologies when a DRIVE pulse is incorporated in the sequence. Incorporation of a DRIVE pulse into a standard T1-weighted TSE-sequence leads to significant increase of SNR and CNR of both, anatomical structures and pathologies, and consequently to an increase in diagnostic value within the same acquisition time.

  20. Quantification of sequence exchange events between PMS2 and PMS2CL provides a basis for improved mutation scanning of Lynch syndrome patients.

    Science.gov (United States)

    van der Klift, Heleen M; Tops, Carli M J; Bik, Elsa C; Boogaard, Merel W; Borgstein, Anne-Marijke; Hansson, Kerstin B M; Ausems, Margreet G E M; Gomez Garcia, Encarna; Green, Andrew; Hes, Frederik J; Izatt, Louise; van Hest, Liselotte P; Alonso, Angel M; Vriends, Annette H J T; Wagner, Anja; van Zelst-Stams, Wendy A G; Vasen, Hans F A; Morreau, Hans; Devilee, Peter; Wijnen, Juul T

    2010-05-01

    Heterozygous mutations in PMS2 are involved in Lynch syndrome, whereas biallelic mutations are found in Constitutional mismatch repair-deficiency syndrome patients. Mutation detection is complicated by the occurrence of sequence exchange events between the duplicated regions of PMS2 and PMS2CL. We investigated the frequency of such events with a nonspecific polymerase chain reaction (PCR) strategy, co-amplifying both PMS2 and PMS2CL sequences. This allowed us to score ratios between gene and pseudogene-specific nucleotides at 29 PSV sites from exon 11 to the end of the gene. We found sequence transfer at all investigated PSVs from intron 12 to the 3' end of the gene in 4 to 52% of DNA samples. Overall, sequence exchange between PMS2 and PMS2CL was observed in 69% (83/120) of individuals. We demonstrate that mutation scanning with PMS2-specific PCR primers and MLPA probes, designed on PSVs, in the 3' duplicated region is unreliable, and present an RNA-based mutation detection strategy to improve reliability. Using this strategy, we found 19 different putative pathogenic PMS2 mutations. Four of these (21%) are lying in the region with frequent sequence transfer and are missed or called incorrectly as homozygous with several PSV-based mutation detection methods. (c) 2010 Wiley-Liss, Inc.

  1. BepiPred-2.0: improving sequence-based B-cell epitope prediction using conformational epitopes

    DEFF Research Database (Denmark)

    Jespersen, Martin Closter; Peters, Bjoern; Nielsen, Morten

    2017-01-01

    Antibodies have become an indispensable tool for many biotechnological and clinical applications. They bind their molecular target (antigen) by recognizing a portion of its structure (epitope) in a highly specific manner. The ability to predict epitopes from antigen sequences alone is a complex t...

  2. Improvements in the HbVar database of human hemoglobin variants and thalassemia mutations for population and sequence variation studies.

    NARCIS (Netherlands)

    G.P. Patrinos (George); B. Giardine (Belinda); C. Riemer (Cathy); W. Miller (Webb); D.H. Chui (David); N.P. Anagnou (Nicholas); H. Wajcman (Henri); R.C. Hardison (Ross)

    2004-01-01

    textabstractHbVar (http://globin.cse.psu.edu/globin/hbvar/) is a relational database developed by a multi-center academic effort to provide up-to-date and high quality information on the genomic sequence changes leading to hemoglobin variants and all types of thalassemia and

  3. Construction of high resolution genetic linkage maps to improve the soybean genome sequence assembly Glyma1.01

    Science.gov (United States)

    A landmark in soybean research, Glyma1.01, the first whole genome sequence of variety Williams 82 (Glycine max L. Merr.) was completed in 2010 and is widely used. However, because the assembly was primarily built based on the linkage maps constructed with a limited number of markers and recombinant...

  4. A Proteomic Workflow Using High-Throughput De Novo Sequencing Towards Complementation of Genome Information for Improved Comparative Crop Science.

    Science.gov (United States)

    Turetschek, Reinhard; Lyon, David; Desalegn, Getinet; Kaul, Hans-Peter; Wienkoop, Stefanie

    2016-01-01

    The proteomic study of non-model organisms, such as many crop plants, is challenging due to the lack of comprehensive genome information. Changing environmental conditions require the study and selection of adapted cultivars. Mutations, inherent to cultivars, hamper protein identification and thus considerably complicate the qualitative and quantitative comparison in large-scale systems biology approaches. With this workflow, cultivar-specific mutations are detected from high-throughput comparative MS analyses, by extracting sequence polymorphisms with de novo sequencing. Stringent criteria are suggested to filter for confidential mutations. Subsequently, these polymorphisms complement the initially used database, which is ready to use with any preferred database search algorithm. In our example, we thereby identified 26 specific mutations in two cultivars of Pisum sativum and achieved an increased number (17 %) of peptide spectrum matches.

  5. Phytoplasma phylogenetics based on analysis of secA and 23S rRNA gene sequences for improved resolution of candidate species of 'Candidatus Phytoplasma'.

    Science.gov (United States)

    Hodgetts, Jennifer; Boonham, Neil; Mumford, Rick; Harrison, Nigel; Dickinson, Matthew

    2008-08-01

    Phytoplasma phylogenetics has focused primarily on sequences of the non-coding 16S rRNA gene and the 16S-23S rRNA intergenic spacer region (16-23S ISR), and primers that enable amplification of these regions from all phytoplasmas by PCR are well established. In this study, primers based on the secA gene have been developed into a semi-nested PCR assay that results in a sequence of the expected size (about 480 bp) from all 34 phytoplasmas examined, including strains representative of 12 16Sr groups. Phylogenetic analysis of secA gene sequences showed similar clustering of phytoplasmas when compared with clusters resolved by similar sequence analyses of a 16-23S ISR-23S rRNA gene contig or of the 16S rRNA gene alone. The main differences between trees were in the branch lengths, which were elongated in the 16-23S ISR-23S rRNA gene tree when compared with the 16S rRNA gene tree and elongated still further in the secA gene tree, despite this being a shorter sequence. The improved resolution in the secA gene-derived phylogenetic tree resulted in the 16SrII group splitting into two distinct clusters, while phytoplasmas associated with coconut lethal yellowing-type diseases split into three distinct groups, thereby supporting past proposals that they represent different candidate species within 'Candidatus Phytoplasma'. The ability to differentiate 16Sr groups and subgroups by virtual RFLP analysis of secA gene sequences suggests that this gene may provide an informative alternative molecular marker for pathogen identification and diagnosis of phytoplasma diseases.

  6. Improved imaging of cochlear nerve hypoplasia using a 3-Tesla variable flip-angle turbo spin-echo sequence and a 7-cm surface coil.

    Science.gov (United States)

    Giesemann, Anja M; Raab, Peter; Lyutenski, Stefan; Dettmer, Sabine; Bültmann, Eva; Frömke, Cornelia; Lenarz, Thomas; Lanfermann, Heinrich; Goetz, Friedrich

    2014-03-01

    Magnetic resonance imaging of the temporal bone has an important role in decision making with regard to cochlea implantation, especially in children with cochlear nerve deficiency. The purpose of this study was to evaluate the usefulness of the combination of an advanced high-resolution T2-weighted sequence with a surface coil in a 3-Tesla magnetic resonance imaging scanner in cases of suspected cochlear nerve aplasia. Prospective study. Seven patients with cochlear nerve hypoplasia or aplasia were prospectively examined using a high-resolution three-dimensional variable flip-angle turbo spin-echo sequence using a surface coil, and the images were compared with the same sequence in standard resolution using a standard head coil. Three neuroradiologists evaluated the magnetic resonance images independently, rating the visibility of the nerves in diagnosing hypoplasia or aplasia. Eight ears in seven patients with hypoplasia or aplasia of the cochlear nerve were examined. The average age was 2.7 years (range, 9 months-5 years). Seven ears had accompanying malformations. The inter-rater reliability in diagnosing hypoplasia or aplasia was greater using the high-resolution three-dimensional variable flip-angle turbo spin-echo sequence (fixed-marginal kappa: 0.64) than with the same sequence in lower resolution (fixed-marginal kappa: 0.06). Examining cases of suspected cochlear nerve aplasia using the high-resolution three-dimensional variable flip-angle turbo spin-echo sequence in combination with a surface coil shows significant improvement over standard methods. © 2013 The American Laryngological, Rhinological and Otological Society, Inc.

  7. Ethanol prefermentation of food waste in sequencing batch methane fermentation for improved buffering capacity and microbial community analysis.

    Science.gov (United States)

    Yu, Miao; Wu, Chuanfu; Wang, Qunhui; Sun, Xiaohong; Ren, Yuanyuan; Li, Yu-You

    2018-01-01

    This study investigates the effects of ethanol prefermentation (EP) on methane fermentation. Yeast was added to the substrate for EP in the sequencing batch methane fermentation of food waste. An Illumina MiSeq high-throughput sequencing system was used to analyze changes in the microbial community. Methane production in the EP group (254mL/g VS) was higher than in the control group (35mL/g VS) because EP not only increased the buffering capacity of the system, but also increased hydrolytic acidification. More carbon source was converted to ethanol in the EP group than in the control group, and neutral ethanol could be converted continuously to acetic acid, which promoted the growth of Methanobacterium and Methanosarcina. As a result, the relative abundance of methane-producing bacteria was significantly higher than that of the control group. Kinetic modeling indicated that the EP group had a higher hydrolysis efficiency and shorter lag phase. Copyright © 2017 Elsevier Ltd. All rights reserved.

  8. Minimotif Miner 3.0: database expansion and significantly improved reduction of false-positive predictions from consensus sequences.

    Science.gov (United States)

    Mi, Tian; Merlin, Jerlin Camilus; Deverasetty, Sandeep; Gryk, Michael R; Bill, Travis J; Brooks, Andrew W; Lee, Logan Y; Rathnayake, Viraj; Ross, Christian A; Sargeant, David P; Strong, Christy L; Watts, Paula; Rajasekaran, Sanguthevar; Schiller, Martin R

    2012-01-01

    Minimotif Miner (MnM available at http://minimotifminer.org or http://mnm.engr.uconn.edu) is an online database for identifying new minimotifs in protein queries. Minimotifs are short contiguous peptide sequences that have a known function in at least one protein. Here we report the third release of the MnM database which has now grown 60-fold to approximately 300,000 minimotifs. Since short minimotifs are by their nature not very complex we also summarize a new set of false-positive filters and linear regression scoring that vastly enhance minimotif prediction accuracy on a test data set. This online database can be used to predict new functions in proteins and causes of disease.

  9. An improved method for quantitatively measuring the sequences of total organic carbon and black carbon in marine sediment cores

    Science.gov (United States)

    Xu, Xiaoming; Zhu, Qing; Zhou, Qianzhi; Liu, Jinzhong; Yuan, Jianping; Wang, Jianghai

    2018-01-01

    Understanding global carbon cycle is critical to uncover the mechanisms of global warming and remediate its adverse effects on human activities. Organic carbon in marine sediments is an indispensable part of the global carbon reservoir in global carbon cycling. Evaluating such a reservoir calls for quantitative studies of marine carbon burial, which closely depend on quantifying total organic carbon and black carbon in marine sediment cores and subsequently on obtaining their high-resolution temporal sequences. However, the conventional methods for detecting the contents of total organic carbon or black carbon cannot resolve the following specific difficulties, i.e., (1) a very limited amount of each subsample versus the diverse analytical items, (2) a low and fluctuating recovery rate of total organic carbon or black carbon versus the reproducibility of carbon data, and (3) a large number of subsamples versus the rapid batch measurements. In this work, (i) adopting the customized disposable ceramic crucibles with the microporecontrolled ability, (ii) developing self-made or customized facilities for the procedures of acidification and chemothermal oxidization, and (iii) optimizing procedures and carbon-sulfur analyzer, we have built a novel Wang-Xu-Yuan method (the WXY method) for measuring the contents of total organic carbon or black carbon in marine sediment cores, which includes the procedures of pretreatment, weighing, acidification, chemothermal oxidation and quantification; and can fully meet the requirements of establishing their highresolution temporal sequences, whatever in the recovery, experimental efficiency, accuracy and reliability of the measurements, and homogeneity of samples. In particular, the usage of disposable ceramic crucibles leads to evidently simplify the experimental scenario, which further results in the very high recovery rates for total organic carbon and black carbon. This new technique may provide a significant support for

  10. Unlimited Thirst for Genome Sequencing, Data Interpretation, and Database Usage in Genomic Era: The Road towards Fast-Track Crop Plant Improvement

    Directory of Open Access Journals (Sweden)

    Arun Prabhu Dhanapal

    2015-01-01

    Full Text Available The number of sequenced crop genomes and associated genomic resources is growing rapidly with the advent of inexpensive next generation sequencing methods. Databases have become an integral part of all aspects of science research, including basic and applied plant and animal sciences. The importance of databases keeps increasing as the volume of datasets from direct and indirect genomics, as well as other omics approaches, keeps expanding in recent years. The databases and associated web portals provide at a minimum a uniform set of tools and automated analysis across a wide range of crop plant genomes. This paper reviews some basic terms and considerations in dealing with crop plant databases utilization in advancing genomic era. The utilization of databases for variation analysis with other comparative genomics tools, and data interpretation platforms are well described. The major focus of this review is to provide knowledge on platforms and databases for genome-based investigations of agriculturally important crop plants. The utilization of these databases in applied crop improvement program is still being achieved widely; otherwise, the end for sequencing is not far away.

  11. The presence of glutamate residues on the PAS sequence of the stimuli-sensitive nano-ferritin improves in vivo biodistribution and mitoxantrone encapsulation homogeneity.

    Science.gov (United States)

    Falvo, Elisabetta; Malagrinò, Francesca; Arcovito, Alessandro; Fazi, Francesco; Colotti, Gianni; Tremante, Elisa; Di Micco, Patrizio; Braca, Aldo; Opri, Roberta; Giuffrè, Alessandro; Fracasso, Giulio; Ceci, Pierpaolo

    2018-04-10

    A genetically engineered human ferritin heavy chain (HFt)-based construct has been recently shown by our group to efficiently entrap and deliver doxorubicin to cancer cells. This construct, named HFt-MP-PAS, contained a tumor-selective sequence (MP) responsive to proteolytic cleavage by tumor proteases (MMPs), located between each HFt subunit and an outer shielding polypeptide sequence rich in proline (P), serine (S) and alanine (A) residues (PAS). HFt-MP-PAS displayed excellent therapeutic efficacy in xenogenic pancreatic and head and neck cancer models in vivo, leading to a significant increase in overall animal survivals. Here we report a new construct obtained by the genetic insertion of two glutamate residues in the PAS sequence of HFt-MP-PAS. Such new construct, named HFt-MP-PASE, is characterized by improved performances as drug biodistribution in a xenogenic pancreatic cancer model in vivo. Moreover, HFt-MP-PASE efficiently encapsulates the anti-cancer drug mitoxantrone (MIT), and the resulting MIT-loaded nanoparticles proved to be more soluble and monodispersed than the HFt-MP-PAS counterparts. Importantly, in vitro MIT-loaded HFt-MP-PASE kills several cancer cell lines of different origin (colon, breast, sarcoma and pancreas) at least as efficiently as the free drug. Finally, our MIT loaded protein nanocages allowed in vivo an impressive incrementing of the drug accumulation in the tumor with respect to the free drug. Copyright © 2018 Elsevier B.V. All rights reserved.

  12. Whole exome sequencing of wild-derived inbred strains of mice improves power to link phenotype and genotype.

    Science.gov (United States)

    Chang, Peter L; Kopania, Emily; Keeble, Sara; Sarver, Brice A J; Larson, Erica; Orth, Annie; Belkhir, Khalid; Boursot, Pierre; Bonhomme, François; Good, Jeffrey M; Dean, Matthew D

    2017-10-01

    The house mouse is a powerful model to dissect the genetic basis of phenotypic variation, and serves as a model to study human diseases. Despite a wealth of discoveries, most classical laboratory strains have captured only a small fraction of genetic variation known to segregate in their wild progenitors, and existing strains are often related to each other in complex ways. Inbred strains of mice independently derived from natural populations have the potential to increase power in genetic studies with the addition of novel genetic variation. Here, we perform exome-enrichment and high-throughput sequencing (~8× coverage) of 26 wild-derived strains known in the mouse research community as the "Montpellier strains." We identified 1.46 million SNPs in our dataset, approximately 19% of which have not been detected from other inbred strains. This novel genetic variation is expected to contribute to phenotypic variation, as they include 18,496 nonsynonymous variants and 262 early stop codons. Simulations demonstrate that the higher density of genetic variation in the Montpellier strains provides increased power for quantitative genetic studies. Inasmuch as the power to connect genotype to phenotype depends on genetic variation, it is important to incorporate these additional genetic strains into future research programs.

  13. A randomized clinical trial of a new orthodontic appliance to improve upper airway obstruction in infants with Pierre Robin sequence.

    Science.gov (United States)

    Buchenau, Wolfgang; Urschitz, Michael S; Sautermeister, Judit; Bacher, Margit; Herberts, Tina; Arand, Joerg; Poets, Christian F

    2007-08-01

    To test the hypothesis that a new orthodontic appliance with a velar extension that shifts the tongue anteriorly would reduce upper airway obstruction in infants with Pierre Robin sequence (PRS). Eleven infants with PRS (median age, 3 days) and an apnea index (AI) >3 were studied. The effect of the new appliance on the AI was compared with that of a conventional appliance without a velar extension by using a crossover study design with random allocation. Compared with baseline (mean AI, 13.8), there was a significant decrease in the AI with the new appliance (3.9; P value appliance (14.8; P = .842). Thus, the relative change in AI was -71% (95% CI, -84--49) for the new appliance and +8% (95% CI, -52-142) for the conventional appliance, which was significantly different (P = .004). No severe adverse effects were observed. This new orthodontic appliance appears to be safe and effective in reducing upper airway obstruction in infants with PRS.

  14. Improvement of the Gravitational Search Algorithm by means of Low-Discrepancy Sobol Quasi Random-Number Sequence Based Initialization

    Directory of Open Access Journals (Sweden)

    ALTINOZ, O. T.

    2014-08-01

    Full Text Available Nature-inspired optimization algorithms can obtain the optima by updating the position of each member in the population. At the beginning of the algorithm, the particles of the population are spread into the search space. The initial distribution of particles corresponds to the beginning points of the search process. Hence, the aim is to alter the position for each particle beginning with this initial position until the optimum solution will be found with respect to the pre-determined conditions like maximum iteration, and specific error value for the fitness function. Therefore, initial positions of the population have a direct effect on both accuracy of the optima and the computational cost. If any member in the population is close enough to the optima, this eases the achievement of the exact solution. On the contrary, individuals grouped far away from the optima might yield pointless efforts. In this study, low-discrepancy quasi-random number sequence is preferred for the localization of the population at the initialization phase. By this way, the population is distributed into the search space in a more uniform manner at the initialization phase. The technique is applied to the Gravitational Search Algorithm and compared via the performance on benchmark function solutions.

  15. Improving the secretion of a methyl parathion hydrolase in Pichia pastoris by modifying its N-terminal sequence.

    Directory of Open Access Journals (Sweden)

    Ping Wang

    Full Text Available Pichia pastoris is commonly used to express and secrete target proteins, although not all recombinant proteins can be successfully produced. In this study, we used methyl parathion hydrolase (MPH from Ochrobactrum sp. M231 as a model to study the importance of the N-terminus of the protein for its secretion. While MPH can be efficiently expressed intracellularly in P. pastoris, it is not secreted into the extracellular environment. Three MPH mutants (N66-MPH, D10-MPH, and N9-MPH were constructed through modification of its N-terminus, and the secretion of each by P. pastoris was improved when compared to wild-type MPH. The level of secreted D10-MPH was increased to 0.21 U/mL, while that of N9-MPH was enhanced to 0.16 U/mL. Although N66-MPH was not enzymatically active, it was secreted efficiently, and was identified by SDS-PAGE. These results demonstrate that the secretion of heterologous proteins in P. pastoris may be improved by modifying their N-terminal structures.

  16. An Improved Image Encryption Algorithm Based on Cyclic Rotations and Multiple Chaotic Sequences: Application to Satellite Images

    Directory of Open Access Journals (Sweden)

    MADANI Mohammed

    2017-10-01

    Full Text Available In this paper, a new satellite image encryption algorithm based on the combination of multiple chaotic systems and a random cyclic rotation technique is proposed. Our contribution consists in implementing three different chaotic maps (logistic, sine, and standard combined to improve the security of satellite images. Besides enhancing the encryption, the proposed algorithm also focuses on advanced efficiency of the ciphered images. Compared with classical encryption schemes based on multiple chaotic maps and the Rubik's cube rotation, our approach has not only the same merits of chaos systems like high sensitivity to initial values, unpredictability, and pseudo-randomness, but also other advantages like a higher number of permutations, better performances in Peak Signal to Noise Ratio (PSNR and a Maximum Deviation (MD.

  17. Identifying Active Faults by Improving Earthquake Locations with InSAR Data and Bayesian Estimation: The 2004 Tabuk (Saudi Arabia) Earthquake Sequence

    KAUST Repository

    Xu, Wenbin

    2015-02-03

    A sequence of shallow earthquakes of magnitudes ≤5.1 took place in 2004 on the eastern flank of the Red Sea rift, near the city of Tabuk in northwestern Saudi Arabia. The earthquakes could not be well located due to the sparse distribution of seismic stations in the region, making it difficult to associate the activity with one of the many mapped faults in the area and thus to improve the assessment of seismic hazard in the region. We used Interferometric Synthetic Aperture Radar (InSAR) data from the European Space Agency’s Envisat and ERS‐2 satellites to improve the location and source parameters of the largest event of the sequence (Mw 5.1), which occurred on 22 June 2004. The mainshock caused a small but distinct ∼2.7  cm displacement signal in the InSAR data, which reveals where the earthquake took place and shows that seismic reports mislocated it by 3–16 km. With Bayesian estimation, we modeled the InSAR data using a finite‐fault model in a homogeneous elastic half‐space and found the mainshock activated a normal fault, roughly 70 km southeast of the city of Tabuk. The southwest‐dipping fault has a strike that is roughly parallel to the Red Sea rift, and we estimate the centroid depth of the earthquake to be ∼3.2  km. Projection of the fault model uncertainties to the surface indicates that one of the west‐dipping normal faults located in the area and oriented parallel to the Red Sea is a likely source for the mainshock. The results demonstrate how InSAR can be used to improve locations of moderate‐size earthquakes and thus to identify currently active faults.

  18. Identifying Active Faults by Improving Earthquake Locations with InSAR Data and Bayesian Estimation: The 2004 Tabuk (Saudi Arabia) Earthquake Sequence

    KAUST Repository

    Xu, Wenbin; Dutta, Rishabh; Jonsson, Sigurjon

    2015-01-01

    A sequence of shallow earthquakes of magnitudes ≤5.1 took place in 2004 on the eastern flank of the Red Sea rift, near the city of Tabuk in northwestern Saudi Arabia. The earthquakes could not be well located due to the sparse distribution of seismic stations in the region, making it difficult to associate the activity with one of the many mapped faults in the area and thus to improve the assessment of seismic hazard in the region. We used Interferometric Synthetic Aperture Radar (InSAR) data from the European Space Agency’s Envisat and ERS‐2 satellites to improve the location and source parameters of the largest event of the sequence (Mw 5.1), which occurred on 22 June 2004. The mainshock caused a small but distinct ∼2.7  cm displacement signal in the InSAR data, which reveals where the earthquake took place and shows that seismic reports mislocated it by 3–16 km. With Bayesian estimation, we modeled the InSAR data using a finite‐fault model in a homogeneous elastic half‐space and found the mainshock activated a normal fault, roughly 70 km southeast of the city of Tabuk. The southwest‐dipping fault has a strike that is roughly parallel to the Red Sea rift, and we estimate the centroid depth of the earthquake to be ∼3.2  km. Projection of the fault model uncertainties to the surface indicates that one of the west‐dipping normal faults located in the area and oriented parallel to the Red Sea is a likely source for the mainshock. The results demonstrate how InSAR can be used to improve locations of moderate‐size earthquakes and thus to identify currently active faults.

  19. Identification and characterization of lipid metabolism-related microRNAs in the liver of genetically improved farmed tilapia (GIFT, Oreochromis niloticus) by deep sequencing.

    Science.gov (United States)

    Tao, Yi-Fan; Qiang, Jun; Yin, Guo-Jun; Xu, Pao; Shi, Qiong; Bao, Jing-Wen

    2017-10-01

    MicroRNAs (miRNAs) play vital roles in modulating diverse metabolic processes in the liver, including lipid metabolism. Genetically improved farmed tilapia (GIFT, Oreochromis niloticus), an important aquaculture species in China, is susceptible to hepatic steatosis when reared in intensive culture systems. To investigate the miRNAs involved in GIFT lipid metabolism, two hepatic small RNA libraries from high-fat diet-fed and normal-fat diet-fed GIFT were constructed and sequenced using high-throughput sequencing technology. A total of 204 known and 56 novel miRNAs were identified by aligning the sequencing data with known Danio rerio miRNAs listed in miRBase 21.0. Six known miRNAs (miR-30a-5p, miR-34a, miR-145-5p, miR-29a, miR-205-5p, and miR-23a-3p) that were differentially expressed between the high-fat diet and normal-fat diet groups were validated by quantitative real-time PCR. Bioinformatics tools were used to predict the potential target genes of these differentially expressed miRNAs, and Gene Ontology enrichment analysis indicated that these miRNAs may play important roles in diet-induced hepatic steatosis in GIFT. Our results provide a foundation for further studies of the role of miRNAs in tilapia lipid homeostasis regulation, and may help to identify novel targets for therapeutic interventions to reduce the occurrence of fatty liver disease in farmed tilapia. Copyright © 2017. Published by Elsevier Ltd.

  20. Expression of lycopene biosynthesis genes fused in line with Shine-Dalgarno sequences improves the stress-tolerance of Lactococcus lactis.

    Science.gov (United States)

    Dong, Xiangrong; Wang, Yanping; Yang, Fengyuan; Zhao, Shanshan; Tian, Bing; Li, Tao

    2017-01-01

    Lycopene biosynthetic genes from Deinococcus radiodurans were co-expressed in Lactococcus lactis to produce lycopene and improve its tolerance to stress. Lycopene-related genes from D. radiodurans, DR1395 (crtE), DR0862 (crtB), and DR0861 (crtI), were fused in line with S hine-Dalgarno (SD) sequences and co-expressed in L. lactis. The recombinant strain produced 0.36 mg lycopene g -1  dry cell wt after 48 h fermentation. The survival rate to UV irradiation of the recombinant strain was higher than that of the non-transformed strain. The L. lactis with co-expressed genes responsible for lycopene biosynthesis from D. radiodurans produced lycopene and exhibited increased resistance to UV stress, suggesting that the recombinant strain has important application potential in food industry.

  1. Genome Sequence Databases (Overview): Sequencing and Assembly

    Energy Technology Data Exchange (ETDEWEB)

    Lapidus, Alla L.

    2009-01-01

    From the date its role in heredity was discovered, DNA has been generating interest among scientists from different fields of knowledge: physicists have studied the three dimensional structure of the DNA molecule, biologists tried to decode the secrets of life hidden within these long molecules, and technologists invent and improve methods of DNA analysis. The analysis of the nucleotide sequence of DNA occupies a special place among the methods developed. Thanks to the variety of sequencing technologies available, the process of decoding the sequence of genomic DNA (or whole genome sequencing) has become robust and inexpensive. Meanwhile the assembly of whole genome sequences remains a challenging task. In addition to the need to assemble millions of DNA fragments of different length (from 35 bp (Solexa) to 800 bp (Sanger)), great interest in analysis of microbial communities (metagenomes) of different complexities raises new problems and pushes some new requirements for sequence assembly tools to the forefront. The genome assembly process can be divided into two steps: draft assembly and assembly improvement (finishing). Despite the fact that automatically performed assembly (or draft assembly) is capable of covering up to 98% of the genome, in most cases, it still contains incorrectly assembled reads. The error rate of the consensus sequence produced at this stage is about 1/2000 bp. A finished genome represents the genome assembly of much higher accuracy (with no gaps or incorrectly assembled areas) and quality ({approx}1 error/10,000 bp), validated through a number of computer and laboratory experiments.

  2. Improved detection of CXCR4-using HIV by V3 genotyping: application of population-based and "deep" sequencing to plasma RNA and proviral DNA.

    Science.gov (United States)

    Swenson, Luke C; Moores, Andrew; Low, Andrew J; Thielen, Alexander; Dong, Winnie; Woods, Conan; Jensen, Mark A; Wynhoven, Brian; Chan, Dennison; Glascock, Christopher; Harrigan, P Richard

    2010-08-01

    Tropism testing should rule out CXCR4-using HIV before treatment with CCR5 antagonists. Currently, the recombinant phenotypic Trofile assay (Monogram) is most widely utilized; however, genotypic tests may represent alternative methods. Independent triplicate amplifications of the HIV gp120 V3 region were made from either plasma HIV RNA or proviral DNA. These underwent standard, population-based sequencing with an ABI3730 (RNA n = 63; DNA n = 40), or "deep" sequencing with a Roche/454 Genome Sequencer-FLX (RNA n = 12; DNA n = 12). Position-specific scoring matrices (PSSMX4/R5) (-6.96 cutoff) and geno2pheno[coreceptor] (5% false-positive rate) inferred tropism from V3 sequence. These methods were then independently validated with a separate, blinded dataset (n = 278) of screening samples from the maraviroc MOTIVATE trials. Standard sequencing of HIV RNA with PSSM yielded 69% sensitivity and 91% specificity, relative to Trofile. The validation dataset gave 75% sensitivity and 83% specificity. Proviral DNA plus PSSM gave 77% sensitivity and 71% specificity. "Deep" sequencing of HIV RNA detected >2% inferred-CXCR4-using virus in 8/8 samples called non-R5 by Trofile, and <2% in 4/4 samples called R5. Triplicate analyses of V3 standard sequence data detect greater proportions of CXCR4-using samples than previously achieved. Sequencing proviral DNA and "deep" V3 sequencing may also be useful tools for assessing tropism.

  3. Sequence assembly

    DEFF Research Database (Denmark)

    Scheibye-Alsing, Karsten; Hoffmann, S.; Frankel, Annett Maria

    2009-01-01

    Despite the rapidly increasing number of sequenced and re-sequenced genomes, many issues regarding the computational assembly of large-scale sequencing data have remain unresolved. Computational assembly is crucial in large genome projects as well for the evolving high-throughput technologies and...... in genomic DNA, highly expressed genes and alternative transcripts in EST sequences. We summarize existing comparisons of different assemblers and provide a detailed descriptions and directions for download of assembly programs at: http://genome.ku.dk/resources/assembly/methods.html....

  4. Genome Sequencing

    DEFF Research Database (Denmark)

    Sato, Shusei; Andersen, Stig Uggerhøj

    2014-01-01

    The current Lotus japonicus reference genome sequence is based on a hybrid assembly of Sanger TAC/BAC, Sanger shotgun and Illumina shotgun sequencing data generated from the Miyakojima-MG20 accession. It covers nearly all expressed L. japonicus genes and has been annotated mainly based on transcr......The current Lotus japonicus reference genome sequence is based on a hybrid assembly of Sanger TAC/BAC, Sanger shotgun and Illumina shotgun sequencing data generated from the Miyakojima-MG20 accession. It covers nearly all expressed L. japonicus genes and has been annotated mainly based...

  5. CMsearch: simultaneous exploration of protein sequence space and structure space improves not only protein homology detection but also protein structure prediction

    KAUST Repository

    Cui, Xuefeng

    2016-06-15

    Motivation: Protein homology detection, a fundamental problem in computational biology, is an indispensable step toward predicting protein structures and understanding protein functions. Despite the advances in recent decades on sequence alignment, threading and alignment-free methods, protein homology detection remains a challenging open problem. Recently, network methods that try to find transitive paths in the protein structure space demonstrate the importance of incorporating network information of the structure space. Yet, current methods merge the sequence space and the structure space into a single space, and thus introduce inconsistency in combining different sources of information. Method: We present a novel network-based protein homology detection method, CMsearch, based on cross-modal learning. Instead of exploring a single network built from the mixture of sequence and structure space information, CMsearch builds two separate networks to represent the sequence space and the structure space. It then learns sequence–structure correlation by simultaneously taking sequence information, structure information, sequence space information and structure space information into consideration. Results: We tested CMsearch on two challenging tasks, protein homology detection and protein structure prediction, by querying all 8332 PDB40 proteins. Our results demonstrate that CMsearch is insensitive to the similarity metrics used to define the sequence and the structure spaces. By using HMM–HMM alignment as the sequence similarity metric, CMsearch clearly outperforms state-of-the-art homology detection methods and the CASP-winning template-based protein structure prediction methods.

  6. Alternating anoxic feast/aerobic famine condition for improving granular sludge formation in sequencing batch airlift reactor at reduced aeration rate.

    Science.gov (United States)

    Wan, Junfeng; Bessière, Yolaine; Spérandio, Mathieu

    2009-12-01

    In this study the influence of a pre-anoxic feast period on granular sludge formation in a sequencing batch airlift reactor is evaluated. Whereas a purely aerobic SBR was operated as a reference (reactor R2), another reactor (R1) was run with a reduced aeration rate and an alternating anoxic-aerobic cycle reinforced by nitrate feeding. The presence of pre-anoxic phase clearly improved the densification of aggregates and allowed granular sludge formation at reduced air flow rate (superficial air velocity (SAV)=0.63cms(-1)). A low sludge volume index (SVI(30)=45mLg(-1)) and a high MLSS concentration (9-10gL(-1)) were obtained in the anoxic/aerobic system compared to more conventional results for the aerobic reactor. A granular sludge was observed in the anoxic/aerobic system whilst only flocs were observed in the aerobic reference even when operated at a high aeration rate (SAV=2.83cms(-1)). Nitrification was maintained efficiently in the anoxic/aerobic system even when organic loading rate (OLR) was increased up to 2.8kgCODm(-3)d(-1). In the contrary nitrification was unstable in the aerobic system and dropped at high OLR due to competition between autotrophic and heterotrophic growth. The presence of a pre-anoxic period positively affected granulation process via different mechanisms: enhancing heterotrophic growth/storage deeper in the internal anoxic layer of granule, reducing the competition between autotrophic and heterotrophic growth. These processes help to develop dense granular sludge at a moderate aeration rate. This tends to confirm that oxygen transfer is the most limiting factor for granulation at reduced aeration. Hence the use of an alternative electron acceptor (nitrate or nitrite) should be encouraged during feast period for reducing energy demand of the granular sludge process.

  7. CMsearch: simultaneous exploration of protein sequence space and structure space improves not only protein homology detection but also protein structure prediction

    KAUST Repository

    Cui, Xuefeng; Lu, Zhiwu; Wang, Sheng; Jing-Yan Wang, Jim; Gao, Xin

    2016-01-01

    Motivation: Protein homology detection, a fundamental problem in computational biology, is an indispensable step toward predicting protein structures and understanding protein functions. Despite the advances in recent decades on sequence alignment

  8. Improving accuracy of Tay Sachs carrier screening of the non-Jewish population: analysis of 34 carriers and six late-onset patients with HEXA enzyme and DNA sequence analysis.

    Science.gov (United States)

    Park, Noh Jin; Morgan, Craig; Sharma, Rajesh; Li, Yuanyin; Lobo, Raynah M; Redman, Joy B; Salazar, Denise; Sun, Weimin; Neidich, Julie A; Strom, Charles M

    2010-02-01

    The purpose of this study was to determine whether combining different testing modalities namely beta-hexosaminidase A (HEXA) enzyme analysis, HEXA DNA common mutation assay, and HEXA gene sequencing could improve the sensitivity for carrier detection in non-Ashkenazi (AJ) individuals. We performed a HEXA gene sequencing assay, a HEXA DNA common mutation assay, and a HEXA enzyme assay on 34 self-reported Tay-Sachs disease (TSD) carriers, six late-onset patients with TSD, and one pseudodeficiency allele carrier. Sensitivity of TSD carrier detection was 91% for gene sequencing compared with 91% for the enzyme assay and 52% for the DNA mutation assay. Gene sequencing combined with enzyme testing had the highest sensitivity (100%) for carrier detection. Gene sequencing detected four novel mutations, three of which are predicted to be disease causing [118.delT, 965A-->T (D322V), and 775A-->G (T259A)]. Gene sequencing is useful in identifying rare mutations in patients with TSD and their families, in evaluating spouses of known carriers for TSD who have indeterminate enzyme analysis and negative for common mutation analysis, and in resolving ambiguous enzyme testing results.

  9. HIV Sequence Compendium 2015

    Energy Technology Data Exchange (ETDEWEB)

    Foley, Brian Thomas [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Leitner, Thomas Kenneth [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Apetrei, Cristian [Univ. of Pittsburgh, PA (United States); Hahn, Beatrice [Univ. of Pennsylvania, Philadelphia, PA (United States); Mizrachi, Ilene [National Center for Biotechnology Information, Bethesda, MD (United States); Mullins, James [Univ. of Washington, Seattle, WA (United States); Rambaut, Andrew [Univ. of Edinburgh, Scotland (United Kingdom); Wolinsky, Steven [Northwestern Univ., Evanston, IL (United States); Korber, Bette Tina Marie [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

    2015-10-05

    This compendium is an annual printed summary of the data contained in the HIV sequence database. We try to present a judicious selection of the data in such a way that it is of maximum utility to HIV researchers. Each of the alignments attempts to display the genetic variability within the different species, groups and subtypes of the virus. This compendium contains sequences published before January 1, 2015. Hence, though it is published in 2015 and called the 2015 Compendium, its contents correspond to the 2014 curated alignments on our website. The number of sequences in the HIV database is still increasing. In total, at the end of 2014, there were 624,121 sequences in the HIV Sequence Database, an increase of 7% since the previous year. This is the first year that the number of new sequences added to the database has decreased compared to the previous year. The number of near complete genomes (>7000 nucleotides) increased to 5834 by end of 2014. However, as in previous years, the compendium alignments contain only a fraction of these. A more complete version of all alignments is available on our website, http://www.hiv.lanl.gov/ content/sequence/NEWALIGN/align.html As always, we are open to complaints and suggestions for improvement. Inquiries and comments regarding the compendium should be addressed to seq-info@lanl.gov.

  10. Quantification of sequence exchange events between PMS2 and PMS2CL provides a basis for improved mutation scanning of Lynch syndrome patients.

    NARCIS (Netherlands)

    Klift, H.M. van der; Tops, C.M.; Bik, E.C.; Boogaard, M.W.; Borgstein, A.M.; Hansson, K.B.; Ausems, M.G.E.M.; Gomez Garcia, E.; Green, A.; Hes, F.J.; Izatt, L.; Hest, L.P. van; Alonso, A.M.; Vriends, A.H.; Wagner, A.; Zelst-Stams, W.A.G. van; Vasen, H.F.; Morreau, H.; Devilee, P.; Wijnen, J.T.

    2010-01-01

    Heterozygous mutations in PMS2 are involved in Lynch syndrome, whereas biallelic mutations are found in Constitutional mismatch repair-deficiency syndrome patients. Mutation detection is complicated by the occurrence of sequence exchange events between the duplicated regions of PMS2 and PMS2CL. We

  11. Improved annotation of 3' untranslated regions and complex loci by combination of strand-specific direct RNA sequencing, RNA-Seq and ESTs.

    Directory of Open Access Journals (Sweden)

    Nicholas J Schurch

    Full Text Available The reference annotations made for a genome sequence provide the framework for all subsequent analyses of the genome. Correct and complete annotation in addition to the underlying genomic sequence is particularly important when interpreting the results of RNA-seq experiments where short sequence reads are mapped against the genome and assigned to genes according to the annotation. Inconsistencies in annotations between the reference and the experimental system can lead to incorrect interpretation of the effect on RNA expression of an experimental treatment or mutation in the system under study. Until recently, the genome-wide annotation of 3' untranslated regions received less attention than coding regions and the delineation of intron/exon boundaries. In this paper, data produced for samples in Human, Chicken and A. thaliana by the novel single-molecule, strand-specific, Direct RNA Sequencing technology from Helicos Biosciences which locates 3' polyadenylation sites to within +/- 2 nt, were combined with archival EST and RNA-Seq data. Nine examples are illustrated where this combination of data allowed: (1 gene and 3' UTR re-annotation (including extension of one 3' UTR by 5.9 kb; (2 disentangling of gene expression in complex regions; (3 clearer interpretation of small RNA expression and (4 identification of novel genes. While the specific examples displayed here may become obsolete as genome sequences and their annotations are refined, the principles laid out in this paper will be of general use both to those annotating genomes and those seeking to interpret existing publically available annotations in the context of their own experimental data.

  12. HIV Sequence Compendium 2010

    Energy Technology Data Exchange (ETDEWEB)

    Kuiken, Carla [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Foley, Brian [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Leitner, Thomas [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Apetrei, Christian [Univ. of Pittsburgh, PA (United States); Hahn, Beatrice [Univ. of Alabama, Tuscaloosa, AL (United States); Mizrachi, Ilene [National Center for Biotechnology Information, Bethesda, MD (United States); Mullins, James [Univ. of Washington, Seattle, WA (United States); Rambaut, Andrew [Univ. of Edinburgh, Scotland (United Kingdom); Wolinsky, Steven [Northwestern Univ., Evanston, IL (United States); Korber, Bette [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

    2010-12-31

    This compendium is an annual printed summary of the data contained in the HIV sequence database. In these compendia we try to present a judicious selection of the data in such a way that it is of maximum utility to HIV researchers. Each of the alignments attempts to display the genetic variability within the different species, groups and subtypes of the virus. This compendium contains sequences published before January 1, 2010. Hence, though it is called the 2010 Compendium, its contents correspond to the 2009 curated alignments on our website. The number of sequences in the HIV database is still increasing exponentially. In total, at the time of printing, there were 339,306 sequences in the HIV Sequence Database, an increase of 45% since last year. The number of near complete genomes (>7000 nucleotides) increased to 2576 by end of 2009, reflecting a smaller increase than in previous years. However, as in previous years, the compendium alignments contain only a small fraction of these. Included in the alignments are a small number of sequences representing each of the subtypes and the more prevalent circulating recombinant forms (CRFs) such as 01 and 02, as well as a few outgroup sequences (group O and N and SIV-CPZ). Of the rarer CRFs we included one representative each. A more complete version of all alignments is available on our website, http://www.hiv.lanl.gov/content/sequence/NEWALIGN/align.html. Reprints are available from our website in the form of both HTML and PDF files. As always, we are open to complaints and suggestions for improvement. Inquiries and comments regarding the compendium should be addressed to seq-info@lanl.gov.

  13. Genomic sequencing in clinical trials

    OpenAIRE

    Mestan, Karen K; Ilkhanoff, Leonard; Mouli, Samdeep; Lin, Simon

    2011-01-01

    Abstract Human genome sequencing is the process by which the exact order of nucleic acid base pairs in the 24 human chromosomes is determined. Since the completion of the Human Genome Project in 2003, genomic sequencing is rapidly becoming a major part of our translational research efforts to understand and improve human health and disease. This article reviews the current and future directions of clinical research with respect to genomic sequencing, a technology that is just beginning to fin...

  14. Can value for money be improved by changing the sequence of our donor work-up in the living kidney donor programme?

    DEFF Research Database (Denmark)

    Larsen, J.; Sorensen, S.S.; Feldt-Rasmussen, B.

    2009-01-01

    and December 2006 at our department. The cost in euro (euro) for the programme was estimated using the Danish diagnosis-related group-system (DRG). The donor work-up programme was described. One hundred and thirty-three potential donors were identified; 66 male- and 67 female subjects, median age of 52 years......The aim of the study was to identify procedures of maximum importance for acceptance or rejection of kidney donation from a living donor as well as making the process more cost-effective. We identified all potential living related donors who were examined during the period between January 2002...... was the procedure identifying most subjects who were unsuited for kidney donation. A rearrangement of the present donor work-up programme could potentially reduce the costs from euro6911 to euro5292 per donor--saving 23% of the costs. By changing the sequence of examinations, it might be possible to cut down...

  15. Can value for money be improved by changing the sequence of our donor work-up in the living kidney donor programme?

    DEFF Research Database (Denmark)

    Larsen, Jesper; Sørensen, Søren Schwartz; Feldt-Rasmussen, Bo

    2009-01-01

    (range 22-69). Sixty-four participants were rejected as donors. Abdominal CT-scan with angiography and urography ruled out 22 of the above 64 potential organ donors; thus, 48% of the volunteers for living kidney donation were unsuited for donation. Abdominal CT-scan with angiography and urography......The aim of the study was to identify procedures of maximum importance for acceptance or rejection of kidney donation from a living donor as well as making the process more cost-effective. We identified all potential living related donors who were examined during the period between January 2002...... was the procedure identifying most subjects who were unsuited for kidney donation. A rearrangement of the present donor work-up programme could potentially reduce the costs from euro6911 to euro5292 per donor--saving 23% of the costs. By changing the sequence of examinations, it might be possible to cut down...

  16. HFE gene mutations in patients with primary iron overload: is there a significant improvement in molecular diagnosis yield with HFE sequencing?

    Science.gov (United States)

    Santos, Paulo C J L; Pereira, Alexandre C; Cançado, Rodolfo D; Schettert, Isolmar T; Sobreira, Tiago J P; Oliveira, Paulo S L; Hirata, Rosario D C; Hirata, Mario H; Figueiredo, Maria Stella; Chiattone, Carlos S; Krieger, Jose E; Guerra-Shinohara, Elvira M

    2010-12-15

    Rare HFE variants have been shown to be associated with hereditary hemochromatosis (HH), an iron overload disease. The low frequency of the HFE p.C282Y mutation in HH-affected Brazilian patients may suggest that other HFE-related mutations may also be implicated in the pathogenesis of HH in this population. The main aim was to screen for new HFE mutations in Brazilian individuals with primary iron overload and to investigate their relationship with HH. Fifty Brazilian patients with primary iron overload (transferrin saturation>50% in females and 60% in males) were selected. Subsequent bidirectional sequencing for each HFE exon was performed. The effect of HFE mutations on protein structure were analyzed by molecular dynamics simulation and free binding energy calculations. p.C282Y in homozygosis or in heterozygosis with p.H63D were the most frequent genotypic combinations associated with HH in our sample population (present in 17 individuals, 34%). Thirty-six (72.0%) out of the 50 individuals presented at least one HFE mutation. The most frequent genotype associated with HH was the homozygous p.C282Y mutation (n=11, 22.0%). One novel mutation (p.V256I) was indentified in heterozygosis with the p.H63D mutation. In silico modeling analysis of protein behavior indicated that the p.V256I mutation does not reduce the binding affinity between HFE and β2-microglobulin (β2M) in the same way the p.C282Y mutation does compared with the native HFE protein. In conclusion, screening of HFE through direct sequencing, as compared to p.C282Y/p.H63D genotyping, was not able to increase the molecular diagnosis yield of HH. The novel p.V256I mutation could not be implicated in the molecular basis of the HH phenotype, although its role cannot be completely excluded in HH-phenotype development. Our molecular modeling analysis can help in the analysis of novel, previously undescribed, HFE mutations. Copyright © 2010 Elsevier Inc. All rights reserved.

  17. Can value for money be improved by changing the sequence of our donor work-up in the living kidney donor programme?

    Science.gov (United States)

    Larsen, Jesper; Sørensen, Søren Schwartz; Feldt-Rasmussen, Bo

    2009-08-01

    The aim of the study was to identify procedures of maximum importance for acceptance or rejection of kidney donation from a living donor as well as making the process more cost-effective. We identified all potential living related donors who were examined during the period between January 2002 and December 2006 at our department. The cost in euro (euro) for the programme was estimated using the Danish diagnosis-related group-system (DRG). The donor work-up programme was described. One hundred and thirty-three potential donors were identified; 66 male- and 67 female subjects, median age of 52 years (range 22-69). Sixty-four participants were rejected as donors. Abdominal CT-scan with angiography and urography ruled out 22 of the above 64 potential organ donors; thus, 48% of the volunteers for living kidney donation were unsuited for donation. Abdominal CT-scan with angiography and urography was the procedure identifying most subjects who were unsuited for kidney donation. A rearrangement of the present donor work-up programme could potentially reduce the costs from euro6911 to euro5292 per donor--saving 23% of the costs. By changing the sequence of examinations, it might be possible to cut down on time spent and number of tests needed for approving or rejecting subjects for living kidney donation.

  18. Sequencing bilateral robot-assisted arm therapy and constraint-induced therapy improves reach to press and trunk kinematics in patients with stroke.

    Science.gov (United States)

    Hsieh, Yu-wei; Liing, Rong-jiuan; Lin, Keh-chung; Wu, Ching-yi; Liou, Tsan-hon; Lin, Jui-chi; Hung, Jen-wen

    2016-03-22

    The combination of robot-assisted therapy (RT) and a modified form of constraint-induced therapy (mCIT) shows promise for improving motor function of patients with stroke. However, whether the changes of motor control strategies are concomitant with the improvements in motor function after combination of RT and mCIT (RT + mCIT) is unclear. This study investigated the effects of the sequential combination of RT + mCIT compared with RT alone on the strategies of motor control measured by kinematic analysis and on motor function and daily performance measured by clinical scales. The study enrolled 34 patients with chronic stroke. The data were derived from part of a single-blinded randomized controlled trial. Participants in the RT + mCIT and RT groups received 20 therapy sessions (90 to 105 min/day, 5 days for 4 weeks). Patients in the RT + mCIT group received 10 RT sessions for first 2 weeks and 10 mCIT sessions for the next 2 weeks. The Bi-Manu-Track was used in RT sessions to provide bilateral practice of wrist and forearm movements. The primary outcome was kinematic variables in a task of reaching to press a desk bell. Secondary outcomes included scores on the Wolf Motor Function Test, Functional Independence Measure, and Nottingham Extended Activities of Daily Living. All outcome measures were administered before and after intervention. RT + mCIT and RT demonstrated different benefits on motor control strategies. RT + mCIT uniquely improved motor control strategies by reducing shoulder abduction, increasing elbow extension, and decreasing trunk compensatory movement during the reaching task. Motor function and quality of the affected limb was improved, and patients achieved greater independence in instrumental activities of daily living. Force generation at movement initiation was improved in the patients who received RT. A combination of RT and mCIT could be an effective approach to improve stroke rehabilitation outcomes, achieving

  19. Improving validation methods for molecular diagnostics: application of Bland-Altman, Deming and simple linear regression analyses in assay comparison and evaluation for next-generation sequencing.

    Science.gov (United States)

    Misyura, Maksym; Sukhai, Mahadeo A; Kulasignam, Vathany; Zhang, Tong; Kamel-Reid, Suzanne; Stockley, Tracy L

    2018-02-01

    A standard approach in test evaluation is to compare results of the assay in validation to results from previously validated methods. For quantitative molecular diagnostic assays, comparison of test values is often performed using simple linear regression and the coefficient of determination (R 2 ), using R 2 as the primary metric of assay agreement. However, the use of R 2 alone does not adequately quantify constant or proportional errors required for optimal test evaluation. More extensive statistical approaches, such as Bland-Altman and expanded interpretation of linear regression methods, can be used to more thoroughly compare data from quantitative molecular assays. We present the application of Bland-Altman and linear regression statistical methods to evaluate quantitative outputs from next-generation sequencing assays (NGS). NGS-derived data sets from assay validation experiments were used to demonstrate the utility of the statistical methods. Both Bland-Altman and linear regression were able to detect the presence and magnitude of constant and proportional error in quantitative values of NGS data. Deming linear regression was used in the context of assay comparison studies, while simple linear regression was used to analyse serial dilution data. Bland-Altman statistical approach was also adapted to quantify assay accuracy, including constant and proportional errors, and precision where theoretical and empirical values were known. The complementary application of the statistical methods described in this manuscript enables more extensive evaluation of performance characteristics of quantitative molecular assays, prior to implementation in the clinical molecular laboratory. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  20. Digital Recovery Sequencer - Advanced Concept Ejection Seats

    National Research Council Canada - National Science Library

    Ross, David A; Cotter, Lee; Culhane, David; Press, Matthew J

    2005-01-01

    .... Continued usage of the Analog Sequencer is undesirable due to limitations with respect to its installed life, electronic component obsolescence, flexibility to accommodate seat safety improvements...

  1. Optimization of de novo transcriptome assembly from high-throughput short read sequencing data improves functional annotation for non-model organisms

    Directory of Open Access Journals (Sweden)

    Haznedaroglu Berat Z

    2012-07-01

    Full Text Available Abstract Background The k-mer hash length is a key factor affecting the output of de novo transcriptome assembly packages using de Bruijn graph algorithms. Assemblies constructed with varying single k-mer choices might result in the loss of unique contiguous sequences (contigs and relevant biological information. A common solution to this problem is the clustering of single k-mer assemblies. Even though annotation is one of the primary goals of a transcriptome assembly, the success of assembly strategies does not consider the impact of k-mer selection on the annotation output. This study provides an in-depth k-mer selection analysis that is focused on the degree of functional annotation achieved for a non-model organism where no reference genome information is available. Individual k-mers and clustered assemblies (CA were considered using three representative software packages. Pair-wise comparison analyses (between individual k-mers and CAs were produced to reveal missing Kyoto Encyclopedia of Genes and Genomes (KEGG ortholog identifiers (KOIs, and to determine a strategy that maximizes the recovery of biological information in a de novo transcriptome assembly. Results Analyses of single k-mer assemblies resulted in the generation of various quantities of contigs and functional annotations within the selection window of k-mers (k-19 to k-63. For each k-mer in this window, generated assemblies contained certain unique contigs and KOIs that were not present in the other k-mer assemblies. Producing a non-redundant CA of k-mers 19 to 63 resulted in a more complete functional annotation than any single k-mer assembly. However, a fraction of unique annotations remained (~0.19 to 0.27% of total KOIs in the assemblies of individual k-mers (k-19 to k-63 that were not present in the non-redundant CA. A workflow to recover these unique annotations is presented. Conclusions This study demonstrated that different k-mer choices result in various quantities

  2. A real-time PCR assay with improved specificity for detection and discrimination of all clinically relevant Bordetella species by the presence and distribution of three Insertion Sequence elements

    Directory of Open Access Journals (Sweden)

    Ossewaarde Jacobus M

    2011-01-01

    Full Text Available Abstract Background In Dutch laboratories molecular detection of B. pertussis and B. parapertussis is commonly based on insertion sequences IS481 and IS1001, respectively. Both IS elements are more widely spread among Bordetella species. Both Bordetella holmesii, and B. bronchiseptica can harbour IS481. Also, IS1001 is found among B. bronchiseptica. IS481, and IS1001 based PCR thus lacks specificity when used for detection of specific Bordetella spp. Findings We designed a PCR based on IS1002, another IS element that is present among Bordetella species, and exploited it as a template in combination with PCR for IS481, and IS1001. In combining the PCRs for IS481, IS1001, and IS1002, and including an inhibition control, we were able to detect and discriminate all clinically relevant Bordetella species. Conclusions We developed an improved PCR method for specific detection of B. pertussis, B. parapertussis, B. holmesii, and B. bronchiseptica.

  3. Shotgun protein sequencing.

    Energy Technology Data Exchange (ETDEWEB)

    Faulon, Jean-Loup Michel; Heffelfinger, Grant S.

    2009-06-01

    A novel experimental and computational technique based on multiple enzymatic digestion of a protein or protein mixture that reconstructs protein sequences from sequences of overlapping peptides is described in this SAND report. This approach, analogous to shotgun sequencing of DNA, is to be used to sequence alternative spliced proteins, to identify post-translational modifications, and to sequence genetically engineered proteins.

  4. Covariance of charged amino acids at positions 322 and 440 of HIV-1 Env contributes to coreceptor specificity of subtype B viruses, and can be used to improve the performance of V3 sequence-based coreceptor usage prediction algorithms.

    Directory of Open Access Journals (Sweden)

    Kieran Cashin

    Full Text Available The ability to determine coreceptor usage of patient-derived human immunodeficiency virus type 1 (HIV-1 strains is clinically important, particularly for the administration of the CCR5 antagonist maraviroc. The envelope glycoprotein (Env determinants of coreceptor specificity lie primarily within the gp120 V3 loop region, although other Env determinants have been shown to influence gp120-coreceptor interactions. Here, we determined whether conserved amino acid alterations outside the V3 loop that contribute to coreceptor usage exist, and whether these alterations improve the performance of V3 sequence-based coreceptor usage prediction algorithms. We demonstrate a significant covariant association between charged amino acids at position 322 in V3 and position 440 in the C4 Env region that contributes to the specificity of HIV-1 subtype B strains for CCR5 or CXCR4. Specifically, positively charged Lys/Arg at position 322 and negatively charged Asp/Glu at position 440 occurred more frequently in CXCR4-using viruses, whereas negatively charged Asp/Glu at position 322 and positively charged Arg at position 440 occurred more frequently in R5 strains. In the context of CD4-bound gp120, structural models suggest that covariation of amino acids at Env positions 322 and 440 has the potential to alter electrostatic interactions that are formed between gp120 and charged amino acids in the CCR5 N-terminus. We further demonstrate that inclusion of a "440 rule" can improve the sensitivity of several V3 sequence-based genotypic algorithms for predicting coreceptor usage of subtype B HIV-1 strains, without compromising specificity, and significantly improves the AUROC of the geno2pheno algorithm when set to its recommended false positive rate of 5.75%. Together, our results provide further mechanistic insights into the intra-molecular interactions within Env that contribute to coreceptor specificity of subtype B HIV-1 strains, and demonstrate that incorporation

  5. Multimodal sequence learning.

    Science.gov (United States)

    Kemény, Ferenc; Meier, Beat

    2016-02-01

    While sequence learning research models complex phenomena, previous studies have mostly focused on unimodal sequences. The goal of the current experiment is to put implicit sequence learning into a multimodal context: to test whether it can operate across different modalities. We used the Task Sequence Learning paradigm to test whether sequence learning varies across modalities, and whether participants are able to learn multimodal sequences. Our results show that implicit sequence learning is very similar regardless of the source modality. However, the presence of correlated task and response sequences was required for learning to take place. The experiment provides new evidence for implicit sequence learning of abstract conceptual representations. In general, the results suggest that correlated sequences are necessary for implicit sequence learning to occur. Moreover, they show that elements from different modalities can be automatically integrated into one unitary multimodal sequence. Copyright © 2015 Elsevier B.V. All rights reserved.

  6. Sequence Read Archive (SRA)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Sequence Read Archive (SRA) stores raw sequencing data from the next generation of sequencing platforms including Roche 454 GS System®, Illumina Genome...

  7. Nonparametric combinatorial sequence models.

    Science.gov (United States)

    Wauthier, Fabian L; Jordan, Michael I; Jojic, Nebojsa

    2011-11-01

    This work considers biological sequences that exhibit combinatorial structures in their composition: groups of positions of the aligned sequences are "linked" and covary as one unit across sequences. If multiple such groups exist, complex interactions can emerge between them. Sequences of this kind arise frequently in biology but methodologies for analyzing them are still being developed. This article presents a nonparametric prior on sequences which allows combinatorial structures to emerge and which induces a posterior distribution over factorized sequence representations. We carry out experiments on three biological sequence families which indicate that combinatorial structures are indeed present and that combinatorial sequence models can more succinctly describe them than simpler mixture models. We conclude with an application to MHC binding prediction which highlights the utility of the posterior distribution over sequence representations induced by the prior. By integrating out the posterior, our method compares favorably to leading binding predictors.

  8. Long sequence correlation coprocessor

    Science.gov (United States)

    Gage, Douglas W.

    1994-09-01

    A long sequence correlation coprocessor (LSCC) accelerates the bitwise correlation of arbitrarily long digital sequences by calculating in parallel the correlation score for 16, for example, adjacent bit alignments between two binary sequences. The LSCC integrated circuit is incorporated into a computer system with memory storage buffers and a separate general purpose computer processor which serves as its controller. Each of the LSCC's set of sequential counters simultaneously tallies a separate correlation coefficient. During each LSCC clock cycle, computer enable logic associated with each counter compares one bit of a first sequence with one bit of a second sequence to increment the counter if the bits are the same. A shift register assures that the same bit of the first sequence is simultaneously compared to different bits of the second sequence to simultaneously calculate the correlation coefficient by the different counters to represent different alignments of the two sequences.

  9. Roles of repetitive sequences

    Energy Technology Data Exchange (ETDEWEB)

    Bell, G.I.

    1991-12-31

    The DNA of higher eukaryotes contains many repetitive sequences. The study of repetitive sequences is important, not only because many have important biological function, but also because they provide information on genome organization, evolution and dynamics. In this paper, I will first discuss some generic effects that repetitive sequences will have upon genome dynamics and evolution. In particular, it will be shown that repetitive sequences foster recombination among, and turnover of, the elements of a genome. I will then consider some examples of repetitive sequences, notably minisatellite sequences and telomere sequences as examples of tandem repeats, without and with respectively known function, and Alu sequences as an example of interspersed repeats. Some other examples will also be considered in less detail.

  10. Anomaly Detection in Sequences

    Data.gov (United States)

    National Aeronautics and Space Administration — We present a set of novel algorithms which we call sequenceMiner, that detect and characterize anomalies in large sets of high-dimensional symbol sequences that...

  11. DNA sequencing conference, 2

    Energy Technology Data Exchange (ETDEWEB)

    Cook-Deegan, R.M. [Georgetown Univ., Kennedy Inst. of Ethics, Washington, DC (United States); Venter, J.C. [National Inst. of Neurological Disorders and Strokes, Bethesda, MD (United States); Gilbert, W. [Harvard Univ., Cambridge, MA (United States); Mulligan, J. [Stanford Univ., CA (United States); Mansfield, B.K. [Oak Ridge National Lab., TN (United States)

    1991-06-19

    This conference focused on DNA sequencing, genetic linkage mapping, physical mapping, informatics and bioethics. Several were used to study this sequencing and mapping. This article also discusses computer hardware and software aiding in the mapping of genes.

  12. sequenceMiner algorithm

    Data.gov (United States)

    National Aeronautics and Space Administration — Detecting and describing anomalies in large repositories of discrete symbol sequences. sequenceMiner has been open-sourced! Download the file below to try it out....

  13. Genome Sequences of Oryza Species

    KAUST Repository

    Kumagai, Masahiko; Tanaka, Tsuyoshi; Ohyanagi, Hajime; Hsing, Yue-Ie C.; Itoh, Takeshi

    2018-01-01

    This chapter summarizes recent data obtained from genome sequencing, annotation projects, and studies on the genome diversity of Oryza sativa and related Oryza species. O. sativa, commonly known as Asian rice, is the first monocot species whose complete genome sequence was deciphered based on physical mapping by an international collaborative effort. This genome, along with its accurate and comprehensive annotation, has become an indispensable foundation for crop genomics and breeding. With the development of innovative sequencing technologies, genomic studies of O. sativa have dramatically increased; in particular, a large number of cultivars and wild accessions have been sequenced and compared with the reference rice genome. Since de novo genome sequencing has become cost-effective, the genome of African cultivated rice, O. glaberrima, has also been determined. Comparative genomic studies have highlighted the independent domestication processes of different rice species, but it also turned out that Asian and African rice share a common gene set that has experienced similar artificial selection. An international project aimed at constructing reference genomes and examining the genome diversity of wild Oryza species is currently underway, and the genomes of some species are publicly available. This project provides a platform for investigations such as the evolution, development, polyploidization, and improvement of crops. Studies on the genomic diversity of Oryza species, including wild species, should provide new insights to solve the problem of growing food demands in the face of rapid climatic changes.

  14. Genome Sequences of Oryza Species

    KAUST Repository

    Kumagai, Masahiko

    2018-02-14

    This chapter summarizes recent data obtained from genome sequencing, annotation projects, and studies on the genome diversity of Oryza sativa and related Oryza species. O. sativa, commonly known as Asian rice, is the first monocot species whose complete genome sequence was deciphered based on physical mapping by an international collaborative effort. This genome, along with its accurate and comprehensive annotation, has become an indispensable foundation for crop genomics and breeding. With the development of innovative sequencing technologies, genomic studies of O. sativa have dramatically increased; in particular, a large number of cultivars and wild accessions have been sequenced and compared with the reference rice genome. Since de novo genome sequencing has become cost-effective, the genome of African cultivated rice, O. glaberrima, has also been determined. Comparative genomic studies have highlighted the independent domestication processes of different rice species, but it also turned out that Asian and African rice share a common gene set that has experienced similar artificial selection. An international project aimed at constructing reference genomes and examining the genome diversity of wild Oryza species is currently underway, and the genomes of some species are publicly available. This project provides a platform for investigations such as the evolution, development, polyploidization, and improvement of crops. Studies on the genomic diversity of Oryza species, including wild species, should provide new insights to solve the problem of growing food demands in the face of rapid climatic changes.

  15. Readout-segmented echo-planar imaging improves the image quality of diffusion-weighted MR imaging in rectal cancer: Comparison with single-shot echo-planar diffusion-weighted sequences

    Energy Technology Data Exchange (ETDEWEB)

    Xia, Chun-chao; Liu, Xi; Peng, Wan-lin; Li, Lei; Zhang, Jin-ge [Department of Radiology, West China Hospital, Sichuan University, 37# Guo Xue Xiang, Chengdu, Sichuan 610041 (China); Meng, Wen-jian; Deng, Xiang-bing [Department of Gastrointestinal Surgery, West China Hospital, Sichuan University, 37# Guo Xue Xiang, Chengdu, Sichuan 610041 (China); Zuo, Pan-li [Siemens Healthcare, MR Collaborations NE Asia, 100010, Beijing (China); Li, Zhen-lin, E-mail: lzlcd01@126.com [Department of Radiology, West China Hospital, Sichuan University, 37# Guo Xue Xiang, Chengdu, Sichuan 610041 (China)

    2016-10-15

    Purpose: To determine whether readout-segmented echo-planar imaging (rs-EPI) diffusion-weighted imaging (DWI) can improve the image quality in patients with rectal cancer compared with single-shot echo-planar imaging (ss-EPI) DWI using 3.0 T magnetic resonance (MR) imaging. Materials and methods: This study was approved by the Institutional Review Board, and informed consent was obtained from all patients. Seventy-one patients with rectal cancer were enrolled in this study. For all patients, both rs-EPI and ss-EPI DWI were performed using a 3T MR scanner. Two radiologists independently assessed the overall image quality, lesion conspicuity, geometric distortion and distinction of anatomical structures. The signal-to-noise ratio (SNR), lesion contrast, contrast-to-noise ratio (CNR), and apparent diffusion coefficient (ADC) were also measured. Comparisons of the quantitative and qualitative parameters between the two sequences were performed using the paired t-test and the Wilcoxon signed rank test. Results: The scores of overall image quality, lesion conspicuity, geometric distortion and distinction of anatomical structures of rs-EPI were all significantly higher than those of ss-EPI (all p < 0.05). The SNR and CNR were higher in rs-EPI than those in ss-EPI (all p < 0.05). There was no significant difference between ss-EPI and rs-EPI with regard to ROI size and mean ADCs of the tumour (p = 0.574 and p = 0.479, respectively), but the mean ADC of the normal tissue was higher in rs-EPI than in ss-EPI (1.73 ± 0.30 × 10{sup −3} mm{sup 2}/s vs. 1.60 ± 0.31 × 10{sup −3} mm{sup 2}/s, p = 0.001). Conclusions: DW imaging based on readout-segmented echo-planar imaging is a clinically useful technique to improve the image quality for the purpose of evaluating lesions in patients with rectal tumours.

  16. Exome sequencing and genetic testing for MODY.

    Directory of Open Access Journals (Sweden)

    Stefan Johansson

    Full Text Available Genetic testing for monogenic diabetes is important for patient care. Given the extensive genetic and clinical heterogeneity of diabetes, exome sequencing might provide additional diagnostic potential when standard Sanger sequencing-based diagnostics is inconclusive.The aim of the study was to examine the performance of exome sequencing for a molecular diagnosis of MODY in patients who have undergone conventional diagnostic sequencing of candidate genes with negative results.We performed exome enrichment followed by high-throughput sequencing in nine patients with suspected MODY. They were Sanger sequencing-negative for mutations in the HNF1A, HNF4A, GCK, HNF1B and INS genes. We excluded common, non-coding and synonymous gene variants, and performed in-depth analysis on filtered sequence variants in a pre-defined set of 111 genes implicated in glucose metabolism.On average, we obtained 45 X median coverage of the entire targeted exome and found 199 rare coding variants per individual. We identified 0-4 rare non-synonymous and nonsense variants per individual in our a priori list of 111 candidate genes. Three of the variants were considered pathogenic (in ABCC8, HNF4A and PPARG, respectively, thus exome sequencing led to a genetic diagnosis in at least three of the nine patients. Approximately 91% of known heterozygous SNPs in the target exomes were detected, but we also found low coverage in some key diabetes genes using our current exome sequencing approach. Novel variants in the genes ARAP1, GLIS3, MADD, NOTCH2 and WFS1 need further investigation to reveal their possible role in diabetes.Our results demonstrate that exome sequencing can improve molecular diagnostics of MODY when used as a complement to Sanger sequencing. However, improvements will be needed, especially concerning coverage, before the full potential of exome sequencing can be realized.

  17. Sequencing of BAC pools by different next generation sequencing platforms and strategies

    Directory of Open Access Journals (Sweden)

    Scholz Uwe

    2011-10-01

    Full Text Available Abstract Background Next generation sequencing of BACs is a viable option for deciphering the sequence of even large and highly repetitive genomes. In order to optimize this strategy, we examined the influence of read length on the quality of Roche/454 sequence assemblies, to what extent Illumina/Solexa mate pairs (MPs improve the assemblies by scaffolding and whether barcoding of BACs is dispensable. Results Sequencing four BACs with both FLX and Titanium technologies revealed similar sequencing accuracy, but showed that the longer Titanium reads produce considerably less misassemblies and gaps. The 454 assemblies of 96 barcoded BACs were improved by scaffolding 79% of the total contig length with MPs from a non-barcoded library. Assembly of the unmasked 454 sequences without separation by barcodes revealed chimeric contig formation to be a major problem, encompassing 47% of the total contig length. Masking the sequences reduced this fraction to 24%. Conclusion Optimal BAC pool sequencing should be based on the longest available reads, with barcoding essential for a comprehensive assessment of both repetitive and non-repetitive sequence information. When interest is restricted to non-repetitive regions and repeats are masked prior to assembly, barcoding is non-essential. In any case, the assemblies can be improved considerably by scaffolding with non-barcoded BAC pool MPs.

  18. Quantitative phenotyping via deep barcode sequencing.

    Science.gov (United States)

    Smith, Andrew M; Heisler, Lawrence E; Mellor, Joseph; Kaper, Fiona; Thompson, Michael J; Chee, Mark; Roth, Frederick P; Giaever, Guri; Nislow, Corey

    2009-10-01

    Next-generation DNA sequencing technologies have revolutionized diverse genomics applications, including de novo genome sequencing, SNP detection, chromatin immunoprecipitation, and transcriptome analysis. Here we apply deep sequencing to genome-scale fitness profiling to evaluate yeast strain collections in parallel. This method, Barcode analysis by Sequencing, or "Bar-seq," outperforms the current benchmark barcode microarray assay in terms of both dynamic range and throughput. When applied to a complex chemogenomic assay, Bar-seq quantitatively identifies drug targets, with performance superior to the benchmark microarray assay. We also show that Bar-seq is well-suited for a multiplex format. We completely re-sequenced and re-annotated the yeast deletion collection using deep sequencing, found that approximately 20% of the barcodes and common priming sequences varied from expectation, and used this revised list of barcode sequences to improve data quality. Together, this new assay and analysis routine provide a deep-sequencing-based toolkit for identifying gene-environment interactions on a genome-wide scale.

  19. Sequences for Student Investigation

    Science.gov (United States)

    Barton, Jeffrey; Feil, David; Lartigue, David; Mullins, Bernadette

    2004-01-01

    We describe two classes of sequences that give rise to accessible problems for undergraduate research. These problems may be understood with virtually no prerequisites and are well suited for computer-aided investigation. The first sequence is a variation of one introduced by Stephen Wolfram in connection with his study of cellular automata. The…

  20. Sequence History Update Tool

    Science.gov (United States)

    Khanampompan, Teerapat; Gladden, Roy; Fisher, Forest; DelGuercio, Chris

    2008-01-01

    The Sequence History Update Tool performs Web-based sequence statistics archiving for Mars Reconnaissance Orbiter (MRO). Using a single UNIX command, the software takes advantage of sequencing conventions to automatically extract the needed statistics from multiple files. This information is then used to populate a PHP database, which is then seamlessly formatted into a dynamic Web page. This tool replaces a previous tedious and error-prone process of manually editing HTML code to construct a Web-based table. Because the tool manages all of the statistics gathering and file delivery to and from multiple data sources spread across multiple servers, there is also a considerable time and effort savings. With the use of The Sequence History Update Tool what previously took minutes is now done in less than 30 seconds, and now provides a more accurate archival record of the sequence commanding for MRO.

  1. Improving taxonomic accuracy for fungi in public sequence databases: applying ‘one name one species’ in well-defined genera with Trichoderma/Hypocrea as a test case

    Science.gov (United States)

    Strope, Pooja K; Chaverri, Priscila; Gazis, Romina; Ciufo, Stacy; Domrachev, Michael; Schoch, Conrad L

    2017-01-01

    Abstract The ITS (nuclear ribosomal internal transcribed spacer) RefSeq database at the National Center for Biotechnology Information (NCBI) is dedicated to the clear association between name, specimen and sequence data. This database is focused on sequences obtained from type material stored in public collections. While the initial ITS sequence curation effort together with numerous fungal taxonomy experts attempted to cover as many orders as possible, we extended our latest focus to the family and genus ranks. We focused on Trichoderma for several reasons, mainly because the asexual and sexual synonyms were well documented, and a list of proposed names and type material were recently proposed and published. In this case study the recent taxonomic information was applied to do a complete taxonomic audit for the genus Trichoderma in the NCBI Taxonomy database. A name status report is available here: https://www.ncbi.nlm.nih.gov/Taxonomy/TaxIdentifier/tax_identifier.cgi. As a result, the ITS RefSeq Targeted Loci database at NCBI has been augmented with more sequences from type and verified material from Trichoderma species. Additionally, to aid in the cross referencing of data from single loci and genomes we have collected a list of quality records of the RPB2 gene obtained from type material in GenBank that could help validate future submissions. During the process of curation misidentified genomes were discovered, and sequence records from type material were found hidden under previous classifications. Source metadata curation, although more cumbersome, proved to be useful as confirmation of the type material designation. Database URL: http://www.ncbi.nlm.nih.gov/bioproject/PRJNA177353 PMID:29220466

  2. Droplet Digital™ PCR Next-Generation Sequencing Library QC Assay.

    Science.gov (United States)

    Heredia, Nicholas J

    2018-01-01

    Digital PCR is a valuable tool to quantify next-generation sequencing (NGS) libraries precisely and accurately. Accurately quantifying NGS libraries enable accurate loading of the libraries on to the sequencer and thus improve sequencing performance by reducing under and overloading error. Accurate quantification also benefits users by enabling uniform loading of indexed/barcoded libraries which in turn greatly improves sequencing uniformity of the indexed/barcoded samples. The advantages gained by employing the Droplet Digital PCR (ddPCR™) library QC assay includes the precise and accurate quantification in addition to size quality assessment, enabling users to QC their sequencing libraries with confidence.

  3. Mapping sequences by parts

    Directory of Open Access Journals (Sweden)

    Guziolowski Carito

    2007-09-01

    Full Text Available Abstract Background: We present the N-map method, a pairwise and asymmetrical approach which allows us to compare sequences by taking into account evolutionary events that produce shuffled, reversed or repeated elements. Basically, the optimal N-map of a sequence s over a sequence t is the best way of partitioning the first sequence into N parts and placing them, possibly complementary reversed, over the second sequence in order to maximize the sum of their gapless alignment scores. Results: We introduce an algorithm computing an optimal N-map with time complexity O (|s| × |t| × N using O (|s| × |t| × N memory space. Among all the numbers of parts taken in a reasonable range, we select the value N for which the optimal N-map has the most significant score. To evaluate this significance, we study the empirical distributions of the scores of optimal N-maps and show that they can be approximated by normal distributions with a reasonable accuracy. We test the functionality of the approach over random sequences on which we apply artificial evolutionary events. Practical Application: The method is illustrated with four case studies of pairs of sequences involving non-standard evolutionary events.

  4. The Colliding Beams Sequencer

    International Nuclear Information System (INIS)

    Johnson, D.E.; Johnson, R.P.

    1989-01-01

    The Colliding Beam Sequencer (CBS) is a computer program used to operate the pbar-p Collider by synchronizing the applications programs and simulating the activities of the accelerator operators during filling and storage. The Sequencer acts as a meta-program, running otherwise stand alone applications programs, to do the set-up, beam transfers, acceleration, low beta turn on, and diagnostics for the transfers and storage. The Sequencer and its operational performance will be described along with its special features which include a periodic scheduler and command logger. 14 refs., 3 figs

  5. Phylogenetic Trees From Sequences

    Science.gov (United States)

    Ryvkin, Paul; Wang, Li-San

    In this chapter, we review important concepts and approaches for phylogeny reconstruction from sequence data.We first cover some basic definitions and properties of phylogenetics, and briefly explain how scientists model sequence evolution and measure sequence divergence. We then discuss three major approaches for phylogenetic reconstruction: distance-based phylogenetic reconstruction, maximum parsimony, and maximum likelihood. In the third part of the chapter, we review how multiple phylogenies are compared by consensus methods and how to assess confidence using bootstrapping. At the end of the chapter are two sections that list popular software packages and additional reading.

  6. Computer-aided visualization and analysis system for sequence evaluation

    Energy Technology Data Exchange (ETDEWEB)

    Chee, Mark S.; Wang, Chunwei; Jevons, Luis C.; Bernhart, Derek H.; Lipshutz, Robert J.

    2004-05-11

    A computer system for analyzing nucleic acid sequences is provided. The computer system is used to perform multiple methods for determining unknown bases by analyzing the fluorescence intensities of hybridized nucleic acid probes. The results of individual experiments are improved by processing nucleic acid sequences together. Comparative analysis of multiple experiments is also provided by displaying reference sequences in one area and sample sequences in another area on a display device.

  7. Compressing DNA sequence databases with coil

    Directory of Open Access Journals (Sweden)

    Hendy Michael D

    2008-05-01

    Full Text Available Abstract Background Publicly available DNA sequence databases such as GenBank are large, and are growing at an exponential rate. The sheer volume of data being dealt with presents serious storage and data communications problems. Currently, sequence data is usually kept in large "flat files," which are then compressed using standard Lempel-Ziv (gzip compression – an approach which rarely achieves good compression ratios. While much research has been done on compressing individual DNA sequences, surprisingly little has focused on the compression of entire databases of such sequences. In this study we introduce the sequence database compression software coil. Results We have designed and implemented a portable software package, coil, for compressing and decompressing DNA sequence databases based on the idea of edit-tree coding. coil is geared towards achieving high compression ratios at the expense of execution time and memory usage during compression – the compression time represents a "one-off investment" whose cost is quickly amortised if the resulting compressed file is transmitted many times. Decompression requires little memory and is extremely fast. We demonstrate a 5% improvement in compression ratio over state-of-the-art general-purpose compression tools for a large GenBank database file containing Expressed Sequence Tag (EST data. Finally, coil can efficiently encode incremental additions to a sequence database. Conclusion coil presents a compelling alternative to conventional compression of flat files for the storage and distribution of DNA sequence databases having a narrow distribution of sequence lengths, such as EST data. Increasing compression levels for databases having a wide distribution of sequence lengths is a direction for future work.

  8. Enhanced throughput for infrared automated DNA sequencing

    Science.gov (United States)

    Middendorf, Lyle R.; Gartside, Bill O.; Humphrey, Pat G.; Roemer, Stephen C.; Sorensen, David R.; Steffens, David L.; Sutter, Scott L.

    1995-04-01

    Several enhancements have been developed and applied to infrared automated DNA sequencing resulting in significantly higher throughput. A 41 cm sequencing gel (31 cm well- to-read distance) combines high resolution of DNA sequencing fragments with optimized run times yielding two runs per day of 500 bases per sample. A 66 cm sequencing gel (56 cm well-to-read distance) produces sequence read lengths of up to 1000 bases for ds and ss templates using either T7 polymerase or cycle-sequencing protocols. Using a multichannel syringe to load 64 lanes allows 16 samples (compatible with 96-well format) to be visualized for each run. The 41 cm gel configuration allows 16,000 bases per day (16 samples X 500 bases/sample X 2 ten hour runs/day) to be sequenced with the advantages of infrared technology. Enhancements to internal labeling techniques using an infrared-labeled dATP molecule (Boehringer Mannheim GmbH, Penzberg, Germany; Sequenase (U.S. Biochemical) have also been made. The inclusion of glycerol in the sequencing reactions yields greatly improved results for some primer and template combinations. The inclusion of (alpha) -Thio-dNTP's in the labeling reaction increases signal intensity two- to three-fold.

  9. Protecting genomic sequence anonymity with generalization lattices.

    Science.gov (United States)

    Malin, B A

    2005-01-01

    Current genomic privacy technologies assume the identity of genomic sequence data is protected if personal information, such as demographics, are obscured, removed, or encrypted. While demographic features can directly compromise an individual's identity, recent research demonstrates such protections are insufficient because sequence data itself is susceptible to re-identification. To counteract this problem, we introduce an algorithm for anonymizing a collection of person-specific DNA sequences. The technique is termed DNA lattice anonymization (DNALA), and is based upon the formal privacy protection schema of k -anonymity. Under this model, it is impossible to observe or learn features that distinguish one genetic sequence from k-1 other entries in a collection. To maximize information retained in protected sequences, we incorporate a concept generalization lattice to learn the distance between two residues in a single nucleotide region. The lattice provides the most similar generalized concept for two residues (e.g. adenine and guanine are both purines). The method is tested and evaluated with several publicly available human population datasets ranging in size from 30 to 400 sequences. Our findings imply the anonymization schema is feasible for the protection of sequences privacy. The DNALA method is the first computational disclosure control technique for general DNA sequences. Given the computational nature of the method, guarantees of anonymity can be formally proven. There is room for improvement and validation, though this research provides the groundwork from which future researchers can construct genomics anonymization schemas tailored to specific datasharing scenarios.

  10. Gomphid DNA sequence data

    Data.gov (United States)

    U.S. Environmental Protection Agency — DNA sequence data for several genetic loci. This dataset is not publicly accessible because: It's already publicly available on GenBank. It can be accessed through...

  11. Yeast genome sequencing:

    DEFF Research Database (Denmark)

    Piskur, Jure; Langkjær, Rikke Breinhold

    2004-01-01

    For decades, unicellular yeasts have been general models to help understand the eukaryotic cell and also our own biology. Recently, over a dozen yeast genomes have been sequenced, providing the basis to resolve several complex biological questions. Analysis of the novel sequence data has shown...... of closely related species helps in gene annotation and to answer how many genes there really are within the genomes. Analysis of non-coding regions among closely related species has provided an example of how to determine novel gene regulatory sequences, which were previously difficult to analyse because...... they are short and degenerate and occupy different positions. Comparative genomics helps to understand the origin of yeasts and points out crucial molecular events in yeast evolutionary history, such as whole-genome duplication and horizontal gene transfer(s). In addition, the accumulating sequence data provide...

  12. Clinical evaluation of further-developed MRCP sequences in comparison with standard MRCP sequences

    International Nuclear Information System (INIS)

    Hundt, W.; Scheidler, J.; Reiser, M.; Petsch, R.

    2002-01-01

    The purpose of this study was the comparison of technically improved single-shot magnetic resonance cholangiopancreatography (MRCP) sequences with standard single-shot rapid acquisition with relaxation enhancement (RARE) and half-Fourier acquired single-shot turbo spin-echo (HASTE) sequences in evaluating the normal and abnormal biliary duct system. The bile duct system of 45 patients was prospectively investigated on a 1.5-T MRI system. The investigation was performed with RARE and HASTE MR cholangiography sequences with standard and high spatial resolutions, and with a delayed-echo half-Fourier RARE (HASTE) sequence. Findings of the improved MRCP sequences were compared with the standard MRCP sequences. The level of confidence in assessing the diagnosis was divided into five groups. The Wilcoxon signed-rank test at a level of p<0.05 was applied. In 15 patients no pathology was found. The MRCP showed stenoses of the bile duct system in 10 patients and choledocholithiasis and cholecystolithiasis in 16 patients. In 12 patients a dilatation of the bile duct system was found. Comparison of the low- and high spatial resolution sequences and the short and long TE times of the half-Fourier RARE (HASTE) sequence revealed no statistically significant differences regarding accuracy of the examination. The diagnostic confidence level in assessing normal or pathological findings for the high-resolution RARE and half-Fourier RARE (HASTE) was significantly better than for the standard sequences. For the delayed-echo half-Fourier RARE (HASTE) sequence no statistically significant difference was seen. The high-resolution RARE and half-Fourier RARE (HASTE) sequences had a higher confidence level, but there was no significant difference in diagnosis in terms of detection and assessment of pathological changes in the biliary duct system compared with standard sequences. (orig.)

  13. Dynamic Sequence Assignment.

    Science.gov (United States)

    1983-12-01

    D-136 548 DYNAMIIC SEQUENCE ASSIGNMENT(U) ADVANCED INFORMATION AND 1/2 DECISION SYSTEMS MOUNTAIN YIELW CA C A 0 REILLY ET AL. UNCLSSIIED DEC 83 AI/DS...I ADVANCED INFORMATION & DECISION SYSTEMS Mountain View. CA 94040 84 u ,53 V,..’. Unclassified _____ SCURITY CLASSIFICATION OF THIS PAGE REPORT...reviews some important heuristic algorithms developed for fas- ter solution of the sequence assignment problem. 3.1. DINAMIC MOGRAMUNIG FORMULATION FOR

  14. General LTE Sequence

    OpenAIRE

    Billal, Masum

    2015-01-01

    In this paper,we have characterized sequences which maintain the same property described in Lifting the Exponent Lemma. Lifting the Exponent Lemma is a very powerful tool in olympiad number theory and recently it has become very popular. We generalize it to all sequences that maintain a property like it i.e. if p^{\\alpha}||a_k and p^\\b{eta}||n, then p^{{\\alpha}+\\b{eta}}||a_{nk}.

  15. Probabilistic Motor Sequence Yields Greater Offline and Less Online Learning than Fixed Sequence.

    Science.gov (United States)

    Du, Yue; Prashad, Shikha; Schoenbrun, Ilana; Clark, Jane E

    2016-01-01

    It is well acknowledged that motor sequences can be learned quickly through online learning. Subsequently, the initial acquisition of a motor sequence is boosted or consolidated by offline learning. However, little is known whether offline learning can drive the fast learning of motor sequences (i.e., initial sequence learning in the first training session). To examine offline learning in the fast learning stage, we asked four groups of young adults to perform the serial reaction time (SRT) task with either a fixed or probabilistic sequence and with or without preliminary knowledge (PK) of the presence of a sequence. The sequence and PK were manipulated to emphasize either procedural (probabilistic sequence; no preliminary knowledge (NPK)) or declarative (fixed sequence; with PK) memory that were found to either facilitate or inhibit offline learning. In the SRT task, there were six learning blocks with a 2 min break between each consecutive block. Throughout the session, stimuli followed the same fixed or probabilistic pattern except in Block 5, in which stimuli appeared in a random order. We found that PK facilitated the learning of a fixed sequence, but not a probabilistic sequence. In addition to overall learning measured by the mean reaction time (RT), we examined the progressive changes in RT within and between blocks (i.e., online and offline learning, respectively). It was found that the two groups who performed the fixed sequence, regardless of PK, showed greater online learning than the other two groups who performed the probabilistic sequence. The groups who performed the probabilistic sequence, regardless of PK, did not display online learning, as indicated by a decline in performance within the learning blocks. However, they did demonstrate remarkably greater offline improvement in RT, which suggests that they are learning the probabilistic sequence offline. These results suggest that in the SRT task, the fast acquisition of a motor sequence is driven

  16. Pairwise Sequence Alignment Library

    Energy Technology Data Exchange (ETDEWEB)

    2015-05-20

    Vector extensions, such as SSE, have been part of the x86 CPU since the 1990s, with applications in graphics, signal processing, and scientific applications. Although many algorithms and applications can naturally benefit from automatic vectorization techniques, there are still many that are difficult to vectorize due to their dependence on irregular data structures, dense branch operations, or data dependencies. Sequence alignment, one of the most widely used operations in bioinformatics workflows, has a computational footprint that features complex data dependencies. The trend of widening vector registers adversely affects the state-of-the-art sequence alignment algorithm based on striped data layouts. Therefore, a novel SIMD implementation of a parallel scan-based sequence alignment algorithm that can better exploit wider SIMD units was implemented as part of the Parallel Sequence Alignment Library (parasail). Parasail features: Reference implementations of all known vectorized sequence alignment approaches. Implementations of Smith Waterman (SW), semi-global (SG), and Needleman Wunsch (NW) sequence alignment algorithms. Implementations across all modern CPU instruction sets including AVX2 and KNC. Language interfaces for C/C++ and Python.

  17. Adaptive Processing for Sequence Alignment

    KAUST Repository

    Zidan, Mohammed A.; Bonny, Talal; Salama, Khaled N.

    2012-01-01

    Disclosed are various embodiments for adaptive processing for sequence alignment. In one embodiment, among others, a method includes obtaining a query sequence and a plurality of database sequences. A first portion of the plurality of database sequences is distributed to a central processing unit (CPU) and a second portion of the plurality of database sequences is distributed to a graphical processing unit (GPU) based upon a predetermined splitting ratio associated with the plurality of database sequences, where the database sequences of the first portion are shorter than the database sequences of the second portion. A first alignment score for the query sequence is determined with the CPU based upon the first portion of the plurality of database sequences and a second alignment score for the query sequence is determined with the GPU based upon the second portion of the plurality of database sequences.

  18. Adaptive Processing for Sequence Alignment

    KAUST Repository

    Zidan, Mohammed A.

    2012-01-26

    Disclosed are various embodiments for adaptive processing for sequence alignment. In one embodiment, among others, a method includes obtaining a query sequence and a plurality of database sequences. A first portion of the plurality of database sequences is distributed to a central processing unit (CPU) and a second portion of the plurality of database sequences is distributed to a graphical processing unit (GPU) based upon a predetermined splitting ratio associated with the plurality of database sequences, where the database sequences of the first portion are shorter than the database sequences of the second portion. A first alignment score for the query sequence is determined with the CPU based upon the first portion of the plurality of database sequences and a second alignment score for the query sequence is determined with the GPU based upon the second portion of the plurality of database sequences.

  19. Exploration of noncoding sequences in metagenomes.

    Directory of Open Access Journals (Sweden)

    Fabián Tobar-Tosse

    Full Text Available Environment-dependent genomic features have been defined for different metagenomes, whose genes and their associated processes are related to specific environments. Identification of ORFs and their functional categories are the most common methods for association between functional and environmental features. However, this analysis based on finding ORFs misses noncoding sequences and, therefore, some metagenome regulatory or structural information could be discarded. In this work we analyzed 23 whole metagenomes, including coding and noncoding sequences using the following sequence patterns: (G+C content, Codon Usage (Cd, Trinucleotide Usage (Tn, and functional assignments for ORF prediction. Herein, we present evidence of a high proportion of noncoding sequences discarded in common similarity-based methods in metagenomics, and the kind of relevant information present in those. We found a high density of trinucleotide repeat sequences (TRS in noncoding sequences, with a regulatory and adaptive function for metagenome communities. We present associations between trinucleotide values and gene function, where metagenome clustering correlate with microorganism adaptations and kinds of metagenomes. We propose here that noncoding sequences have relevant information to describe metagenomes that could be considered in a whole metagenome analysis in order to improve their organization, classification protocols, and their relation with the environment.

  20. Main sequence mass loss

    International Nuclear Information System (INIS)

    Brunish, W.M.; Guzik, J.A.; Willson, L.A.; Bowen, G.

    1987-01-01

    It has been hypothesized that variable stars may experience mass loss, driven, at least in part, by oscillations. The class of stars we are discussing here are the δ Scuti variables. These are variable stars with masses between about 1.2 and 2.25 M/sub θ/, lying on or very near the main sequence. According to this theory, high rotation rates enhance the rate of mass loss, so main sequence stars born in this mass range would have a range of mass loss rates, depending on their initial rotation velocity and the amplitude of the oscillations. The stars would evolve rapidly down the main sequence until (at about 1.25 M/sub θ/) a surface convection zone began to form. The presence of this convective region would slow the rotation, perhaps allowing magnetic braking to occur, and thus sharply reduce the mass loss rate. 7 refs

  1. High resolution sequence stratigraphy in China

    International Nuclear Information System (INIS)

    Zhang Shangfeng; Zhang Changmin; Yin Yanshi; Yin Taiju

    2008-01-01

    Since high resolution sequence stratigraphy was introduced into China by DENG Hong-wen in 1995, it has been experienced two development stages in China which are the beginning stage of theory research and development of theory research and application, and the stage of theoretical maturity and widely application that is going into. It is proved by practices that high resolution sequence stratigraphy plays more and more important roles in the exploration and development of oil and gas in Chinese continental oil-bearing basin and the research field spreads to the exploration of coal mine, uranium mine and other strata deposits. However, the theory of high resolution sequence stratigraphy still has some shortages, it should be improved in many aspects. The authors point out that high resolution sequence stratigraphy should be characterized quantitatively and modelized by computer techniques. (authors)

  2. Sequence finishing and mapping of Drosophila melanogasterheterochromatin

    Energy Technology Data Exchange (ETDEWEB)

    Hoskins, Roger A.; Carlson, Joseph W.; Kennedy, Cameron; Acevedo,David; Evans-Holm, Martha; Frise, Erwin; Wan, Kenneth H.; Park, Soo; Mendez-Lago, Maria; Rossi, Fabrizio; Villasante, Alfredo; Dimitri,Patrizio; Karpen, Gary H.; Celniker, Susan E.

    2007-06-15

    Genome sequences for most metazoans are incomplete due tothe presence of repeated DNA in the pericentromeric heterochromatin. Theheterochromatic regions of D. melanogaster contain 20 Mb of sequenceamenable to mapping, sequence assembly and finishing. Here we describethe generation of 15 Mb of finished or improved heterochromatic sequenceusing available clone resources and assembly and mapping methods. We alsoconstructed a BAC-based physical map that spans approximately 13 Mb ofthe pericentromeric heterochromatin, and a cytogenetic map that positionsapproximately 11 Mb of BAC contigs and sequence scaffolds in specificchromosomal locations. The integrated sequence assembly and maps greatlyimprove our understanding of the structure and composition of this poorlyunderstood fraction of a metazoan genome and provide a framework forfunctional analyses.

  3. Electricity sequence control

    International Nuclear Information System (INIS)

    Shin, Heung Ryeol

    2010-03-01

    The contents of the book are introduction of control system, like classification and control signal, introduction of electricity power switch, such as push-button and detection switch sensor for induction type and capacitance type machinery for control, solenoid valve, expression of sequence and type of electricity circuit about using diagram, time chart, marking and term, logic circuit like Yes, No, and, or and equivalence logic, basic electricity circuit, electricity sequence control, added condition, special program control about choice and jump of program, motor control, extra circuit on repeat circuit, pause circuit in a conveyer, safety regulations and rule about classification of electricity disaster and protective device for insulation.

  4. Next-generation sequencing

    DEFF Research Database (Denmark)

    Rieneck, Klaus; Bak, Mads; Jønson, Lars

    2013-01-01

    , Illumina); several millions of PCR sequences were analyzed. RESULTS: The results demonstrated the feasibility of diagnosing the fetal KEL1 or KEL2 blood group from cell-free DNA purified from maternal plasma. CONCLUSION: This method requires only one primer pair, and the large amount of sequence...... information obtained allows well for statistical analysis of the data. This general approach can be integrated into current laboratory practice and has numerous applications. Besides DNA-based predictions of blood group phenotypes, platelet phenotypes, or sickle cell anemia, and the determination of zygosity...

  5. The Release 6 reference sequence of the Drosophila melanogaster genome.

    Science.gov (United States)

    Hoskins, Roger A; Carlson, Joseph W; Wan, Kenneth H; Park, Soo; Mendez, Ivonne; Galle, Samuel E; Booth, Benjamin W; Pfeiffer, Barret D; George, Reed A; Svirskas, Robert; Krzywinski, Martin; Schein, Jacqueline; Accardo, Maria Carmela; Damia, Elisabetta; Messina, Giovanni; Méndez-Lago, María; de Pablos, Beatriz; Demakova, Olga V; Andreyeva, Evgeniya N; Boldyreva, Lidiya V; Marra, Marco; Carvalho, A Bernardo; Dimitri, Patrizio; Villasante, Alfredo; Zhimulev, Igor F; Rubin, Gerald M; Karpen, Gary H; Celniker, Susan E

    2015-03-01

    Drosophila melanogaster plays an important role in molecular, genetic, and genomic studies of heredity, development, metabolism, behavior, and human disease. The initial reference genome sequence reported more than a decade ago had a profound impact on progress in Drosophila research, and improving the accuracy and completeness of this sequence continues to be important to further progress. We previously described improvement of the 117-Mb sequence in the euchromatic portion of the genome and 21 Mb in the heterochromatic portion, using a whole-genome shotgun assembly, BAC physical mapping, and clone-based finishing. Here, we report an improved reference sequence of the single-copy and middle-repetitive regions of the genome, produced using cytogenetic mapping to mitotic and polytene chromosomes, clone-based finishing and BAC fingerprint verification, ordering of scaffolds by alignment to cDNA sequences, incorporation of other map and sequence data, and validation by whole-genome optical restriction mapping. These data substantially improve the accuracy and completeness of the reference sequence and the order and orientation of sequence scaffolds into chromosome arm assemblies. Representation of the Y chromosome and other heterochromatic regions is particularly improved. The new 143.9-Mb reference sequence, designated Release 6, effectively exhausts clone-based technologies for mapping and sequencing. Highly repeat-rich regions, including large satellite blocks and functional elements such as the ribosomal RNA genes and the centromeres, are largely inaccessible to current sequencing and assembly methods and remain poorly represented. Further significant improvements will require sequencing technologies that do not depend on molecular cloning and that produce very long reads. © 2015 Hoskins et al.; Published by Cold Spring Harbor Laboratory Press.

  6. Biological sequence analysis

    DEFF Research Database (Denmark)

    Durbin, Richard; Eddy, Sean; Krogh, Anders Stærmose

    This book provides an up-to-date and tutorial-level overview of sequence analysis methods, with particular emphasis on probabilistic modelling. Discussed methods include pairwise alignment, hidden Markov models, multiple alignment, profile searches, RNA secondary structure analysis, and phylogene...

  7. THE RHIC SEQUENCER

    International Nuclear Information System (INIS)

    VAN ZEIJTS, J.; DOTTAVIO, T.; FRAK, B.; MICHNOFF, R.

    2001-01-01

    The Relativistic Heavy Ion Collider (RHIC) has a high level asynchronous time-line driven by a controlling program called the ''Sequencer''. Most high-level magnet and beam related issues are orchestrated by this system. The system also plays an important task in coordinated data acquisition and saving. We present the program, operator interface, operational impact and experience

  8. Twin anemia polycythemia sequence

    NARCIS (Netherlands)

    Slaghekke, Femke

    2014-01-01

    In this thesis we describe that Twin Anemia Polycythemia Sequence (TAPS) is a form of chronic feto-fetal transfusion in monochorionic (identical) twins based on a small amount of blood transfusion through very small anastomoses. For the antenatal diagnosis of TAPS, Middle Cerebral Artery – Peak

  9. simple sequence repeat (SSR)

    African Journals Online (AJOL)

    In the present study, 78 mapped simple sequence repeat (SSR) markers representing 11 linkage groups of adzuki bean were evaluated for transferability to mungbean and related Vigna spp. 41 markers amplified characteristic bands in at least one Vigna species. The transferability percentage across the genotypes ranged ...

  10. Improvement of an enzyme-linked immunosorbent assay for equine herpesvirus type 4 by using a synthetic-peptide 24-mer repeat sequence of glycoprotein G as an antigen

    Science.gov (United States)

    BANNAI, Hiroshi; NEMOTO, Manabu; TSUJIMURA, Koji; YAMANAKA, Takashi; MAEDA, Ken; KONDO, Takashi

    2015-01-01

    To increase the sensitivity of an enzyme-linked immunosorbent assay (ELISA) for equine herpesvirus type 4 (EHV-4) that uses a 12-mer peptide of glycoprotein G (gG4-12-mer: MKNNPIYSEGSL) [4], we used a longer peptide consisting of a 24-mer repeat sequence (gG4-24-mer: MKNNPIYSEGSLMLNVQHDDSIHT) as an antigen. Sera of horses experimentally infected with EHV-4 reacted much more strongly to the gG4-24-mer peptide than to the gG4-12-mer peptide. We used peptide ELISAs to test paired sera from horses naturally infected with EHV-4 (n=40). gG4-24-mer ELISA detected 37 positive samples (92.5%), whereas gG4-12-mer ELISA detected only 28 (70.0%). gG4-24-mer ELISA was much more sensitive than gG4-12-mer ELISA. PMID:26424485

  11. Improvement of an enzyme-linked immunosorbent assay for equine herpesvirus type 4 by using a synthetic-peptide 24-mer repeat sequence of glycoprotein G as an antigen.

    Science.gov (United States)

    Bannai, Hiroshi; Nemoto, Manabu; Tsujimura, Koji; Yamanaka, Takashi; Maeda, Ken; Kondo, Takashi

    2016-02-01

    To increase the sensitivity of an enzyme-linked immunosorbent assay (ELISA) for equine herpesvirus type 4 (EHV-4) that uses a 12-mer peptide of glycoprotein G (gG4-12-mer: MKNNPIYSEGSL) [4], we used a longer peptide consisting of a 24-mer repeat sequence (gG4-24-mer: MKNNPIYSEGSLMLNVQHDDSIHT) as an antigen. Sera of horses experimentally infected with EHV-4 reacted much more strongly to the gG4-24-mer peptide than to the gG4-12-mer peptide. We used peptide ELISAs to test paired sera from horses naturally infected with EHV-4 (n=40). gG4-24-mer ELISA detected 37 positive samples (92.5%), whereas gG4-12-mer ELISA detected only 28 (70.0%). gG4-24-mer ELISA was much more sensitive than gG4-12-mer ELISA.

  12. Targeted sequencing of plant genomes

    Science.gov (United States)

    Mark D. Huynh

    2014-01-01

    Next-generation sequencing (NGS) has revolutionized the field of genetics by providing a means for fast and relatively affordable sequencing. With the advancement of NGS, wholegenome sequencing (WGS) has become more commonplace. However, sequencing an entire genome is still not cost effective or even beneficial in all cases. In studies that do not require a whole-...

  13. Almost convergence of triple sequences

    OpenAIRE

    Ayhan Esi; M.Necdet Catalbas

    2013-01-01

    In this paper we introduce and study the concepts of almost convergence and almost Cauchy for triple sequences. Weshow that the set of almost convergent triple sequences of 0's and 1's is of the first category and also almost everytriple sequence of 0's and 1's is not almost convergent.Keywords: almost convergence, P-convergent, triple sequence.

  14. A few Smarandache Integer Sequences

    OpenAIRE

    Ibstedt, Henry

    2010-01-01

    This paper deals with the analysis of a few Smarandache Integer Sequences which first appeared in Properties or the Numbers, F. Smarandache, University or Craiova Archives, 1975. The first four sequences are recurrence generated sequences while the last three are concatenation sequences.

  15. Complex Sequencing Problems and Local Search Heuristics

    NARCIS (Netherlands)

    Brucker, P.; Hurink, Johann L.; Osman, I.H.; Kelly, J.P.

    1996-01-01

    Many problems can be formulated as complex sequencing problems. We will present problems in flexible manufacturing that have such a formulation and apply local search methods like iterative improvement, simulated annealing and tabu search to solve these problems. Computational results are reported.

  16. Allele Re-sequencing Technologies

    DEFF Research Database (Denmark)

    Byrne, Stephen; Farrell, Jacqueline Danielle; Asp, Torben

    2013-01-01

    The development of next-generation sequencing technologies has made sequencing an affordable approach for detection of genetic variations associated with various traits. However, the cost of whole genome re-sequencing still remains too high to be feasible for many plant species with large...... alternative to whole genome re-sequencing to identify causative genetic variations in plants. One challenge, however, will be efficient bioinformatics strategies for data handling and analysis from the increasing amount of sequence information....

  17. Variable depth recursion algorithm for leaf sequencing

    International Nuclear Information System (INIS)

    Siochi, R. Alfredo C.

    2007-01-01

    The processes of extraction and sweep are basic segmentation steps that are used in leaf sequencing algorithms. A modified version of a commercial leaf sequencer changed the way that the extracts are selected and expanded the search space, but the modification maintained the basic search paradigm of evaluating multiple solutions, each one consisting of up to 12 extracts and a sweep sequence. While it generated the best solutions compared to other published algorithms, it used more computation time. A new, faster algorithm selects one extract at a time but calls itself as an evaluation function a user-specified number of times, after which it uses the bidirectional sweeping window algorithm as the final evaluation function. To achieve a performance comparable to that of the modified commercial leaf sequencer, 2-3 calls were needed, and in all test cases, there were only slight improvements beyond two calls. For the 13 clinical test maps, computation speeds improved by a factor between 12 and 43, depending on the constraints, namely the ability to interdigitate and the avoidance of the tongue-and-groove under dose. The new algorithm was compared to the original and modified versions of the commercial leaf sequencer. It was also compared to other published algorithms for 1400, random, 15x15, test maps with 3-16 intensity levels. In every single case the new algorithm provided the best solution

  18. Multilocus Sequence Typing

    OpenAIRE

    Belén, Ana; Pavón, Ibarz; Maiden, Martin C.J.

    2009-01-01

    Multilocus sequence typing (MLST) was first proposed in 1998 as a typing approach that enables the unambiguous characterization of bacterial isolates in a standardized, reproducible, and portable manner using the human pathogen Neisseria meningitidis as the exemplar organism. Since then, the approach has been applied to a large and growing number of organisms by public health laboratories and research institutions. MLST data, shared by investigators over the world via the Internet, have been ...

  19. Achalasia Carcinoma Sequence

    OpenAIRE

    Makmun, Dadang

    2001-01-01

    We report a case of carcinoma of the esophagus in a 58 years old woman with achalasia, who has been diagnosed since 30 years ago, which initiated by surgical treatment (myotomy) and the symptoms recurred since 3 years ago. According to the progress of the disease, Malignancy was strongly suspected due to prolonged stasis and mucosal irritation caused by achalasia (achalasia carcinoma sequence). Because of these contributing factors for the development of serious complications such as Malignan...

  20. Sequencing BPS spectra

    Energy Technology Data Exchange (ETDEWEB)

    Gukov, Sergei [Walter Burke Institute for Theoretical Physics, California Institute of Technology,1200 E California Blvd, Pasadena, CA 91125 (United States); Max-Planck-Institut für Mathematik,Vivatsgasse 7, D-53111 Bonn (Germany); Nawata, Satoshi [Walter Burke Institute for Theoretical Physics, California Institute of Technology,1200 E California Blvd, Pasadena, CA 91125 (United States); Centre for Quantum Geometry of Moduli Spaces, University of Aarhus,Nordre Ringgade 1, DK-8000 (Denmark); Saberi, Ingmar [Walter Burke Institute for Theoretical Physics, California Institute of Technology,1200 E California Blvd, Pasadena, CA 91125 (United States); Stošić, Marko [CAMGSD, Departamento de Matemática, Instituto Superior Técnico,Av. Rovisco Pais, 1049-001 Lisbon (Portugal); Mathematical Institute SANU,Knez Mihajlova 36, 11000 Belgrade (Serbia); Sułkowski, Piotr [Walter Burke Institute for Theoretical Physics, California Institute of Technology,1200 E California Blvd, Pasadena, CA 91125 (United States); Faculty of Physics, University of Warsaw,ul. Pasteura 5, 02-093 Warsaw (Poland)

    2016-03-02

    This paper provides both a detailed study of color-dependence of link homologies, as realized in physics as certain spaces of BPS states, and a broad study of the behavior of BPS states in general. We consider how the spectrum of BPS states varies as continuous parameters of a theory are perturbed. This question can be posed in a wide variety of physical contexts, and we answer it by proposing that the relationship between unperturbed and perturbed BPS spectra is described by a spectral sequence. These general considerations unify previous applications of spectral sequence techniques to physics, and explain from a physical standpoint the appearance of many spectral sequences relating various link homology theories to one another. We also study structural properties of colored HOMFLY homology for links and evaluate Poincaré polynomials in numerous examples. Among these structural properties is a novel “sliding” property, which can be explained by using (refined) modular S-matrix. This leads to the identification of modular transformations in Chern-Simons theory and 3d N=2 theory via the 3d/3d correspondence. Lastly, we introduce the notion of associated varieties as classical limits of recursion relations of colored superpolynomials of links, and study their properties.

  1. Sequencing BPS spectra

    International Nuclear Information System (INIS)

    Gukov, Sergei; Nawata, Satoshi; Saberi, Ingmar; Stošić, Marko; Sułkowski, Piotr

    2016-01-01

    This paper provides both a detailed study of color-dependence of link homologies, as realized in physics as certain spaces of BPS states, and a broad study of the behavior of BPS states in general. We consider how the spectrum of BPS states varies as continuous parameters of a theory are perturbed. This question can be posed in a wide variety of physical contexts, and we answer it by proposing that the relationship between unperturbed and perturbed BPS spectra is described by a spectral sequence. These general considerations unify previous applications of spectral sequence techniques to physics, and explain from a physical standpoint the appearance of many spectral sequences relating various link homology theories to one another. We also study structural properties of colored HOMFLY homology for links and evaluate Poincaré polynomials in numerous examples. Among these structural properties is a novel “sliding” property, which can be explained by using (refined) modular S-matrix. This leads to the identification of modular transformations in Chern-Simons theory and 3d N=2 theory via the 3d/3d correspondence. Lastly, we introduce the notion of associated varieties as classical limits of recursion relations of colored superpolynomials of links, and study their properties.

  2. Image sequence analysis

    CERN Document Server

    1981-01-01

    The processing of image sequences has a broad spectrum of important applica­ tions including target tracking, robot navigation, bandwidth compression of TV conferencing video signals, studying the motion of biological cells using microcinematography, cloud tracking, and highway traffic monitoring. Image sequence processing involves a large amount of data. However, because of the progress in computer, LSI, and VLSI technologies, we have now reached a stage when many useful processing tasks can be done in a reasonable amount of time. As a result, research and development activities in image sequence analysis have recently been growing at a rapid pace. An IEEE Computer Society Workshop on Computer Analysis of Time-Varying Imagery was held in Philadelphia, April 5-6, 1979. A related special issue of the IEEE Transactions on Pattern Anal­ ysis and Machine Intelligence was published in November 1980. The IEEE Com­ puter magazine has also published a special issue on the subject in 1981. The purpose of this book ...

  3. Complete genome sequence of Ikoma lyssavirus.

    Science.gov (United States)

    Marston, Denise A; Ellis, Richard J; Horton, Daniel L; Kuzmin, Ivan V; Wise, Emma L; McElhinney, Lorraine M; Banyard, Ashley C; Ngeleja, Chanasa; Keyyu, Julius; Cleaveland, Sarah; Lembo, Tiziana; Rupprecht, Charles E; Fooks, Anthony R

    2012-09-01

    Lyssaviruses (family Rhabdoviridae) constitute one of the most important groups of viral zoonoses globally. All lyssaviruses cause the disease rabies, an acute progressive encephalitis for which, once symptoms occur, there is no effective cure. Currently available vaccines are highly protective against the predominantly circulating lyssavirus species. Using next-generation sequencing technologies, we have obtained the whole-genome sequence for a novel lyssavirus, Ikoma lyssavirus (IKOV), isolated from an African civet in Tanzania displaying clinical signs of rabies. Genetically, this virus is the most divergent within the genus Lyssavirus. Characterization of the genome will help to improve our understanding of lyssavirus diversity and enable investigation into vaccine-induced immunity and protection.

  4. Combined use of susceptibility weighted magnetic resonance imaging sequences and dynamic susceptibility contrast perfusion weighted imaging to improve the accuracy of the differential diagnosis of recurrence and radionecrosis in high-grade glioma patients.

    Science.gov (United States)

    Kim, Tae-Hyung; Yun, Tae Jin; Park, Chul-Kee; Kim, Tae Min; Kim, Ji-Hoon; Sohn, Chul-Ho; Won, Jae Kyung; Park, Sung-Hye; Kim, Il Han; Choi, Seung Hong

    2017-03-21

    Purpose was to assess predictive power for overall survival (OS) and diagnostic performance of combination of susceptibility-weighted MRI sequences (SWMRI) and dynamic susceptibility contrast (DSC) perfusion-weighted imaging (PWI) for differentiation of recurrence and radionecrosis in high-grade glioma (HGG). We enrolled 51 patients who underwent radiation therapy or gamma knife surgeryfollowed by resection for HGG and who developed new measurable enhancement more than six months after complete response. The lesions were confirmed as recurrence (n = 32) or radionecrosis (n = 19). The mean and each percentile value from cumulative histograms of normalized CBV (nCBV) and proportion of dark signal intensity on SWMRI (proSWMRI, %) within enhancement were compared. Multivariate regression was performed for the best differentiator. The cutoff value of best predictor from ROC analysis was evaluated. OS was determined with Kaplan-Meier method and log-rank test. Recurrence showed significantly lower proSWMRI and higher mean nCBV and 90th percentile nCBV (nCBV90) than radionecrosis. Regression analysis revealed both nCBV90 and proSWMRI were independent differentiators. Combination of nCBV90 and proSWMRI achieved 71.9% sensitivity (23/32), 100% specificity (19/19) and 82.3% accuracy (42/51) using best cut-off values (nCBV90 > 2.07 and proSWMRI≤15.76%) from ROC analysis. In subgroup analysis, radionecrosis with nCBV > 2.07 (n = 5) showed obvious hemorrhage (proSWMRI > 32.9%). Patients with nCBV90 > 2.07 and proSWMRI≤15.76% had significantly shorter OS. In conclusion, compared with DSC PWI alone, combination of SWMRI and DSC PWI have potential to be prognosticator for OS and lower false positive rate in differentiation of recurrence and radionecrosis in HGG who develop new measurable enhancement more than six months after complete response.

  5. Foundations of Sequence-to-Sequence Modeling for Time Series

    OpenAIRE

    Kuznetsov, Vitaly; Mariet, Zelda

    2018-01-01

    The availability of large amounts of time series data, paired with the performance of deep-learning algorithms on a broad class of problems, has recently led to significant interest in the use of sequence-to-sequence models for time series forecasting. We provide the first theoretical analysis of this time series forecasting framework. We include a comparison of sequence-to-sequence modeling to classical time series models, and as such our theory can serve as a quantitative guide for practiti...

  6. Novel expressed sequence tag- simple sequence repeats (EST ...

    African Journals Online (AJOL)

    Using different bioinformatic criteria, the SUCEST database was used to mine for simple sequence repeat (SSR) markers. Among 42,189 clusters, 1,425 expressed sequence tag- simple sequence repeats (EST-SSRs) were identified in silico. Trinucleotide repeats were the most abundant SSRs detected. Of 212 primer pairs ...

  7. Infinite sequences and series

    CERN Document Server

    Knopp, Konrad

    1956-01-01

    One of the finest expositors in the field of modern mathematics, Dr. Konrad Knopp here concentrates on a topic that is of particular interest to 20th-century mathematicians and students. He develops the theory of infinite sequences and series from its beginnings to a point where the reader will be in a position to investigate more advanced stages on his own. The foundations of the theory are therefore presented with special care, while the developmental aspects are limited by the scope and purpose of the book. All definitions are clearly stated; all theorems are proved with enough detail to ma

  8. Levenshtein error-correcting barcodes for multiplexed DNA sequencing

    NARCIS (Netherlands)

    Buschmann, Tilo; Bystrykh, Leonid V.

    2013-01-01

    Background: High-throughput sequencing technologies are improving in quality, capacity and costs, providing versatile applications in DNA and RNA research. For small genomes or fraction of larger genomes, DNA samples can be mixed and loaded together on the same sequencing track. This so-called

  9. Single-cell sequencing to quantify genomic integrity in cancer

    NARCIS (Netherlands)

    van den Bos, Hilda; Bakker, Bjorn; Spierings, Diana C J; Lansdorp, Peter M; Foijer, Floris

    The use of single-cell DNA sequencing (sc-seq) techniques for the diagnosis, prognosis and treatment of cancer is a rapidly developing field. Sc-seq research is gaining momentum by decreased sequencing costs and continuous improvements in techniques. In this review, we provide an overview of recent

  10. 10KP: A phylodiverse genome sequencing plan

    Science.gov (United States)

    Cheng, Shifeng; Melkonian, Michael; Brockington, Samuel; Archibald, John M; Delaux, Pierre-Marc; Melkonian, Barbara; Mavrodiev, Evgeny V; Sun, Wenjing; Fu, Yuan; Yang, Huanming; Soltis, Douglas E; Graham, Sean W; Soltis, Pamela S; Liu, Xin; Xu, Xun

    2018-01-01

    Abstract Understanding plant evolution and diversity in a phylogenomic context is an enormous challenge due, in part, to limited availability of genome-scale data across phylodiverse species. The 10KP (10,000 Plants) Genome Sequencing Project will sequence and characterize representative genomes from every major clade of embryophytes, green algae, and protists (excluding fungi) within the next 5 years. By implementing and continuously improving leading-edge sequencing technologies and bioinformatics tools, 10KP will catalogue the genome content of plant and protist diversity and make these data freely available as an enduring foundation for future scientific discoveries and applications. 10KP is structured as an international consortium, open to the global community, including botanical gardens, plant research institutes, universities, and private industry. Our immediate goal is to establish a policy framework for this endeavor, the principles of which are outlined here. PMID:29618049

  11. 10KP: A phylodiverse genome sequencing plan.

    Science.gov (United States)

    Cheng, Shifeng; Melkonian, Michael; Smith, Stephen A; Brockington, Samuel; Archibald, John M; Delaux, Pierre-Marc; Li, Fay-Wei; Melkonian, Barbara; Mavrodiev, Evgeny V; Sun, Wenjing; Fu, Yuan; Yang, Huanming; Soltis, Douglas E; Graham, Sean W; Soltis, Pamela S; Liu, Xin; Xu, Xun; Wong, Gane Ka-Shu

    2018-03-01

    Understanding plant evolution and diversity in a phylogenomic context is an enormous challenge due, in part, to limited availability of genome-scale data across phylodiverse species. The 10KP (10,000 Plants) Genome Sequencing Project will sequence and characterize representative genomes from every major clade of embryophytes, green algae, and protists (excluding fungi) within the next 5 years. By implementing and continuously improving leading-edge sequencing technologies and bioinformatics tools, 10KP will catalogue the genome content of plant and protist diversity and make these data freely available as an enduring foundation for future scientific discoveries and applications. 10KP is structured as an international consortium, open to the global community, including botanical gardens, plant research institutes, universities, and private industry. Our immediate goal is to establish a policy framework for this endeavor, the principles of which are outlined here.

  12. Memory for sequences of events impaired in typical aging

    Science.gov (United States)

    Allen, Timothy A.; Morris, Andrea M.; Stark, Shauna M.; Fortin, Norbert J.

    2015-01-01

    Typical aging is associated with diminished episodic memory performance. To improve our understanding of the fundamental mechanisms underlying this age-related memory deficit, we previously developed an integrated, cross-species approach to link converging evidence from human and animal research. This novel approach focuses on the ability to remember sequences of events, an important feature of episodic memory. Unlike existing paradigms, this task is nonspatial, nonverbal, and can be used to isolate different cognitive processes that may be differentially affected in aging. Here, we used this task to make a comprehensive comparison of sequence memory performance between younger (18–22 yr) and older adults (62–86 yr). Specifically, participants viewed repeated sequences of six colored, fractal images and indicated whether each item was presented “in sequence” or “out of sequence.” Several out of sequence probe trials were used to provide a detailed assessment of sequence memory, including: (i) repeating an item from earlier in the sequence (“Repeats”; e.g., ABADEF), (ii) skipping ahead in the sequence (“Skips”; e.g., ABDDEF), and (iii) inserting an item from a different sequence into the same ordinal position (“Ordinal Transfers”; e.g., AB3DEF). We found that older adults performed as well as younger controls when tested on well-known and predictable sequences, but were severely impaired when tested using novel sequences. Importantly, overall sequence memory performance in older adults steadily declined with age, a decline not detected with other measures (RAVLT or BPS-O). We further characterized this deficit by showing that performance of older adults was severely impaired on specific probe trials that required detailed knowledge of the sequence (Skips and Ordinal Transfers), and was associated with a shift in their underlying mnemonic representation of the sequences. Collectively, these findings provide unambiguous evidence that the

  13. Fast global sequence alignment technique

    KAUST Repository

    Bonny, Mohamed Talal; Salama, Khaled N.

    2011-01-01

    fast alignment algorithm, called 'Alignment By Scanning' (ABS), to provide an approximate alignment of two DNA sequences. We compare our algorithm with the wellknown sequence alignment algorithms, the 'GAP' (which is heuristic) and the 'Needleman

  14. Next-Generation Sequencing Platforms

    Science.gov (United States)

    Mardis, Elaine R.

    2013-06-01

    Automated DNA sequencing instruments embody an elegant interplay among chemistry, engineering, software, and molecular biology and have built upon Sanger's founding discovery of dideoxynucleotide sequencing to perform once-unfathomable tasks. Combined with innovative physical mapping approaches that helped to establish long-range relationships between cloned stretches of genomic DNA, fluorescent DNA sequencers produced reference genome sequences for model organisms and for the reference human genome. New types of sequencing instruments that permit amazing acceleration of data-collection rates for DNA sequencing have been developed. The ability to generate genome-scale data sets is now transforming the nature of biological inquiry. Here, I provide an historical perspective of the field, focusing on the fundamental developments that predated the advent of next-generation sequencing instruments and providing information about how these instruments work, their application to biological research, and the newest types of sequencers that can extract data from single DNA molecules.

  15. Rapid Polymer Sequencer

    Science.gov (United States)

    Stolc, Viktor (Inventor); Brock, Matthew W (Inventor)

    2013-01-01

    Method and system for rapid and accurate determination of each of a sequence of unknown polymer components, such as nucleic acid components. A self-assembling monolayer of a selected substance is optionally provided on an interior surface of a pipette tip, and the interior surface is immersed in a selected liquid. A selected electrical field is impressed in a longitudinal direction, or in a transverse direction, in the tip region, a polymer sequence is passed through the tip region, and a change in an electrical current signal is measured as each polymer component passes through the tip region. Each of the measured changes in electrical current signals is compared with a database of reference electrical change signals, with each reference signal corresponding to an identified polymer component, to identify the unknown polymer component with a reference polymer component. The nanopore preferably has a pore inner diameter of no more than about 40 nm and is prepared by heating and pulling a very small section of a glass tubing.

  16. The advantages of SMRT sequencing

    OpenAIRE

    Roberts, Richard J; Carneiro, Mauricio O; Schatz, Michael C

    2013-01-01

    Of the current next-generation sequencing technologies, SMRT sequencing is sometimes overlooked. However, attributes such as long reads, modified base detection and high accuracy make SMRT a useful technology and an ideal approach to the complete sequencing of small genomes.

  17. Putting instruction sequences into effect

    NARCIS (Netherlands)

    Bergstra, J.A.

    2011-01-01

    An attempt is made to define the concept of execution of an instruction sequence. It is found to be a special case of directly putting into effect of an instruction sequence. Directly putting into effect of an instruction sequences comprises interpretation as well as execution. Directly putting into

  18. Region segmentation along image sequence

    International Nuclear Information System (INIS)

    Monchal, L.; Aubry, P.

    1995-01-01

    A method to extract regions in sequence of images is proposed. Regions are not matched from one image to the following one. The result of a region segmentation is used as an initialization to segment the following and image to track the region along the sequence. The image sequence is exploited as a spatio-temporal event. (authors). 12 refs., 8 figs

  19. Harnessing Whole Genome Sequencing in Medical Mycology.

    Science.gov (United States)

    Cuomo, Christina A

    2017-01-01

    Comparative genome sequencing studies of human fungal pathogens enable identification of genes and variants associated with virulence and drug resistance. This review describes current approaches, resources, and advances in applying whole genome sequencing to study clinically important fungal pathogens. Genomes for some important fungal pathogens were only recently assembled, revealing gene family expansions in many species and extreme gene loss in one obligate species. The scale and scope of species sequenced is rapidly expanding, leveraging technological advances to assemble and annotate genomes with higher precision. By using iteratively improved reference assemblies or those generated de novo for new species, recent studies have compared the sequence of isolates representing populations or clinical cohorts. Whole genome approaches provide the resolution necessary for comparison of closely related isolates, for example, in the analysis of outbreaks or sampled across time within a single host. Genomic analysis of fungal pathogens has enabled both basic research and diagnostic studies. The increased scale of sequencing can be applied across populations, and new metagenomic methods allow direct analysis of complex samples.

  20. Metal resistance sequences and transgenic plants

    Science.gov (United States)

    Meagher, Richard Brian; Summers, Anne O.; Rugh, Clayton L.

    1999-10-12

    The present invention provides nucleic acid sequences encoding a metal ion resistance protein, which are expressible in plant cells. The metal resistance protein provides for the enzymatic reduction of metal ions including but not limited to divalent Cu, divalent mercury, trivalent gold, divalent cadmium, lead ions and monovalent silver ions. Transgenic plants which express these coding sequences exhibit increased resistance to metal ions in the environment as compared with plants which have not been so genetically modified. Transgenic plants with improved resistance to organometals including alkylmercury compounds, among others, are provided by the further inclusion of plant-expressible organometal lyase coding sequences, as specifically exemplified by the plant-expressible merB coding sequence. Furthermore, these transgenic plants which have been genetically modified to express the metal resistance coding sequences of the present invention can participate in the bioremediation of metal contamination via the enzymatic reduction of metal ions. Transgenic plants resistant to organometals can further mediate remediation of organic metal compounds, for example, alkylmetal compounds including but not limited to methyl mercury, methyl lead compounds, methyl cadmium and methyl arsenic compounds, in the environment by causing the freeing of mercuric or other metal ions and the reduction of the ionic mercury or other metal ions to the less toxic elemental mercury or other metals.

  1. Whole-Genome Sequences of Thirteen Isolates of Borrelia burgdorferi

    Energy Technology Data Exchange (ETDEWEB)

    Schutzer S. E.; Dunn J.; Fraser-Liggett, C. M.; Casjens, S. R.; Qiu, W.-G.; Mongodin, E. F.; Luft, B. J.

    2011-02-01

    Borrelia burgdorferi is a causative agent of Lyme disease in North America and Eurasia. The first complete genome sequence of B. burgdorferi strain 31, available for more than a decade, has assisted research on the pathogenesis of Lyme disease. Because a single genome sequence is not sufficient to understand the relationship between genotypic and geographic variation and disease phenotype, we determined the whole-genome sequences of 13 additional B. burgdorferi isolates that span the range of natural variation. These sequences should allow improved understanding of pathogenesis and provide a foundation for novel detection, diagnosis, and prevention strategies.

  2. Log-balanced combinatorial sequences

    Directory of Open Access Journals (Sweden)

    Tomislav Došlic

    2005-01-01

    Full Text Available We consider log-convex sequences that satisfy an additional constraint imposed on their rate of growth. We call such sequences log-balanced. It is shown that all such sequences satisfy a pair of double inequalities. Sufficient conditions for log-balancedness are given for the case when the sequence satisfies a two- (or more- term linear recurrence. It is shown that many combinatorially interesting sequences belong to this class, and, as a consequence, that the above-mentioned double inequalities are valid for all of them.

  3. New MR pulse sequence

    International Nuclear Information System (INIS)

    Harms, S.E.; Flamig, D.P.; Griffey, R.H.

    1990-01-01

    This paper describes a method for fat suppression for three-dimensional MR imaging. The FATS (fat-suppressed acquisition with echo time shortened) sequence employs a pair of opposing adiabatic half-passage RF pulses tuned on fat resonance. The imaging parameters are as follows: TR, 20 msec; TE, 21.7-3.2 msec; 1,024 x 128 x 128 acquired matrix; imaging time, approximately 11 minutes. A series of 54 examinations were performed. Excellent fat suppression with water excitation is achieved in all cases. The orbital images demonstrate superior resolution of small orbital lesions. The high signal-to-noise ratio (SNR) in cranial studies demonstrates excellent petrous bone and internal auditory canal anatomy

  4. Insights from 20 years of bacterial genome sequencing

    DEFF Research Database (Denmark)

    Land, Miriam; Hauser, Loren; Jun, Se-Ran

    2015-01-01

    Since the first two complete bacterial genome sequences were published in 1995, the science of bacteria has dramatically changed. Using third-generation DNA sequencing, it is possible to completely sequence a bacterial genome in a few hours and identify some types of methylation sites along...... the genome as well. Sequencing of bacterial genome sequences is now a standard procedure, and the information from tens of thousands of bacterial genomes has had a major impact on our views of the bacterial world. In this review, we explore a series of questions to highlight some insights that comparative...... genomics has produced. To date, there are genome sequences available from 50 different bacterial phyla and 11 different archaeal phyla. However, the distribution is quite skewed towards a few phyla that contain model organisms. But the breadth is continuing to improve, with projects dedicated to filling...

  5. Implicitly Defined Neural Networks for Sequence Labeling

    Science.gov (United States)

    2017-07-31

    ularity has soared for the Long Short - Term Memory (LSTM) (Hochreiter and Schmidhuber, 1997) and vari- ants such as Gated Recurrent Unit (GRU) (Cho et...610. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short - term memory . Neural computation 9(8):1735– 1780. Zhiheng Huang, Wei Xu, and Kai Yu. 2015...network are coupled together, in order to improve perfor- mance on complex, long -range dependencies in either direction of a sequence. We contrast our

  6. BLAST and FASTA similarity searching for multiple sequence alignment.

    Science.gov (United States)

    Pearson, William R

    2014-01-01

    BLAST, FASTA, and other similarity searching programs seek to identify homologous proteins and DNA sequences based on excess sequence similarity. If two sequences share much more similarity than expected by chance, the simplest explanation for the excess similarity is common ancestry-homology. The most effective similarity searches compare protein sequences, rather than DNA sequences, for sequences that encode proteins, and use expectation values, rather than percent identity, to infer homology. The BLAST and FASTA packages of sequence comparison programs provide programs for comparing protein and DNA sequences to protein databases (the most sensitive searches). Protein and translated-DNA comparisons to protein databases routinely allow evolutionary look back times from 1 to 2 billion years; DNA:DNA searches are 5-10-fold less sensitive. BLAST and FASTA can be run on popular web sites, but can also be downloaded and installed on local computers. With local installation, target databases can be customized for the sequence data being characterized. With today's very large protein databases, search sensitivity can also be improved by searching smaller comprehensive databases, for example, a complete protein set from an evolutionarily neighboring model organism. By default, BLAST and FASTA use scoring strategies target for distant evolutionary relationships; for comparisons involving short domains or queries, or searches that seek relatively close homologs (e.g. mouse-human), shallower scoring matrices will be more effective. Both BLAST and FASTA provide very accurate statistical estimates, which can be used to reliably identify protein sequences that diverged more than 2 billion years ago.

  7. Quantum-Sequencing: Fast electronic single DNA molecule sequencing

    Science.gov (United States)

    Casamada Ribot, Josep; Chatterjee, Anushree; Nagpal, Prashant

    2014-03-01

    A major goal of third-generation sequencing technologies is to develop a fast, reliable, enzyme-free, high-throughput and cost-effective, single-molecule sequencing method. Here, we present the first demonstration of unique ``electronic fingerprint'' of all nucleotides (A, G, T, C), with single-molecule DNA sequencing, using Quantum-tunneling Sequencing (Q-Seq) at room temperature. We show that the electronic state of the nucleobases shift depending on the pH, with most distinct states identified at acidic pH. We also demonstrate identification of single nucleotide modifications (methylation here). Using these unique electronic fingerprints (or tunneling data), we report a partial sequence of beta lactamase (bla) gene, which encodes resistance to beta-lactam antibiotics, with over 95% success rate. These results highlight the potential of Q-Seq as a robust technique for next-generation sequencing.

  8. Universal sequence map (USM of arbitrary discrete sequences

    Directory of Open Access Journals (Sweden)

    Almeida Jonas S

    2002-02-01

    Full Text Available Abstract Background For over a decade the idea of representing biological sequences in a continuous coordinate space has maintained its appeal but not been fully realized. The basic idea is that any sequence of symbols may define trajectories in the continuous space conserving all its statistical properties. Ideally, such a representation would allow scale independent sequence analysis – without the context of fixed memory length. A simple example would consist on being able to infer the homology between two sequences solely by comparing the coordinates of any two homologous units. Results We have successfully identified such an iterative function for bijective mappingψ of discrete sequences into objects of continuous state space that enable scale-independent sequence analysis. The technique, named Universal Sequence Mapping (USM, is applicable to sequences with an arbitrary length and arbitrary number of unique units and generates a representation where map distance estimates sequence similarity. The novel USM procedure is based on earlier work by these and other authors on the properties of Chaos Game Representation (CGR. The latter enables the representation of 4 unit type sequences (like DNA as an order free Markov Chain transition table. The properties of USM are illustrated with test data and can be verified for other data by using the accompanying web-based tool:http://bioinformatics.musc.edu/~jonas/usm/. Conclusions USM is shown to enable a statistical mechanics approach to sequence analysis. The scale independent representation frees sequence analysis from the need to assume a memory length in the investigation of syntactic rules.

  9. Functional region prediction with a set of appropriate homologous sequences-an index for sequence selection by integrating structure and sequence information with spatial statistics

    Science.gov (United States)

    2012-01-01

    Background The detection of conserved residue clusters on a protein structure is one of the effective strategies for the prediction of functional protein regions. Various methods, such as Evolutionary Trace, have been developed based on this strategy. In such approaches, the conserved residues are identified through comparisons of homologous amino acid sequences. Therefore, the selection of homologous sequences is a critical step. It is empirically known that a certain degree of sequence divergence in the set of homologous sequences is required for the identification of conserved residues. However, the development of a method to select homologous sequences appropriate for the identification of conserved residues has not been sufficiently addressed. An objective and general method to select appropriate homologous sequences is desired for the efficient prediction of functional regions. Results We have developed a novel index to select the sequences appropriate for the identification of conserved residues, and implemented the index within our method to predict the functional regions of a protein. The implementation of the index improved the performance of the functional region prediction. The index represents the degree of conserved residue clustering on the tertiary structure of the protein. For this purpose, the structure and sequence information were integrated within the index by the application of spatial statistics. Spatial statistics is a field of statistics in which not only the attributes but also the geometrical coordinates of the data are considered simultaneously. Higher degrees of clustering generate larger index scores. We adopted the set of homologous sequences with the highest index score, under the assumption that the best prediction accuracy is obtained when the degree of clustering is the maximum. The set of sequences selected by the index led to higher functional region prediction performance than the sets of sequences selected by other sequence

  10. Improving Photosynthesis

    Science.gov (United States)

    Evans, John R.

    2013-01-01

    Photosynthesis is the basis of plant growth, and improving photosynthesis can contribute toward greater food security in the coming decades as world population increases. Multiple targets have been identified that could be manipulated to increase crop photosynthesis. The most important target is Rubisco because it catalyses both carboxylation and oxygenation reactions and the majority of responses of photosynthesis to light, CO2, and temperature are reflected in its kinetic properties. Oxygenase activity can be reduced either by concentrating CO2 around Rubisco or by modifying the kinetic properties of Rubisco. The C4 photosynthetic pathway is a CO2-concentrating mechanism that generally enables C4 plants to achieve greater efficiency in their use of light, nitrogen, and water than C3 plants. To capitalize on these advantages, attempts have been made to engineer the C4 pathway into C3 rice (Oryza sativa). A simpler approach is to transfer bicarbonate transporters from cyanobacteria into chloroplasts and prevent CO2 leakage. Recent technological breakthroughs now allow higher plant Rubisco to be engineered and assembled successfully in planta. Novel amino acid sequences can be introduced that have been impossible to reach via normal evolution, potentially enlarging the range of kinetic properties and breaking free from the constraints associated with covariation that have been observed between certain kinetic parameters. Capturing the promise of improved photosynthesis in greater yield potential will require continued efforts to improve carbon allocation within the plant as well as to maintain grain quality and resistance to disease and lodging. PMID:23812345

  11. Heuristics for multiobjective multiple sequence alignment.

    Science.gov (United States)

    Abbasi, Maryam; Paquete, Luís; Pereira, Francisco B

    2016-07-15

    Aligning multiple sequences arises in many tasks in Bioinformatics. However, the alignments produced by the current software packages are highly dependent on the parameters setting, such as the relative importance of opening gaps with respect to the increase of similarity. Choosing only one parameter setting may provide an undesirable bias in further steps of the analysis and give too simplistic interpretations. In this work, we reformulate multiple sequence alignment from a multiobjective point of view. The goal is to generate several sequence alignments that represent a trade-off between maximizing the substitution score and minimizing the number of indels/gaps in the sum-of-pairs score function. This trade-off gives to the practitioner further information about the similarity of the sequences, from which she could analyse and choose the most plausible alignment. We introduce several heuristic approaches, based on local search procedures, that compute a set of sequence alignments, which are representative of the trade-off between the two objectives (substitution score and indels). Several algorithm design options are discussed and analysed, with particular emphasis on the influence of the starting alignment and neighborhood search definitions on the overall performance. A perturbation technique is proposed to improve the local search, which provides a wide range of high-quality alignments. The proposed approach is tested experimentally on a wide range of instances. We performed several experiments with sequences obtained from the benchmark database BAliBASE 3.0. To evaluate the quality of the results, we calculate the hypervolume indicator of the set of score vectors returned by the algorithms. The results obtained allow us to identify reasonably good choices of parameters for our approach. Further, we compared our method in terms of correctly aligned pairs ratio and columns correctly aligned ratio with respect to reference alignments. Experimental results show

  12. Toward a Better Compression for DNA Sequences Using Huffman Encoding.

    Science.gov (United States)

    Al-Okaily, Anas; Almarri, Badar; Al Yami, Sultan; Huang, Chun-Hsi

    2017-04-01

    Due to the significant amount of DNA data that are being generated by next-generation sequencing machines for genomes of lengths ranging from megabases to gigabases, there is an increasing need to compress such data to a less space and a faster transmission. Different implementations of Huffman encoding incorporating the characteristics of DNA sequences prove to better compress DNA data. These implementations center on the concepts of selecting frequent repeats so as to force a skewed Huffman tree, as well as the construction of multiple Huffman trees when encoding. The implementations demonstrate improvements on the compression ratios for five genomes with lengths ranging from 5 to 50 Mbp, compared with the standard Huffman tree algorithm. The research hence suggests an improvement on all such DNA sequence compression algorithms that use the conventional Huffman encoding. The research suggests an improvement on all DNA sequence compression algorithms that use the conventional Huffman encoding. Accompanying software is publicly available (AL-Okaily, 2016 ).

  13. Transcriptome sequencing of the Microarray Quality Control (MAQC RNA reference samples using next generation sequencing

    Directory of Open Access Journals (Sweden)

    Thierry-Mieg Danielle

    2009-06-01

    Full Text Available Abstract Background Transcriptome sequencing using next-generation sequencing platforms will soon be competing with DNA microarray technologies for global gene expression analysis. As a preliminary evaluation of these promising technologies, we performed deep sequencing of cDNA synthesized from the Microarray Quality Control (MAQC reference RNA samples using Roche's 454 Genome Sequencer FLX. Results We generated more that 3.6 million sequence reads of average length 250 bp for the MAQC A and B samples and introduced a data analysis pipeline for translating cDNA read counts into gene expression levels. Using BLAST, 90% of the reads mapped to the human genome and 64% of the reads mapped to the RefSeq database of well annotated genes with e-values ≤ 10-20. We measured gene expression levels in the A and B samples by counting the numbers of reads that mapped to individual RefSeq genes in multiple sequencing runs to evaluate the MAQC quality metrics for reproducibility, sensitivity, specificity, and accuracy and compared the results with DNA microarrays and Quantitative RT-PCR (QRTPCR from the MAQC studies. In addition, 88% of the reads were successfully aligned directly to the human genome using the AceView alignment programs with an average 90% sequence similarity to identify 137,899 unique exon junctions, including 22,193 new exon junctions not yet contained in the RefSeq database. Conclusion Using the MAQC metrics for evaluating the performance of gene expression platforms, the ExpressSeq results for gene expression levels showed excellent reproducibility, sensitivity, and specificity that improved systematically with increasing shotgun sequencing depth, and quantitative accuracy that was comparable to DNA microarrays and QRTPCR. In addition, a careful mapping of the reads to the genome using the AceView alignment programs shed new light on the complexity of the human transcriptome including the discovery of thousands of new splice variants.

  14. Short sequence motifs, overrepresented in mammalian conservednon-coding sequences

    Energy Technology Data Exchange (ETDEWEB)

    Minovitsky, Simon; Stegmaier, Philip; Kel, Alexander; Kondrashov,Alexey S.; Dubchak, Inna

    2007-02-21

    Background: A substantial fraction of non-coding DNAsequences of multicellular eukaryotes is under selective constraint. Inparticular, ~;5 percent of the human genome consists of conservednon-coding sequences (CNSs). CNSs differ from other genomic sequences intheir nucleotide composition and must play important functional roles,which mostly remain obscure.Results: We investigated relative abundancesof short sequence motifs in all human CNSs present in the human/mousewhole-genome alignments vs. three background sets of sequences: (i)weakly conserved or unconserved non-coding sequences (non-CNSs); (ii)near-promoter sequences (located between nucleotides -500 and -1500,relative to a start of transcription); and (iii) random sequences withthe same nucleotide composition as that of CNSs. When compared tonon-CNSs and near-promoter sequences, CNSs possess an excess of AT-richmotifs, often containing runs of identical nucleotides. In contrast, whencompared to random sequences, CNSs contain an excess of GC-rich motifswhich, however, lack CpG dinucleotides. Thus, abundance of short sequencemotifs in human CNSs, taken as a whole, is mostly determined by theiroverall compositional properties and not by overrepresentation of anyspecific short motifs. These properties are: (i) high AT-content of CNSs,(ii) a tendency, probably due to context-dependent mutation, of A's andT's to clump, (iii) presence of short GC-rich regions, and (iv) avoidanceof CpG contexts, due to their hypermutability. Only a small number ofshort motifs, overrepresented in all human CNSs are similar to bindingsites of transcription factors from the FOX family.Conclusion: Human CNSsas a whole appear to be too broad a class of sequences to possess strongfootprints of any short sequence-specific functions. Such footprintsshould be studied at the level of functional subclasses of CNSs, such asthose which flank genes with a particular pattern of expression. Overallproperties of CNSs are affected by

  15. Biosensors for DNA sequence detection

    Science.gov (United States)

    Vercoutere, Wenonah; Akeson, Mark

    2002-01-01

    DNA biosensors are being developed as alternatives to conventional DNA microarrays. These devices couple signal transduction directly to sequence recognition. Some of the most sensitive and functional technologies use fibre optics or electrochemical sensors in combination with DNA hybridization. In a shift from sequence recognition by hybridization, two emerging single-molecule techniques read sequence composition using zero-mode waveguides or electrical impedance in nanoscale pores.

  16. ABS: Sequence alignment by scanning

    KAUST Repository

    Bonny, Mohamed Talal

    2011-08-01

    Sequence alignment is an essential tool in almost any computational biology research. It processes large database sequences and considered to be high consumers of computation time. Heuristic algorithms are used to get approximate but fast results. We introduce fast alignment algorithm, called Alignment By Scanning (ABS), to provide an approximate alignment of two DNA sequences. We compare our algorithm with the well-known alignment algorithms, the FASTA (which is heuristic) and the \\'Needleman-Wunsch\\' (which is optimal). The proposed algorithm achieves up to 76% enhancement in alignment score when it is compared with the FASTA Algorithm. The evaluations are conducted using different lengths of DNA sequences. © 2011 IEEE.

  17. ABS: Sequence alignment by scanning

    KAUST Repository

    Bonny, Mohamed Talal; Salama, Khaled N.

    2011-01-01

    Sequence alignment is an essential tool in almost any computational biology research. It processes large database sequences and considered to be high consumers of computation time. Heuristic algorithms are used to get approximate but fast results. We introduce fast alignment algorithm, called Alignment By Scanning (ABS), to provide an approximate alignment of two DNA sequences. We compare our algorithm with the well-known alignment algorithms, the FASTA (which is heuristic) and the 'Needleman-Wunsch' (which is optimal). The proposed algorithm achieves up to 76% enhancement in alignment score when it is compared with the FASTA Algorithm. The evaluations are conducted using different lengths of DNA sequences. © 2011 IEEE.

  18. Fast global sequence alignment technique

    KAUST Repository

    Bonny, Mohamed Talal

    2011-11-01

    Bioinformatics database is growing exponentially in size. Processing these large amount of data may take hours of time even if super computers are used. One of the most important processing tool in Bioinformatics is sequence alignment. We introduce fast alignment algorithm, called \\'Alignment By Scanning\\' (ABS), to provide an approximate alignment of two DNA sequences. We compare our algorithm with the wellknown sequence alignment algorithms, the \\'GAP\\' (which is heuristic) and the \\'Needleman-Wunsch\\' (which is optimal). The proposed algorithm achieves up to 51% enhancement in alignment score when it is compared with the GAP Algorithm. The evaluations are conducted using different lengths of DNA sequences. © 2011 IEEE.

  19. Evaluation of a pooled strategy for high-throughput sequencing of cosmid clones from metagenomic libraries.

    Science.gov (United States)

    Lam, Kathy N; Hall, Michael W; Engel, Katja; Vey, Gregory; Cheng, Jiujun; Neufeld, Josh D; Charles, Trevor C

    2014-01-01

    High-throughput sequencing methods have been instrumental in the growing field of metagenomics, with technological improvements enabling greater throughput at decreased costs. Nonetheless, the economy of high-throughput sequencing cannot be fully leveraged in the subdiscipline of functional metagenomics. In this area of research, environmental DNA is typically cloned to generate large-insert libraries from which individual clones are isolated, based on specific activities of interest. Sequence data are required for complete characterization of such clones, but the sequencing of a large set of clones requires individual barcode-based sample preparation; this can become costly, as the cost of clone barcoding scales linearly with the number of clones processed, and thus sequencing a large number of metagenomic clones often remains cost-prohibitive. We investigated a hybrid Sanger/Illumina pooled sequencing strategy that omits barcoding altogether, and we evaluated this strategy by comparing the pooled sequencing results to reference sequence data obtained from traditional barcode-based sequencing of the same set of clones. Using identity and coverage metrics in our evaluation, we show that pooled sequencing can generate high-quality sequence data, without producing problematic chimeras. Though caveats of a pooled strategy exist and further optimization of the method is required to improve recovery of complete clone sequences and to avoid circumstances that generate unrecoverable clone sequences, our results demonstrate that pooled sequencing represents an effective and low-cost alternative for sequencing large sets of metagenomic clones.

  20. Multilocus Sequence Typing of Total-Genome-Sequenced Bacteria

    DEFF Research Database (Denmark)

    Larsen, Mette Voldby; Cosentino, Salvatore; Rasmussen, Simon

    2012-01-01

    Accurate strain identification is essential for anyone working with bacteria. For many species, multilocus sequence typing (MLST) is considered the "gold standard" of typing, but it is traditionally performed in an expensive and time-consuming manner. As the costs of whole-genome sequencing (WGS...

  1. Robustness of ancestral sequence reconstruction to phylogenetic uncertainty.

    Science.gov (United States)

    Hanson-Smith, Victor; Kolaczkowski, Bryan; Thornton, Joseph W

    2010-09-01

    Ancestral sequence reconstruction (ASR) is widely used to formulate and test hypotheses about the sequences, functions, and structures of ancient genes. Ancestral sequences are usually inferred from an alignment of extant sequences using a maximum likelihood (ML) phylogenetic algorithm, which calculates the most likely ancestral sequence assuming a probabilistic model of sequence evolution and a specific phylogeny--typically the tree with the ML. The true phylogeny is seldom known with certainty, however. ML methods ignore this uncertainty, whereas Bayesian methods incorporate it by integrating the likelihood of each ancestral state over a distribution of possible trees. It is not known whether Bayesian approaches to phylogenetic uncertainty improve the accuracy of inferred ancestral sequences. Here, we use simulation-based experiments under both simplified and empirically derived conditions to compare the accuracy of ASR carried out using ML and Bayesian approaches. We show that incorporating phylogenetic uncertainty by integrating over topologies very rarely changes the inferred ancestral state and does not improve the accuracy of the reconstructed ancestral sequence. Ancestral state reconstructions are robust to uncertainty about the underlying tree because the conditions that produce phylogenetic uncertainty also make the ancestral state identical across plausible trees; conversely, the conditions under which different phylogenies yield different inferred ancestral states produce little or no ambiguity about the true phylogeny. Our results suggest that ML can produce accurate ASRs, even in the face of phylogenetic uncertainty. Using Bayesian integration to incorporate this uncertainty is neither necessary nor beneficial.

  2. A programmable method for massively parallel targeted sequencing

    Science.gov (United States)

    Hopmans, Erik S.; Natsoulis, Georges; Bell, John M.; Grimes, Susan M.; Sieh, Weiva; Ji, Hanlee P.

    2014-01-01

    We have developed a targeted resequencing approach referred to as Oligonucleotide-Selective Sequencing. In this study, we report a series of significant improvements and novel applications of this method whereby the surface of a sequencing flow cell is modified in situ to capture specific genomic regions of interest from a sample and then sequenced. These improvements include a fully automated targeted sequencing platform through the use of a standard Illumina cBot fluidics station. Targeting optimization increased the yield of total on-target sequencing data 2-fold compared to the previous iteration, while simultaneously increasing the percentage of reads that could be mapped to the human genome. The described assays cover up to 1421 genes with a total coverage of 5.5 Megabases (Mb). We demonstrate a 10-fold abundance uniformity of greater than 90% in 1 log distance from the median and a targeting rate of up to 95%. We also sequenced continuous genomic loci up to 1.5 Mb while simultaneously genotyping SNPs and genes. Variants with low minor allele fraction were sensitively detected at levels of 5%. Finally, we determined the exact breakpoint sequence of cancer rearrangements. Overall, this approach has high performance for selective sequencing of genome targets, configuration flexibility and variant calling accuracy. PMID:24782526

  3. Dog Y chromosomal DNA sequence: identification, sequencing and SNP discovery

    Directory of Open Access Journals (Sweden)

    Kirkness Ewen

    2006-10-01

    Full Text Available Abstract Background Population genetic studies of dogs have so far mainly been based on analysis of mitochondrial DNA, describing only the history of female dogs. To get a picture of the male history, as well as a second independent marker, there is a need for studies of biallelic Y-chromosome polymorphisms. However, there are no biallelic polymorphisms reported, and only 3200 bp of non-repetitive dog Y-chromosome sequence deposited in GenBank, necessitating the identification of dog Y chromosome sequence and the search for polymorphisms therein. The genome has been only partially sequenced for one male dog, disallowing mapping of the sequence into specific chromosomes. However, by comparing the male genome sequence to the complete female dog genome sequence, candidate Y-chromosome sequence may be identified by exclusion. Results The male dog genome sequence was analysed by Blast search against the human genome to identify sequences with a best match to the human Y chromosome and to the female dog genome to identify those absent in the female genome. Candidate sequences were then tested for male specificity by PCR of five male and five female dogs. 32 sequences from the male genome, with a total length of 24 kbp, were identified as male specific, based on a match to the human Y chromosome, absence in the female dog genome and male specific PCR results. 14437 bp were then sequenced for 10 male dogs originating from Europe, Southwest Asia, Siberia, East Asia, Africa and America. Nine haplotypes were found, which were defined by 14 substitutions. The genetic distance between the haplotypes indicates that they originate from at least five wolf haplotypes. There was no obvious trend in the geographic distribution of the haplotypes. Conclusion We have identified 24159 bp of dog Y-chromosome sequence to be used for population genetic studies. We sequenced 14437 bp in a worldwide collection of dogs, identifying 14 SNPs for future SNP analyses, and

  4. SVX Sequencer Board

    International Nuclear Information System (INIS)

    Utes, M.

    1997-01-01

    The SVX Sequencer boards are 9U by 280mm circuit boards that reside in slots 2 through 21 of each of eight Eurocard crates in the D0 Detector Platform. The basic purpose is to control the SVX chips for data acquisition and when a trigger occurs, to gather the SVX data and relay the data to the VRB boards in the Movable Counting House. Functions and features are as follows: (1) Initialization of eight SVX chip strings using the MIL-STD-1553 data bus; (2) Real time manipulation of the SVX control lines to effect data acquisition, digitization, and readout based on the NRZ/Clock signals from the Controller; (3) Conversion of 8-bit electrical SVX readout data to an optical signal operating at 1.062 Gbit/sec, sent to the VRB. Eight HDIs will be serviced per board; (4) Built-in logic analyzer which can record the most important control and data lines during a data acquisition cycle and put this recorded information onto the 1553 bus; (5) Identification header and end of data trailer tacked onto data stream; (6) 1553 register which can read the current values of the control and data lines; (7) 1553 register which can test the optical link; (8) 1553 registers for crossing pulse width, calibration pulse voltage, and calibration pipeline select; (9) 1553 register for reading the optical drivers status link; (10) 1553 register for power control of SVX chips and ignoring bad SVX strings; (11) Front panel displays and LEDs show the board status at a glance; (12) In-system programmable EPLDs are programmed via 1553 or Altera's 'Bitblaster'; (13) Automatic readout abort after 45us; (14) Supplies BUSY signal back to Trigger Framework; (15) Supports a heartbeat system to prevent excessive SVX current draw; and (16) Supports a SVX power trip feature if heartbeat failure occurs.

  5. Sequence Algebra, Sequence Decision Diagrams and Dynamic Fault Trees

    International Nuclear Information System (INIS)

    Rauzy, Antoine B.

    2011-01-01

    A large attention has been focused on the Dynamic Fault Trees in the past few years. By adding new gates to static (regular) Fault Trees, Dynamic Fault Trees aim to take into account dependencies among events. Merle et al. proposed recently an algebraic framework to give a formal interpretation to these gates. In this article, we extend Merle et al.'s work by adopting a slightly different perspective. We introduce Sequence Algebras that can be seen as Algebras of Basic Events, representing failures of non-repairable components. We show how to interpret Dynamic Fault Trees within this framework. Finally, we propose a new data structure to encode sets of sequences of Basic Events: Sequence Decision Diagrams. Sequence Decision Diagrams are very much inspired from Minato's Zero-Suppressed Binary Decision Diagrams. We show that all operations of Sequence Algebras can be performed on this data structure.

  6. Boosting antibody developability through rational sequence optimization.

    Science.gov (United States)

    Seeliger, Daniel; Schulz, Patrick; Litzenburger, Tobias; Spitz, Julia; Hoerer, Stefan; Blech, Michaela; Enenkel, Barbara; Studts, Joey M; Garidel, Patrick; Karow, Anne R

    2015-01-01

    The application of monoclonal antibodies as commercial therapeutics poses substantial demands on stability and properties of an antibody. Therapeutic molecules that exhibit favorable properties increase the success rate in development. However, it is not yet fully understood how the protein sequences of an antibody translates into favorable in vitro molecule properties. In this work, computational design strategies based on heuristic sequence analysis were used to systematically modify an antibody that exhibited a tendency to precipitation in vitro. The resulting series of closely related antibodies showed improved stability as assessed by biophysical methods and long-term stability experiments. As a notable observation, expression levels also improved in comparison with the wild-type candidate. The methods employed to optimize the protein sequences, as well as the biophysical data used to determine the effect on stability under conditions commonly used in the formulation of therapeutic proteins, are described. Together, the experimental and computational data led to consistent conclusions regarding the effect of the introduced mutations. Our approach exemplifies how computational methods can be used to guide antibody optimization for increased stability.

  7. Pulse sequences and visualization of instruments

    International Nuclear Information System (INIS)

    Merkle, E.M.; Ulm Univ.; Wendt, M.; Chung, Y.C.; Duerk, J.L.; University Hospitals of Cleveland and Case Western Reserve University, OH; Lewin, J.S.

    1998-01-01

    While initially advocated primarily for intrasurgical visualization (e.g., craniotomy), interventional MRI rapidly evolved into roles in image-guided localization for needle-based procedures, minimally invasive neurosurgical procedures, and thermal ablation of cancer. In this contest, MRI pulse sequences and scanning methods serve one of four primary roles: (1) speed improvement, (2) device localization, (3) anatomy/lesion differentiation and (4) temperature sensitivity. The first part of this manuscript deals with passive visualization of MR-compatible needles and the effects of field strength, sequence design, and orientation of the needle relative to the static magnetic field of the scanner. Issues and recommendations are given for low-field as well as high-field scanners. The second part contains methods reported to achieve improved acquisition efficiency over conventional phase encoding (wavelets, locally focused imaging, singular value decomposition and keyhole imaging). Finally, the last part of the manuscrpt reports the current status of thermosensitive sequences and their dependence on spinlattice relaxation time (T1), water diffusion coefficient (D) and proton chemical shift (δ). (orig.) [de

  8. Chameleon sequences in neurodegenerative diseases

    International Nuclear Information System (INIS)

    Bahramali, Golnaz; Goliaei, Bahram; Minuchehr, Zarrin; Salari, Ali

    2016-01-01

    Chameleon sequences can adopt either alpha helix sheet or a coil conformation. Defining chameleon sequences in PDB (Protein Data Bank) may yield to an insight on defining peptides and proteins responsible in neurodegeneration. In this research, we benefitted from the large PDB and performed a sequence analysis on Chameleons, where we developed an algorithm to extract peptide segments with identical sequences, but different structures. In order to find new chameleon sequences, we extracted a set of 8315 non-redundant protein sequences from the PDB with an identity less than 25%. Our data was classified to “helix to strand (HE)”, “helix to coil (HC)” and “strand to coil (CE)” alterations. We also analyzed the occurrence of singlet and doublet amino acids and the solvent accessibility in the chameleon sequences; we then sorted out the proteins with the most number of chameleon sequences and named them Chameleon Flexible Proteins (CFPs) in our dataset. Our data revealed that Gly, Val, Ile, Tyr and Phe, are the major amino acids in Chameleons. We also found that there are proteins such as Insulin Degrading Enzyme IDE and GTP-binding nuclear protein Ran (RAN) with the most number of chameleons (640 and 405 respectively). These proteins have known roles in neurodegenerative diseases. Therefore it can be inferred that other CFP's can serve as key proteins in neurodegeneration, and a study on them can shed light on curing and preventing neurodegenerative diseases.

  9. Direct, rapid RNA sequence analysis

    International Nuclear Information System (INIS)

    Peattie, D.A.

    1987-01-01

    The original methods of RNA sequence analysis were based on enzymatic production and chromatographic separation of overlapping oligonucleotide fragments from within an RNA molecule followed by identification of the mononucleotides comprising the oligomer. Over the past decade the field of nucleic acid sequencing has changed dramatically, however, and RNA molecules now can be sequenced in a variety of more streamlined fashions. Most of the more recent advances in RNA sequencing have involved one-dimensional electrophoretic separation of 32 P-end-labeled oligoribonucleotides on polyacrylamide gels. In this chapter the author discusses two of these methods for determining the nucleotide sequences of RNA molecules rapidly: the chemical method and the enzymatic method. Both methods are direct and degradative, i.e., they rely on fragmatic and chemical approaches should be utilized. The single-strand-specific ribonucleases (A, T 1 , T 2 , and S 1 ) provide an efficient means to locate double-helical regions rapidly, and the chemical reactions provide a means to determine the RNA sequence within these regions. In addition, the chemical reactions allow one to assign interactions to specific atoms and to distinguish secondary interactions from tertiary ones. If the RNA molecule is small enough to be sequenced directly by the enzymatic or chemical method, the probing reactions can be done easily at the same time as sequencing reactions

  10. Chameleon sequences in neurodegenerative diseases.

    Science.gov (United States)

    Bahramali, Golnaz; Goliaei, Bahram; Minuchehr, Zarrin; Salari, Ali

    2016-03-25

    Chameleon sequences can adopt either alpha helix sheet or a coil conformation. Defining chameleon sequences in PDB (Protein Data Bank) may yield to an insight on defining peptides and proteins responsible in neurodegeneration. In this research, we benefitted from the large PDB and performed a sequence analysis on Chameleons, where we developed an algorithm to extract peptide segments with identical sequences, but different structures. In order to find new chameleon sequences, we extracted a set of 8315 non-redundant protein sequences from the PDB with an identity less than 25%. Our data was classified to "helix to strand (HE)", "helix to coil (HC)" and "strand to coil (CE)" alterations. We also analyzed the occurrence of singlet and doublet amino acids and the solvent accessibility in the chameleon sequences; we then sorted out the proteins with the most number of chameleon sequences and named them Chameleon Flexible Proteins (CFPs) in our dataset. Our data revealed that Gly, Val, Ile, Tyr and Phe, are the major amino acids in Chameleons. We also found that there are proteins such as Insulin Degrading Enzyme IDE and GTP-binding nuclear protein Ran (RAN) with the most number of chameleons (640 and 405 respectively). These proteins have known roles in neurodegenerative diseases. Therefore it can be inferred that other CFP's can serve as key proteins in neurodegeneration, and a study on them can shed light on curing and preventing neurodegenerative diseases. Copyright © 2016 Elsevier Inc. All rights reserved.

  11. Farey sequences and resistor networks

    Indian Academy of Sciences (India)

    Green's function, while the perturbation of a network is investigated in [3]. ... In Theorem 1 below, we employ the Farey sequence to establish a strict .... We next show that the Farey sequence method is applicable for circuits with n or fewer.

  12. DNA Sequencing by Capillary Electrophoresis

    Science.gov (United States)

    Karger, Barry L.; Guttman, Andras

    2009-01-01

    Sequencing of human and other genomes has been at the center of interest in the biomedical field over the past several decades and is now leading toward an era of personalized medicine. During this time, DNA sequencing methods have evolved from the labor intensive slab gel electrophoresis, through automated multicapillary electrophoresis systems using fluorophore labeling with multispectral imaging, to the “next generation” technologies of cyclic array, hybridization based, nanopore and single molecule sequencing. Deciphering the genetic blueprint and follow-up confirmatory sequencing of Homo sapiens and other genomes was only possible by the advent of modern sequencing technologies that was a result of step by step advances with a contribution of academics, medical personnel and instrument companies. While next generation sequencing is moving ahead at break-neck speed, the multicapillary electrophoretic systems played an essential role in the sequencing of the Human Genome, the foundation of the field of genomics. In this prospective, we wish to overview the role of capillary electrophoresis in DNA sequencing based in part of several of our articles in this journal. PMID:19517496

  13. Graphene nanodevices for DNA sequencing

    NARCIS (Netherlands)

    Heerema, S.J.; Dekker, C.

    2016-01-01

    Fast, cheap, and reliable DNA sequencing could be one of the most disruptive innovations of this decade, as it will pave the way for personalized medicine. In pursuit of such technology, a variety of nanotechnology-based approaches have been explored and established, including sequencing with

  14. Chameleon sequences in neurodegenerative diseases

    Energy Technology Data Exchange (ETDEWEB)

    Bahramali, Golnaz [Institute of Biochemistry and Biophysics, University of Tehran, Tehran (Iran, Islamic Republic of); Goliaei, Bahram, E-mail: goliaei@ut.ac.ir [Institute of Biochemistry and Biophysics, University of Tehran, Tehran (Iran, Islamic Republic of); Minuchehr, Zarrin, E-mail: minuchehr@nigeb.ac.ir [Department of Systems Biotechnology, National Institute of Genetic Engineering and Biotechnology, (NIGEB), Tehran (Iran, Islamic Republic of); Salari, Ali [Department of Systems Biotechnology, National Institute of Genetic Engineering and Biotechnology, (NIGEB), Tehran (Iran, Islamic Republic of)

    2016-03-25

    Chameleon sequences can adopt either alpha helix sheet or a coil conformation. Defining chameleon sequences in PDB (Protein Data Bank) may yield to an insight on defining peptides and proteins responsible in neurodegeneration. In this research, we benefitted from the large PDB and performed a sequence analysis on Chameleons, where we developed an algorithm to extract peptide segments with identical sequences, but different structures. In order to find new chameleon sequences, we extracted a set of 8315 non-redundant protein sequences from the PDB with an identity less than 25%. Our data was classified to “helix to strand (HE)”, “helix to coil (HC)” and “strand to coil (CE)” alterations. We also analyzed the occurrence of singlet and doublet amino acids and the solvent accessibility in the chameleon sequences; we then sorted out the proteins with the most number of chameleon sequences and named them Chameleon Flexible Proteins (CFPs) in our dataset. Our data revealed that Gly, Val, Ile, Tyr and Phe, are the major amino acids in Chameleons. We also found that there are proteins such as Insulin Degrading Enzyme IDE and GTP-binding nuclear protein Ran (RAN) with the most number of chameleons (640 and 405 respectively). These proteins have known roles in neurodegenerative diseases. Therefore it can be inferred that other CFP's can serve as key proteins in neurodegeneration, and a study on them can shed light on curing and preventing neurodegenerative diseases.

  15. Commercial Art: Scope and Sequence.

    Science.gov (United States)

    Nashville - Davidson County Metropolitan Public Schools, TN.

    This scope and sequence guide, developed for a commercial art vocational education program, represents an initial step in the development of a systemwide articulated curriculum sequence for all vocational programs within the Metropolitan Nashville Public School System. It was developed as a result of needs expressed by teachers, parents, and the…

  16. Rapid Diagnostics of Onboard Sequences

    Science.gov (United States)

    Starbird, Thomas W.; Morris, John R.; Shams, Khawaja S.; Maimone, Mark W.

    2012-01-01

    Keeping track of sequences onboard a spacecraft is challenging. When reviewing Event Verification Records (EVRs) of sequence executions on the Mars Exploration Rover (MER), operators often found themselves wondering which version of a named sequence the EVR corresponded to. The lack of this information drastically impacts the operators diagnostic capabilities as well as their situational awareness with respect to the commands the spacecraft has executed, since the EVRs do not provide argument values or explanatory comments. Having this information immediately available can be instrumental in diagnosing critical events and can significantly enhance the overall safety of the spacecraft. This software provides auditing capability that can eliminate that uncertainty while diagnosing critical conditions. Furthermore, the Restful interface provides a simple way for sequencing tools to automatically retrieve binary compiled sequence SCMFs (Space Command Message Files) on demand. It also enables developers to change the underlying database, while maintaining the same interface to the existing applications. The logging capabilities are also beneficial to operators when they are trying to recall how they solved a similar problem many days ago: this software enables automatic recovery of SCMF and RML (Robot Markup Language) sequence files directly from the command EVRs, eliminating the need for people to find and validate the corresponding sequences. To address the lack of auditing capability for sequences onboard a spacecraft during earlier missions, extensive logging support was added on the Mars Science Laboratory (MSL) sequencing server. This server is responsible for generating all MSL binary SCMFs from RML input sequences. The sequencing server logs every SCMF it generates into a MySQL database, as well as the high-level RML file and dictionary name inputs used to create the SCMF. The SCMF is then indexed by a hash value that is automatically included in all command

  17. Rowe's Argument from Improvability | Almeida | Philosophical Papers

    African Journals Online (AJOL)

    William Rowe has argued that if there is an infinite sequence of improving worlds then an essentially perfectly good being must actualize some world in the sequence and must not actualize any world in the sequence. Since that is impossible, there exist no perfectly good beings. I show that Rowe's argument assumes that ...

  18. Accident sequence quantification with KIRAP

    International Nuclear Information System (INIS)

    Kim, Tae Un; Han, Sang Hoon; Kim, Kil You; Yang, Jun Eon; Jeong, Won Dae; Chang, Seung Cheol; Sung, Tae Yong; Kang, Dae Il; Park, Jin Hee; Lee, Yoon Hwan; Hwang, Mi Jeong.

    1997-01-01

    The tasks of probabilistic safety assessment(PSA) consists of the identification of initiating events, the construction of event tree for each initiating event, construction of fault trees for event tree logics, the analysis of reliability data and finally the accident sequence quantification. In the PSA, the accident sequence quantification is to calculate the core damage frequency, importance analysis and uncertainty analysis. Accident sequence quantification requires to understand the whole model of the PSA because it has to combine all event tree and fault tree models, and requires the excellent computer code because it takes long computation time. Advanced Research Group of Korea Atomic Energy Research Institute(KAERI) has developed PSA workstation KIRAP(Korea Integrated Reliability Analysis Code Package) for the PSA work. This report describes the procedures to perform accident sequence quantification, the method to use KIRAP's cut set generator, and method to perform the accident sequence quantification with KIRAP. (author). 6 refs

  19. Accident sequence quantification with KIRAP

    Energy Technology Data Exchange (ETDEWEB)

    Kim, Tae Un; Han, Sang Hoon; Kim, Kil You; Yang, Jun Eon; Jeong, Won Dae; Chang, Seung Cheol; Sung, Tae Yong; Kang, Dae Il; Park, Jin Hee; Lee, Yoon Hwan; Hwang, Mi Jeong

    1997-01-01

    The tasks of probabilistic safety assessment(PSA) consists of the identification of initiating events, the construction of event tree for each initiating event, construction of fault trees for event tree logics, the analysis of reliability data and finally the accident sequence quantification. In the PSA, the accident sequence quantification is to calculate the core damage frequency, importance analysis and uncertainty analysis. Accident sequence quantification requires to understand the whole model of the PSA because it has to combine all event tree and fault tree models, and requires the excellent computer code because it takes long computation time. Advanced Research Group of Korea Atomic Energy Research Institute(KAERI) has developed PSA workstation KIRAP(Korea Integrated Reliability Analysis Code Package) for the PSA work. This report describes the procedures to perform accident sequence quantification, the method to use KIRAP`s cut set generator, and method to perform the accident sequence quantification with KIRAP. (author). 6 refs.

  20. Repeated DNA sequences in fungi

    Energy Technology Data Exchange (ETDEWEB)

    Dutta, S K

    1974-11-01

    Several fungal species, representatives of all broad groups like basidiomycetes, ascomycetes and phycomycetes, were examined for the nature of repeated DNA sequences by DNA:DNA reassociation studies using hydroxyapatite chromatography. All of the fungal species tested contained 10 to 20 percent repeated DNA sequences. There are approximately 100 to 110 copies of repeated DNA sequences of approximately 4 x 10/sup 7/ daltons piece size of each. Repeated DNA sequence homoduplexes showed on average 5/sup 0/C difference of T/sub e/50 (temperature at which 50 percent duplexes dissociate) values from the corresponding homoduplexes of unfractionated whole DNA. It is suggested that a part of repetitive sequences in fungi constitutes mitochondrial DNA and a part of it constitutes nuclear DNA. (auth)

  1. [Complete genome sequencing and sequence analysis of BCG Tice].

    Science.gov (United States)

    Wang, Zhiming; Pan, Yuanlong; Wu, Jun; Zhu, Baoli

    2012-10-04

    The objective of this study is to obtain the complete genome sequence of Bacillus Calmette-Guerin Tice (BCG Tice), in order to provide more information about the molecular biology of BCG Tice and design more reasonable vaccines to prevent tuberculosis. We assembled the data from high-throughput sequencing with SOAPdenovo software, with many contigs and scaffolds obtained. There are many sequence gaps and physical gaps remained as a result of regional low coverage and low quality. We designed primers at the end of contigs and performed PCR amplification in order to link these contigs and scaffolds. With various enzymes to perform PCR amplification, adjustment of PCR reaction conditions, and combined with clone construction to sequence, all the gaps were finished. We obtained the complete genome sequence of BCG Tice and submitted it to GenBank of National Center for Biotechnology Information (NCBI). The genome of BCG Tice is 4334064 base pairs in length, with GC content 65.65%. The problems and strategies during the finishing step of BCG Tice sequencing are illuminated here, with the hope of affording some experience to those who are involved in the finishing step of genome sequencing. The microarray data were verified by our results.

  2. Improving ITS sequence data for identification of plant pathogenic fungi

    Science.gov (United States)

    R. Henrik Nilsson; Kevin D. Hyde; Julia Pawłowska; Martin Ryberg; Leho Tedersoo; Anders Bjørnsgard Aas; Siti A. Alias; Artur Alves; Cajsa Lisa Anderson; Alexandre Antonelli; A. Elizabeth Arnold; Barbara Bahnmann; Mohammad Bahram; Johan Bengtsson-Palme; Anna Berlin; Sara Branco; Putarak Chomnunti; Asha Dissanayake; Rein Drenkhan; Hanna Friberg; Tobias Guldberg Frøslev; Bettina Halwachs; Martin Hartmann; Beatrice Henricot; Ruvishika Jayawardena; Ari Jumpponen; Håvard Kauserud; Sonja Koskela; Tomasz Kulik; Kare Liimatainen; Björn D. Lindahl; Daniel Lindner; Jian-Kui Liu; Sajeewa Maharachchikumbura; Dimuthu Manamgoda; Svante Martinsson; Maria Alice Neves; Tuula Niskanen; Stephan Nylinder; Olinto Liparini Pereira; Danilo Batista Pinho; Teresita M. Porter; Valentin Queloz; Taavi Riit; Marisol Sánchez-García; Filipe de Sousa; Emil Stefańczyk; Mariusz Tadych; Susumu Takamatsu; Qing Tian; Dhanushka Udayanga; Martin Unterseher; Zheng Wang; Saowanee Wikee; Jiye Yan; Ellen Larsson; Karl-Henrik Larsson; Urmas Kõljalg; Kessy Abarenkov

    2014-01-01

    Plant pathogenic fungi are a large and diverse assemblage of eukaryotes with substantial impacts on natural ecosystems and human endeavours. These taxa often have complex and poorly understood life cycles, lack observable, discriminatory morphological characters, and may not be amenable to in vitro culturing. As a result, species identification is frequently difficult...

  3. Improved nonparametric inference for multiple correlated periodic sequences

    KAUST Repository

    Sun, Ying; Hart, Jeffrey D.; Genton, Marc G.

    2013-01-01

    cross-validation method to the temperature data obtained from multiple ice cores, investigating the periodicity of the El Niño effect. Our methodology is also illustrated by estimating patients' cardiac cycle from different physiological signals

  4. BWR severe accident sequence analyses at ORNL - some lessons learned

    International Nuclear Information System (INIS)

    Hodge, S.A.

    1983-01-01

    Boiling water reactor severe accident sequence studies are being carried out using Browns Ferry Unit 1 as the model plant. Four accident studies were completed, resulting in recommendations for improvements in system design, emergency procedures, and operator training. Computer code improvements were an important by-product

  5. GuiTope: an application for mapping random-sequence peptides to protein sequences.

    Science.gov (United States)

    Halperin, Rebecca F; Stafford, Phillip; Emery, Jack S; Navalkar, Krupa Arun; Johnston, Stephen Albert

    2012-01-03

    Random-sequence peptide libraries are a commonly used tool to identify novel ligands for binding antibodies, other proteins, and small molecules. It is often of interest to compare the selected peptide sequences to the natural protein binding partners to infer the exact binding site or the importance of particular residues. The ability to search a set of sequences for similarity to a set of peptides may sometimes enable the prediction of an antibody epitope or a novel binding partner. We have developed a software application designed specifically for this task. GuiTope provides a graphical user interface for aligning peptide sequences to protein sequences. All alignment parameters are accessible to the user including the ability to specify the amino acid frequency in the peptide library; these frequencies often differ significantly from those assumed by popular alignment programs. It also includes a novel feature to align di-peptide inversions, which we have found improves the accuracy of antibody epitope prediction from peptide microarray data and shows utility in analyzing phage display datasets. Finally, GuiTope can randomly select peptides from a given library to estimate a null distribution of scores and calculate statistical significance. GuiTope provides a convenient method for comparing selected peptide sequences to protein sequences, including flexible alignment parameters, novel alignment features, ability to search a database, and statistical significance of results. The software is available as an executable (for PC) at http://www.immunosignature.com/software and ongoing updates and source code will be available at sourceforge.net.

  6. GuiTope: an application for mapping random-sequence peptides to protein sequences

    Directory of Open Access Journals (Sweden)

    Halperin Rebecca F

    2012-01-01

    Full Text Available Abstract Background Random-sequence peptide libraries are a commonly used tool to identify novel ligands for binding antibodies, other proteins, and small molecules. It is often of interest to compare the selected peptide sequences to the natural protein binding partners to infer the exact binding site or the importance of particular residues. The ability to search a set of sequences for similarity to a set of peptides may sometimes enable the prediction of an antibody epitope or a novel binding partner. We have developed a software application designed specifically for this task. Results GuiTope provides a graphical user interface for aligning peptide sequences to protein sequences. All alignment parameters are accessible to the user including the ability to specify the amino acid frequency in the peptide library; these frequencies often differ significantly from those assumed by popular alignment programs. It also includes a novel feature to align di-peptide inversions, which we have found improves the accuracy of antibody epitope prediction from peptide microarray data and shows utility in analyzing phage display datasets. Finally, GuiTope can randomly select peptides from a given library to estimate a null distribution of scores and calculate statistical significance. Conclusions GuiTope provides a convenient method for comparing selected peptide sequences to protein sequences, including flexible alignment parameters, novel alignment features, ability to search a database, and statistical significance of results. The software is available as an executable (for PC at http://www.immunosignature.com/software and ongoing updates and source code will be available at sourceforge.net.

  7. Tritium pellet injection sequences for TFTR

    International Nuclear Information System (INIS)

    Houlberg, W.A.; Milora, S.L.; Attenberger, S.E.; Singer, C.E.; Schmidt, G.L.

    1983-01-01

    Tritium pellet injection into neutral deuterium, beam heated deuterium plasmas in the Tokamak Fusion Test Reactor (TFTR) is shown to be an attractive means of (1) minimizing tritium use per tritium discharge and over a sequence of tritium discharges; (2) greatly reducing the tritium load in the walls, limiters, getters, and cryopanels; (3) maintaining or improving instantaneous neutron production (Q); (4) reducing or eliminating deuterium-tritium (D-T) neutron production in non-optimized discharges; and (5) generally adding flexibility to the experimental sequences leading to optimal Q operation. Transport analyses of both compression and full-bore TFTR plasmas are used to support the above observations and to provide the basis for a proposed eight-pellet gas gun injector for the 1986 tritium experiments

  8. Dynamics of domain coverage of the protein sequence universe

    Science.gov (United States)

    2012-01-01

    Background The currently known protein sequence space consists of millions of sequences in public databases and is rapidly expanding. Assigning sequences to families leads to a better understanding of protein function and the nature of the protein universe. However, a large portion of the current protein space remains unassigned and is referred to as its “dark matter”. Results Here we suggest that true size of “dark matter” is much larger than stated by current definitions. We propose an approach to reducing the size of “dark matter” by identifying and subtracting regions in protein sequences that are not likely to contain any domain. Conclusions Recent improvements in computational domain modeling result in a decrease, albeit slowly, in the relative size of “dark matter”; however, its absolute size increases substantially with the growth of sequence data. PMID:23157439

  9. Fat suppression in MR imaging with binomial pulse sequences

    International Nuclear Information System (INIS)

    Baudovin, C.J.; Bryant, D.J.; Bydder, G.M.; Young, I.R.

    1989-01-01

    This paper reports on a study to develop pulse sequences allowing suppression of fat signal on MR images without eliminating signal from other tissues with short T1. They have developed such a technique involving selective excitation of protons in water, based on a binomial pulse sequence. Imaging is performed at 0.15 T. Careful shimming is performed to maximize separation of fat and water peaks. A spin-echo 1,500/80 sequence is used, employing 90 degrees pulse with transit frequency optimized for water with null excitation of 20 H offset, followed by a section-selective 180 degrees pulse. With use of the binomial sequence for imagining, reduction in fat signal is seen on images of the pelvis and legs of volunteers. Patient studies show dramatic improvement in visualization of prostatic carcinoma compared with standard sequences

  10. Quantifying population genetic differentiation from next-generation sequencing data

    DEFF Research Database (Denmark)

    Fumagalli, Matteo; Garrett Vieira, Filipe Jorge; Korneliussen, Thorfinn Sand

    2013-01-01

    method for quantifying population genetic differentiation from next-generation sequencing data. In addition, we present a strategy to investigate population structure via Principal Components Analysis. Through extensive simulations, we compare the new method herein proposed to approaches based...... on genotype calling and demonstrate a marked improvement in estimation accuracy for a wide range of conditions. We apply the method to a large-scale genomic data set of domesticated and wild silkworms sequenced at low coverage. We find that we can infer the fine-scale genetic structure of the sampled......Over the last few years, new high-throughput DNA sequencing technologies have dramatically increased speed and reduced sequencing costs. However, the use of these sequencing technologies is often challenged by errors and biases associated with the bioinformatical methods used for analyzing the data...

  11. Dynamics of domain coverage of the protein sequence universe

    Directory of Open Access Journals (Sweden)

    Rekapalli Bhanu

    2012-11-01

    Full Text Available Abstract Background The currently known protein sequence space consists of millions of sequences in public databases and is rapidly expanding. Assigning sequences to families leads to a better understanding of protein function and the nature of the protein universe. However, a large portion of the current protein space remains unassigned and is referred to as its “dark matter”. Results Here we suggest that true size of “dark matter” is much larger than stated by current definitions. We propose an approach to reducing the size of “dark matter” by identifying and subtracting regions in protein sequences that are not likely to contain any domain. Conclusions Recent improvements in computational domain modeling result in a decrease, albeit slowly, in the relative size of “dark matter”; however, its absolute size increases substantially with the growth of sequence data.

  12. Peptide de novo sequencing of mixture tandem mass spectra

    DEFF Research Database (Denmark)

    Gorshkov, Vladimir; Hotta, Stéphanie Yuki Kolbeck; Braga, Thiago Verano

    2016-01-01

    they decrease the identification performance using database search engines. De novo sequencing approaches are expected to be even more sensitive to the reduction in mass spectrum quality resulting from peptide precursor co-isolation and thus prone to false identifications. The deconvolution approach matched...... complementary b-, y-ions to each precursor peptide mass, which allowed the creation of virtual spectra containing sequence specific fragment ions of each co-isolated peptide. Deconvolution processing resulted in equally efficient identification rates but increased the absolute number of correctly sequenced...... peptides. The improvement was in the range of 20–35% additional peptide identifications for a HeLa lysate sample. Some correct sequences were identified only using unprocessed spectra; however, the number of these was lower than those where improvement was obtained by mass spectral deconvolution. Tight...

  13. GROUPING WEB ACCESS SEQUENCES uSING SEQUENCE ALIGNMENT METHOD

    OpenAIRE

    BHUPENDRA S CHORDIA; KRISHNAKANT P ADHIYA

    2011-01-01

    In web usage mining grouping of web access sequences can be used to determine the behavior or intent of a set of users. Grouping websessions is how to measure the similarity between web sessions. There are many shortcomings in traditional measurement methods. The taskof grouping web sessions based on similarity and consists of maximizing the intra-group similarity while minimizing the inter-groupsimilarity is done using sequence alignment method. This paper introduces a new method to group we...

  14. LPTAU, Quasi Random Sequence Generator

    International Nuclear Information System (INIS)

    Sobol, Ilya M.

    1993-01-01

    1 - Description of program or function: LPTAU generates quasi random sequences. These are uniformly distributed sets of L=M N points in the N-dimensional unit cube: I N =[0,1]x...x[0,1]. These sequences are used as nodes for multidimensional integration; as searching points in global optimization; as trial points in multi-criteria decision making; as quasi-random points for quasi Monte Carlo algorithms. 2 - Method of solution: Uses LP-TAU sequence generation (see references). 3 - Restrictions on the complexity of the problem: The number of points that can be generated is L 30 . The dimension of the space cannot exceed 51

  15. Weak disorder in Fibonacci sequences

    Energy Technology Data Exchange (ETDEWEB)

    Ben-Naim, E [Theoretical Division and Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, NM 87545 (United States); Krapivsky, P L [Department of Physics and Center for Molecular Cybernetics, Boston University, Boston, MA 02215 (United States)

    2006-05-19

    We study how weak disorder affects the growth of the Fibonacci series. We introduce a family of stochastic sequences that grow by the normal Fibonacci recursion with probability 1 - {epsilon}, but follow a different recursion rule with a small probability {epsilon}. We focus on the weak disorder limit and obtain the Lyapunov exponent that characterizes the typical growth of the sequence elements, using perturbation theory. The limiting distribution for the ratio of consecutive sequence elements is obtained as well. A number of variations to the basic Fibonacci recursion including shift, doubling and copying are considered. (letter to the editor)

  16. Implicit sequence learning in people with Parkinson’s disease

    Directory of Open Access Journals (Sweden)

    Katherine R Gamble

    2014-08-01

    Full Text Available Implicit sequence learning involves learning about dependencies in sequences of events without intent to learn or awareness of what has been learned. Sequence learning is related to striatal dopamine levels, striatal activation, and integrity of white matter connections. People with Parkinson’s disease (PD have degeneration of dopamine-producing neurons, leading to dopamine deficiency and therefore striatal deficits, and they have difficulties with sequencing, including complex language comprehension and postural stability. Most research on implicit sequence learning in PD has used motor-based tasks. However, because PD presents with motor deficits, it is difficult to assess whether learning itself is impaired in these tasks. The present study used an implicit sequence learning task with a reduced motor component, the Triplets Learning Task (TLT. People with PD and age- and education-matched healthy older adults completed three sessions (each consisting of 10 blocks of 50 trials of the TLT. Results revealed that the PD group was able to learn the sequence, however, when learning was examined using a Half Blocks analysis (Nemeth et al., 2013, which compared learning in the 1st 25/50 trials of all blocks to that in the 2nd 25/50 trials, the PD group showed significantly less learning than Controls in the 2nd Half Blocks, but not in the 1st. Nemeth et al. hypothesized that the 1st Half Blocks involve recall and reactivation of the sequence learned, thus reflecting hippocampal-dependent learning, while the 2nd Half Blocks involve proceduralized behavior of learned sequences, reflecting striatal-based learning. The present results suggest that the PD group had intact hippocampal-dependent implicit sequence learning, but impaired striatal-dependent learning. Thus, sequencing deficits in PD are likely due to striatal impairments, but other brain systems, such as the hippocampus, may be able to partially compensate for striatal decline to improve

  17. Oxford Nanopore MinION Sequencing and Genome Assembly

    Directory of Open Access Journals (Sweden)

    Hengyun Lu

    2016-10-01

    Full Text Available The revolution of genome sequencing is continuing after the successful second-generation sequencing (SGS technology. The third-generation sequencing (TGS technology, led by Pacific Biosciences (PacBio, is progressing rapidly, moving from a technology once only capable of providing data for small genome analysis, or for performing targeted screening, to one that promises high quality de novo assembly and structural variation detection for human-sized genomes. In 2014, the MinION, the first commercial sequencer using nanopore technology, was released by Oxford Nanopore Technologies (ONT. MinION identifies DNA bases by measuring the changes in electrical conductivity generated as DNA strands pass through a biological pore. Its portability, affordability, and speed in data production makes it suitable for real-time applications, the release of the long read sequencer MinION has thus generated much excitement and interest in the genomics community. While de novo genome assemblies can be cheaply produced from SGS data, assembly continuity is often relatively poor, due to the limited ability of short reads to handle long repeats. Assembly quality can be greatly improved by using TGS long reads, since repetitive regions can be easily expanded into using longer sequencing lengths, despite having higher error rates at the base level. The potential of nanopore sequencing has been demonstrated by various studies in genome surveillance at locations where rapid and reliable sequencing is needed, but where resources are limited.

  18. Sequencing of individual chromosomes of plant pathogenic Fusarium oxysporum.

    Science.gov (United States)

    Kashiwa, Takeshi; Kozaki, Toshinori; Ishii, Kazuo; Turgeon, B Gillian; Teraoka, Tohru; Komatsu, Ken; Arie, Tsutomu

    2017-01-01

    A small chromosome in reference isolate 4287 of F. oxysporum f. sp. lycopersici (Fol) has been designated as a 'pathogenicity chromosome' because it carries several pathogenicity related genes such as the Secreted In Xylem (SIX) genes. Sequence assembly of small chromosomes in other isolates, based on a reference genome template, is difficult because of karyotype variation among isolates and a high number of sequences associated with transposable elements. These factors often result in misassembly of sequences, making it unclear whether other isolates possess the same pathogenicity chromosome harboring SIX genes as in the reference isolate. To overcome this difficulty, single chromosome sequencing after Contour-clamped Homogeneous Electric Field (CHEF) separation of chromosomes was performed, followed by de novo assembly of sequences. The assembled sequences of individual chromosomes were consistent with results of probing gels of CHEF separated chromosomes with SIX genes. Individual chromosome sequencing revealed that several SIX genes are located on a single small chromosome in two pathogenic forms of F. oxysporum, beyond the reference isolate 4287, and in the cabbage yellows fungus F. oxysporum f. sp. conglutinans. The particular combination of SIX genes on each small chromosome varied. Moreover, not all SIX genes were found on small chromosomes; depending on the isolate, some were on big chromosomes. This suggests that recombination of chromosomes and/or translocation of SIX genes may occur frequently. Our method improves sequence comparison of small chromosomes among isolates. Copyright © 2016 Elsevier Inc. All rights reserved.

  19. Exome sequencing for prenatal diagnosis of fetuses with sonographic abnormalities.

    Science.gov (United States)

    Drury, Suzanne; Williams, Hywel; Trump, Natalie; Boustred, Christopher; Lench, Nicholas; Scott, Richard H; Chitty, Lyn S

    2015-10-01

    In the absence of aneuploidy or other pathogenic cytogenetic abnormality, fetuses with increased nuchal translucency (NT ≥ 3.5 mm) and/or other sonographic abnormalities have a greater incidence of genetic syndromes, but defining the underlying pathology can be challenging. Here, we investigate the value of whole exome sequencing in fetuses with sonographic abnormalities but normal microarray analysis. Whole exome sequencing was performed on DNA extracted from chorionic villi or amniocytes in 24 fetuses with unexplained ultrasound findings. In the first 14 cases sequencing was initially performed on fetal DNA only. For the remaining 10, the trio of fetus, mother and father was sequenced simultaneously. In 21% (5/24) cases, exome sequencing provided definitive diagnoses (Milroy disease, hypophosphatasia, achondrogenesis type 2, Freeman-Sheldon syndrome and Baraitser-Winter Syndrome). In a further case, a plausible diagnosis of orofaciodigital syndrome type 6 was made. In two others, a single mutation in an autosomal recessive gene was identified, but incomplete sequencing coverage precluded exclusion of the presence of a second mutation. Whole exome sequencing improves prenatal diagnosis in euploid fetuses with abnormal ultrasound scans. In order to expedite interpretation of results, trio sequencing should be employed, but interpretation can still be compromised by incomplete coverage of relevant genes. © 2015 John Wiley & Sons, Ltd.

  20. Technical Considerations for Reduced Representation Bisulfite Sequencing with Multiplexed Libraries

    Science.gov (United States)

    Chatterjee, Aniruddha; Rodger, Euan J.; Stockwell, Peter A.; Weeks, Robert J.; Morison, Ian M.

    2012-01-01

    Reduced representation bisulfite sequencing (RRBS), which couples bisulfite conversion and next generation sequencing, is an innovative method that specifically enriches genomic regions with a high density of potential methylation sites and enables investigation of DNA methylation at single-nucleotide resolution. Recent advances in the Illumina DNA sample preparation protocol and sequencing technology have vastly improved sequencing throughput capacity. Although the new Illumina technology is now widely used, the unique challenges associated with multiplexed RRBS libraries on this platform have not been previously described. We have made modifications to the RRBS library preparation protocol to sequence multiplexed libraries on a single flow cell lane of the Illumina HiSeq 2000. Furthermore, our analysis incorporates a bioinformatics pipeline specifically designed to process bisulfite-converted sequencing reads and evaluate the output and quality of the sequencing data generated from the multiplexed libraries. We obtained an average of 42 million paired-end reads per sample for each flow-cell lane, with a high unique mapping efficiency to the reference human genome. Here we provide a roadmap of modifications, strategies, and trouble shooting approaches we implemented to optimize sequencing of multiplexed libraries on an a RRBS background. PMID:23193365

  1. Formatt: Correcting protein multiple structural alignments by incorporating sequence alignment

    Directory of Open Access Journals (Sweden)

    Daniels Noah M

    2012-10-01

    Full Text Available Abstract Background The quality of multiple protein structure alignments are usually computed and assessed based on geometric functions of the coordinates of the backbone atoms from the protein chains. These purely geometric methods do not utilize directly protein sequence similarity, and in fact, determining the proper way to incorporate sequence similarity measures into the construction and assessment of protein multiple structure alignments has proved surprisingly difficult. Results We present Formatt, a multiple structure alignment based on the Matt purely geometric multiple structure alignment program, that also takes into account sequence similarity when constructing alignments. We show that Formatt outperforms Matt and other popular structure alignment programs on the popular HOMSTRAD benchmark. For the SABMark twilight zone benchmark set that captures more remote homology, Formatt and Matt outperform other programs; depending on choice of embedded sequence aligner, Formatt produces either better sequence and structural alignments with a smaller core size than Matt, or similarly sized alignments with better sequence similarity, for a small cost in average RMSD. Conclusions Considering sequence information as well as purely geometric information seems to improve quality of multiple structure alignments, though defining what constitutes the best alignment when sequence and structural measures would suggest different alignments remains a difficult open question.

  2. Decrease in gamma-band activity tracks sequence learning

    Science.gov (United States)

    Madhavan, Radhika; Millman, Daniel; Tang, Hanlin; Crone, Nathan E.; Lenz, Fredrick A.; Tierney, Travis S.; Madsen, Joseph R.; Kreiman, Gabriel; Anderson, William S.

    2015-01-01

    Learning novel sequences constitutes an example of declarative memory formation, involving conscious recall of temporal events. Performance in sequence learning tasks improves with repetition and involves forming temporal associations over scales of seconds to minutes. To further understand the neural circuits underlying declarative sequence learning over trials, we tracked changes in intracranial field potentials (IFPs) recorded from 1142 electrodes implanted throughout temporal and frontal cortical areas in 14 human subjects, while they learned the temporal-order of multiple sequences of images over trials through repeated recall. We observed an increase in power in the gamma frequency band (30–100 Hz) in the recall phase, particularly in areas within the temporal lobe including the parahippocampal gyrus. The degree of this gamma power enhancement decreased over trials with improved sequence recall. Modulation of gamma power was directly correlated with the improvement in recall performance. When presenting new sequences, gamma power was reset to high values and decreased again after learning. These observations suggest that signals in the gamma frequency band may play a more prominent role during the early steps of the learning process rather than during the maintenance of memory traces. PMID:25653598

  3. Sequence analysis of Leukemia DNA

    Science.gov (United States)

    Nacong, Nasria; Lusiyanti, Desy; Irawan, Muhammad. Isa

    2018-03-01

    Cancer is a very deadly disease, one of which is leukemia disease or better known as blood cancer. The cancer cell can be detected by taking DNA in laboratory test. This study focused on local alignment of leukemia and non leukemia data resulting from NCBI in the form of DNA sequences by using Smith-Waterman algorithm. SmithWaterman algorithm was invented by TF Smith and MS Waterman in 1981. These algorithms try to find as much as possible similarity of a pair of sequences, by giving a negative value to the unequal base pair (mismatch), and positive values on the same base pair (match). So that will obtain the maximum positive value as the end of the alignment, and the minimum value as the initial alignment. This study will use sequences of leukemia and 3 sequences of non leukemia.

  4. Integrated sequence analysis. Final report

    International Nuclear Information System (INIS)

    Andersson, K.; Pyy, P.

    1998-02-01

    The NKS/RAK subprojet 3 'integrated sequence analysis' (ISA) was formulated with the overall objective to develop and to test integrated methodologies in order to evaluate event sequences with significant human action contribution. The term 'methodology' denotes not only technical tools but also methods for integration of different scientific disciplines. In this report, we first discuss the background of ISA and the surveys made to map methods in different application fields, such as man machine system simulation software, human reliability analysis (HRA) and expert judgement. Specific event sequences were, after the surveys, selected for application and testing of a number of ISA methods. The event sequences discussed in the report were cold overpressure of BWR, shutdown LOCA of BWR, steam generator tube rupture of a PWR and BWR disturbed signal view in the control room after an external event. Different teams analysed these sequences by using different ISA and HRA methods. Two kinds of results were obtained from the ISA project: sequence specific and more general findings. The sequence specific results are discussed together with each sequence description. The general lessons are discussed under a separate chapter by using comparisons of different case studies. These lessons include areas ranging from plant safety management (design, procedures, instrumentation, operations, maintenance and safety practices) to methodological findings (ISA methodology, PSA,HRA, physical analyses, behavioural analyses and uncertainty assessment). Finally follows a discussion about the project and conclusions are presented. An interdisciplinary study of complex phenomena is a natural way to produce valuable and innovative results. This project came up with structured ways to perform ISA and managed to apply the in practice. The project also highlighted some areas where more work is needed. In the HRA work, development is required for the use of simulators and expert judgement as

  5. Optimization of sequence alignment for simple sequence repeat regions

    Directory of Open Access Journals (Sweden)

    Ogbonnaya Francis C

    2011-07-01

    Full Text Available Abstract Background Microsatellites, or simple sequence repeats (SSRs, are tandemly repeated DNA sequences, including tandem copies of specific sequences no longer than six bases, that are distributed in the genome. SSR has been used as a molecular marker because it is easy to detect and is used in a range of applications, including genetic diversity, genome mapping, and marker assisted selection. It is also very mutable because of slipping in the DNA polymerase during DNA replication. This unique mutation increases the insertion/deletion (INDELs mutation frequency to a high ratio - more than other types of molecular markers such as single nucleotide polymorphism (SNPs. SNPs are more frequent than INDELs. Therefore, all designed algorithms for sequence alignment fit the vast majority of the genomic sequence without considering microsatellite regions, as unique sequences that require special consideration. The old algorithm is limited in its application because there are many overlaps between different repeat units which result in false evolutionary relationships. Findings To overcome the limitation of the aligning algorithm when dealing with SSR loci, a new algorithm was developed using PERL script with a Tk graphical interface. This program is based on aligning sequences after determining the repeated units first, and the last SSR nucleotides positions. This results in a shifting process according to the inserted repeated unit type. When studying the phylogenic relations before and after applying the new algorithm, many differences in the trees were obtained by increasing the SSR length and complexity. However, less distance between different linage had been observed after applying the new algorithm. Conclusions The new algorithm produces better estimates for aligning SSR loci because it reflects more reliable evolutionary relations between different linages. It reduces overlapping during SSR alignment, which results in a more realistic

  6. Simple sequence repeat marker development from bacterial artificial chromosome end sequences and expressed sequence tags of flax (Linum usitatissimum L.).

    Science.gov (United States)

    Cloutier, Sylvie; Miranda, Evelyn; Ward, Kerry; Radovanovic, Natasa; Reimer, Elsa; Walichnowski, Andrzej; Datla, Raju; Rowland, Gordon; Duguid, Scott; Ragupathy, Raja

    2012-08-01

    Flax is an important oilseed crop in North America and is mostly grown as a fibre crop in Europe. As a self-pollinated diploid with a small estimated genome size of ~370 Mb, flax is well suited for fast progress in genomics. In the last few years, important genetic resources have been developed for this crop. Here, we describe the assessment and comparative analyses of 1,506 putative simple sequence repeats (SSRs) of which, 1,164 were derived from BAC-end sequences (BESs) and 342 from expressed sequence tags (ESTs). The SSRs were assessed on a panel of 16 flax accessions with 673 (58 %) and 145 (42 %) primer pairs being polymorphic in the BESs and ESTs, respectively. With 818 novel polymorphic SSR primer pairs reported in this study, the repertoire of available SSRs in flax has more than doubled from the combined total of 508 of all previous reports. Among nucleotide motifs, trinucleotides were the most abundant irrespective of the class, but dinucleotides were the most polymorphic. SSR length was also positively correlated with polymorphism. Two dinucleotide (AT/TA and AG/GA) and two trinucleotide (AAT/ATA/TAA and GAA/AGA/AAG) motifs and their iterations, different from those reported in many other crops, accounted for more than half of all the SSRs and were also more polymorphic (63.4 %) than the rest of the markers (42.7 %). This improved resource promises to be useful in genetic, quantitative trait loci (QTL) and association mapping as well as for anchoring the physical/genetic map with the whole genome shotgun reference sequence of flax.

  7. Why barcode? High-throughput multiplex sequencing of mitochondrial genomes for molecular systematics.

    Science.gov (United States)

    Timmermans, M J T N; Dodsworth, S; Culverwell, C L; Bocak, L; Ahrens, D; Littlewood, D T J; Pons, J; Vogler, A P

    2010-11-01

    Mitochondrial genome sequences are important markers for phylogenetics but taxon sampling remains sporadic because of the great effort and cost required to acquire full-length sequences. Here, we demonstrate a simple, cost-effective way to sequence the full complement of protein coding mitochondrial genes from pooled samples using the 454/Roche platform. Multiplexing was achieved without the need for expensive indexing tags ('barcodes'). The method was trialled with a set of long-range polymerase chain reaction (PCR) fragments from 30 species of Coleoptera (beetles) sequenced in a 1/16th sector of a sequencing plate. Long contigs were produced from the pooled sequences with sequencing depths ranging from ∼10 to 100× per contig. Species identity of individual contigs was established via three 'bait' sequences matching disparate parts of the mitochondrial genome obtained by conventional PCR and Sanger sequencing. This proved that assembly of contigs from the sequencing pool was correct. Our study produced sequences for 21 nearly complete and seven partial sets of protein coding mitochondrial genes. Combined with existing sequences for 25 taxa, an improved estimate of basal relationships in Coleoptera was obtained. The procedure could be employed routinely for mitochondrial genome sequencing at the species level, to provide improved species 'barcodes' that currently use the cox1 gene only.

  8. ADDRESS SEQUENCES FOR MULTI RUN RAM TESTING

    Directory of Open Access Journals (Sweden)

    V. N. Yarmolik

    2014-01-01

    Full Text Available A universal approach for generation of address sequences with specified properties is proposed and analyzed. A modified version of the Antonov and Saleev algorithm for Sobol sequences genera-tion is chosen as a mathematical description of the proposed method. Within the framework of the proposed universal approach, the Sobol sequences form a subset of the address sequences. Other sub-sets are also formed, which are Gray sequences, anti-Gray sequences, counter sequences and sequenc-es with specified properties.

  9. Protein sequence comparison and protein evolution

    Energy Technology Data Exchange (ETDEWEB)

    Pearson, W.R. [Univ. of Virginia, Charlottesville, VA (United States). Dept. of Biochemistry

    1995-12-31

    This tutorial was one of eight tutorials selected to be presented at the Third International Conference on Intelligent Systems for Molecular Biology which was held in the United Kingdom from July 16 to 19, 1995. This tutorial examines how the information conserved during the evolution of a protein molecule can be used to infer reliably homology, and thus a shared proteinfold and possibly a shared active site or function. The authors start by reviewing a geological/evolutionary time scale. Next they look at the evolution of several protein families. During the tutorial, these families will be used to demonstrate that homologous protein ancestry can be inferred with confidence. They also examine different modes of protein evolution and consider some hypotheses that have been presented to explain the very earliest events in protein evolution. The next part of the tutorial will examine the technical aspects of protein sequence comparison. Both optimal and heuristic algorithms and their associated parameters that are used to characterize protein sequence similarities are discussed. Perhaps more importantly, they survey the statistics of local similarity scores, and how these statistics can both be used to improve the selectivity of a search and to evaluate the significance of a match. They them examine distantly related members of three protein families, the serine proteases, the glutathione transferases, and the G-protein-coupled receptors (GCRs). Finally, the discuss how sequence similarity can be used to examine internal repeated or mosaic structures in proteins.

  10. Collaborative Filtering Recommendation on Users' Interest Sequences.

    Directory of Open Access Journals (Sweden)

    Weijie Cheng

    Full Text Available As an important factor for improving recommendations, time information has been introduced to model users' dynamic preferences in many papers. However, the sequence of users' behaviour is rarely studied in recommender systems. Due to the users' unique behavior evolution patterns and personalized interest transitions among items, users' similarity in sequential dimension should be introduced to further distinguish users' preferences and interests. In this paper, we propose a new collaborative filtering recommendation method based on users' interest sequences (IS that rank users' ratings or other online behaviors according to the timestamps when they occurred. This method extracts the semantics hidden in the interest sequences by the length of users' longest common sub-IS (LCSIS and the count of users' total common sub-IS (ACSIS. Then, these semantics are utilized to obtain users' IS-based similarities and, further, to refine the similarities acquired from traditional collaborative filtering approaches. With these updated similarities, transition characteristics and dynamic evolution patterns of users' preferences are considered. Our new proposed method was compared with state-of-the-art time-aware collaborative filtering algorithms on datasets MovieLens, Flixster and Ciao. The experimental results validate that the proposed recommendation method is effective and outperforms several existing algorithms in the accuracy of rating prediction.

  11. Collaborative Filtering Recommendation on Users' Interest Sequences.

    Science.gov (United States)

    Cheng, Weijie; Yin, Guisheng; Dong, Yuxin; Dong, Hongbin; Zhang, Wansong

    2016-01-01

    As an important factor for improving recommendations, time information has been introduced to model users' dynamic preferences in many papers. However, the sequence of users' behaviour is rarely studied in recommender systems. Due to the users' unique behavior evolution patterns and personalized interest transitions among items, users' similarity in sequential dimension should be introduced to further distinguish users' preferences and interests. In this paper, we propose a new collaborative filtering recommendation method based on users' interest sequences (IS) that rank users' ratings or other online behaviors according to the timestamps when they occurred. This method extracts the semantics hidden in the interest sequences by the length of users' longest common sub-IS (LCSIS) and the count of users' total common sub-IS (ACSIS). Then, these semantics are utilized to obtain users' IS-based similarities and, further, to refine the similarities acquired from traditional collaborative filtering approaches. With these updated similarities, transition characteristics and dynamic evolution patterns of users' preferences are considered. Our new proposed method was compared with state-of-the-art time-aware collaborative filtering algorithms on datasets MovieLens, Flixster and Ciao. The experimental results validate that the proposed recommendation method is effective and outperforms several existing algorithms in the accuracy of rating prediction.

  12. Collaborative Filtering Recommendation on Users’ Interest Sequences

    Science.gov (United States)

    Cheng, Weijie; Yin, Guisheng; Dong, Yuxin; Dong, Hongbin; Zhang, Wansong

    2016-01-01

    As an important factor for improving recommendations, time information has been introduced to model users’ dynamic preferences in many papers. However, the sequence of users’ behaviour is rarely studied in recommender systems. Due to the users’ unique behavior evolution patterns and personalized interest transitions among items, users’ similarity in sequential dimension should be introduced to further distinguish users’ preferences and interests. In this paper, we propose a new collaborative filtering recommendation method based on users’ interest sequences (IS) that rank users’ ratings or other online behaviors according to the timestamps when they occurred. This method extracts the semantics hidden in the interest sequences by the length of users’ longest common sub-IS (LCSIS) and the count of users’ total common sub-IS (ACSIS). Then, these semantics are utilized to obtain users’ IS-based similarities and, further, to refine the similarities acquired from traditional collaborative filtering approaches. With these updated similarities, transition characteristics and dynamic evolution patterns of users’ preferences are considered. Our new proposed method was compared with state-of-the-art time-aware collaborative filtering algorithms on datasets MovieLens, Flixster and Ciao. The experimental results validate that the proposed recommendation method is effective and outperforms several existing algorithms in the accuracy of rating prediction. PMID:27195787

  13. Genome Sequence of the Palaeopolyploid soybean

    Energy Technology Data Exchange (ETDEWEB)

    Schmutz, Jeremy; Cannon, Steven B.; Schlueter, Jessica; Ma, Jianxin; Mitros, Therese; Nelson, William; Hyten, David L.; Song, Qijian; Thelen, Jay J.; Cheng, Jianlin; Xu, Dong; Hellsten, Uffe; May, Gregory D.; Yu, Yeisoo; Sakura, Tetsuya; Umezawa, Taishi; Bhattacharyya, Madan K.; Sandhu, Devinder; Valliyodan, Babu; Lindquist, Erika; Peto, Myron; Grant, David; Shu, Shengqiang; Goodstein, David; Barry, Kerrie; Futrell-Griggs, Montona; Abernathy, Brian; Du, Jianchang; Tian, Zhixi; Zhu, Liucun; Gill, Navdeep; Joshi, Trupti; Libault, Marc; Sethuraman, Anand; Zhang, Xue-Cheng; Shinozaki, Kazuo; Nguyen, Henry T.; Wing, Rod A.; Cregan, Perry; Specht, James; Grimwood, Jane; Rokhsar, Dan; Stacey, Gary; Shoemaker, Randy C.; Jackson, Scott A.

    2009-08-03

    Soybean (Glycine max) is one of the most important crop plants for seed protein and oil content, and for its capacity to fix atmospheric nitrogen through symbioses with soil-borne microorganisms. We sequenced the 1.1-gigabase genome by a whole-genome shotgun approach and integrated it with physical and high-density genetic maps to create a chromosome-scale draft sequence assembly. We predict 46,430 protein-coding genes, 70percent more than Arabidopsis and similar to the poplar genome which, like soybean, is an ancient polyploid (palaeopolyploid). About 78percent of the predicted genes occur in chromosome ends, which comprise less than one-half of the genome but account for nearly all of the genetic recombination. Genome duplications occurred at approximately 59 and 13 million years ago, resulting in a highly duplicated genome with nearly 75percent of the genes present in multiple copies. The two duplication events were followed by gene diversification and loss, and numerous chromosome rearrangements. An accurate soybean genome sequence will facilitate the identification of the genetic basis of many soybean traits, and accelerate the creation of improved soybean varieties.

  14. Method for priming and DNA sequencing

    Energy Technology Data Exchange (ETDEWEB)

    Mugasimangalam, R.C.; Ulanovsky, L.E.

    1997-12-01

    A method is presented for improving the priming specificity of an oligonucleotide primer that is non-unique in a nucleic acid template which includes selecting a continuous stretch of several nucleotides in the template DNA where one of the four bases does not occur in the stretch. This also includes bringing the template DNA in contract with a non-unique primer partially or fully complimentary to the sequence immediately upstream of the selected sequence stretch. This results in polymerase-mediated differential extension of the primer in the presence of a subset of deoxyribonucleotide triphosphates that does not contain the base complementary to the base absent in the selected sequence stretch. These reactions occur at a temperature sufficiently low for allowing the extension of the non-unique primer. The method causes polymerase-mediated extension reactions in the presence of all four natural deoxyribonucleotide triphosphates or modifications. At this high temperature discrimination occurs against priming sites of the non-unique primer where the differential extension has not made the primer sufficiently stable to prime. However, the primer extended at the selected stretch is sufficiently stable to prime.

  15. Fast and secure retrieval of DNA sequences

    NARCIS (Netherlands)

    2014-01-01

    Sequence models are retrieved from a sequences index. The sequence models model DNA or RNA sequences stored in a database, and each comprises a finite memory tree source model and parameters for the finite memory tree source model. One or more DNA or RNA sequences stored in the database are

  16. Decidability of uniform recurrence of morphic sequences

    OpenAIRE

    Durand , Fabien

    2012-01-01

    We prove that the uniform recurrence of morphic sequences is decidable. For this we show that the number of derived sequences of uniformly recurrent morphic sequences is bounded. As a corollary we obtain that uniformly recurrent morphic sequences are primitive substitutive sequences.

  17. Sequence Factorization with Multiple References.

    Directory of Open Access Journals (Sweden)

    Sebastian Wandelt

    Full Text Available The success of high-throughput sequencing has lead to an increasing number of projects which sequence large populations of a species. Storage and analysis of sequence data is a key challenge in these projects, because of the sheer size of the datasets. Compression is one simple technology to deal with this challenge. Referential factorization and compression schemes, which store only the differences between input sequence and a reference sequence, gained lots of interest in this field. Highly-similar sequences, e.g., Human genomes, can be compressed with a compression ratio of 1,000:1 and more, up to two orders of magnitude better than with standard compression techniques. Recently, it was shown that the compression against multiple references from the same species can boost the compression ratio up to 4,000:1. However, a detailed analysis of using multiple references is lacking, e.g., for main memory consumption and optimality. In this paper, we describe one key technique for the referential compression against multiple references: The factorization of sequences. Based on the notion of an optimal factorization, we propose optimization heuristics and identify parameter settings which greatly influence 1 the size of the factorization, 2 the time for factorization, and 3 the required amount of main memory. We evaluate a total of 30 setups with a varying number of references on data from three different species. Our results show a wide range of factorization sizes (optimal to an overhead of up to 300%, factorization speed (0.01 MB/s to more than 600 MB/s, and main memory usage (few dozen MB to dozens of GB. Based on our evaluation, we identify the best configurations for common use cases. Our evaluation shows that multi-reference factorization is much better than single-reference factorization.

  18. Analysis of Pteridium ribosomal RNA sequences by rapid direct sequencing.

    Science.gov (United States)

    Tan, M K

    1991-08-01

    A total of 864 bases from 5 regions interspersed in the 18S and 26S rRNA molecules from various clones of Pteridium covering the general geographical distribution of the genus was analysed using a rapid rRNA sequencing technique. No base difference has been detected amongst the three major lineages, two of which apparently separated before the breakup of the ancient supercontinent, Pangaea. These regions of the rRNA sequences have thus been conserved for at least 160 million years and are here compared with other eukaryotic, especially plant rRNAs.

  19. Draft Genome Sequence of Mycobacterium chimaera Type ...

    Science.gov (United States)

    We report the draft genome sequence of the type strain Mycobacterium chimaera Fl-0169T, a member of the Mycobacterium avium complex (MAC). M. chimaera Fl-0169T was isolated from a patient in Italy and is highly similar to strains of M. chimaera isolated in Ireland, though Fl-0169T possesses unique virulence genes. Evidence suggests that M. avium, M. intracellulare, and M. chimaera are differently virulent and a comparative genomic analysis is critically needed to identify diagnostic targets that reliably differentiate species of MAC. With treatment costs for Mycobacterium infections estimated to be >$1.8 B annually in the U.S., correct species identification will result in improved treatment selection, lower costs, and improved patient outcomes.

  20. A novel constraint for thermodynamically designing DNA sequences.

    Directory of Open Access Journals (Sweden)

    Qiang Zhang

    Full Text Available Biotechnological and biomolecular advances have introduced novel uses for DNA such as DNA computing, storage, and encryption. For these applications, DNA sequence design requires maximal desired (and minimal undesired hybridizations, which are the product of a single new DNA strand from 2 single DNA strands. Here, we propose a novel constraint to design DNA sequences based on thermodynamic properties. Existing constraints for DNA design are based on the Hamming distance, a constraint that does not address the thermodynamic properties of the DNA sequence. Using a unique, improved genetic algorithm, we designed DNA sequence sets which satisfy different distance constraints and employ a free energy gap based on a minimum free energy (MFE to gauge DNA sequences based on set thermodynamic properties. When compared to the best constraints of the Hamming distance, our method yielded better thermodynamic qualities. We then used our improved genetic algorithm to obtain lower-bound DNA sequence sets. Here, we discuss the effects of novel constraint parameters on the free energy gap.

  1. Experimental design-based functional mining and characterization of high-throughput sequencing data in the sequence read archive.

    Directory of Open Access Journals (Sweden)

    Takeru Nakazato

    Full Text Available High-throughput sequencing technology, also called next-generation sequencing (NGS, has the potential to revolutionize the whole process of genome sequencing, transcriptomics, and epigenetics. Sequencing data is captured in a public primary data archive, the Sequence Read Archive (SRA. As of January 2013, data from more than 14,000 projects have been submitted to SRA, which is double that of the previous year. Researchers can download raw sequence data from SRA website to perform further analyses and to compare with their own data. However, it is extremely difficult to search entries and download raw sequences of interests with SRA because the data structure is complicated, and experimental conditions along with raw sequences are partly described in natural language. Additionally, some sequences are of inconsistent quality because anyone can submit sequencing data to SRA with no quality check. Therefore, as a criterion of data quality, we focused on SRA entries that were cited in journal articles. We extracted SRA IDs and PubMed IDs (PMIDs from SRA and full-text versions of journal articles and retrieved 2748 SRA ID-PMID pairs. We constructed a publication list referring to SRA entries. Since, one of the main themes of -omics analyses is clarification of disease mechanisms, we also characterized SRA entries by disease keywords, according to the Medical Subject Headings (MeSH extracted from articles assigned to each SRA entry. We obtained 989 SRA ID-MeSH disease term pairs, and constructed a disease list referring to SRA data. We previously developed feature profiles of diseases in a system called "Gendoo". We generated hyperlinks between diseases extracted from SRA and the feature profiles of it. The developed project, publication and disease lists resulting from this study are available at our web service, called "DBCLS SRA" (http://sra.dbcls.jp/. This service will improve accessibility to high-quality data from SRA.

  2. High-Throughput Block Optical DNA Sequence Identification.

    Science.gov (United States)

    Sagar, Dodderi Manjunatha; Korshoj, Lee Erik; Hanson, Katrina Bethany; Chowdhury, Partha Pratim; Otoupal, Peter Britton; Chatterjee, Anushree; Nagpal, Prashant

    2018-01-01

    Optical techniques for molecular diagnostics or DNA sequencing generally rely on small molecule fluorescent labels, which utilize light with a wavelength of several hundred nanometers for detection. Developing a label-free optical DNA sequencing technique will require nanoscale focusing of light, a high-throughput and multiplexed identification method, and a data compression technique to rapidly identify sequences and analyze genomic heterogeneity for big datasets. Such a method should identify characteristic molecular vibrations using optical spectroscopy, especially in the "fingerprinting region" from ≈400-1400 cm -1 . Here, surface-enhanced Raman spectroscopy is used to demonstrate label-free identification of DNA nucleobases with multiplexed 3D plasmonic nanofocusing. While nanometer-scale mode volumes prevent identification of single nucleobases within a DNA sequence, the block optical technique can identify A, T, G, and C content in DNA k-mers. The content of each nucleotide in a DNA block can be a unique and high-throughput method for identifying sequences, genes, and other biomarkers as an alternative to single-letter sequencing. Additionally, coupling two complementary vibrational spectroscopy techniques (infrared and Raman) can improve block characterization. These results pave the way for developing a novel, high-throughput block optical sequencing method with lossy genomic data compression using k-mer identification from multiplexed optical data acquisition. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  3. Highly multiplexed targeted DNA sequencing from single nuclei.

    Science.gov (United States)

    Leung, Marco L; Wang, Yong; Kim, Charissa; Gao, Ruli; Jiang, Jerry; Sei, Emi; Navin, Nicholas E

    2016-02-01

    Single-cell DNA sequencing methods are challenged by poor physical coverage, high technical error rates and low throughput. To address these issues, we developed a single-cell DNA sequencing protocol that combines flow-sorting of single nuclei, time-limited multiple-displacement amplification (MDA), low-input library preparation, DNA barcoding, targeted capture and next-generation sequencing (NGS). This approach represents a major improvement over our previous single nucleus sequencing (SNS) Nature Protocols paper in terms of generating higher-coverage data (>90%), thereby enabling the detection of genome-wide variants in single mammalian cells at base-pair resolution. Furthermore, by pooling 48-96 single-cell libraries together for targeted capture, this approach can be used to sequence many single-cell libraries in parallel in a single reaction. This protocol greatly reduces the cost of single-cell DNA sequencing, and it can be completed in 5-6 d by advanced users. This single-cell DNA sequencing protocol has broad applications for studying rare cells and complex populations in diverse fields of biological research and medicine.

  4. Use of designed sequences in protein structure recognition.

    Science.gov (United States)

    Kumar, Gayatri; Mudgal, Richa; Srinivasan, Narayanaswamy; Sandhya, Sankaran

    2018-05-09

    Knowledge of the protein structure is a pre-requisite for improved understanding of molecular function. The gap in the sequence-structure space has increased in the post-genomic era. Grouping related protein sequences into families can aid in narrowing the gap. In the Pfam database, structure description is provided for part or full-length proteins of 7726 families. For the remaining 52% of the families, information on 3-D structure is not yet available. We use the computationally designed sequences that are intermediately related to two protein domain families, which are already known to share the same fold. These strategically designed sequences enable detection of distant relationships and here, we have employed them for the purpose of structure recognition of protein families of yet unknown structure. We first measured the success rate of our approach using a dataset of protein families of known fold and achieved a success rate of 88%. Next, for 1392 families of yet unknown structure, we made structural assignments for part/full length of the proteins. Fold association for 423 domains of unknown function (DUFs) are provided as a step towards functional annotation. The results indicate that knowledge-based filling of gaps in protein sequence space is a lucrative approach for structure recognition. Such sequences assist in traversal through protein sequence space and effectively function as 'linkers', where natural linkers between distant proteins are unavailable. This article was reviewed by Oliviero Carugo, Christine Orengo and Srikrishna Subramanian.

  5. Sequences, groups, and number theory

    CERN Document Server

    Rigo, Michel

    2018-01-01

    This collaborative book presents recent trends on the study of sequences, including combinatorics on words and symbolic dynamics, and new interdisciplinary links to group theory and number theory. Other chapters branch out from those areas into subfields of theoretical computer science, such as complexity theory and theory of automata. The book is built around four general themes: number theory and sequences, word combinatorics, normal numbers, and group theory. Those topics are rounded out by investigations into automatic and regular sequences, tilings and theory of computation, discrete dynamical systems, ergodic theory, numeration systems, automaton semigroups, and amenable groups.  This volume is intended for use by graduate students or research mathematicians, as well as computer scientists who are working in automata theory and formal language theory. With its organization around unified themes, it would also be appropriate as a supplemental text for graduate level courses.

  6. Explaining the harmonic sequence paradox.

    Science.gov (United States)

    Schmidt, Ulrich; Zimper, Alexander

    2012-05-01

    According to the harmonic sequence paradox, an expected utility decision maker's willingness to pay for a gamble whose expected payoffs evolve according to the harmonic series is finite if and only if his marginal utility of additional income becomes zero for rather low payoff levels. Since the assumption of zero marginal utility is implausible for finite payoff levels, expected utility theory - as well as its standard generalizations such as cumulative prospect theory - are apparently unable to explain a finite willingness to pay. This paper presents first an experimental study of the harmonic sequence paradox. Additionally, it demonstrates that the theoretical argument of the harmonic sequence paradox only applies to time-patient decision makers, whereas the paradox is easily avoided if time-impatience is introduced. ©2011 The British Psychological Society.

  7. Highly accurate fluorogenic DNA sequencing with information theory-based error correction.

    Science.gov (United States)

    Chen, Zitian; Zhou, Wenxiong; Qiao, Shuo; Kang, Li; Duan, Haifeng; Xie, X Sunney; Huang, Yanyi

    2017-12-01

    Eliminating errors in next-generation DNA sequencing has proved challenging. Here we present error-correction code (ECC) sequencing, a method to greatly improve sequencing accuracy by combining fluorogenic sequencing-by-synthesis (SBS) with an information theory-based error-correction algorithm. ECC embeds redundancy in sequencing reads by creating three orthogonal degenerate sequences, generated by alternate dual-base reactions. This is similar to encoding and decoding strategies that have proved effective in detecting and correcting errors in information communication and storage. We show that, when combined with a fluorogenic SBS chemistry with raw accuracy of 98.1%, ECC sequencing provides single-end, error-free sequences up to 200 bp. ECC approaches should enable accurate identification of extremely rare genomic variations in various applications in biology and medicine.

  8. PHARMACOGENETIC TESTING OPPORTUNITIES IN CARDIOLOGY BASED ON EXOME SEQUENCING

    Directory of Open Access Journals (Sweden)

    N. V. Shcherbakova

    2014-01-01

    Full Text Available Aim. To study what cardiac drugs currently have any comments on biomarkers and what information can be obtained by pharmacogenetic testing using data exome sequencing in patients with cardiac diseases.Material and methods. Exome sequencing in random participant of the ATEROGEN IVANOVO study and bioinformatics analysis of the data were performed. Point mutations were annotated using ANNOVAR program, as well as comparison with a number of specialized databases was done on the basis of user protocols.Results. 11 cardiac drugs and 7 genes which variants can influence cardiac drug metabolism were analyzed. According to exome sequencing of the participant we did not reveal allelic variants that require dose regime correction and careful efficacy control.Conclusion. The exome sequencing application is the next step to a wide range of personalized therapy. Future opportunities for improvement of the risk-benefit ratio in each patient are the main purpose of the collection and analysis of pharmacogenetic data.

  9. Genomic multiple sequence alignments: refinement using a genetic algorithm

    Directory of Open Access Journals (Sweden)

    Lefkowitz Elliot J

    2005-08-01

    Full Text Available Abstract Background Genomic sequence data cannot be fully appreciated in isolation. Comparative genomics – the practice of comparing genomic sequences from different species – plays an increasingly important role in understanding the genotypic differences between species that result in phenotypic differences as well as in revealing patterns of evolutionary relationships. One of the major challenges in comparative genomics is producing a high-quality alignment between two or more related genomic sequences. In recent years, a number of tools have been developed for aligning large genomic sequences. Most utilize heuristic strategies to identify a series of strong sequence similarities, which are then used as anchors to align the regions between the anchor points. The resulting alignment is globally correct, but in many cases is suboptimal locally. We describe a new program, GenAlignRefine, which improves the overall quality of global multiple alignments by using a genetic algorithm to improve local regions of alignment. Regions of low quality are identified, realigned using the program T-Coffee, and then refined using a genetic algorithm. Because a better COFFEE (Consistency based Objective Function For alignmEnt Evaluation score generally reflects greater alignment quality, the algorithm searches for an alignment that yields a better COFFEE score. To improve the intrinsic slowness of the genetic algorithm, GenAlignRefine was implemented as a parallel, cluster-based program. Results We tested the GenAlignRefine algorithm by running it on a Linux cluster to refine sequences from a simulation, as well as refine a multiple alignment of 15 Orthopoxvirus genomic sequences approximately 260,000 nucleotides in length that initially had been aligned by Multi-LAGAN. It took approximately 150 minutes for a 40-processor Linux cluster to optimize some 200 fuzzy (poorly aligned regions of the orthopoxvirus alignment. Overall sequence identity increased only

  10. Integrated sequence analysis. Final report

    Energy Technology Data Exchange (ETDEWEB)

    Andersson, K.; Pyy, P

    1998-02-01

    The NKS/RAK subprojet 3 `integrated sequence analysis` (ISA) was formulated with the overall objective to develop and to test integrated methodologies in order to evaluate event sequences with significant human action contribution. The term `methodology` denotes not only technical tools but also methods for integration of different scientific disciplines. In this report, we first discuss the background of ISA and the surveys made to map methods in different application fields, such as man machine system simulation software, human reliability analysis (HRA) and expert judgement. Specific event sequences were, after the surveys, selected for application and testing of a number of ISA methods. The event sequences discussed in the report were cold overpressure of BWR, shutdown LOCA of BWR, steam generator tube rupture of a PWR and BWR disturbed signal view in the control room after an external event. Different teams analysed these sequences by using different ISA and HRA methods. Two kinds of results were obtained from the ISA project: sequence specific and more general findings. The sequence specific results are discussed together with each sequence description. The general lessons are discussed under a separate chapter by using comparisons of different case studies. These lessons include areas ranging from plant safety management (design, procedures, instrumentation, operations, maintenance and safety practices) to methodological findings (ISA methodology, PSA,HRA, physical analyses, behavioural analyses and uncertainty assessment). Finally follows a discussion about the project and conclusions are presented. An interdisciplinary study of complex phenomena is a natural way to produce valuable and innovative results. This project came up with structured ways to perform ISA and managed to apply the in practice. The project also highlighted some areas where more work is needed. In the HRA work, development is required for the use of simulators and expert judgement as

  11. Matrix transformations and sequence spaces

    International Nuclear Information System (INIS)

    Nanda, S.

    1983-06-01

    In most cases the most general linear operator from one sequence space into another is actually given by an infinite matrix and therefore the theory of matrix transformations has always been of great interest in the study of sequence spaces. The study of general theory of matrix transformations was motivated by the special results in summability theory. This paper is a review article which gives almost all known results on matrix transformations. This also suggests a number of open problems for further study and will be very useful for research workers. (author)

  12. Green's theorem and Gorenstein sequences

    OpenAIRE

    Ahn, Jeaman; Migliore, Juan C.; Shin, Yong-Su

    2016-01-01

    We study consequences, for a standard graded algebra, of extremal behavior in Green's Hyperplane Restriction Theorem. First, we extend his Theorem 4 from the case of a plane curve to the case of a hypersurface in a linear space. Second, assuming a certain Lefschetz condition, we give a connection to extremal behavior in Macaulay's theorem. We apply these results to show that $(1,19,17,19,1)$ is not a Gorenstein sequence, and as a result we classify the sequences of the form $(1,a,a-2,a,1)$ th...

  13. High-Throughput Next-Generation Sequencing of Polioviruses

    Science.gov (United States)

    Montmayeur, Anna M.; Schmidt, Alexander; Zhao, Kun; Magaña, Laura; Iber, Jane; Castro, Christina J.; Chen, Qi; Henderson, Elizabeth; Ramos, Edward; Shaw, Jing; Tatusov, Roman L.; Dybdahl-Sissoko, Naomi; Endegue-Zanga, Marie Claire; Adeniji, Johnson A.; Oberste, M. Steven; Burns, Cara C.

    2016-01-01

    ABSTRACT The poliovirus (PV) is currently targeted for worldwide eradication and containment. Sanger-based sequencing of the viral protein 1 (VP1) capsid region is currently the standard method for PV surveillance. However, the whole-genome sequence is sometimes needed for higher resolution global surveillance. In this study, we optimized whole-genome sequencing protocols for poliovirus isolates and FTA cards using next-generation sequencing (NGS), aiming for high sequence coverage, efficiency, and throughput. We found that DNase treatment of poliovirus RNA followed by random reverse transcription (RT), amplification, and the use of the Nextera XT DNA library preparation kit produced significantly better results than other preparations. The average viral reads per total reads, a measurement of efficiency, was as high as 84.2% ± 15.6%. PV genomes covering >99 to 100% of the reference length were obtained and validated with Sanger sequencing. A total of 52 PV genomes were generated, multiplexing as many as 64 samples in a single Illumina MiSeq run. This high-throughput, sequence-independent NGS approach facilitated the detection of a diverse range of PVs, especially for those in vaccine-derived polioviruses (VDPV), circulating VDPV, or immunodeficiency-related VDPV. In contrast to results from previous studies on other viruses, our results showed that filtration and nuclease treatment did not discernibly increase the sequencing efficiency of PV isolates. However, DNase treatment after nucleic acid extraction to remove host DNA significantly improved the sequencing results. This NGS method has been successfully implemented to generate PV genomes for molecular epidemiology of the most recent PV isolates. Additionally, the ability to obtain full PV genomes from FTA cards will aid in facilitating global poliovirus surveillance. PMID:27927929

  14. Controversy and debate on clinical genomics sequencing-paper 2: clinical genome-wide sequencing: don't throw out the baby with the bathwater!

    Science.gov (United States)

    Adam, Shelin; Friedman, Jan M

    2017-12-01

    Genome-wide (exome or whole genome) sequencing with appropriate genetic counseling should be considered for any patient with a suspected Mendelian disease that has not been identified by conventional testing. Clinical genome-wide sequencing provides a powerful and effective means of identifying specific genetic causes of serious disease and improving clinical care. Copyright © 2017 Elsevier Inc. All rights reserved.

  15. Sequences in language and text

    CERN Document Server

    Mikros, George K

    2015-01-01

    The aim of this volume is to present the diverse but highly interesting area of the quantitative analysis of the sequence of various linguistic structures. The collected articles present a wide spectrum of quantitative analyses of linguistic syntagmatic structures and explore novel sequential linguistic entities. This volume will be interesting to all researchers studying linguistics using quantitative methods.

  16. Probabilistic studies of accident sequences

    International Nuclear Information System (INIS)

    Villemeur, A.; Berger, J.P.

    1986-01-01

    For several years, Electricite de France has carried out probabilistic assessment of accident sequences for nuclear power plants. In the framework of this program many methods were developed. As the interest in these studies was increasing and as adapted methods were developed, Electricite de France has undertaken a probabilistic safety assessment of a nuclear power plant [fr

  17. MRI sequences and their parameters

    International Nuclear Information System (INIS)

    Teissier, J.M.

    1993-01-01

    Listing basic sequences and their present variants makes a synthetic classification of the various acquisition modes possible. The knowledge of the advantages of each of them, as well as of their disadvantages and restraints, seems to be an essential prerequisite to an optimal utilization of each magnetic resonance imaging system. (author)

  18. Degree sequence in message transfer

    Science.gov (United States)

    Yamuna, M.

    2017-11-01

    Message encryption is always an issue in current communication scenario. Methods are being devised using various domains. Graphs satisfy numerous unique properties which can be used for message transfer. In this paper, I propose a message encryption method based on degree sequence of graphs.

  19. Fractals in DNA sequence analysis

    Institute of Scientific and Technical Information of China (English)

    Yu Zu-Guo(喻祖国); Vo Anh; Gong Zhi-Min(龚志民); Long Shun-Chao(龙顺潮)

    2002-01-01

    Fractal methods have been successfully used to study many problems in physics, mathematics, engineering, finance,and even in biology. There has been an increasing interest in unravelling the mysteries of DNA; for example, how can we distinguish coding and noncoding sequences, and the problems of classification and evolution relationship of organisms are key problems in bioinformatics. Although much research has been carried out by taking into consideration the long-range correlations in DNA sequences, and the global fractal dimension has been used in these works by other people, the models and methods are somewhat rough and the results are not satisfactory. In recent years, our group has introduced a time series model (statistical point of view) and a visual representation (geometrical point of view)to DNA sequence analysis. We have also used fractal dimension, correlation dimension, the Hurst exponent and the dimension spectrum (multifractal analysis) to discuss problems in this field. In this paper, we introduce these fractal models and methods and the results of DNA sequence analysis.

  20. On primes in Lucas sequences

    Czech Academy of Sciences Publication Activity Database

    Křížek, Michal; Somer, L.

    2015-01-01

    Roč. 53, č. 1 (2015), s. 2-23 ISSN 0015-0517 R&D Projects: GA ČR GA14-02067S Institutional support: RVO:67985840 Keywords : Lucas sequence * primes Subject RIV: BA - General Mathematics http://www.fq.math.ca/Abstracts/53-1/somer.pdf

  1. Is sequence awareness mandatory for perceptual sequence learning: An assessment using a pure perceptual sequence learning design.

    Science.gov (United States)

    Deroost, Natacha; Coomans, Daphné

    2018-02-01

    We examined the role of sequence awareness in a pure perceptual sequence learning design. Participants had to react to the target's colour that changed according to a perceptual sequence. By varying the mapping of the target's colour onto the response keys, motor responses changed randomly. The effect of sequence awareness on perceptual sequence learning was determined by manipulating the learning instructions (explicit versus implicit) and assessing the amount of sequence awareness after the experiment. In the explicit instruction condition (n = 15), participants were instructed to intentionally search for the colour sequence, whereas in the implicit instruction condition (n = 15), they were left uninformed about the sequenced nature of the task. Sequence awareness after the sequence learning task was tested by means of a questionnaire and the process-dissociation-procedure. The results showed that the instruction manipulation had no effect on the amount of perceptual sequence learning. Based on their report to have actively applied their sequence knowledge during the experiment, participants were subsequently regrouped in a sequence strategy group (n = 14, of which 4 participants from the implicit instruction condition and 10 participants from the explicit instruction condition) and a no-sequence strategy group (n = 16, of which 11 participants from the implicit instruction condition and 5 participants from the explicit instruction condition). Only participants of the sequence strategy group showed reliable perceptual sequence learning and sequence awareness. These results indicate that perceptual sequence learning depends upon the continuous employment of strategic cognitive control processes on sequence knowledge. Sequence awareness is suggested to be a necessary but not sufficient condition for perceptual learning to take place. Copyright © 2018 Elsevier B.V. All rights reserved.

  2. Teaching Task Sequencing via Verbal Mediation.

    Science.gov (United States)

    Rusch, Frank R.; And Others

    1987-01-01

    Verbal sequence training was used to teach a moderately mentally retarded woman to sequence job-related tasks. Learning to say the tasks in the proper sequence resulted in the employee performing her tasks in that sequence, and the employee was capable of mediating her own work behavior when scheduled changes occurred. (Author/JDD)

  3. Repdigits in k-Lucas sequences

    Indian Academy of Sciences (India)

    57(2) 2000 243-254) proved that 11 is the largest number with only one distinct digit (the so-called repdigit) in the sequence ( L n ( 2 ) ) n . In this paper, we address a similar problem in the family of -Lucas sequences. We also show that the -Lucas sequences have similar properties to those of -Fibonacci sequences ...

  4. Procedural Memory Consolidation in the Performance of Brief Keyboard Sequences

    Science.gov (United States)

    Duke, Robert A.; Davis, Carla M.

    2006-01-01

    Using two sequential key press sequences, we tested the extent to which subjects' performance on a digital piano keyboard changed between the end of training and retest on subsequent days. We found consistent, significant improvements attributable to sleep-based consolidation effects, indicating that learning continued after the cessation of…

  5. Differentiating Visual from Response Sequencing during Long-term Skill Learning.

    Science.gov (United States)

    Lynch, Brighid; Beukema, Patrick; Verstynen, Timothy

    2017-01-01

    The dual-system model of sequence learning posits that during early learning there is an advantage for encoding sequences in sensory frames; however, it remains unclear whether this advantage extends to long-term consolidation. Using the serial RT task, we set out to distinguish the dynamics of learning sequential orders of visual cues from learning sequential responses. On each day, most participants learned a new mapping between a set of symbolic cues and responses made with one of four fingers, after which they were exposed to trial blocks of either randomly ordered cues or deterministic ordered cues (12-item sequence). Participants were randomly assigned to one of four groups (n = 15 per group): Visual sequences (same sequence of visual cues across training days), Response sequences (same order of key presses across training days), Combined (same serial order of cues and responses on all training days), and a Control group (a novel sequence each training day). Across 5 days of training, sequence-specific measures of response speed and accuracy improved faster in the Visual group than any of the other three groups, despite no group differences in explicit awareness of the sequence. The two groups that were exposed to the same visual sequence across days showed a marginal improvement in response binding that was not found in the other groups. These results indicate that there is an advantage, in terms of rate of consolidation across multiple days of training, for learning sequences of actions in a sensory representational space, rather than as motoric representations.

  6. NEW COOLING SEQUENCES FOR OLD WHITE DWARFS

    International Nuclear Information System (INIS)

    Renedo, I.; Althaus, L. G.; GarcIa-Berro, E.; Miller Bertolami, M. M.; Romero, A. D.; Corsico, A. H.; Rohrmann, R. D.

    2010-01-01

    We present full evolutionary calculations appropriate for the study of hydrogen-rich DA white dwarfs. This is done by evolving white dwarf progenitors from the zero-age main sequence, through the core hydrogen-burning phase, the helium-burning phase, and the thermally pulsing asymptotic giant branch phase to the white dwarf stage. Complete evolutionary sequences are computed for a wide range of stellar masses and for two different metallicities, Z = 0.01, which is representative of the solar neighborhood, and Z = 0.001, which is appropriate for the study of old stellar systems, like globular clusters. During the white dwarf cooling stage, we self-consistently compute the phase in which nuclear reactions are still important, the diffusive evolution of the elements in the outer layers and, finally, we also take into account all the relevant energy sources in the deep interior of the white dwarf, such as the release of latent heat and the release of gravitational energy due to carbon-oxygen phase separation upon crystallization. We also provide colors and magnitudes for these sequences, based on a new set of improved non-gray white dwarf model atmospheres, which include the most up-to-date physical inputs like the Lyα quasi-molecular opacity. The calculations are extended down to an effective temperature of 2500 K. Our calculations provide a homogeneous set of evolutionary cooling tracks appropriate for mass and age determinations of old DA white dwarfs and for white dwarf cosmochronology of the different Galactic populations.

  7. Exploring the relationship between sequence similarity and accurate phylogenetic trees.

    Science.gov (United States)

    Cantarel, Brandi L; Morrison, Hilary G; Pearson, William

    2006-11-01

    significantly decrease phylogenetic accuracy. In general, although less-divergent sequence families produce more accurate trees, the likelihood of estimating an accurate tree is most dependent on whether radiation in the family was ancient or recent. Accuracy can be improved by combining genes from the same organism when creating species trees or by selecting protein families with the best bootstrap values in comprehensive studies.

  8. Implicit sequence learning in deaf children with cochlear implants.

    Science.gov (United States)

    Conway, Christopher M; Pisoni, David B; Anaya, Esperanza M; Karpicke, Jennifer; Henning, Shirley C

    2011-01-01

    Deaf children with cochlear implants (CIs) represent an intriguing opportunity to study neurocognitive plasticity and reorganization when sound is introduced following a period of auditory deprivation early in development. Although it is common to consider deafness as affecting hearing alone, it may be the case that auditory deprivation leads to more global changes in neurocognitive function. In this paper, we investigate implicit sequence learning abilities in deaf children with CIs using a novel task that measured learning through improvement to immediate serial recall for statistically consistent visual sequences. The results demonstrated two key findings. First, the deaf children with CIs showed disturbances in their visual sequence learning abilities relative to the typically developing normal-hearing children. Second, sequence learning was significantly correlated with a standardized measure of language outcome in the CI children. These findings suggest that a period of auditory deprivation has secondary effects related to general sequencing deficits, and that disturbances in sequence learning may at least partially explain why some deaf children still struggle with language following cochlear implantation. © 2010 Blackwell Publishing Ltd.

  9. Genome sequence of Aspergillus luchuensis NBRC 4314

    Science.gov (United States)

    Yamada, Osamu; Machida, Masayuki; Hosoyama, Akira; Goto, Masatoshi; Takahashi, Toru; Futagami, Taiki; Yamagata, Youhei; Takeuchi, Michio; Kobayashi, Tetsuo; Koike, Hideaki; Abe, Keietsu; Asai, Kiyoshi; Arita, Masanori; Fujita, Nobuyuki; Fukuda, Kazuro; Higa, Ken-ichi; Horikawa, Hiroshi; Ishikawa, Takeaki; Jinno, Koji; Kato, Yumiko; Kirimura, Kohtaro; Mizutani, Osamu; Nakasone, Kaoru; Sano, Motoaki; Shiraishi, Yohei; Tsukahara, Masatoshi; Gomi, Katsuya

    2016-01-01

    Awamori is a traditional distilled beverage made from steamed Thai-Indica rice in Okinawa, Japan. For brewing the liquor, two microbes, local kuro (black) koji mold Aspergillus luchuensis and awamori yeast Saccharomyces cerevisiae are involved. In contrast, that yeasts are used for ethanol fermentation throughout the world, a characteristic of Japanese fermentation industries is the use of Aspergillus molds as a source of enzymes for the maceration and saccharification of raw materials. Here we report the draft genome of a kuro (black) koji mold, A. luchuensis NBRC 4314 (RIB 2604). The total length of nonredundant sequences was nearly 34.7 Mb, comprising approximately 2,300 contigs with 16 telomere-like sequences. In total, 11,691 genes were predicted to encode proteins. Most of the housekeeping genes, such as transcription factors and N-and O-glycosylation system, were conserved with respect to Aspergillus niger and Aspergillus oryzae. An alternative oxidase and acid-stable α-amylase regarding citric acid production and fermentation at a low pH as well as a unique glutamic peptidase were also found in the genome. Furthermore, key biosynthetic gene clusters of ochratoxin A and fumonisin B were absent when compared with A. niger genome, showing the safety of A. luchuensis for food and beverage production. This genome information will facilitate not only comparative genomics with industrial kuro-koji molds, but also molecular breeding of the molds in improvements of awamori fermentation. PMID:27651094

  10. COGNITIVE FATIGUE FACILITATES PROCEDURAL SEQUENCE LEARNING

    Directory of Open Access Journals (Sweden)

    Guillermo eBorragán

    2016-03-01

    Full Text Available Enhanced procedural learning has been evidenced in conditions where cognitive control is diminished, including hypnosis, disruption of prefrontal activity and non-optimal time of the day. Another condition depleting the availability of controlled resources is cognitive fatigue. We tested the hypothesis that cognitive fatigue, eventually leading to diminished cognitive control, facilitates procedural sequence learning. In a two-day experiment, twenty-three young healthy adults were administered a serial reaction time task (SRTT following the induction of high or low levels of cognitive fatigue, in a counterbalanced order. Cognitive fatigue was induced using the Time load Dual-back (TloadDback paradigm, a dual working memory task that allows tailoring cognitive load levels to the individual's optimal performance capacity. In line with our hypothesis, reaction times in the SRTT were faster in the high- than in the low-level fatigue condition, and performance improvement showed more of a benefit from the sequential components than from motor. Altogether, our results suggest a paradoxical, facilitating impact of cognitive fatigue on procedural motor sequence learning. We propose that facilitated learning in the high-level fatigue condition stems from a reduction in the cognitive resources devoted to cognitive control processes that normally oppose automatic procedural acquisition mechanisms.

  11. Using SQL Databases for Sequence Similarity Searching and Analysis.

    Science.gov (United States)

    Pearson, William R; Mackey, Aaron J

    2017-09-13

    Relational databases can integrate diverse types of information and manage large sets of similarity search results, greatly simplifying genome-scale analyses. By focusing on taxonomic subsets of sequences, relational databases can reduce the size and redundancy of sequence libraries and improve the statistical significance of homologs. In addition, by loading similarity search results into a relational database, it becomes possible to explore and summarize the relationships between all of the proteins in an organism and those in other biological kingdoms. This unit describes how to use relational databases to improve the efficiency of sequence similarity searching and demonstrates various large-scale genomic analyses of homology-related data. It also describes the installation and use of a simple protein sequence database, seqdb_demo, which is used as a basis for the other protocols. The unit also introduces search_demo, a database that stores sequence similarity search results. The search_demo database is then used to explore the evolutionary relationships between E. coli proteins and proteins in other organisms in a large-scale comparative genomic analysis. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.

  12. Utility of RNA Sequencing for Analysis of Maize Reproductive Transcriptomes

    Directory of Open Access Journals (Sweden)

    Rebecca M. Davidson

    2011-11-01

    Full Text Available Transcriptome sequencing is a powerful method for studying global expression patterns in large, complex genomes. Evaluation of sequence-based expression profiles during reproductive development would provide functional annotation to genes underlying agronomic traits. We generated transcriptome profiles for 12 diverse maize ( L. reproductive tissues representing male, female, developing seed, and leaf tissues using high throughput transcriptome sequencing. Overall, ∼80% of annotated genes were expressed. Comparative analysis between sequence and hybridization-based methods demonstrated the utility of ribonucleic acid sequencing (RNA-seq for expression determination and differentiation of paralagous genes (∼85% of maize genes. Analysis of 4975 gene families across reproductive tissues revealed expression divergence is proportional to family size. In all pairwise comparisons between tissues, 7 (pre- vs. postemergence cobs to 48% (pollen vs. ovule of genes were differentially expressed. Genes with expression restricted to a single tissue within this study were identified with the highest numbers observed in leaves, endosperm, and pollen. Coexpression network analysis identified 17 gene modules with complex and shared expression patterns containing many previously described maize genes. The data and analyses in this study provide valuable tools through improved gene annotation, gene family characterization, and a core set of candidate genes to further characterize maize reproductive development and improve grain yield potential.

  13. Nonparametric Inference for Periodic Sequences

    KAUST Repository

    Sun, Ying

    2012-02-01

    This article proposes a nonparametric method for estimating the period and values of a periodic sequence when the data are evenly spaced in time. The period is estimated by a "leave-out-one-cycle" version of cross-validation (CV) and complements the periodogram, a widely used tool for period estimation. The CV method is computationally simple and implicitly penalizes multiples of the smallest period, leading to a "virtually" consistent estimator of integer periods. This estimator is investigated both theoretically and by simulation.We also propose a nonparametric test of the null hypothesis that the data have constantmean against the alternative that the sequence of means is periodic. Finally, our methodology is demonstrated on three well-known time series: the sunspots and lynx trapping data, and the El Niño series of sea surface temperatures. © 2012 American Statistical Association and the American Society for Quality.

  14. Multi-qubit compensation sequences

    International Nuclear Information System (INIS)

    Tomita, Y; Merrill, J T; Brown, K R

    2010-01-01

    The Hamiltonian control of n qubits requires precision control of both the strength and timing of interactions. Compensation pulses relax the precision requirements by reducing unknown but systematic errors. Using composite pulse techniques designed for single qubits, we show that systematic errors for n-qubit systems can be corrected to arbitrary accuracy given either two non-commuting control Hamiltonians with identical systematic errors or one error-free control Hamiltonian. We also examine composite pulses in the context of quantum computers controlled by two-qubit interactions. For quantum computers based on the XY interaction, single-qubit composite pulse sequences naturally correct systematic errors. For quantum computers based on the Heisenberg or exchange interaction, the composite pulse sequences reduce the logical single-qubit gate errors but increase the errors for logical two-qubit gates.

  15. Cassini Mission Sequence Subsystem (MSS)

    Science.gov (United States)

    Alland, Robert

    2011-01-01

    This paper describes my work with the Cassini Mission Sequence Subsystem (MSS) team during the summer of 2011. It gives some background on the motivation for this project and describes the expected benefit to the Cassini program. It then introduces the two tasks that I worked on - an automatic system auditing tool and a series of corrections to the Cassini Sequence Generator (SEQ_GEN) - and the specific objectives these tasks were to accomplish. Next, it details the approach I took to meet these objectives and the results of this approach, followed by a discussion of how the outcome of the project compares with my initial expectations. The paper concludes with a summary of my experience working on this project, lists what the next steps are, and acknowledges the help of my Cassini colleagues.

  16. A note on the almost sure central limit theorems for the maxima of strongly dependent nonstationary Gaussian vector sequences

    Directory of Open Access Journals (Sweden)

    Xiang Zeng

    2016-06-01

    Full Text Available Abstract We prove some almost sure central limit theorems for the maxima of strongly dependent nonstationary Gaussian vector sequences under some mild conditions. The results extend the ASCLT to nonstationary Gaussian vector sequences and give substantial improvements for the weight sequence obtained by Lin et al. (Comput. Math. Appl. 62(2:635-640, 2011.

  17. Sequence complexity and work extraction

    International Nuclear Information System (INIS)

    Merhav, Neri

    2015-01-01

    We consider a simplified version of a solvable model by Mandal and Jarzynski, which constructively demonstrates the interplay between work extraction and the increase of the Shannon entropy of an information reservoir which is in contact with a physical system. We extend Mandal and Jarzynski’s main findings in several directions: first, we allow sequences of correlated bits rather than just independent bits. Secondly, at least for the case of binary information, we show that, in fact, the Shannon entropy is only one measure of complexity of the information that must increase in order for work to be extracted. The extracted work can also be upper bounded in terms of the increase in other quantities that measure complexity, like the predictability of future bits from past ones. Third, we provide an extension to the case of non-binary information (i.e. a larger alphabet), and finally, we extend the scope to the case where the incoming bits (before the interaction) form an individual sequence, rather than a random one. In this case, the entropy before the interaction can be replaced by the Lempel–Ziv (LZ) complexity of the incoming sequence, a fact that gives rise to an entropic meaning of the LZ complexity, not only in information theory, but also in physics. (paper)

  18. Entropic fluctuations in DNA sequences

    Science.gov (United States)

    Thanos, Dimitrios; Li, Wentian; Provata, Astero

    2018-03-01

    The Local Shannon Entropy (LSE) in blocks is used as a complexity measure to study the information fluctuations along DNA sequences. The LSE of a DNA block maps the local base arrangement information to a single numerical value. It is shown that despite this reduction of information, LSE allows to extract meaningful information related to the detection of repetitive sequences in whole chromosomes and is useful in finding evolutionary differences between organisms. More specifically, large regions of tandem repeats, such as centromeres, can be detected based on their low LSE fluctuations along the chromosome. Furthermore, an empirical investigation of the appropriate block sizes is provided and the relationship of LSE properties with the structure of the underlying repetitive units is revealed by using both computational and mathematical methods. Sequence similarity between the genomic DNA of closely related species also leads to similar LSE values at the orthologous regions. As an application, the LSE covariance function is used to measure the evolutionary distance between several primate genomes.

  19. MRI Sequences in Head & Neck Radiology - State of the Art.

    Science.gov (United States)

    Widmann, Gerlig; Henninger, Benjamin; Kremser, Christian; Jaschke, Werner

    2017-05-01

    Background  Magnetic resonance imaging (MRI) has become an essential imaging modality for the evaluation of head & neck pathologies. However, the diagnostic power of MRI is strongly related to the appropriate selection and interpretation of imaging protocols and sequences. The aim of this article is to review state-of-the-art sequences for the clinical routine in head & neck MRI and to describe the evidence for which medical question these sequences and techniques are useful. Method  Literature review of state-of-the-art sequences in head & neck MRI. Results and Conclusion  Basic sequences (T1w, T2w, T1wC+) and fat suppression techniques (TIRM/STIR, Dixon, Spectral Fat sat) are important tools in the diagnostic workup of inflammation, congenital lesions and tumors including staging. Additional sequences (SSFP (CISS, FIESTA), SPACE, VISTA, 3D-FLAIR) are used for pathologies of the cranial nerves, labyrinth and evaluation of endolymphatic hydrops in Menière's disease. Vessel and perfusion sequences (3D-TOF, TWIST/TRICKS angiography, DCE) are used in vascular contact syndromes, vascular malformations and analysis of microvascular parameters of tissue perfusion. Diffusion-weighted imaging (EPI-DWI, non-EPI-DWI, RESOLVE) is helpful in cholesteatoma imaging, estimation of malignancy, and evaluation of treatment response and posttreatment recurrence in head & neck cancer. Understanding of MRI sequences and close collaboration with referring physicians improves the diagnostic confidence of MRI in the daily routine and drives further research in this fascinating image modality. Key Points:   · Understanding of MRI sequences is essential for the correct and reliable interpretation of MRI findings.. · MRI protocols have to be carefully selected based on relevant clinical information.. · Close collaboration with referring physicians improves the output obtained from the diagnostic possibilities of MRI.. Citation Format · Widmann G, Henninger B, Kremser C et

  20. Accurate Local-Ancestry Inference in Exome-Sequenced Admixed Individuals via Off-Target Sequence Reads

    Science.gov (United States)

    Hu, Youna; Willer, Cristen; Zhan, Xiaowei; Kang, Hyun Min; Abecasis, Gonçalo R.

    2013-01-01

    Estimates of the ancestry of specific chromosomal regions in admixed individuals are useful for studies of human evolutionary history and for genetic association studies. Previously, this ancestry inference relied on high-quality genotypes from genome-wide association study (GWAS) arrays. These high-quality genotypes are not always available when samples are exome sequenced, and exome sequencing is the strategy of choice for many ongoing genetic studies. Here we show that off-target reads generated during exome-sequencing experiments can be combined with on-target reads to accurately estimate the ancestry of each chromosomal segment in an admixed individual. To reconstruct local ancestry, our method SEQMIX models aligned bases directly instead of relying on hard genotype calls. We evaluate the accuracy of our method through simulations and analysis of samples sequenced by the 1000 Genomes Project and the NHLBI Grand Opportunity Exome Sequencing Project. In African Americans, we show that local-ancestry estimates derived by our method are very similar to those derived with Illumina’s Omni 2.5M genotyping array and much improved in relation to estimates that use only exome genotypes and ignore off-target sequencing reads. Software implementing this method, SEQMIX, can be applied to analysis of human population history or used for genetic association studies in admixed individuals. PMID:24210252

  1. Method and apparatus for biological sequence comparison

    Science.gov (United States)

    Marr, T.G.; Chang, W.I.

    1997-12-23

    A method and apparatus are disclosed for comparing biological sequences from a known source of sequences, with a subject (query) sequence. The apparatus takes as input a set of target similarity levels (such as evolutionary distances in units of PAM), and finds all fragments of known sequences that are similar to the subject sequence at each target similarity level, and are long enough to be statistically significant. The invention device filters out fragments from the known sequences that are too short, or have a lower average similarity to the subject sequence than is required by each target similarity level. The subject sequence is then compared only to the remaining known sequences to find the best matches. The filtering member divides the subject sequence into overlapping blocks, each block being sufficiently large to contain a minimum-length alignment from a known sequence. For each block, the filter member compares the block with every possible short fragment in the known sequences and determines a best match for each comparison. The determined set of short fragment best matches for the block provide an upper threshold on alignment values. Regions of a certain length from the known sequences that have a mean alignment value upper threshold greater than a target unit score are concatenated to form a union. The current block is compared to the union and provides an indication of best local alignment with the subject sequence. 5 figs.

  2. Memory and learning with rapid audiovisual sequences

    Science.gov (United States)

    Keller, Arielle S.; Sekuler, Robert

    2015-01-01

    We examined short-term memory for sequences of visual stimuli embedded in varying multisensory contexts. In two experiments, subjects judged the structure of the visual sequences while disregarding concurrent, but task-irrelevant auditory sequences. Stimuli were eight-item sequences in which varying luminances and frequencies were presented concurrently and rapidly (at 8 Hz). Subjects judged whether the final four items in a visual sequence identically replicated the first four items. Luminances and frequencies in each sequence were either perceptually correlated (Congruent) or were unrelated to one another (Incongruent). Experiment 1 showed that, despite encouragement to ignore the auditory stream, subjects' categorization of visual sequences was strongly influenced by the accompanying auditory sequences. Moreover, this influence tracked the similarity between a stimulus's separate audio and visual sequences, demonstrating that task-irrelevant auditory sequences underwent a considerable degree of processing. Using a variant of Hebb's repetition design, Experiment 2 compared musically trained subjects and subjects who had little or no musical training on the same task as used in Experiment 1. Test sequences included some that intermittently and randomly recurred, which produced better performance than sequences that were generated anew for each trial. The auditory component of a recurring audiovisual sequence influenced musically trained subjects more than it did other subjects. This result demonstrates that stimulus-selective, task-irrelevant learning of sequences can occur even when such learning is an incidental by-product of the task being performed. PMID:26575193

  3. Memory and learning with rapid audiovisual sequences.

    Science.gov (United States)

    Keller, Arielle S; Sekuler, Robert

    2015-01-01

    We examined short-term memory for sequences of visual stimuli embedded in varying multisensory contexts. In two experiments, subjects judged the structure of the visual sequences while disregarding concurrent, but task-irrelevant auditory sequences. Stimuli were eight-item sequences in which varying luminances and frequencies were presented concurrently and rapidly (at 8 Hz). Subjects judged whether the final four items in a visual sequence identically replicated the first four items. Luminances and frequencies in each sequence were either perceptually correlated (Congruent) or were unrelated to one another (Incongruent). Experiment 1 showed that, despite encouragement to ignore the auditory stream, subjects' categorization of visual sequences was strongly influenced by the accompanying auditory sequences. Moreover, this influence tracked the similarity between a stimulus's separate audio and visual sequences, demonstrating that task-irrelevant auditory sequences underwent a considerable degree of processing. Using a variant of Hebb's repetition design, Experiment 2 compared musically trained subjects and subjects who had little or no musical training on the same task as used in Experiment 1. Test sequences included some that intermittently and randomly recurred, which produced better performance than sequences that were generated anew for each trial. The auditory component of a recurring audiovisual sequence influenced musically trained subjects more than it did other subjects. This result demonstrates that stimulus-selective, task-irrelevant learning of sequences can occur even when such learning is an incidental by-product of the task being performed.

  4. Multineuronal Spike Sequences Repeat with Millisecond Precision

    Directory of Open Access Journals (Sweden)

    Koki eMatsumoto

    2013-06-01

    Full Text Available Cortical microcircuits are nonrandomly wired by neurons. As a natural consequence, spikes emitted by microcircuits are also nonrandomly patterned in time and space. One of the prominent spike organizations is a repetition of fixed patterns of spike series across multiple neurons. However, several questions remain unsolved, including how precisely spike sequences repeat, how the sequences are spatially organized, how many neurons participate in sequences, and how different sequences are functionally linked. To address these questions, we monitored spontaneous spikes of hippocampal CA3 neurons ex vivo using a high-speed functional multineuron calcium imaging technique that allowed us to monitor spikes with millisecond resolution and to record the location of spiking and nonspiking neurons. Multineuronal spike sequences were overrepresented in spontaneous activity compared to the statistical chance level. Approximately 75% of neurons participated in at least one sequence during our observation period. The participants were sparsely dispersed and did not show specific spatial organization. The number of sequences relative to the chance level decreased when larger time frames were used to detect sequences. Thus, sequences were precise at the millisecond level. Sequences often shared common spikes with other sequences; parts of sequences were subsequently relayed by following sequences, generating complex chains of multiple sequences.

  5. A Systematic Approach to Quality Oriented Product Sequencing for Multistage Manufacturing Systems

    OpenAIRE

    Zhang, Faping; Butt, Shahid Ikramullah

    2016-01-01

    Product sequencing is one way to reduce cost and improve product quality for multistage manufacturing systems (MMS). However, systematically evaluating the influence of product sequence on quality performance for MMS is still a challenge. By considering the rate of incoming conforming product, manufacturing system quality transition between batch to batch, and quality propagation along stages, this paper investigates the appropriate batch policies and product sequencing for MMS so that satisf...

  6. Automatic generation of randomized trial sequences for priming experiments.

    Science.gov (United States)

    Ihrke, Matthias; Behrendt, Jörg

    2011-01-01

    In most psychological experiments, a randomized presentation of successive displays is crucial for the validity of the results. For some paradigms, this is not a trivial issue because trials are interdependent, e.g., priming paradigms. We present a software that automatically generates optimized trial sequences for (negative-) priming experiments. Our implementation is based on an optimization heuristic known as genetic algorithms that allows for an intuitive interpretation due to its similarity to natural evolution. The program features a graphical user interface that allows the user to generate trial sequences and to interactively improve them. The software is based on freely available software and is released under the GNU General Public License.

  7. Static multiplicities in heterogeneous azeotropic distillation sequences

    DEFF Research Database (Denmark)

    Esbjerg, Klavs; Andersen, Torben Ravn; Jørgensen, Sten Bay

    1998-01-01

    In this paper the results of a bifurcation analysis on heterogeneous azeotropic distillation sequences are given. Two sequences suitable for ethanol dehydration are compared: The 'direct' and the 'indirect' sequence. It is shown, that the two sequences, despite their similarities, exhibit very...... different static behavior. The method of Petlyuk and Avet'yan (1971), Bekiaris et al. (1993), which assumes infinite reflux and infinite number of stages, is extended to and applied on heterogeneous azeotropic distillation sequences. The predictions are substantiated through simulations. The static sequence...

  8. SNBRFinder: A Sequence-Based Hybrid Algorithm for Enhanced Prediction of Nucleic Acid-Binding Residues.

    Science.gov (United States)

    Yang, Xiaoxia; Wang, Jia; Sun, Jun; Liu, Rong

    2015-01-01

    Protein-nucleic acid interactions are central to various fundamental biological processes. Automated methods capable of reliably identifying DNA- and RNA-binding residues in protein sequence are assuming ever-increasing importance. The majority of current algorithms rely on feature-based prediction, but their accuracy remains to be further improved. Here we propose a sequence-based hybrid algorithm SNBRFinder (Sequence-based Nucleic acid-Binding Residue Finder) by merging a feature predictor SNBRFinderF and a template predictor SNBRFinderT. SNBRFinderF was established using the support vector machine whose inputs include sequence profile and other complementary sequence descriptors, while SNBRFinderT was implemented with the sequence alignment algorithm based on profile hidden Markov models to capture the weakly homologous template of query sequence. Experimental results show that SNBRFinderF was clearly superior to the commonly used sequence profile-based predictor and SNBRFinderT can achieve comparable performance to the structure-based template methods. Leveraging the complementary relationship between these two predictors, SNBRFinder reasonably improved the performance of both DNA- and RNA-binding residue predictions. More importantly, the sequence-based hybrid prediction reached competitive performance relative to our previous structure-based counterpart. Our extensive and stringent comparisons show that SNBRFinder has obvious advantages over the existing sequence-based prediction algorithms. The value of our algorithm is highlighted by establishing an easy-to-use web server that is freely accessible at http://ibi.hzau.edu.cn/SNBRFinder.

  9. Blind sequence-length estimation of low-SNR cyclostationary sequences

    CSIR Research Space (South Africa)

    Vlok, JD

    2014-06-01

    Full Text Available Several existing direct-sequence spread spectrum (DSSS) detection and estimation algorithms assume prior knowledge of the symbol period or sequence length, although very few sequence-length estimation techniques are available in the literature...

  10. Infinite matrices and sequence spaces

    CERN Document Server

    Cooke, Richard G

    2014-01-01

    This clear and correct summation of basic results from a specialized field focuses on the behavior of infinite matrices in general, rather than on properties of special matrices. Three introductory chapters guide students to the manipulation of infinite matrices, covering definitions and preliminary ideas, reciprocals of infinite matrices, and linear equations involving infinite matrices.From the fourth chapter onward, the author treats the application of infinite matrices to the summability of divergent sequences and series from various points of view. Topics include consistency, mutual consi

  11. Complete Genome Sequence of the Probiotic Strain Lactobacillus salivarius LPM01.

    Science.gov (United States)

    Chenoll, Empar; Codoñer, Francisco M; Martinez-Blanch, Juan F; Acevedo-Piérart, Marcelo; Ormeño, M Loreto; Ramón, Daniel; Genovés, Salvador

    2016-11-23

    Lactobacillus salivarius LPM01 (DSM 22150) is a probiotic strain able to improve health status in immunocompromised people. Here, we report its complete genome sequence deciphered by PacBio single-molecule real-time (SMRT) technology. Analysis of the sequence may provide insights into its functional activity and safety assessment. Copyright © 2016 Chenoll et al.

  12. Draft Genome Sequence of Fish Pathogen Aeromonas bestiarum GA97-22.

    Science.gov (United States)

    Kumru, Salih; Tekedar, Hasan C; Griffin, Matt J; Waldbieser, Geoffrey C; Liles, Mark R; Sonstegard, Tad; Schroeder, Steven G; Lawrence, Mark L; Karsi, Attila

    2018-06-14

    Aeromonas bestiarum is a Gram-negative mesophilic motile bacterium causing acute hemorrhagic septicemia or chronic skin ulcers in fish. Here, we report the draft genome sequence of A. bestiarum strain GA97-22, which was isolated from rainbow trout in 1997. This genome sequence will improve our understanding of the complex taxonomy of motile aeromonads.

  13. Next-Generation Sequencing: From Understanding Biology to Personalized Medicine

    Directory of Open Access Journals (Sweden)

    Benjamin Meder

    2013-03-01

    Full Text Available Within just a few years, the new methods for high-throughput next-generation sequencing have generated completely novel insights into the heritability and pathophysiology of human disease. In this review, we wish to highlight the benefits of the current state-of-the-art sequencing technologies for genetic and epigenetic research. We illustrate how these technologies help to constantly improve our understanding of genetic mechanisms in biological systems and summarize the progress made so far. This can be exemplified by the case of heritable heart muscle diseases, so-called cardiomyopathies. Here, next-generation sequencing is able to identify novel disease genes, and first clinical applications demonstrate the successful translation of this technology into personalized patient care.

  14. Mapping Base Modifications in DNA by Transverse-Current Sequencing

    Science.gov (United States)

    Alvarez, Jose R.; Skachkov, Dmitry; Massey, Steven E.; Kalitsov, Alan; Velev, Julian P.

    2018-02-01

    Sequencing DNA modifications and lesions, such as methylation of cytosine and oxidation of guanine, is even more important and challenging than sequencing the genome itself. The traditional methods for detecting DNA modifications are either insensitive to these modifications or require additional processing steps to identify a particular type of modification. Transverse-current sequencing in nanopores can potentially identify the canonical bases and base modifications in the same run. In this work, we demonstrate that the most common DNA epigenetic modifications and lesions can be detected with any predefined accuracy based on their tunneling current signature. Our results are based on simulations of the nanopore tunneling current through DNA molecules, calculated using nonequilibrium electron-transport methodology within an effective multiorbital model derived from first-principles calculations, followed by a base-calling algorithm accounting for neighbor current-current correlations. This methodology can be integrated with existing experimental techniques to improve base-calling fidelity.

  15. Plann: A command-line application for annotating plastome sequences.

    Science.gov (United States)

    Huang, Daisie I; Cronk, Quentin C B

    2015-08-01

    Plann automates the process of annotating a plastome sequence in GenBank format for either downstream processing or for GenBank submission by annotating a new plastome based on a similar, well-annotated plastome. Plann is a Perl script to be executed on the command line. Plann compares a new plastome sequence to the features annotated in a reference plastome and then shifts the intervals of any matching features to the locations in the new plastome. Plann's output can be used in the National Center for Biotechnology Information's tbl2asn to create a Sequin file for GenBank submission. Unlike Web-based annotation packages, Plann is a locally executable script that will accurately annotate a plastome sequence to a locally specified reference plastome. Because it executes from the command line, it is ready to use in other software pipelines and can be easily rerun as a draft plastome is improved.

  16. Thermodynamics-based models of transcriptional regulation with gene sequence.

    Science.gov (United States)

    Wang, Shuqiang; Shen, Yanyan; Hu, Jinxing

    2015-12-01

    Quantitative models of gene regulatory activity have the potential to improve our mechanistic understanding of transcriptional regulation. However, the few models available today have been based on simplistic assumptions about the sequences being modeled or heuristic approximations of the underlying regulatory mechanisms. In this work, we have developed a thermodynamics-based model to predict gene expression driven by any DNA sequence. The proposed model relies on a continuous time, differential equation description of transcriptional dynamics. The sequence features of the promoter are exploited to derive the binding affinity which is derived based on statistical molecular thermodynamics. Experimental results show that the proposed model can effectively identify the activity levels of transcription factors and the regulatory parameters. Comparing with the previous models, the proposed model can reveal more biological sense.

  17. Parallel motif extraction from very long sequences

    KAUST Repository

    Sahli, Majed; Mansour, Essam; Kalnis, Panos

    2013-01-01

    Motifs are frequent patterns used to identify biological functionality in genomic sequences, periodicity in time series, or user trends in web logs. In contrast to a lot of existing work that focuses on collections of many short sequences, modern

  18. Computational analysis of sequence selection mechanisms.

    Science.gov (United States)

    Meyerguz, Leonid; Grasso, Catherine; Kleinberg, Jon; Elber, Ron

    2004-04-01

    Mechanisms leading to gene variations are responsible for the diversity of species and are important components of the theory of evolution. One constraint on gene evolution is that of protein foldability; the three-dimensional shapes of proteins must be thermodynamically stable. We explore the impact of this constraint and calculate properties of foldable sequences using 3660 structures from the Protein Data Bank. We seek a selection function that receives sequences as input, and outputs survival probability based on sequence fitness to structure. We compute the number of sequences that match a particular protein structure with energy lower than the native sequence, the density of the number of sequences, the entropy, and the "selection" temperature. The mechanism of structure selection for sequences longer than 200 amino acids is approximately universal. For shorter sequences, it is not. We speculate on concrete evolutionary mechanisms that show this behavior.

  19. The recurrence sequences via Sylvester matrices

    Science.gov (United States)

    Karaduman, Erdal; Deveci, Ömür

    2017-07-01

    In this work, we define the Pell-Jacobsthal-Slyvester sequence and the Jacobsthal-Pell-Slyvester sequence by using the Slyvester matrices which are obtained from the characteristic polynomials of the Pell and Jacobsthal sequences and then, we study the sequences defined modulo m. Also, we obtain the cyclic groups and the semigroups from the generating matrices of these sequences when read modulo m and then, we derive the relationships among the orders of the cyclic groups and the periods of the sequences. Furthermore, we redefine Pell-Jacobsthal-Slyvester sequence and the Jacobsthal-Pell-Slyvester sequence by means of the elements of the groups and then, we examine them in the finite groups.

  20. ON SOME RECURRENCE TYPE SMARANDACHE SEQUENCES

    OpenAIRE

    MAJUMDAR, A.A.K.; GUNARTO, H.

    2000-01-01

    In this paper, we study some properties of ten recurrence type Smarandache sequences, namely, the Smarandache odd, even, prime product, square product, higher-power product, permutation, consecutive, reverse, symmetric, and pierced chain sequences.

  1. "First generation" automated DNA sequencing technology.

    Science.gov (United States)

    Slatko, Barton E; Kieleczawa, Jan; Ju, Jingyue; Gardner, Andrew F; Hendrickson, Cynthia L; Ausubel, Frederick M

    2011-10-01

    Beginning in the 1980s, automation of DNA sequencing has greatly increased throughput, reduced costs, and enabled large projects to be completed more easily. The development of automation technology paralleled the development of other aspects of DNA sequencing: better enzymes and chemistry, separation and imaging technology, sequencing protocols, robotics, and computational advancements (including base-calling algorithms with quality scores, database developments, and sequence analysis programs). Despite the emergence of high-throughput sequencing platforms, automated Sanger sequencing technology remains useful for many applications. This unit provides background and a description of the "First-Generation" automated DNA sequencing technology. It also includes protocols for using the current Applied Biosystems (ABI) automated DNA sequencing machines. © 2011 by John Wiley & Sons, Inc.

  2. Comparative analysis of sequences from PT 2013

    DEFF Research Database (Denmark)

    Mikkelsen, Susie Sommer

    Sheatfish and not EHNV. Generally, mistakes occurred at the ends of the sequences. This can be due to several factors. One is that the sequence has not been trimmed of the sequence primer sites. Another is the lack of quality control of the chromatogram. Finally, sequencing in just one direction can result...... diseases in Europe. As part of the EURL proficiency test for fish diseases it is required to sequence any RANA virus isolates found in any of the samples. It is also highly recommended to sequence the ISA virus to determine whether it be HPRΔ or HPR0. Furthermore, it is recommended that any VHSV and IHNV...... isolates be genotyped. As part of the evaluation of the proficiency results it was decided this year to look into the quality and similarity of the sequence results for selected viruses. Ampoule III in the proficiency test 2013 contained an EHNV isolate. The EURL received 43 sequences from 41 laboratories...

  3. Augmented brain function by coordinated reset stimulation with slowly varying sequences

    Directory of Open Access Journals (Sweden)

    Magteld eZeitler

    2015-03-01

    Full Text Available Several brain disorders are characterized by abnormally strong neuronal synchrony. Coordinated Reset (CR stimulation was developed to selectively counteract abnormal neuronal synchrony by desynchronization. For this, phase resetting stimuli are delivered to different subpopulations in a timely coordinated way. In neural networks with spike timing-dependent plasticity CR stimulation may eventually lead to an anti-kindling, i.e. an unlearning of abnormal synaptic connectivity and abnormal synchrony. The spatiotemporal sequence by which all stimulation sites are stimulated exactly once is called the stimulation site sequence, or briefly sequence. So far, in simulations, pre-clinical and clinical applications CR was applied either with fixed sequences or rapidly varying sequences (RVS. In this computational study we show that appropriate repetition of the sequence with occasional random switching to the next sequence may significantly improve the anti-kindling effect of CR. To this end, a sequence is applied many times before randomly switching to the next sequence. This new method is called SVS CR stimulation, i.e. CR with slowly varying sequences. In a neuronal network with strong short-range excitatory and weak long-range inhibitory dynamic couplings SVS CR stimulation turns out to be superior to CR stimulation with fixed sequences or RVS.

  4. Augmented brain function by coordinated reset stimulation with slowly varying sequences.

    Science.gov (United States)

    Zeitler, Magteld; Tass, Peter A

    2015-01-01

    Several brain disorders are characterized by abnormally strong neuronal synchrony. Coordinated Reset (CR) stimulation was developed to selectively counteract abnormal neuronal synchrony by desynchronization. For this, phase resetting stimuli are delivered to different subpopulations in a timely coordinated way. In neural networks with spike timing-dependent plasticity CR stimulation may eventually lead to an anti-kindling, i.e., an unlearning of abnormal synaptic connectivity and abnormal synchrony. The spatiotemporal sequence by which all stimulation sites are stimulated exactly once is called the stimulation site sequence, or briefly sequence. So far, in simulations, pre-clinical and clinical applications CR was applied either with fixed sequences or rapidly varying sequences (RVS). In this computational study we show that appropriate repetition of the sequence with occasional random switching to the next sequence may significantly improve the anti-kindling effect of CR. To this end, a sequence is applied many times before randomly switching to the next sequence. This new method is called SVS CR stimulation, i.e., CR with slowly varying sequences. In a neuronal network with strong short-range excitatory and weak long-range inhibitory dynamic couplings SVS CR stimulation turns out to be superior to CR stimulation with fixed sequences or RVS.

  5. Perfect sequences over the real quaternions

    OpenAIRE

    Kuznetsov, Oleg

    2017-01-01

    In this Thesis, perfect sequences over the real quaternions are first considered. Definitions for the right and left periodic autocorrelation functions are given, and right and left perfect sequences introduced. It is shown that the right (left) perfection of any sequence implies the left (right) perfection, so concepts of right and left perfect sequences over the real quaternions are equivalent. Unitary transformations of the quaternion space ℍ are then considered. Using the equivalence of t...

  6. Information decomposition method to analyze symbolical sequences

    International Nuclear Information System (INIS)

    Korotkov, E.V.; Korotkova, M.A.; Kudryashov, N.A.

    2003-01-01

    The information decomposition (ID) method to analyze symbolical sequences is presented. This method allows us to reveal a latent periodicity of any symbolical sequence. The ID method is shown to have advantages in comparison with application of the Fourier transformation, the wavelet transform and the dynamic programming method to look for latent periodicity. Examples of the latent periods for poetic texts, DNA sequences and amino acids are presented. Possible origin of a latent periodicity for different symbolical sequences is discussed

  7. PSA modeling of long-term accident sequences

    International Nuclear Information System (INIS)

    Georgescu, Gabriel; Corenwinder, Francois; Lanore, Jeanne-Marie

    2014-01-01

    In the context of the extension of PSA scope to include external hazards, in France, both operator (EDF) and IRSN work for the improvement of methods to better take into account in the PSA the accident sequences induced by initiators which affect a whole site containing several nuclear units (reactors, fuel pools,...). These methodological improvements represent an essential prerequisite for the development of external hazards PSA. However, it has to be noted that in French PSA, even before Fukushima, long term accident sequences were taken into account: many insight were therefore used, as complementary information, to enhance the safety level of the plants. IRSN proposed an external events PSA development program. One of the first steps of the program is the development of methods to model in the PSA the long term accident sequences, based on the experience gained. At short term IRSN intends to enhance the modeling of the 'long term' accident sequences induced by the loss of the heat sink or/and the loss of external power supply. The experience gained by IRSN and EDF from the development of several probabilistic studies treating long term accident sequences shows that the simple extension of the mission time of the mitigation systems from 24 hours to longer times is not sufficient to realistically quantify the risk and to obtain a correct ranking of the risk contributions and that treatment of recoveries is also necessary. IRSN intends to develop a generic study which can be used as a general methodology for the assessment of the long term accident sequences, mainly generated by external hazards and their combinations. This first attempt to develop this generic study allowed identifying some aspects, which may be hazard (or combinations of hazards) or related to initial boundary conditions, which should be taken into account for further developments. (authors)

  8. Biophysical and structural considerations for protein sequence evolution

    Directory of Open Access Journals (Sweden)

    Grahnen Johan A

    2011-12-01

    Full Text Available Abstract Background Protein sequence evolution is constrained by the biophysics of folding and function, causing interdependence between interacting sites in the sequence. However, current site-independent models of sequence evolutions do not take this into account. Recent attempts to integrate the influence of structure and biophysics into phylogenetic models via statistical/informational approaches have not resulted in expected improvements in model performance. This suggests that further innovations are needed for progress in this field. Results Here we develop a coarse-grained physics-based model of protein folding and binding function, and compare it to a popular informational model. We find that both models violate the assumption of the native sequence being close to a thermodynamic optimum, causing directional selection away from the native state. Sampling and simulation show that the physics-based model is more specific for fold-defining interactions that vary less among residue type. The informational model diffuses further in sequence space with fewer barriers and tends to provide less support for an invariant sites model, although amino acid substitutions are generally conservative. Both approaches produce sequences with natural features like dN/dS Conclusions Simple coarse-grained models of protein folding can describe some natural features of evolving proteins but are currently not accurate enough to use in evolutionary inference. This is partly due to improper packing of the hydrophobic core. We suggest possible improvements on the representation of structure, folding energy, and binding function, as regards both native and non-native conformations, and describe a large number of possible applications for such a model.

  9. Parallel sequencing lives, or what makes large sequencing projects successful.

    Science.gov (United States)

    Quilez, Javier; Vidal, Enrique; Dily, François Le; Serra, François; Cuartero, Yasmina; Stadhouders, Ralph; Graf, Thomas; Marti-Renom, Marc A; Beato, Miguel; Filion, Guillaume

    2017-11-01

    T47D_rep2 and b1913e6c1_51720e9cf were 2 Hi-C samples. They were born and processed at the same time, yet their fates were very different. The life of b1913e6c1_51720e9cf was simple and fruitful, while that of T47D_rep2 was full of accidents and sorrow. At the heart of these differences lies the fact that b1913e6c1_51720e9cf was born under a lab culture of Documentation, Automation, Traceability, and Autonomy and compliance with the FAIR Principles. Their lives are a lesson for those who wish to embark on the journey of managing high-throughput sequencing data. © The Author 2017. Published by Oxford University Press.

  10. MatrixPlot: visualizing sequence constraints

    DEFF Research Database (Denmark)

    Gorodkin, Jan; Stærfeldt, Hans Henrik; Lund, Ole

    1999-01-01

    MatrixPlot: visualizing sequence constraints. Sub-title Abstract Summary : MatrixPlot is a program for making high-quality matrix plots, such as mutual information plots of sequence alignments and distance matrices of sequences with known three-dimensional coordinates. The user can add information...

  11. Comparative genomics beyond sequence-based alignments

    DEFF Research Database (Denmark)

    Þórarinsson, Elfar; Yao, Zizhen; Wiklund, Eric D.

    2008-01-01

    Recent computational scans for non-coding RNAs (ncRNAs) in multiple organisms have relied on existing multiple sequence alignments. However, as sequence similarity drops, a key signal of RNA structure--frequent compensating base changes--is increasingly likely to cause sequence-based alignment me...

  12. DNA sequence modeling based on context trees

    NARCIS (Netherlands)

    Kusters, C.J.; Ignatenko, T.; Roland, J.; Horlin, F.

    2015-01-01

    Genomic sequences contain instructions for protein and cell production. Therefore understanding and identification of biologically and functionally meaningful patterns in DNA sequences is of paramount importance. Modeling of DNA sequences in its turn can help to better understand and identify such

  13. Compact flow diagrams for state sequences

    NARCIS (Netherlands)

    Buchin, K.A.; Buchin, M.E.; Gudmundsson, J.; Horton, M.J.; Sijben, S.

    2016-01-01

    We introduce the concept of compactly representing a large number of state sequences, e.g., sequences of activities, as a flow diagram. We argue that the flow diagram representation gives an intuitive summary that allows the user to detect patterns among large sets of state sequences. Simplified,

  14. Blazar Sequence in Fermi Era Liang Chen

    Indian Academy of Sciences (India)

    Abstract. In this paper, we review the latest research results on the topic of blazar sequence. It seems that the blazar sequence is phenomenally ruled out, while the theoretical blazar sequence still holds. We point out that black hole mass is a dominated parameter accounting for high-power- high-synchrotron-peaked and ...

  15. Discovery of Escherichia coli CRISPR sequences in an undergraduate laboratory.

    Science.gov (United States)

    Militello, Kevin T; Lazatin, Justine C

    2017-05-01

    Clustered regularly interspaced short palindromic repeats (CRISPRs) represent a novel type of adaptive immune system found in eubacteria and archaebacteria. CRISPRs have recently generated a lot of attention due to their unique ability to catalog foreign nucleic acids, their ability to destroy foreign nucleic acids in a mechanism that shares some similarity to RNA interference, and the ability to utilize reconstituted CRISPR systems for genome editing in numerous organisms. In order to introduce CRISPR biology into an undergraduate upper-level laboratory, a five-week set of exercises was designed to allow students to examine the CRISPR status of uncharacterized Escherichia coli strains and to allow the discovery of new repeats and spacers. Students started the project by isolating genomic DNA from E. coli and amplifying the iap CRISPR locus using the polymerase chain reaction (PCR). The PCR products were analyzed by Sanger DNA sequencing, and the sequences were examined for the presence of CRISPR repeat sequences. The regions between the repeats, the spacers, were extracted and analyzed with BLASTN searches. Overall, CRISPR loci were sequenced from several previously uncharacterized E. coli strains and one E. coli K-12 strain. Sanger DNA sequencing resulted in the discovery of 36 spacer sequences and their corresponding surrounding repeat sequences. Five of the spacers were homologous to foreign (non-E. coli) DNA. Assessment of the laboratory indicates that improvements were made in the ability of students to answer questions relating to the structure and function of CRISPRs. Future directions of the laboratory are presented and discussed. © 2016 by The International Union of Biochemistry and Molecular Biology, 45(3):262-269, 2017. © 2016 The International Union of Biochemistry and Molecular Biology.

  16. Sequencing and analysis of the Mediterranean amphioxus (Branchiostoma lanceolatum transcriptome.

    Directory of Open Access Journals (Sweden)

    Silvan Oulion

    Full Text Available BACKGROUND: The basally divergent phylogenetic position of amphioxus (Cephalochordata, as well as its conserved morphology, development and genetics, make it the best proxy for the chordate ancestor. Particularly, studies using the amphioxus model help our understanding of vertebrate evolution and development. Thus, interest for the amphioxus model led to the characterization of both the transcriptome and complete genome sequence of the American species, Branchiostoma floridae. However, recent technical improvements allowing induction of spawning in the laboratory during the breeding season on a daily basis with the Mediterranean species Branchiostoma lanceolatum have encouraged European Evo-Devo researchers to adopt this species as a model even though no genomic or transcriptomic data have been available. To fill this need we used the pyrosequencing method to characterize the B. lanceolatum transcriptome and then compared our results with the published transcriptome of B. floridae. RESULTS: Starting with total RNA from nine different developmental stages of B. lanceolatum, a normalized cDNA library was constructed and sequenced on Roche GS FLX (Titanium mode. Around 1.4 million of reads were produced and assembled into 70,530 contigs (average length of 490 bp. Overall 37% of the assembled sequences were annotated by BlastX and their Gene Ontology terms were determined. These results were then compared to genomic and transcriptomic data of B. floridae to assess similarities and specificities of each species. CONCLUSION: We obtained a high-quality amphioxus (B. lanceolatum reference transcriptome using a high throughput sequencing approach. We found that 83% of the predicted genes in the B. floridae complete genome sequence are also found in the B. lanceolatum transcriptome, while only 41% were found in the B. floridae transcriptome obtained with traditional Sanger based sequencing. Therefore, given the high degree of sequence conservation

  17. CBESW: Sequence Alignment on the Playstation 3

    Directory of Open Access Journals (Sweden)

    Hieu Nim

    2008-09-01

    Full Text Available Abstract Background The exponential growth of available biological data has caused bioinformatics to be rapidly moving towards a data-intensive, computational science. As a result, the computational power needed by bioinformatics applications is growing exponentially as well. The recent emergence of accelerator technologies has made it possible to achieve an excellent improvement in execution time for many bioinformatics applications, compared to current general-purpose platforms. In this paper, we demonstrate how the PlayStation® 3, powered by the Cell Broadband Engine, can be used as a computational platform to accelerate the Smith-Waterman algorithm. Results For large datasets, our implementation on the PlayStation® 3 provides a significant improvement in running time compared to other implementations such as SSEARCH, Striped Smith-Waterman and CUDA. Our implementation achieves a peak performance of up to 3,646 MCUPS. Conclusion The results from our experiments demonstrate that the PlayStation® 3 console can be used as an efficient low cost computational platform for high performance sequence alignment applications.

  18. CBESW: sequence alignment on the Playstation 3.

    Science.gov (United States)

    Wirawan, Adrianto; Kwoh, Chee Keong; Hieu, Nim Tri; Schmidt, Bertil

    2008-09-17

    The exponential growth of available biological data has caused bioinformatics to be rapidly moving towards a data-intensive, computational science. As a result, the computational power needed by bioinformatics applications is growing exponentially as well. The recent emergence of accelerator technologies has made it possible to achieve an excellent improvement in execution time for many bioinformatics applications, compared to current general-purpose platforms. In this paper, we demonstrate how the PlayStation 3, powered by the Cell Broadband Engine, can be used as a computational platform to accelerate the Smith-Waterman algorithm. For large datasets, our implementation on the PlayStation 3 provides a significant improvement in running time compared to other implementations such as SSEARCH, Striped Smith-Waterman and CUDA. Our implementation achieves a peak performance of up to 3,646 MCUPS. The results from our experiments demonstrate that the PlayStation 3 console can be used as an efficient low cost computational platform for high performance sequence alignment applications.

  19. Tidying up international nucleotide sequence databases: ecological, geographical and sequence quality annotation of its sequences of mycorrhizal fungi.

    Science.gov (United States)

    Tedersoo, Leho; Abarenkov, Kessy; Nilsson, R Henrik; Schüssler, Arthur; Grelet, Gwen-Aëlle; Kohout, Petr; Oja, Jane; Bonito, Gregory M; Veldre, Vilmar; Jairus, Teele; Ryberg, Martin; Larsson, Karl-Henrik; Kõljalg, Urmas

    2011-01-01

    Sequence analysis of the ribosomal RNA operon, particularly the internal transcribed spacer (ITS) region, provides a powerful tool for identification of mycorrhizal fungi. The sequence data deposited in the International Nucleotide Sequence Databases (INSD) are, however, unfiltered for quality and are often poorly annotated with metadata. To detect chimeric and low-quality sequences and assign the ectomycorrhizal fungi to phylogenetic lineages, fungal ITS sequences were downloaded from INSD, aligned within family-level groups, and examined through phylogenetic analyses and BLAST searches. By combining the fungal sequence database UNITE and the annotation and search tool PlutoF, we also added metadata from the literature to these accessions. Altogether 35,632 sequences belonged to mycorrhizal fungi or originated from ericoid and orchid mycorrhizal roots. Of these sequences, 677 were considered chimeric and 2,174 of low read quality. Information detailing country of collection, geographical coordinates, interacting taxon and isolation source were supplemented to cover 78.0%, 33.0%, 41.7% and 96.4% of the sequences, respectively. These annotated sequences are publicly available via UNITE (http://unite.ut.ee/) for downstream biogeographic, ecological and taxonomic analyses. In European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena/), the annotated sequences have a special link-out to UNITE. We intend to expand the data annotation to additional genes and all taxonomic groups and functional guilds of fungi.

  20. Differentially Private Event Histogram Publication on Sequences over Graphs

    Institute of Scientific and Technical Information of China (English)

    Ning Wang; Yu Gu; Jia Xu; Fang-Fang Li; Ge Yu

    2017-01-01

    The big data era is coming with strong and ever-growing demands on analyzing personal information and footprints in the cyber world. To enable such analysis without privacy leak risk, differential privacy (DP) has been quickly rising in recent years, as the first practical privacy protection model with rigorous theoretical guarantee. This paper discusses how to publish differentially private histograms on events in time series domain, with sequences of personal events over graphs with events as edges. Such individual-generated sequences commonly appear in formalized industrial workflows, online game logs, and spatial-temporal trajectories. Directly publishing the statistics of sequences may compromise personal privacy. While existing DP mechanisms mainly target at normalized domains with fixed and aligned dimensions, our problem raises new challenges when the sequences could follow arbitrary paths on the graph. To tackle the problem, we reformulate the problem with a three-step framework, which 1) carefully truncates the original sequences, trading off errors introduced by the truncation with those introduced by the noise added to guarantee privacy, 2) decomposes the event graph into path sub-domains based on a group of event pivots, and 3) employs a deeply optimized tree-based histogram construction approach for each sub-domain to benefit with less noise addition. We present a careful analysis on our framework to support thorough optimizations over each step of the framework, and verify the huge improvements of our proposals over state-of-the-art solutions.

  1. On generalized fixed sequence procedures for controlling the FWER.

    Science.gov (United States)

    Qiu, Zhiying; Guo, Wenge; Lynch, Gavin

    2015-12-30

    Testing a sequence of pre-ordered hypotheses to decide which of these can be rejected or accepted while controlling the familywise error rate (FWER) is of importance in many scientific studies such as clinical trials. In this paper, we first introduce a generalized fixed sequence procedure whose critical values are defined by using a function of the numbers of rejections and acceptances, and which allows follow-up hypotheses to be tested even if some earlier hypotheses are not rejected. We then construct the least favorable configuration for this generalized fixed sequence procedure and present a sufficient condition for the FWER control under arbitrary dependence. Based on the condition, we develop three new generalized fixed sequence procedures controlling the FWER under arbitrary dependence. We also prove that each generalized fixed sequence procedure can be described as a specific closed testing procedure. Through simulation studies and a clinical trial example, we compare the power performance of these proposed procedures with those of the existing FWER controlling procedures. Finally, when the pairwise joint distributions of the true null p-values are known, we further improve these procedures by incorporating pairwise correlation information while maintaining the control of the FWER. Copyright © 2015 John Wiley & Sons, Ltd. Copyright © 2015 John Wiley & Sons, Ltd.

  2. The consolidation of implicit sequence memory in obstructive sleep apnea.

    Directory of Open Access Journals (Sweden)

    Eszter Csabi

    Full Text Available Obstructive Sleep Apnea (OSA Syndrome is a relatively frequent sleep disorder characterized by disrupted sleep patterns. It is a well-established fact that sleep has beneficial effect on memory consolidation by enhancing neural plasticity. Implicit sequence learning is a prominent component of skill learning. However, the formation and consolidation of this fundamental learning mechanism remains poorly understood in OSA. In the present study we examined the consolidation of different aspects of implicit sequence learning in patients with OSA. We used the Alternating Serial Reaction Time task to measure general skill learning and sequence-specific learning. There were two sessions: a learning phase and a testing phase, separated by a 10-hour offline period with sleep. Our data showed differences in offline changes of general skill learning between the OSA and control group. The control group demonstrated offline improvement from evening to morning, while the OSA group did not. In contrast, we did not observe differences between the groups in offline changes in sequence-specific learning. Our findings suggest that disrupted sleep in OSA differently affects neural circuits involved in the consolidation of sequence learning.

  3. Can you sequence ecology? Metagenomics of adaptive diversification.

    Science.gov (United States)

    Marx, Christopher J

    2013-01-01

    Few areas of science have benefited more from the expansion in sequencing capability than the study of microbial communities. Can sequence data, besides providing hypotheses of the functions the members possess, detect the evolutionary and ecological processes that are occurring? For example, can we determine if a species is adapting to one niche, or if it is diversifying into multiple specialists that inhabit distinct niches? Fortunately, adaptation of populations in the laboratory can serve as a model to test our ability to make such inferences about evolution and ecology from sequencing. Even adaptation to a single niche can give rise to complex temporal dynamics due to the transient presence of multiple competing lineages. If there are multiple niches, this complexity is augmented by segmentation of the population into multiple specialists that can each continue to evolve within their own niche. For a known example of parallel diversification that occurred in the laboratory, sequencing data gave surprisingly few obvious, unambiguous signs of the ecological complexity present. Whereas experimental systems are open to direct experimentation to test hypotheses of selection or ecological interaction, the difficulty in "seeing ecology" from sequencing for even such a simple system suggests translation to communities like the human microbiome will be quite challenging. This will require both improved empirical methods to enhance the depth and time resolution for the relevant polymorphisms and novel statistical approaches to rigorously examine time-series data for signs of various evolutionary and ecological phenomena within and between species.

  4. Bacterial identification and subtyping using DNA microarray and DNA sequencing.

    Science.gov (United States)

    Al-Khaldi, Sufian F; Mossoba, Magdi M; Allard, Marc M; Lienau, E Kurt; Brown, Eric D

    2012-01-01

    The era of fast and accurate discovery of biological sequence motifs in prokaryotic and eukaryotic cells is here. The co-evolution of direct genome sequencing and DNA microarray strategies not only will identify, isotype, and serotype pathogenic bacteria, but also it will aid in the discovery of new gene functions by detecting gene expressions in different diseases and environmental conditions. Microarray bacterial identification has made great advances in working with pure and mixed bacterial samples. The technological advances have moved beyond bacterial gene expression to include bacterial identification and isotyping. Application of new tools such as mid-infrared chemical imaging improves detection of hybridization in DNA microarrays. The research in this field is promising and future work will reveal the potential of infrared technology in bacterial identification. On the other hand, DNA sequencing by using 454 pyrosequencing is so cost effective that the promise of $1,000 per bacterial genome sequence is becoming a reality. Pyrosequencing technology is a simple to use technique that can produce accurate and quantitative analysis of DNA sequences with a great speed. The deposition of massive amounts of bacterial genomic information in databanks is creating fingerprint phylogenetic analysis that will ultimately replace several technologies such as Pulsed Field Gel Electrophoresis. In this chapter, we will review (1) the use of DNA microarray using fluorescence and infrared imaging detection for identification of pathogenic bacteria, and (2) use of pyrosequencing in DNA cluster analysis to fingerprint bacterial phylogenetic trees.

  5. Protein Function Prediction Based on Sequence and Structure Information

    KAUST Repository

    Smaili, Fatima Z.

    2016-05-25

    The number of available protein sequences in public databases is increasing exponentially. However, a significant fraction of these sequences lack functional annotation which is essential to our understanding of how biological systems and processes operate. In this master thesis project, we worked on inferring protein functions based on the primary protein sequence. In the approach we follow, 3D models are first constructed using I-TASSER. Functions are then deduced by structurally matching these predicted models, using global and local similarities, through three independent enzyme commission (EC) and gene ontology (GO) function libraries. The method was tested on 250 “hard” proteins, which lack homologous templates in both structure and function libraries. The results show that this method outperforms the conventional prediction methods based on sequence similarity or threading. Additionally, our method could be improved even further by incorporating protein-protein interaction information. Overall, the method we use provides an efficient approach for automated functional annotation of non-homologous proteins, starting from their sequence.

  6. Semi-Supervised Learning for Classification of Protein Sequence Data

    Directory of Open Access Journals (Sweden)

    Brian R. King

    2008-01-01

    Full Text Available Protein sequence data continue to become available at an exponential rate. Annotation of functional and structural attributes of these data lags far behind, with only a small fraction of the data understood and labeled by experimental methods. Classification methods that are based on semi-supervised learning can increase the overall accuracy of classifying partly labeled data in many domains, but very few methods exist that have shown their effect on protein sequence classification. We show how proven methods from text classification can be applied to protein sequence data, as we consider both existing and novel extensions to the basic methods, and demonstrate restrictions and differences that must be considered. We demonstrate comparative results against the transductive support vector machine, and show superior results on the most difficult classification problems. Our results show that large repositories of unlabeled protein sequence data can indeed be used to improve predictive performance, particularly in situations where there are fewer labeled protein sequences available, and/or the data are highly unbalanced in nature.

  7. Complete genome sequence of Campylobacter jejuni strain 12567 a livestock-associated clade representative

    Science.gov (United States)

    We report the complete genome sequence of the Campylobacter jejuni strain 12567, a member of a C. jejuni livestock-associated clade that expresses glycoconjugates linked to improved gastrointestinal tract persistence....

  8. Genomic Prediction from Whole Genome Sequence in Livestock: The 1000 Bull Genomes Project

    DEFF Research Database (Denmark)

    Hayes, Benjamin J; MacLeod, Iona M; Daetwyler, Hans D

    Advantages of using whole genome sequence data to predict genomic estimated breeding values (GEBV) include better persistence of accuracy of GEBV across generations and more accurate GEBV across breeds. The 1000 Bull Genomes Project provides a database of whole genome sequenced key ancestor bulls....... In a dairy data set, predictions using BayesRC and imputed sequence data from 1000 Bull Genomes were 2% more accurate than with 800k data. We could demonstrate the method identified causal mutations in some cases. Further improvements will come from more accurate imputation of sequence variant genotypes...

  9. Noisy: Identification of problematic columns in multiple sequence alignments

    Directory of Open Access Journals (Sweden)

    Grünewald Stefan

    2008-06-01

    Full Text Available Abstract Motivation Sequence-based methods for phylogenetic reconstruction from (nucleic acid sequence data are notoriously plagued by two effects: homoplasies and alignment errors. Large evolutionary distances imply a large number of homoplastic sites. As most protein-coding genes show dramatic variations in substitution rates that are not uncorrelated across the sequence, this often leads to a patchwork pattern of (i phylogenetically informative and (ii effectively randomized regions. In highly variable regions, furthermore, alignment errors accumulate resulting in sometimes misleading signals in phylogenetic reconstruction. Results We present here a method that, based on assessing the distribution of character states along a cyclic ordering of the taxa, allows the identification of phylogenetically uninformative homoplastic sites in a multiple sequence alignment. Removal of these sites appears to improve the performance of phylogenetic reconstruction algorithms as measured by various indices of "tree quality". In particular, we obtain more stable trees due to the exclusion of phylogenetically incompatible sites that most likely represent strongly randomized characters. Software The computer program noisy implements this approach. It can be employed to improving phylogenetic reconstruction capability with quite a considerable success rate whenever (1 the average bootstrap support obtained from the original alignment is low, and (2 there are sufficiently many taxa in the data set – at least, say, 12 to 15 taxa. The software can be obtained under the GNU Public License from http://www.bioinf.uni-leipzig.de/Software/noisy/.

  10. Digital chaotic sequence generator based on coupled chaotic systems

    International Nuclear Information System (INIS)

    Shu-Bo, Liu; Jing, Sun; Jin-Shuo, Liu; Zheng-Quan, Xu

    2009-01-01

    Chaotic systems perform well as a new rich source of cryptography and pseudo-random coding. Unfortunately their digital dynamical properties would degrade due to the finite computing precision. Proposed in this paper is a modified digital chaotic sequence generator based on chaotic logistic systems with a coupling structure where one chaotic subsystem generates perturbation signals to disturb the control parameter of the other one. The numerical simulations show that the length of chaotic orbits, the output distribution of chaotic system, and the security of chaotic sequences have been greatly improved. Moreover the chaotic sequence period can be extended at least by one order of magnitude longer than that of the uncoupled logistic system and the difficulty in decrypting increases 2 128 *2 128 times indicating that the dynamical degradation of digital chaos is effectively improved. A field programmable gate array (FPGA) implementation of an algorithm is given and the corresponding experiment shows that the output speed of the generated chaotic sequences can reach 571.4 Mbps indicating that the designed generator can be applied to the real-time video image encryption. (general)

  11. Sequence robust association test for familial data.

    Science.gov (United States)

    Dai, Wei; Yang, Ming; Wang, Chaolong; Cai, Tianxi

    2017-09-01

    Genome-wide association studies (GWAS) and next generation sequencing studies (NGSS) are often performed in family studies to improve power in identifying genetic variants that are associated with clinical phenotypes. Efficient analysis of genome-wide studies with familial data is challenging due to the difficulty in modeling shared but unmeasured genetic and/or environmental factors that cause dependencies among family members. Existing genetic association testing procedures for family studies largely rely on generalized estimating equations (GEE) or linear mixed-effects (LME) models. These procedures may fail to properly control for type I errors when the imposed model assumptions fail. In this article, we propose the Sequence Robust Association Test (SRAT), a fully rank-based, flexible approach that tests for association between a set of genetic variants and an outcome, while accounting for within-family correlation and adjusting for covariates. Comparing to existing methods, SRAT has the advantages of allowing for unknown correlation structures and weaker assumptions about the outcome distribution. We provide theoretical justifications for SRAT and show that SRAT includes the well-known Wilcoxon rank sum test as a special case. Extensive simulation studies suggest that SRAT provides better protection against type I error rate inflation, and could be much more powerful for settings with skewed outcome distribution than existing methods. For illustration, we also apply SRAT to the familial data from the Framingham Heart Study and Offspring Study to examine the association between an inflammatory marker and a few sets of genetic variants. © 2017, The International Biometric Society.

  12. Rapid whole genome sequencing and precision neonatology.

    Science.gov (United States)

    Petrikin, Joshua E; Willig, Laurel K; Smith, Laurie D; Kingsmore, Stephen F

    2015-12-01

    Traditionally, genetic testing has been too slow or perceived to be impractical to initial management of the critically ill neonate. Technological advances have led to the ability to sequence and interpret the entire genome of a neonate in as little as 26 h. As the cost and speed of testing decreases, the utility of whole genome sequencing (WGS) of neonates for acute and latent genetic illness increases. Analyzing the entire genome allows for concomitant evaluation of the currently identified 5588 single gene diseases. When applied to a select population of ill infants in a level IV neonatal intensive care unit, WGS yielded a diagnosis of a causative genetic disease in 57% of patients. These diagnoses may lead to clinical management changes ranging from transition to palliative care for uniformly lethal conditions for alteration or initiation of medical or surgical therapy to improve outcomes in others. Thus, institution of 2-day WGS at time of acute presentation opens the possibility of early implementation of precision medicine. This implementation may create opportunities for early interventional, frequently novel or off-label therapies that may alter disease trajectory in infants with what would otherwise be fatal disease. Widespread deployment of rapid WGS and precision medicine will raise ethical issues pertaining to interpretation of variants of unknown significance, discovery of incidental findings related to adult onset conditions and carrier status, and implementation of medical therapies for which little is known in terms of risks and benefits. Despite these challenges, precision neonatology has significant potential both to decrease infant mortality related to genetic diseases with onset in newborns and to facilitate parental decision making regarding transition to palliative care. Copyright © 2015 Elsevier Inc. All rights reserved.

  13. Sequencing the exposome: A call to action

    Directory of Open Access Journals (Sweden)

    Dean P. Jones

    Full Text Available The exposome is a complement to the genome that includes non-genetic causes of disease. Multiple definitions are available, with salient points being global inclusion of exposures and behaviors, and cumulative integration of associated biologic responses. As such, the concept is both refreshingly simple and dauntingly complex. This article reviews high-resolution metabolomics (HRM as an affordable approach to routinely analyze samples for a broad spectrum of environmental chemicals and biologic responses. HRM has been successfully used in multiple exposome research paradigms and is suitable to implement in a prototype universal exposure surveillance system. Development of such a structure for systematic monitoring of environmental exposures is an important step toward sequencing the exposome because it builds upon successes of exposure science, naturally connects external exposure to body burden and partitions the exposome into workable components. Practical results would be repositories of quantitative data on chemicals according to geography and biology. This would support new opportunities for environmental health analysis and predictive modeling. Complementary approaches to hasten development of exposome theory and associated biologic response networks could include experimental studies with model systems, analysis of archival samples from longitudinal studies with outcome data and study of relatively short-lived animals, such as household pets (dogs and cats and non-human primates (common marmoset. International investment and cooperation to sequence the human exposome will advance scientific knowledge and also provide an important foundation to control adverse environmental exposures to sustain healthy living spaces and improve prediction and management of disease. Keywords: Mass spectrometry, Biomonitoring, Analytical chemistry, Metabolomics, Environmental surveillance

  14. Permutation Entropy for Random Binary Sequences

    Directory of Open Access Journals (Sweden)

    Lingfeng Liu

    2015-12-01

    Full Text Available In this paper, we generalize the permutation entropy (PE measure to binary sequences, which is based on Shannon’s entropy, and theoretically analyze this measure for random binary sequences. We deduce the theoretical value of PE for random binary sequences, which can be used to measure the randomness of binary sequences. We also reveal the relationship between this PE measure with other randomness measures, such as Shannon’s entropy and Lempel–Ziv complexity. The results show that PE is consistent with these two measures. Furthermore, we use PE as one of the randomness measures to evaluate the randomness of chaotic binary sequences.

  15. Genome Improvement at JGI-HAGSC

    Energy Technology Data Exchange (ETDEWEB)

    Grimwood, Jane; Schmutz, Jeremy J.; Myers, Richard M.

    2012-03-03

    Since the completion of the sequencing of the human genome, the Joint Genome Institute (JGI) has rapidly expanded its scientific goals in several DOE mission-relevant areas. At the JGI-HAGSC, we have kept pace with this rapid expansion of projects with our focus on assessing, assembling, improving and finishing eukaryotic whole genome shotgun (WGS) projects for which the shotgun sequence is generated at the Production Genomic Facility (JGI-PGF). We follow this by combining the draft WGS with genomic resources generated at JGI-HAGSC or in collaborator laboratories (including BAC end sequences, genetic maps and FLcDNA sequences) to produce an improved draft sequence. For eukaryotic genomes important to the DOE mission, we then add further information from directed experiments to produce reference genomic sequences that are publicly available for any scientific researcher. Also, we have continued our program for producing BAC-based finished sequence, both for adding information to JGI genome projects and for small BAC-based sequencing projects proposed through any of the JGI sequencing programs. We have now built our computational expertise in WGS assembly and analysis and have moved eukaryotic genome assembly from the JGI-PGF to JGI-HAGSC. We have concentrated our assembly development work on large plant genomes and complex fungal and algal genomes.

  16. Prediction of Protein Structural Classes for Low-Similarity Sequences Based on Consensus Sequence and Segmented PSSM

    Directory of Open Access Journals (Sweden)

    Yunyun Liang

    2015-01-01

    Full Text Available Prediction of protein structural classes for low-similarity sequences is useful for understanding fold patterns, regulation, functions, and interactions of proteins. It is well known that feature extraction is significant to prediction of protein structural class and it mainly uses protein primary sequence, predicted secondary structure sequence, and position-specific scoring matrix (PSSM. Currently, prediction solely based on the PSSM has played a key role in improving the prediction accuracy. In this paper, we propose a novel method called CSP-SegPseP-SegACP by fusing consensus sequence (CS, segmented PsePSSM, and segmented autocovariance transformation (ACT based on PSSM. Three widely used low-similarity datasets (1189, 25PDB, and 640 are adopted in this paper. Then a 700-dimensional (700D feature vector is constructed and the dimension is decreased to 224D by using principal component analysis (PCA. To verify the performance of our method, rigorous jackknife cross-validation tests are performed on 1189, 25PDB, and 640 datasets. Comparison of our results with the existing PSSM-based methods demonstrates that our method achieves the favorable and competitive performance. This will offer an important complementary to other PSSM-based methods for prediction of protein structural classes for low-similarity sequences.

  17. The 2016 Kumamoto earthquake sequence.

    Science.gov (United States)

    Kato, Aitaro; Nakamura, Kouji; Hiyama, Yohei

    2016-01-01

    Beginning in April 2016, a series of shallow, moderate to large earthquakes with associated strong aftershocks struck the Kumamoto area of Kyushu, SW Japan. An M j 7.3 mainshock occurred on 16 April 2016, close to the epicenter of an M j 6.5 foreshock that occurred about 28 hours earlier. The intense seismicity released the accumulated elastic energy by right-lateral strike slip, mainly along two known, active faults. The mainshock rupture propagated along multiple fault segments with different geometries. The faulting style is reasonably consistent with regional deformation observed on geologic timescales and with the stress field estimated from seismic observations. One striking feature of this sequence is intense seismic activity, including a dynamically triggered earthquake in the Oita region. Following the mainshock rupture, postseismic deformation has been observed, as well as expansion of the seismicity front toward the southwest and northwest.

  18. Data selector group sequencer interface

    International Nuclear Information System (INIS)

    Zizka, G.; Turko, B.

    1984-01-01

    A CAMAC-based module for high rate data selection and transfer to Tracor Northern TN-1700 multichannel analysis system is described. The module can select any group of 4096 consecutive addresses of events, in the range of 24 bits. This module solves the problem of connecting a number of time digitizing systems to the memory of a multichannel analyzer. Continuous processing rate up to 200,000 events per second along with the live display make the testing of the above systems very efficient and relatively inexpensive. The module also can be programmed for storing the preset group of addresses into more than one section of the memory. The events are analyzed in each section of the memory during the preset time. Multiple spectra can thus be taken automatically in a sequence

  19. A main sequence for quasars

    Science.gov (United States)

    Marziani, Paola; Dultzin, Deborah; Sulentic, Jack W.; Del Olmo, Ascensión; Negrete, C. A.; Martínez-Aldama, Mary L.; D'Onofrio, Mauro; Bon, Edi; Bon, Natasa; Stirpe, Giovanna M.

    2018-03-01

    The last 25 years saw a major step forward in the analysis of optical and UV spectroscopic data of large quasar samples. Multivariate statistical approaches have led to the definition of systematic trends in observational properties that are the basis of physical and dynamical modeling of quasar structure. We discuss the empirical correlates of the so-called “main sequence” associated with the quasar Eigenvector 1, its governing physical parameters and several implications on our view of the quasar structure, as well as some luminosity effects associated with the virialized component of the line emitting regions. We also briefly discuss quasars in a segment of the main sequence that includes the strongest FeII emitters. These sources show a small dispersion around a well-defined Eddington ratio value, a property which makes them potential Eddington standard candles.

  20. The 2016 Kumamoto earthquake sequence

    Science.gov (United States)

    KATO, Aitaro; NAKAMURA, Kouji; HIYAMA, Yohei

    2016-01-01

    Beginning in April 2016, a series of shallow, moderate to large earthquakes with associated strong aftershocks struck the Kumamoto area of Kyushu, SW Japan. An Mj 7.3 mainshock occurred on 16 April 2016, close to the epicenter of an Mj 6.5 foreshock that occurred about 28 hours earlier. The intense seismicity released the accumulated elastic energy by right-lateral strike slip, mainly along two known, active faults. The mainshock rupture propagated along multiple fault segments with different geometries. The faulting style is reasonably consistent with regional deformation observed on geologic timescales and with the stress field estimated from seismic observations. One striking feature of this sequence is intense seismic activity, including a dynamically triggered earthquake in the Oita region. Following the mainshock rupture, postseismic deformation has been observed, as well as expansion of the seismicity front toward the southwest and northwest. PMID:27725474

  1. A Main Sequence for Quasars

    Directory of Open Access Journals (Sweden)

    Paola Marziani

    2018-03-01

    Full Text Available The last 25 years saw a major step forward in the analysis of optical and UV spectroscopic data of large quasar samples. Multivariate statistical approaches have led to the definition of systematic trends in observational properties that are the basis of physical and dynamical modeling of quasar structure. We discuss the empirical correlates of the so-called “main sequence” associated with the quasar Eigenvector 1, its governing physical parameters and several implications on our view of the quasar structure, as well as some luminosity effects associated with the virialized component of the line emitting regions. We also briefly discuss quasars in a segment of the main sequence that includes the strongest FeII emitters. These sources show a small dispersion around a well-defined Eddington ratio value, a property which makes them potential Eddington standard candles.

  2. RANDNA: a random DNA sequence generator.

    Science.gov (United States)

    Piva, Francesco; Principato, Giovanni

    2006-01-01

    Monte Carlo simulations are useful to verify the significance of data. Genomic regularities, such as the nucleotide correlations or the not uniform distribution of the motifs throughout genomic or mature mRNA sequences, exist and their significance can be checked by means of the Monte Carlo test. The test needs good quality random sequences in order to work, moreover they should have the same nucleotide distribution as the sequences in which the regularities have been found. Random DNA sequences are also useful to estimate the background score of an alignment, that is a threshold below which the resulting score is merely due to chance. We have developed RANDNA, a free software which allows to produce random DNA or RNA sequences setting both their length and the percentage of nucleotide composition. Sequences having the same nucleotide distribution of exonic, intronic or intergenic sequences can be generated. Its graphic interface makes it possible to easily set the parameters that characterize the sequences being produced and saved in a text format file. The pseudo-random number generator function of Borland Delphi 6 is used, since it guarantees a good randomness, a long cycle length and a high speed. We have checked the quality of sequences generated by the software, by means of well-known tests, both by themselves and versus genuine random sequences. We show the good quality of the generated sequences. The software, complete with examples and documentation, is freely available to users from: http://www.introni.it/en/software.

  3. Whole Exome Sequencing in Pediatric Neurology Patients: Clinical Implications and Estimated Cost Analysis.

    Science.gov (United States)

    Nolan, Danielle; Carlson, Martha

    2016-06-01

    Genetic heterogeneity in neurologic disorders has been an obstacle to phenotype-based diagnostic testing. The authors hypothesized that information compiled via whole exome sequencing will improve clinical diagnosis and management of pediatric neurology patients. The authors performed a retrospective chart review of patients evaluated in the University of Michigan Pediatric Neurology clinic between 6/2011 and 6/2015. The authors recorded previous diagnostic testing, indications for whole exome sequencing, and whole exome sequencing results. Whole exome sequencing was recommended for 135 patients and obtained in 53 patients. Insurance barriers often precluded whole exome sequencing. The most common indication for whole exome sequencing was neurodevelopmental disorders. Whole exome sequencing improved the presumptive diagnostic rate in the patient cohort from 25% to 48%. Clinical implications included family planning, medication selection, and systemic investigation. Compared to current second tier testing, whole exome sequencing can result in lower long-term charges and more timely diagnosis. Overcoming barriers related to whole exome sequencing insurance authorization could allow for more efficient and fruitful diagnostic neurological evaluations. © The Author(s) 2016.

  4. Sleep and memory consolidation: motor performance and proactive interference effects in sequence learning.

    Science.gov (United States)

    Borragán, Guillermo; Urbain, Charline; Schmitz, Rémy; Mary, Alison; Peigneux, Philippe

    2015-04-01

    That post-training sleep supports the consolidation of sequential motor skills remains debated. Performance improvement and sensitivity to proactive interference are both putative measures of long-term memory consolidation. We tested sleep-dependent memory consolidation for visuo-motor sequence learning using a proactive interference paradigm. Thirty-three young adults were trained on sequence A on Day 1, then had Regular Sleep (RS) or were Sleep Deprived (SD) on the night after learning. After two recovery nights, they were tested on the same sequence A, then had to learn a novel, potentially competing sequence B. We hypothesized that proactive interference effects on sequence B due to the prior learning of sequence A would be higher in the RS condition, considering that proactive interference is an indirect marker of the robustness of sequence A, which should be better consolidated over post-training sleep. Results highlighted sleep-dependent improvement for sequence A, with faster RTs overnight for RS participants only. Moreover, the beneficial impact of sleep was specific to the consolidation of motor but not sequential skills. Proactive interference effects on learning a new material at Day 4 were similar between RS and SD participants. These results suggest that post-training sleep contributes to optimizing motor but not sequential components of performance in visuo-motor sequence learning. Copyright © 2015 Elsevier Inc. All rights reserved.

  5. SUGAR: graphical user interface-based data refiner for high-throughput DNA sequencing.

    Science.gov (United States)

    Sato, Yukuto; Kojima, Kaname; Nariai, Naoki; Yamaguchi-Kabata, Yumi; Kawai, Yosuke; Takahashi, Mamoru; Mimori, Takahiro; Nagasaki, Masao

    2014-08-08

    Next-generation sequencers (NGSs) have become one of the main tools for current biology. To obtain useful insights from the NGS data, it is essential to control low-quality portions of the data affected by technical errors such as air bubbles in sequencing fluidics. We develop a software SUGAR (subtile-based GUI-assisted refiner) which can handle ultra-high-throughput data with user-friendly graphical user interface (GUI) and interactive analysis capability. The SUGAR generates high-resolution quality heatmaps of the flowcell, enabling users to find possible signals of technical errors during the sequencing. The sequencing data generated from the error-affected regions of a flowcell can be selectively removed by automated analysis or GUI-assisted operations implemented in the SUGAR. The automated data-cleaning function based on sequence read quality (Phred) scores was applied to a public whole human genome sequencing data and we proved the overall mapping quality was improved. The detailed data evaluation and cleaning enabled by SUGAR would reduce technical problems in sequence read mapping, improving subsequent variant analysis that require high-quality sequence data and mapping results. Therefore, the software will be especially useful to control the quality of variant calls to the low population cells, e.g., cancers, in a sample with technical errors of sequencing procedures.

  6. Locomotor sequence learning in visually guided walking

    DEFF Research Database (Denmark)

    Choi, Julia T; Jensen, Peter; Nielsen, Jens Bo

    2016-01-01

    walking. In addition, we determined how age (i.e., healthy young adults vs. children) and biomechanical factors (i.e., walking speed) affected the rate and magnitude of locomotor sequence learning. The results showed that healthy young adults (age 24 ± 5 years, N = 20) could learn a specific sequence...... of step lengths over 300 training steps. Younger children (age 6-10 years, N = 8) have lower baseline performance, but their magnitude and rate of sequence learning was the same compared to older children (11-16 years, N = 10) and healthy adults. In addition, learning capacity may be more limited...... to modify step length from one trial to the next. Our sequence learning paradigm is derived from the serial reaction-time (SRT) task that has been used in upper limb studies. Both random and ordered sequences of step lengths were used to measure sequence-specific and sequence non-specific learning during...

  7. Quantum Logic with Composite Pulse Sequences on Sr^+

    Science.gov (United States)

    Shewmon, Ruth; Labaziewicz, Jaroslaw; Ge, Yufei; Wang, Shannon; Chuang, Isaac L.

    2008-03-01

    The optical 5S1/2->4D1/2 transition in Sr^+ is an attractive qubit because it can be addressed by diode lasers, which are relatively inexpensive and easy to operate. We characterize single-qubit rotations as well as a CNOT gate on a Sr^+ ion in a surface electrode Paul trap. To improve these operations, the frequency of the clock laser is stabilized to a high-finesse optical cavity. The resulting linewidth of the laser is approximately 300Hz. Composite pulse sequences, a technique adapted from NMR, have been shown to reduce the effects of systematic errors in a variety of quantum systems. We demonstrate several composite sequences that improve the fidelity of quantum logic operations on Sr^+.

  8. The RNA world, automatic sequences and oncogenetics

    Energy Technology Data Exchange (ETDEWEB)

    Tahir Shah, K

    1993-04-01

    We construct a model of the RNA world in terms of naturally evolving nucleotide sequences assuming only Crick-Watson base pairing and self-cleaving/splicing capability. These sequences have the following properties. (1) They are recognizable by an automation (or automata). That is, to each k-sequence, there exist a k-automation which accepts, recognizes or generates the k-sequence. These are known as automatic sequences. Fibonacci and Morse-Thue sequences are the most natural outcome of pre-biotic chemical conditions. (2) Infinite (resp. large) sequences are self-similar (resp. nearly self-similar) under certain rewrite rules and consequently give rise to fractal (resp.fractal-like) structures. Computationally, such sequences can also be generated by their corresponding deterministic parallel re-write system, known as a DOL system. The self-similar sequences are fixed points of their respective rewrite rules. Some of these automatic sequences have the capability that they can read or ``accept`` other sequences while others can detect errors and trigger error-correcting mechanisms. They can be enlarged and have block and/or palindrome structure. Linear recurring sequences such as Fibonacci sequence are simply Feed-back Shift Registers, a well know model of information processing machines. We show that a mutation of any rewrite rule can cause a combinatorial explosion of error and relates this to oncogenetical behavior. On the other hand, a mutation of sequences that are not rewrite rules, leads to normal evolutionary change. Known experimental results support our hypothesis. (author). Refs.

  9. The RNA world, automatic sequences and oncogenetics

    International Nuclear Information System (INIS)

    Tahir Shah, K.

    1993-04-01

    We construct a model of the RNA world in terms of naturally evolving nucleotide sequences assuming only Crick-Watson base pairing and self-cleaving/splicing capability. These sequences have the following properties. 1) They are recognizable by an automation (or automata). That is, to each k-sequence, there exist a k-automation which accepts, recognizes or generates the k-sequence. These are known as automatic sequences. Fibonacci and Morse-Thue sequences are the most natural outcome of pre-biotic chemical conditions. 2) Infinite (resp. large) sequences are self-similar (resp. nearly self-similar) under certain rewrite rules and consequently give rise to fractal (resp.fractal-like) structures. Computationally, such sequences can also be generated by their corresponding deterministic parallel re-write system, known as a DOL system. The self-similar sequences are fixed points of their respective rewrite rules. Some of these automatic sequences have the capability that they can read or 'accept' other sequences while others can detect errors and trigger error-correcting mechanisms. They can be enlarged and have block and/or palindrome structure. Linear recurring sequences such as Fibonacci sequence are simply Feed-back Shift Registers, a well know model of information processing machines. We show that a mutation of any rewrite rule can cause a combinatorial explosion of error and relates this to oncogenetical behavior. On the other hand, a mutation of sequences that are not rewrite rules, leads to normal evolutionary change. Known experimental results support our hypothesis. (author). Refs

  10. Influence of Layup Sequence on the Surface Accuracy of Carbon Fiber Composite Space Mirrors

    Science.gov (United States)

    Yang, Zhiyong; Liu, Qingnian; Zhang, Boming; Xu, Liang; Tang, Zhanwen; Xie, Yongjie

    2018-04-01

    Layup sequence is directly related to stiffness and deformation resistance of the composite space mirror, and error caused by layup sequence can affect the surface precision of composite mirrors evidently. Variation of layup sequence with the same total thickness of composite space mirror changes surface form of the composite mirror, which is the focus of our study. In our research, the influence of varied quasi-isotropic stacking sequences and random angular deviation on the surface accuracy of composite space mirrors was investigated through finite element analyses (FEA). We established a simulation model for the studied concave mirror with 500 mm diameter, essential factors of layup sequences and random angular deviations on different plies were discussed. Five guiding findings were described in this study. Increasing total plies, optimizing stacking sequence and keeping consistency of ply alignment in ply placement are effective to improve surface accuracy of composite mirror.

  11. The use of DNase I, buffer gradient gel, and 35S label for DNA sequencing

    International Nuclear Information System (INIS)

    Hong, G.F.

    1987-01-01

    The use of microcentrifuge tubes and mixing of sequencing reactions and brief centrifugation in racks rather than the original capillary tube method has made sequencing reactions relatively simple. Buffer gradient gels and 15 S label are simple means of increasing the rate of sequence analysis; they add little time to that required for determining the sequences of a given number of clones, need no elaborate equipment, and increase the amount of useful data per gel. The standard approach of running 2- and 4-hr gels generates about 300 bases of sequence. The above improvements allow the same number of bases to be read with more confidence from a single 50-cm gel for each clone sequenced due to the changed spacing between sharpened bands

  12. Elman RNN based classification of proteins sequences on account of their mutual information.

    Science.gov (United States)

    Mishra, Pooja; Nath Pandey, Paras

    2012-10-21

    In the present work we have employed the method of estimating residue correlation within the protein sequences, by using the mutual information (MI) of adjacent residues, based on structural and solvent accessibility properties of amino acids. The long range correlation between nonadjacent residues is improved by constructing a mutual information vector (MIV) for a single protein sequence, like this each protein sequence is associated with its corresponding MIVs. These MIVs are given to Elman RNN to obtain the classification of protein sequences. The modeling power of MIV was shown to be significantly better, giving a new approach towards alignment free classification of protein sequences. We also conclude that sequence structural and solvent accessible property based MIVs are better predictor. Copyright © 2012 Elsevier Ltd. All rights reserved.

  13. Sparc: a sparsity-based consensus algorithm for long erroneous sequencing reads

    Directory of Open Access Journals (Sweden)

    Chengxi Ye

    2016-06-01

    Full Text Available Motivation. The third generation sequencing (3GS technology generates long sequences of thousands of bases. However, its current error rates are estimated in the range of 15–40%, significantly higher than those of the prevalent next generation sequencing (NGS technologies (less than 1%. Fundamental bioinformatics tasks such as de novo genome assembly and variant calling require high-quality sequences that need to be extracted from these long but erroneous 3GS sequences. Results. We describe a versatile and efficient linear complexity consensus algorithm Sparc to facilitate de novo genome assembly. Sparc builds a sparse k-mer graph using a collection of sequences from a targeted genomic region. The heaviest path which approximates the most likely genome sequence is searched through a sparsity-induced reweighted graph as the consensus sequence. Sparc supports using NGS and 3GS data together, which leads to significant improvements in both cost efficiency and computational efficiency. Experiments with Sparc show that our algorithm can efficiently provide high-quality consensus sequences using both PacBio and Oxford Nanopore sequencing technologies. With only 30× PacBio data, Sparc can reach a consensus with error rate <0.5%. With the more challenging Oxford Nanopore data, Sparc can also achieve similar error rate when combined with NGS data. Compared with the existing approaches, Sparc calculates the consensus with higher accuracy, and uses approximately 80% less memory and time. Availability. The source code is available for download at https://github.com/yechengxi/Sparc.

  14. From Conventional to Next Generation Sequencing of Epstein-Barr Virus Genomes.

    Science.gov (United States)

    Kwok, Hin; Chiang, Alan Kwok Shing

    2016-02-24

    Genomic sequences of Epstein-Barr virus (EBV) have been of interest because the virus is associated with cancers, such as nasopharyngeal carcinoma, and conditions such as infectious mononucleosis. The progress of whole-genome EBV sequencing has been limited by the inefficiency and cost of the first-generation sequencing technology. With the advancement of next-generation sequencing (NGS) and target enrichment strategies, increasing number of EBV genomes has been published. These genomes were sequenced using different approaches, either with or without EBV DNA enrichment. This review provides an overview of the EBV genomes published to date, and a description of the sequencing technology and bioinformatic analyses employed in generating these sequences. We further explored ways through which the quality of sequencing data can be improved, such as using DNA oligos for capture hybridization, and longer insert size and read length in the sequencing runs. These advances will enable large-scale genomic sequencing of EBV which will facilitate a better understanding of the genetic variations of EBV in different geographic regions and discovery of potentially pathogenic variants in specific diseases.

  15. From Conventional to Next Generation Sequencing of Epstein-Barr Virus Genomes

    Directory of Open Access Journals (Sweden)

    Hin Kwok

    2016-02-01

    Full Text Available Genomic sequences of Epstein–Barr virus (EBV have been of interest because the virus is associated with cancers, such as nasopharyngeal carcinoma, and conditions such as infectious mononucleosis. The progress of whole-genome EBV sequencing has been limited by the inefficiency and cost of the first-generation sequencing technology. With the advancement of next-generation sequencing (NGS and target enrichment strategies, increasing number of EBV genomes has been published. These genomes were sequenced using different approaches, either with or without EBV DNA enrichment. This review provides an overview of the EBV genomes published to date, and a description of the sequencing technology and bioinformatic analyses employed in generating these sequences. We further explored ways through which the quality of sequencing data can be improved, such as using DNA oligos for capture hybridization, and longer insert size and read length in the sequencing runs. These advances will enable large-scale genomic sequencing of EBV which will facilitate a better understanding of the genetic variations of EBV in different geographic regions and discovery of potentially pathogenic variants in specific diseases.

  16. Effects of informed consent for individual genome sequencing on relevant knowledge.

    Science.gov (United States)

    Kaphingst, K A; Facio, F M; Cheng, M-R; Brooks, S; Eidem, H; Linn, A; Biesecker, B B; Biesecker, L G

    2012-11-01

    Increasing availability of individual genomic information suggests that patients will need knowledge about genome sequencing to make informed decisions, but prior research is limited. In this study, we examined genome sequencing knowledge before and after informed consent among 311 participants enrolled in the ClinSeq™ sequencing study. An exploratory factor analysis of knowledge items yielded two factors (sequencing limitations knowledge; sequencing benefits knowledge). In multivariable analysis, high pre-consent sequencing limitations knowledge scores were significantly related to education [odds ratio (OR): 8.7, 95% confidence interval (CI): 2.45-31.10 for post-graduate education, and OR: 3.9; 95% CI: 1.05, 14.61 for college degree compared with less than college degree] and race/ethnicity (OR: 2.4, 95% CI: 1.09, 5.38 for non-Hispanic Whites compared with other racial/ethnic groups). Mean values increased significantly between pre- and post-consent for the sequencing limitations knowledge subscale (6.9-7.7, p benefits knowledge subscale (7.0-7.5, p < 0.0001); increase in knowledge did not differ by sociodemographic characteristics. This study highlights gaps in genome sequencing knowledge and underscores the need to target educational efforts toward participants with less education or from minority racial/ethnic groups. The informed consent process improved genome sequencing knowledge. Future studies could examine how genome sequencing knowledge influences informed decision making. © 2012 John Wiley & Sons A/S.

  17. WildSpan: mining structured motifs from protein sequences

    Directory of Open Access Journals (Sweden)

    Chen Chien-Yu

    2011-03-01

    of WildSpan is developed for discovering functional regions of a single protein by referring to a set of related sequences (e.g. its homologues. The discovered W-patterns are used to characterize the protein sequence and the results are compared with the conserved positions identified by multiple sequence alignment (MSA. The family-based mining mode of WildSpan is developed for extracting sequence signatures for a group of related proteins (e.g. a protein family for protein function classification. In this situation, the discovered W-patterns are compared with PROSITE patterns as well as the patterns generated by three existing methods performing the similar task. Finally, analysis on execution time of running WildSpan reveals that the proposed pruning strategy is effective in improving the scalability of the proposed algorithm. Conclusions The mining results conducted in this study reveal that WildSpan is efficient and effective in discovering functional signatures of proteins directly from sequences. The proposed pruning strategy is effective in improving the scalability of WildSpan. It is demonstrated in this study that the W-patterns discovered by WildSpan provides useful information in characterizing protein sequences. The WildSpan executable and open source codes are available on the web (http://biominer.csie.cyu.edu.tw/wildspan.

  18. Generalized min-max bound-based MRI pulse sequence design framework for wide-range T1 relaxometry: A case study on the tissue specific imaging sequence.

    Directory of Open Access Journals (Sweden)

    Yang Liu

    Full Text Available This paper proposes a new design strategy for optimizing MRI pulse sequences for T1 relaxometry. The design strategy optimizes the pulse sequence parameters to minimize the maximum variance of unbiased T1 estimates over a range of T1 values using the Cramér-Rao bound. In contrast to prior sequences optimized for a single nominal T1 value, the optimized sequence using our bound-based strategy achieves improved precision and accuracy for a broad range of T1 estimates within a clinically feasible scan time. The optimization combines the downhill simplex method with a simulated annealing process. To show the effectiveness of the proposed strategy, we optimize the tissue specific imaging (TSI sequence. Preliminary Monte Carlo simulations demonstrate that the optimized TSI sequence yields improved precision and accuracy over the popular driven-equilibrium single-pulse observation of T1 (DESPOT1 approach for normal brain tissues (estimated T1 700-2000 ms at 3.0T. The relative mean estimation error (MSE for T1 estimation is less than 1.7% using the optimized TSI sequence, as opposed to less than 7.0% using DESPOT1 for normal brain tissues. The optimized TSI sequence achieves good stability by keeping the MSE under 7.0% over larger T1 values corresponding to different lesion tissues and the cerebrospinal fluid (up to 5000 ms. The T1 estimation accuracy using the new pulse sequence also shows improvement, which is more pronounced in low SNR scenarios.

  19. Targeted assembly of short sequence reads.

    Directory of Open Access Journals (Sweden)

    René L Warren

    Full Text Available As next-generation sequence (NGS production continues to increase, analysis is becoming a significant bottleneck. However, in situations where information is required only for specific sequence variants, it is not necessary to assemble or align whole genome data sets in their entirety. Rather, NGS data sets can be mined for the presence of sequence variants of interest by localized assembly, which is a faster, easier, and more accurate approach. We present TASR, a streamlined assembler that interrogates very large NGS data sets for the presence of specific variants by only considering reads within the sequence space of input target sequences provided by the user. The NGS data set is searched for reads with an exact match to all possible short words within the target sequence, and these reads are then assembled stringently to generate a consensus of the target and flanking sequence. Typically, variants of a particular locus are provided as different target sequences, and the presence of the variant in the data set being interrogated is revealed by a successful assembly outcome. However, TASR can also be used to find unknown sequences that flank a given target. We demonstrate that TASR has utility in finding or confirming genomic mutations, polymorphisms, fusions and integration events. Targeted assembly is a powerful method for interrogating large data sets for the presence of sequence variants of interest. TASR is a fast, flexible and easy to use tool for targeted assembly.

  20. Time fluctuation analysis of forest fire sequences

    Science.gov (United States)

    Vega Orozco, Carmen D.; Kanevski, Mikhaïl; Tonini, Marj; Golay, Jean; Pereira, Mário J. G.

    2013-04-01

    Forest fires are complex events involving both space and time fluctuations. Understanding of their dynamics and pattern distribution is of great importance in order to improve the resource allocation and support fire management actions at local and global levels. This study aims at characterizing the temporal fluctuations of forest fire sequences observed in Portugal, which is the country that holds the largest wildfire land dataset in Europe. This research applies several exploratory data analysis measures to 302,000 forest fires occurred from 1980 to 2007. The applied clustering measures are: Morisita clustering index, fractal and multifractal dimensions (box-counting), Ripley's K-function, Allan Factor, and variography. These algorithms enable a global time structural analysis describing the degree of clustering of a point pattern and defining whether the observed events occur randomly, in clusters or in a regular pattern. The considered methods are of general importance and can be used for other spatio-temporal events (i.e. crime, epidemiology, biodiversity, geomarketing, etc.). An important contribution of this research deals with the analysis and estimation of local measures of clustering that helps understanding their temporal structure. Each measure is described and executed for the raw data (forest fires geo-database) and results are compared to reference patterns generated under the null hypothesis of randomness (Poisson processes) embedded in the same time period of the raw data. This comparison enables estimating the degree of the deviation of the real data from a Poisson process. Generalizations to functional measures of these clustering methods, taking into account the phenomena, were also applied and adapted to detect time dependences in a measured variable (i.e. burned area). The time clustering of the raw data is compared several times with the Poisson processes at different thresholds of the measured function. Then, the clustering measure value

  1. Model morphing and sequence assignment after molecular replacement

    Energy Technology Data Exchange (ETDEWEB)

    Terwilliger, Thomas C., E-mail: terwilliger@lanl.gov [Los Alamos National Laboratory, Mail Stop M888, Los Alamos, NM 87545 (United States); Read, Randy J. [University of Cambridge, Cambridge Institute for Medical Research, Cambridge CB2 0XY (United Kingdom); Adams, Paul D. [Lawrence Berkeley National Laboratory, One Cyclotron Road, Bldg 64R0121, Berkeley, CA 94720 (United States); Brunger, Axel T. [Stanford University, 318 Campus Drive West, Stanford, CA 94305 (United States); Afonine, Pavel V. [Lawrence Berkeley National Laboratory, One Cyclotron Road, Bldg 64R0121, Berkeley, CA 94720 (United States); Hung, Li-Wei [Los Alamos National Laboratory, Mail Stop M888, Los Alamos, NM 87545 (United States)

    2013-11-01

    A procedure for model building is described that combines morphing a model to match a density map, trimming the morphed model and aligning the model to a sequence. A procedure termed ‘morphing’ for improving a model after it has been placed in the crystallographic cell by molecular replacement has recently been developed. Morphing consists of applying a smooth deformation to a model to make it match an electron-density map more closely. Morphing does not change the identities of the residues in the chain, only their coordinates. Consequently, if the true structure differs from the working model by containing different residues, these differences cannot be corrected by morphing. Here, a procedure that helps to address this limitation is described. The goal of the procedure is to obtain a relatively complete model that has accurate main-chain atomic positions and residues that are correctly assigned to the sequence. Residues in a morphed model that do not match the electron-density map are removed. Each segment of the resulting trimmed morphed model is then assigned to the sequence of the molecule using information about the connectivity of the chains from the working model and from connections that can be identified from the electron-density map. The procedure was tested by application to a recently determined structure at a resolution of 3.2 Å and was found to increase the number of correctly identified residues in this structure from the 88 obtained using phenix.resolve sequence assignment alone (Terwilliger, 2003 ▶) to 247 of a possible 359. Additionally, the procedure was tested by application to a series of templates with sequence identities to a target structure ranging between 7 and 36%. The mean fraction of correctly identified residues in these cases was increased from 33% using phenix.resolve sequence assignment to 47% using the current procedure. The procedure is simple to apply and is available in the Phenix software package.

  2. Model morphing and sequence assignment after molecular replacement

    International Nuclear Information System (INIS)

    Terwilliger, Thomas C.; Read, Randy J.; Adams, Paul D.; Brunger, Axel T.; Afonine, Pavel V.; Hung, Li-Wei

    2013-01-01

    A procedure for model building is described that combines morphing a model to match a density map, trimming the morphed model and aligning the model to a sequence. A procedure termed ‘morphing’ for improving a model after it has been placed in the crystallographic cell by molecular replacement has recently been developed. Morphing consists of applying a smooth deformation to a model to make it match an electron-density map more closely. Morphing does not change the identities of the residues in the chain, only their coordinates. Consequently, if the true structure differs from the working model by containing different residues, these differences cannot be corrected by morphing. Here, a procedure that helps to address this limitation is described. The goal of the procedure is to obtain a relatively complete model that has accurate main-chain atomic positions and residues that are correctly assigned to the sequence. Residues in a morphed model that do not match the electron-density map are removed. Each segment of the resulting trimmed morphed model is then assigned to the sequence of the molecule using information about the connectivity of the chains from the working model and from connections that can be identified from the electron-density map. The procedure was tested by application to a recently determined structure at a resolution of 3.2 Å and was found to increase the number of correctly identified residues in this structure from the 88 obtained using phenix.resolve sequence assignment alone (Terwilliger, 2003 ▶) to 247 of a possible 359. Additionally, the procedure was tested by application to a series of templates with sequence identities to a target structure ranging between 7 and 36%. The mean fraction of correctly identified residues in these cases was increased from 33% using phenix.resolve sequence assignment to 47% using the current procedure. The procedure is simple to apply and is available in the Phenix software package

  3. New MR sequences (diffusion, perfusion, spectroscopy) in brain tumours

    International Nuclear Information System (INIS)

    Rossi, Andrea; Gandolfo, Carlo; Morana, Giovanni; Severino, Mariasavina; Garre, Maria Luisa; Cama, Armando

    2010-01-01

    While MRI has been instrumental in significantly improving care in children harbouring brain tumours, conventional sequences lack information regarding functional parameters including cellularity, haemodynamics and metabolism. Advanced MR imaging modalities, such as diffusion (including diffusion tensor imaging and fibre tractography), perfusion and spectroscopy have significantly improved our understanding of the physiopathology of brain tumours and have provided invaluable additional information for treatment planning and monitoring of treatment results. The contribution of these methods to the characterization of brain neoplasms in children is the focus of the present manuscript. (orig.)

  4. Single molecule sequencing-guided scaffolding and correction of draft assemblies.

    Science.gov (United States)

    Zhu, Shenglong; Chen, Danny Z; Emrich, Scott J

    2017-12-06

    Although single molecule sequencing is still improving, the lengths of the generated sequences are inevitably an advantage in genome assembly. Prior work that utilizes long reads to conduct genome assembly has mostly focused on correcting sequencing errors and improving contiguity of de novo assemblies. We propose a disassembling-reassembling approach for both correcting structural errors in the draft assembly and scaffolding a target assembly based on error-corrected single molecule sequences. To achieve this goal, we formulate a maximum alternating path cover problem. We prove that this problem is NP-hard, and solve it by a 2-approximation algorithm. Our experimental results show that our approach can improve the structural correctness of target assemblies in the cost of some contiguity, even with smaller amounts of long reads. In addition, our reassembling process can also serve as a competitive scaffolder relative to well-established assembly benchmarks.

  5. NeSSM: a Next-generation Sequencing Simulator for Metagenomics.

    Directory of Open Access Journals (Sweden)

    Ben Jia

    Full Text Available BACKGROUND: Metagenomics can reveal the vast majority of microbes that have been missed by traditional cultivation-based methods. Due to its extremely wide range of application areas, fast metagenome sequencing simulation systems with high fidelity are in great demand to facilitate the development and comparison of metagenomics analysis tools. RESULTS: We present here a customizable metagenome simulation system: NeSSM (Next-generation Sequencing Simulator for Metagenomics. Combining complete genomes currently available, a community composition table, and sequencing parameters, it can simulate metagenome sequencing better than existing systems. Sequencing error models based on the explicit distribution of errors at each base and sequencing coverage bias are incorporated in the simulation. In order to improve the fidelity of simulation, tools are provided by NeSSM to estimate the sequencing error models, sequencing coverage bias and the community composition directly from existing metagenome sequencing data. Currently, NeSSM supports single-end and pair-end sequencing for both 454 and Illumina platforms. In addition, a GPU (graphics processing units version of NeSSM is also developed to accelerate the simulation. By comparing the simulated sequencing data from NeSSM with experimental metagenome sequencing data, we have demonstrated that NeSSM performs better in many aspects than existing popular metagenome simulators, such as MetaSim, GemSIM and Grinder. The GPU version of NeSSM is more than one-order of magnitude faster than MetaSim. CONCLUSIONS: NeSSM is a fast simulation system for high-throughput metagenome sequencing. It can be helpful to develop tools and evaluate strategies for metagenomics analysis and it's freely available for academic users at http://cbb.sjtu.edu.cn/~ccwei/pub/software/NeSSM.php.

  6. Pareto optimal pairwise sequence alignment.

    Science.gov (United States)

    DeRonne, Kevin W; Karypis, George

    2013-01-01

    Sequence alignment using evolutionary profiles is a commonly employed tool when investigating a protein. Many profile-profile scoring functions have been developed for use in such alignments, but there has not yet been a comprehensive study of Pareto optimal pairwise alignments for combining multiple such functions. We show that the problem of generating Pareto optimal pairwise alignments has an optimal substructure property, and develop an efficient algorithm for generating Pareto optimal frontiers of pairwise alignments. All possible sets of two, three, and four profile scoring functions are used from a pool of 11 functions and applied to 588 pairs of proteins in the ce_ref data set. The performance of the best objective combinations on ce_ref is also evaluated on an independent set of 913 protein pairs extracted from the BAliBASE RV11 data set. Our dynamic-programming-based heuristic approach produces approximated Pareto optimal frontiers of pairwise alignments that contain comparable alignments to those on the exact frontier, but on average in less than 1/58th the time in the case of four objectives. Our results show that the Pareto frontiers contain alignments whose quality is better than the alignments obtained by single objectives. However, the task of identifying a single high-quality alignment among those in the Pareto frontier remains challenging.

  7. Hierarchically nested river landform sequences

    Science.gov (United States)

    Pasternack, G. B.; Weber, M. D.; Brown, R. A.; Baig, D.

    2017-12-01

    River corridors exhibit landforms nested within landforms repeatedly down spatial scales. In this study we developed, tested, and implemented a new way to create river classifications by mapping domains of fluvial processes with respect to the hierarchical organization of topographic complexity that drives fluvial dynamism. We tested this approach on flow convergence routing, a morphodynamic mechanism with different states depending on the structure of nondimensional topographic variability. Five nondimensional landform types with unique functionality (nozzle, wide bar, normal channel, constricted pool, and oversized) represent this process at any flow. When this typology is nested at base flow, bankfull, and floodprone scales it creates a system with up to 125 functional types. This shows how a single mechanism produces complex dynamism via nesting. Given the classification, we answered nine specific scientific questions to investigate the abundance, sequencing, and hierarchical nesting of these new landform types using a 35-km gravel/cobble river segment of the Yuba River in California. The nested structure of flow convergence routing landforms found in this study revealed that bankfull landforms are nested within specific floodprone valley landform types, and these types control bankfull morphodynamics during moderate to large floods. As a result, this study calls into question the prevailing theory that the bankfull channel of a gravel/cobble river is controlled by in-channel, bankfull, and/or small flood flows. Such flows are too small to initiate widespread sediment transport in a gravel/cobble river with topographic complexity.

  8. Sequencing intractable DNA to close microbial genomes.

    Directory of Open Access Journals (Sweden)

    Richard A Hurt

    Full Text Available Advancement in high throughput DNA sequencing technologies has supported a rapid proliferation of microbial genome sequencing projects, providing the genetic blueprint for in-depth studies. Oftentimes, difficult to sequence regions in microbial genomes are ruled "intractable" resulting in a growing number of genomes with sequence gaps deposited in databases. A procedure was developed to sequence such problematic regions in the "non-contiguous finished" Desulfovibrio desulfuricans ND132 genome (6 intractable gaps and the Desulfovibrio africanus genome (1 intractable gap. The polynucleotides surrounding each gap formed GC rich secondary structures making the regions refractory to amplification and sequencing. Strand-displacing DNA polymerases used in concert with a novel ramped PCR extension cycle supported amplification and closure of all gap regions in both genomes. The developed procedures support accurate gene annotation, and provide a step-wise method that reduces the effort required for genome finishing.

  9. Sequencing Intractable DNA to Close Microbial Genomes

    Energy Technology Data Exchange (ETDEWEB)

    Hurt, Jr., Richard Ashley [ORNL; Brown, Steven D [ORNL; Podar, Mircea [ORNL; Palumbo, Anthony Vito [ORNL; Elias, Dwayne A [ORNL

    2012-01-01

    Advancement in high throughput DNA sequencing technologies has supported a rapid proliferation of microbial genome sequencing projects, providing the genetic blueprint for for in-depth studies. Oftentimes, difficult to sequence regions in microbial genomes are ruled intractable resulting in a growing number of genomes with sequence gaps deposited in databases. A procedure was developed to sequence such difficult regions in the non-contiguous finished Desulfovibrio desulfuricans ND132 genome (6 intractable gaps) and the Desulfovibrio africanus genome (1 intractable gap). The polynucleotides surrounding each gap formed GC rich secondary structures making the regions refractory to amplification and sequencing. Strand-displacing DNA polymerases used in concert with a novel ramped PCR extension cycle supported amplification and closure of all gap regions in both genomes. These developed procedures support accurate gene annotation, and provide a step-wise method that reduces the effort required for genome finishing.

  10. Study of the fast inversion recovery pulse sequence. With reference to fast fluid attenuated inversion recovery and fast short TI inversion recovery pulse sequence

    International Nuclear Information System (INIS)

    Tsuchihashi, Toshio; Maki, Toshio; Suzuki, Takeshi

    1997-01-01

    The fast inversion recovery (fast IR) pulse sequence was evaluated. We compared the fast fluid attenuated inversion recovery (fast FLAIR) pulse sequence in which inversion time (TI) was established as equal to the water null point for the purpose of the water-suppressed T 2 -weighted image, with the fast short TI inversion recovery (fast STIR) pulse sequence in which TI was established as equal to the fat null point for purpose of fat suppression. In the fast FLAIR pulse sequence, the water null point was increased by making TR longer. In the FLAIR pulse sequence, the longitudinal magnetization contrast is determined by TI. If TI is increased, T 2 -weighted contrast improves in the same way as increasing TR for the SE pulse sequence. Therefore, images should be taken with long TR and long TI, which are longer than TR and longer than the water null point. On the other hand, the fat null point is not affected by TR in the fast STIR pulse sequence. However, effective TE was affected by variation of the null point. This increased in proportion to the increase in effective TE. Our evaluation indicated that the fast STIR pulse sequence can control the extensive signals from fat in a short time. (author)

  11. Identification of human chromosome 22 transcribed sequences with ORF expressed sequence tags

    Science.gov (United States)

    de Souza, Sandro J.; Camargo, Anamaria A.; Briones, Marcelo R. S.; Costa, Fernando F.; Nagai, Maria Aparecida; Verjovski-Almeida, Sergio; Zago, Marco A.; Andrade, Luis Eduardo C.; Carrer, Helaine; El-Dorry, Hamza F. A.; Espreafico, Enilza M.; Habr-Gama, Angelita; Giannella-Neto, Daniel; Goldman, Gustavo H.; Gruber, Arthur; Hackel, Christine; Kimura, Edna T.; Maciel, Rui M. B.; Marie, Suely K. N.; Martins, Elizabeth A. L.; Nóbrega, Marina P.; Paçó-Larson, Maria Luisa; Pardini, Maria Inês M. C.; Pereira, Gonçalo G.; Pesquero, João Bosco; Rodrigues, Vanderlei; Rogatto, Silvia R.; da Silva, Ismael D. C. G.; Sogayar, Mari C.; de Fátima Sonati, Maria; Tajara, Eloiza H.; Valentini, Sandro R.; Acencio, Marcio; Alberto, Fernando L.; Amaral, Maria Elisabete J.; Aneas, Ivy; Bengtson, Mário Henrique; Carraro, Dirce M.; Carvalho, Alex F.; Carvalho, Lúcia Helena; Cerutti, Janete M.; Corrêa, Maria Lucia C.; Costa, Maria Cristina R.; Curcio, Cyntia; Gushiken, Tsieko; Ho, Paulo L.; Kimura, Elza; Leite, Luciana C. C.; Maia, Gustavo; Majumder, Paromita; Marins, Mozart; Matsukuma, Adriana; Melo, Analy S. A.; Mestriner, Carlos Alberto; Miracca, Elisabete C.; Miranda, Daniela C.; Nascimento, Ana Lucia T. O.; Nóbrega, Francisco G.; Ojopi, Élida P. B.; Pandolfi, José Rodrigo C.; Pessoa, Luciana Gilbert; Rahal, Paula; Rainho, Claudia A.; da Ro's, Nancy; de Sá, Renata G.; Sales, Magaly M.; da Silva, Neusa P.; Silva, Tereza C.; da Silva, Wilson; Simão, Daniel F.; Sousa, Josane F.; Stecconi, Daniella; Tsukumo, Fernando; Valente, Valéria; Zalcberg, Heloisa; Brentani, Ricardo R.; Reis, Luis F. L.; Dias-Neto, Emmanuel; Simpson, Andrew J. G.

    2000-01-01

    Transcribed sequences in the human genome can be identified with confidence only by alignment with sequences derived from cDNAs synthesized from naturally occurring mRNAs. We constructed a set of 250,000 cDNAs that represent partial expressed gene sequences and that are biased toward the central coding regions of the resulting transcripts. They are termed ORF expressed sequence tags (ORESTES). The 250,000 ORESTES were assembled into 81,429 contigs. Of these, 1,181 (1.45%) were found to match sequences in chromosome 22 with at least one ORESTES contig for 162 (65.6%) of the 247 known genes, for 67 (44.6%) of the 150 related genes, and for 45 of the 148 (30.4%) EST-predicted genes on this chromosome. Using a set of stringent criteria to validate our sequences, we identified a further 219 previously unannotated transcribed sequences on chromosome 22. Of these, 171 were in fact also defined by EST or full length cDNA sequences available in GenBank but not utilized in the initial annotation of the first human chromosome sequence. Thus despite representing less than 15% of all expressed human sequences in the public databases at the time of the present analysis, ORESTES sequences defined 48 transcribed sequences on chromosome 22 not defined by other sequences. All of the transcribed sequences defined by ORESTES coincided with DNA regions predicted as encoding exons by genscan. (http://genes.mit.edu/GENSCAN.html). PMID:11070084

  12. cis sequence effects on gene expression

    Directory of Open Access Journals (Sweden)

    Jacobs Kevin

    2007-08-01

    Full Text Available Abstract Background Sequence and transcriptional variability within and between individuals are typically studied independently. The joint analysis of sequence and gene expression variation (genetical genomics provides insight into the role of linked sequence variation in the regulation of gene expression. We investigated the role of sequence variation in cis on gene expression (cis sequence effects in a group of genes commonly studied in cancer research in lymphoblastoid cell lines. We estimated the proportion of genes exhibiting cis sequence effects and the proportion of gene expression variation explained by cis sequence effects using three different analytical approaches, and compared our results to the literature. Results We generated gene expression profiling data at N = 697 candidate genes from N = 30 lymphoblastoid cell lines for this study and used available candidate gene resequencing data at N = 552 candidate genes to identify N = 30 candidate genes with sufficient variance in both datasets for the investigation of cis sequence effects. We used two additive models and the haplotype phylogeny scanning approach of Templeton (Tree Scanning to evaluate association between individual SNPs, all SNPs at a gene, and diplotypes, with log-transformed gene expression. SNPs and diplotypes at eight candidate genes exhibited statistically significant (p cis sequence effects in our study, respectively. Conclusion Based on analysis of our results and the extant literature, one in four genes exhibits significant cis sequence effects, and for these genes, about 30% of gene expression variation is accounted for by cis sequence variation. Despite diverse experimental approaches, the presence or absence of significant cis sequence effects is largely supported by previously published studies.

  13. Multiple tag labeling method for DNA sequencing

    Science.gov (United States)

    Mathies, R.A.; Huang, X.C.; Quesada, M.A.

    1995-07-25

    A DNA sequencing method is described which uses single lane or channel electrophoresis. Sequencing fragments are separated in the lane and detected using a laser-excited, confocal fluorescence scanner. Each set of DNA sequencing fragments is separated in the same lane and then distinguished using a binary coding scheme employing only two different fluorescent labels. Also described is a method of using radioisotope labels. 5 figs.

  14. Identifying driver mutations in sequenced cancer genomes

    DEFF Research Database (Denmark)

    Raphael, Benjamin J; Dobson, Jason R; Oesper, Layla

    2014-01-01

    High-throughput DNA sequencing is revolutionizing the study of cancer and enabling the measurement of the somatic mutations that drive cancer development. However, the resulting sequencing datasets are large and complex, obscuring the clinically important mutations in a background of errors, nois...... patterns of mutual exclusivity. These techniques, coupled with advances in high-throughput DNA sequencing, are enabling precision medicine approaches to the diagnosis and treatment of cancer....

  15. EGNAS: an exhaustive DNA sequence design algorithm

    Directory of Open Access Journals (Sweden)

    Kick Alfred

    2012-06-01

    Full Text Available Abstract Background The molecular recognition based on the complementary base pairing of deoxyribonucleic acid (DNA is the fundamental principle in the fields of genetics, DNA nanotechnology and DNA computing. We present an exhaustive DNA sequence design algorithm that allows to generate sets containing a maximum number of sequences with defined properties. EGNAS (Exhaustive Generation of Nucleic Acid Sequences offers the possibility of controlling both interstrand and intrastrand properties. The guanine-cytosine content can be adjusted. Sequences can be forced to start and end with guanine or cytosine. This option reduces the risk of “fraying” of DNA strands. It is possible to limit cross hybridizations of a defined length, and to adjust the uniqueness of sequences. Self-complementarity and hairpin structures of certain length can be avoided. Sequences and subsequences can optionally be forbidden. Furthermore, sequences can be designed to have minimum interactions with predefined strands and neighboring sequences. Results The algorithm is realized in a C++ program. TAG sequences can be generated and combined with primers for single-base extension reactions, which were described for multiplexed genotyping of single nucleotide polymorphisms. Thereby, possible foldback through intrastrand interaction of TAG-primer pairs can be limited. The design of sequences for specific attachment of molecular constructs to DNA origami is presented. Conclusions We developed a new software tool called EGNAS for the design of unique nucleic acid sequences. The presented exhaustive algorithm allows to generate greater sets of sequences than with previous software and equal constraints. EGNAS is freely available for noncommercial use at http://www.chm.tu-dresden.de/pc6/EGNAS.

  16. Genome Sequencing and Analysis Conference IV

    Energy Technology Data Exchange (ETDEWEB)

    1993-12-31

    J. Craig Venter and C. Thomas Caskey co-chaired Genome Sequencing and Analysis Conference IV held at Hilton Head, South Carolina from September 26--30, 1992. Venter opened the conference by noting that approximately 400 researchers from 16 nations were present four times as many participants as at Genome Sequencing Conference I in 1989. Venter also introduced the Data Fair, a new component of the conference allowing exchange and on-site computer analysis of unpublished sequence data.

  17. FRESCO: Referential compression of highly similar sequences.

    Science.gov (United States)

    Wandelt, Sebastian; Leser, Ulf

    2013-01-01

    In many applications, sets of similar texts or sequences are of high importance. Prominent examples are revision histories of documents or genomic sequences. Modern high-throughput sequencing technologies are able to generate DNA sequences at an ever-increasing rate. In parallel to the decreasing experimental time and cost necessary to produce DNA sequences, computational requirements for analysis and storage of the sequences are steeply increasing. Compression is a key technology to deal with this challenge. Recently, referential compression schemes, storing only the differences between a to-be-compressed input and a known reference sequence, gained a lot of interest in this field. In this paper, we propose a general open-source framework to compress large amounts of biological sequence data called Framework for REferential Sequence COmpression (FRESCO). Our basic compression algorithm is shown to be one to two orders of magnitudes faster than comparable related work, while achieving similar compression ratios. We also propose several techniques to further increase compression ratios, while still retaining the advantage in speed: 1) selecting a good reference sequence; and 2) rewriting a reference sequence to allow for better compression. In addition,we propose a new way of further boosting the compression ratios by applying referential compression to already referentially compressed files (second-order compression). This technique allows for compression ratios way beyond state of the art, for instance,4,000:1 and higher for human genomes. We evaluate our algorithms on a large data set from three different species (more than 1,000 genomes, more than 3 TB) and on a collection of versions of Wikipedia pages. Our results show that real-time compression of highly similar sequences at high compression ratios is possible on modern hardware.

  18. Biodenitrification in Sequencing Batch Reactors. Final report

    International Nuclear Information System (INIS)

    Silverstein, J.

    1996-01-01

    One plan for stabilization of the Solar Pond waters and sludges at Rocky Flats Plant (RFP), is evaporation and cement solidification of the salts to stabilize heavy metals and radionuclides for land disposal as low-level mixed waste. It has been reported that nitrate (NO 3- ) salts may interfere with cement stabilization of heavy metals and radionuclides. Therefore, biological nitrate removal (denitrification) may be an important pretreatment for the Solar Pond wastewaters at RFP, improving the stability of the cement final waste form, reducing the requirement for cement (or pozzolan) additives and reducing the volume of cemented low-level mixed waste requiring ultimate disposal. A laboratory investigation of the performance of the Sequencing Batch Reactor (SBR) activated sludge process developed for nitrate removal from a synthetic brine typical of the high-nitrate and high-salinity wastewaters in the Solar Ponds at Rocky Flats Plant was carried out at the Environmental Engineering labs at the University of Colorado, Boulder, between May 1, 1994 and October 1, 1995

  19. NOMAD - more than a simple sequencer

    International Nuclear Information System (INIS)

    Mutti, P.; Cecillon, F.; Elaazzouzi, A.; Le Goc, Y.; Locatelli, J.; Ortiz, H.; Ratel, J.

    2012-01-01

    NOMAD is the new instrument control software of the Institut Laue-Langevin (ILL). A highly shareable code among all the instruments' suite, a user oriented design for tailored functionality and the improvement of the instrument team's autonomy thanks to a uniform and ergonomic user interface are the essential elements guiding the software development. NOMAD implements a client/server approach. The server is the core business containing all the instrument methods and the hardware drivers, while the GUI (Graphical User Interface) provides all the necessary functionalities for the interaction between user and hardware. All instruments share the same executable while a set of XML configuration files adapts hardware needs and instrument methods to the specific experimental setup. Thanks to a complete graphical representation of experimental sequences, NOMAD provides an overview of past, present and future operations. Users have the freedom to build their own specific work-flows using intuitive drag-and-drop technique. A complete drivers' database to connect and control all possible instrument components has been created, simplifying the inclusion of a new piece of equipment for an experiment. A web application makes available outside the ILL all the relevant information on the status of the experiment. A set of scientific methods facilitates the interaction between users and hardware giving access to instrument control and to complex operations within just one click on the interface. (authors)

  20. Robustness analysis of chiller sequencing control

    International Nuclear Information System (INIS)

    Liao, Yundan; Sun, Yongjun; Huang, Gongsheng

    2015-01-01

    Highlights: • Uncertainties with chiller sequencing control were systematically quantified. • Robustness of chiller sequencing control was systematically analyzed. • Different sequencing control strategies were sensitive to different uncertainties. • A numerical method was developed for easy selection of chiller sequencing control. - Abstract: Multiple-chiller plant is commonly employed in the heating, ventilating and air-conditioning system to increase operational feasibility and energy-efficiency under part load condition. In a multiple-chiller plant, chiller sequencing control plays a key role in achieving overall energy efficiency while not sacrifices the cooling sufficiency for indoor thermal comfort. Various sequencing control strategies have been developed and implemented in practice. Based on the observation that (i) uncertainty, which cannot be avoided in chiller sequencing control, has a significant impact on the control performance and may cause the control fail to achieve the expected control and/or energy performance; and (ii) in current literature few studies have systematically addressed this issue, this paper therefore presents a study on robustness analysis of chiller sequencing control in order to understand the robustness of various chiller sequencing control strategies under different types of uncertainty. Based on the robustness analysis, a simple and applicable method is developed to select the most robust control strategy for a given chiller plant in the presence of uncertainties, which will be verified using case studies

  1. Multiplexed microsatellite recovery using massively parallel sequencing

    Science.gov (United States)

    Jennings, T.N.; Knaus, B.J.; Mullins, T.D.; Haig, S.M.; Cronn, R.C.

    2011-01-01

    Conservation and management of natural populations requires accurate and inexpensive genotyping methods. Traditional microsatellite, or simple sequence repeat (SSR), marker analysis remains a popular genotyping method because of the comparatively low cost of marker development, ease of analysis and high power of genotype discrimination. With the availability of massively parallel sequencing (MPS), it is now possible to sequence microsatellite-enriched genomic libraries in multiplex pools. To test this approach, we prepared seven microsatellite-enriched, barcoded genomic libraries from diverse taxa (two conifer trees, five birds) and sequenced these on one lane of the Illumina Genome Analyzer using paired-end 80-bp reads. In this experiment, we screened 6.1 million sequences and identified 356958 unique microreads that contained di- or trinucleotide microsatellites. Examination of four species shows that our conversion rate from raw sequences to polymorphic markers compares favourably to Sanger- and 454-based methods. The advantage of multiplexed MPS is that the staggering capacity of modern microread sequencing is spread across many libraries; this reduces sample preparation and sequencing costs to less than $400 (USD) per species. This price is sufficiently low that microsatellite libraries could be prepared and sequenced for all 1373 organisms listed as 'threatened' and 'endangered' in the United States for under $0.5M (USD).

  2. Hardware Accelerated Sequence Alignment with Traceback

    Directory of Open Access Journals (Sweden)

    Scott Lloyd

    2009-01-01

    in a timely manner. Known methods to accelerate alignment on reconfigurable hardware only address sequence comparison, limit the sequence length, or exhibit memory and I/O bottlenecks. A space-efficient, global sequence alignment algorithm and architecture is presented that accelerates the forward scan and traceback in hardware without memory and I/O limitations. With 256 processing elements in FPGA technology, a performance gain over 300 times that of a desktop computer is demonstrated on sequence lengths of 16000. For greater performance, the architecture is scalable to more processing elements.

  3. Multipliers on Generalized Mixed Norm Sequence Spaces

    Directory of Open Access Journals (Sweden)

    Oscar Blasco

    2014-01-01

    Full Text Available Given 1≤p,q≤∞ and sequences of integers (nkk and (nk′k such that nk≤nk′≤nk+1, the generalized mixed norm space ℓℐ(p,q is defined as those sequences (ajj such that ((∑j∈Ik‍|aj|p1/pk∈ℓq where Ik={j∈ℕ0 s.t. nk≤jsequence λ=(λjj to belong to the space of multipliers (ℓℐ(r,s,ℓ(u,v, for different sequences ℐ and of intervals in ℕ0, are determined.

  4. Recursive sequences in first-year calculus

    Science.gov (United States)

    Krainer, Thomas

    2016-02-01

    This article provides ready-to-use supplementary material on recursive sequences for a second-semester calculus class. It equips first-year calculus students with a basic methodical procedure based on which they can conduct a rigorous convergence or divergence analysis of many simple recursive sequences on their own without the need to invoke inductive arguments as is typically required in calculus textbooks. The sequences that are accessible to this kind of analysis are predominantly (eventually) monotonic, but also certain recursive sequences that alternate around their limit point as they converge can be considered.

  5. A measurement of disorder in binary sequences

    Science.gov (United States)

    Gong, Longyan; Wang, Haihong; Cheng, Weiwen; Zhao, Shengmei

    2015-03-01

    We propose a complex quantity, AL, to characterize the degree of disorder of L-length binary symbolic sequences. As examples, we respectively apply it to typical random and deterministic sequences. One kind of random sequences is generated from a periodic binary sequence and the other is generated from the logistic map. The deterministic sequences are the Fibonacci and Thue-Morse sequences. In these analyzed sequences, we find that the modulus of AL, denoted by |AL | , is a (statistically) equivalent quantity to the Boltzmann entropy, the metric entropy, the conditional block entropy and/or other quantities, so it is a useful quantitative measure of disorder. It can be as a fruitful index to discern which sequence is more disordered. Moreover, there is one and only one value of |AL | for the overall disorder characteristics. It needs extremely low computational costs. It can be easily experimentally realized. From all these mentioned, we believe that the proposed measure of disorder is a valuable complement to existing ones in symbolic sequences.

  6. Polynomial sequences generated by infinite Hessenberg matrices

    Directory of Open Access Journals (Sweden)

    Verde-Star Luis

    2017-01-01

    Full Text Available We show that an infinite lower Hessenberg matrix generates polynomial sequences that correspond to the rows of infinite lower triangular invertible matrices. Orthogonal polynomial sequences are obtained when the Hessenberg matrix is tridiagonal. We study properties of the polynomial sequences and their corresponding matrices which are related to recurrence relations, companion matrices, matrix similarity, construction algorithms, and generating functions. When the Hessenberg matrix is also Toeplitz the polynomial sequences turn out to be of interpolatory type and we obtain additional results. For example, we show that every nonderogative finite square matrix is similar to a unique Toeplitz-Hessenberg matrix.

  7. Genomic sequencing of Pleistocene cave bears

    Energy Technology Data Exchange (ETDEWEB)

    Noonan, James P.; Hofreiter, Michael; Smith, Doug; Priest, JamesR.; Rohland, Nadin; Rabeder, Gernot; Krause, Johannes; Detter, J. Chris; Paabo, Svante; Rubin, Edward M.

    2005-04-01

    Despite the information content of genomic DNA, ancient DNA studies to date have largely been limited to amplification of mitochondrial DNA due to technical hurdles such as contamination and degradation of ancient DNAs. In this study, we describe two metagenomic libraries constructed using unamplified DNA extracted from the bones of two 40,000-year-old extinct cave bears. Analysis of {approx}1 Mb of sequence from each library showed that, despite significant microbial contamination, 5.8 percent and 1.1 percent of clones in the libraries contain cave bear inserts, yielding 26,861 bp of cave bear genome sequence. Alignment of this sequence to the dog genome, the closest sequenced genome to cave bear in terms of evolutionary distance, revealed roughly the expected ratio of cave bear exons, repeats and conserved noncoding sequences. Only 0.04 percent of all clones sequenced were derived from contamination with modern human DNA. Comparison of cave bear with orthologous sequences from several modern bear species revealed the evolutionary relationship of these lineages. Using the metagenomic approach described here, we have recovered substantial quantities of mammalian genomic sequence more than twice as old as any previously reported, establishing the feasibility of ancient DNA genomic sequencing programs.

  8. Collaborative Improvement

    DEFF Research Database (Denmark)

    Kaltoft, Rasmus

    The thesis data have been collected in the EU-sponsored project: Collaborative Improvement Tool for the Extended Manufacturing Enterprise, CO-IMPROVE. In this project four universities (Denmark, Ireland, Italy, and The Netherlands), two software vendors (Greece and Sweden) and three companies...... (Denmark, Italy and The Netherlands) each with three to five suppliers were involved. The CO-IMPROVE project and the thesis is based on “action research” and “action learning”. The main aim of the whole project is through actual involvement and actions make the researchers, companies and selected suppliers...... learn how to improve operations in (hopefully) a win-win like manner through collaboration....

  9. Design of Long Period Pseudo-Random Sequences from the Addition of -Sequences over

    Directory of Open Access Journals (Sweden)

    Ren Jian

    2004-01-01

    Full Text Available Pseudo-random sequence with good correlation property and large linear span is widely used in code division multiple access (CDMA communication systems and cryptology for reliable and secure information transmission. In this paper, sequences with long period, large complexity, balance statistics, and low cross-correlation property are constructed from the addition of -sequences with pairwise-prime linear spans (AMPLS. Using -sequences as building blocks, the proposed method proved to be an efficient and flexible approach to construct long period pseudo-random sequences with desirable properties from short period sequences. Applying the proposed method to , a signal set is constructed.

  10. The prevalence and cognitive profile of sequence-space synaesthesia.

    Science.gov (United States)

    Ward, Jamie; Ipser, Alberta; Phanvanova, Eva; Brown, Paris; Bunte, Iris; Simner, Julia

    2018-05-01

    People with sequence-space synaesthesia visualize sequential concepts such as numbers and time as an ordered pattern extending through space. Unlike other types of synaesthesia, there is no generally agreed objective method for diagnosing this variant or separating it from potentially related aspects of cognition. We use a recently-developed spatial consistency test together with a novel questionnaire on naïve samples and estimate the prevalence of sequence-space synaesthesia to be around 8.1% (Study 1) to 12.8% (Study 2). We validate our test by showing that participants classified as having sequence-space synaesthesia perform differently on lab-based tasks. They show a spatial Stroop-like interference response, they show enhanced detection of low visibility Gabor stimuli, they report more use of visual imagery, and improved memory for certain types of public events. We suggest that sequence-space synaesthesia develops from a particular neurocognitive profile linked both to greater visual imagery and enhanced visual perception. Copyright © 2018 Elsevier Inc. All rights reserved.

  11. Conservation patterns in different functional sequence categoriesof divergent Drosophila species

    Energy Technology Data Exchange (ETDEWEB)

    Papatsenko, Dmitri; Kislyuk, Andrey; Levine, Michael; Dubchak, Inna

    2005-10-01

    We have explored the distributions of fully conservedungapped blocks in genome-wide pairwise alignments of recently completedspecies of Drosophila: D.yakuba, D.ananassae, D.pseudoobscura, D.virilisand D.mojavensis. Based on these distributions we have found that nearlyevery functional sequence category possesses its own distinctiveconservation pattern, sometimes independent of the overall sequenceconservation level. In the coding and regulatory regions, the ungappedblocks were longer than in introns, UTRs and non-functional sequences. Atthe same time, the blocks in the coding regions carried 3N+2 signaturecharacteristic to synonymic substitutions in the 3rd codon positions.Larger block sizes in transcription regulatory regions can be explainedby the presence of conserved arrays of binding sites for transcriptionfactors. We also have shown that the longest ungapped blocks, or'ultraconserved' sequences, are associated with specific gene groups,including those encoding ion channels and components of the cytoskeleton.We discussed how restrained conservation patterns may help in mappingfunctional sequence categories and improving genomeannotation.

  12. Draft genome sequence of the silver pomfret fish, Pampus argenteus.

    Science.gov (United States)

    AlMomin, Sabah; Kumar, Vinod; Al-Amad, Sami; Al-Hussaini, Mohsen; Dashti, Talal; Al-Enezi, Khaznah; Akbar, Abrar

    2016-01-01

    Silver pomfret, Pampus argenteus, is a fish species from coastal waters. Despite its high commercial value, this edible fish has not been sequenced. Hence, its genetic and genomic studies have been limited. We report the first draft genome sequence of the silver pomfret obtained using a Next Generation Sequencing (NGS) technology. We assembled 38.7 Gb of nucleotides into scaffolds of 350 Mb with N50 of about 1.5 kb, using high quality paired end reads. These scaffolds represent 63.7% of the estimated silver pomfret genome length. The newly sequenced and assembled genome has 11.06% repetitive DNA regions, and this percentage is comparable to that of the tilapia genome. The genome analysis predicted 16 322 genes. About 91% of these genes showed homology with known proteins. Many gene clusters were annotated to protein and fatty-acid metabolism pathways that may be important in the context of the meat texture and immune system developmental processes. The reference genome can pave the way for the identification of many other genomic features that could improve breeding and population-management strategies, and it can also help characterize the genetic diversity of P. argenteus.

  13. Quality improvement tools and processes.

    Science.gov (United States)

    Lau, Catherine Y

    2015-04-01

    The Model for Improvement and the Plan-Do-Study-Act cycle is a popular quality improvement (QI) tool for health care providers to successfully lead QI projects and redesign care processes. This tool has several distinct components that must be addressed in sequence to organize and critically evaluate improvement activities. Unlike other health sciences clinical research, QI projects and research are based on dynamic hypotheses that develop into observable, serial tests of change with continuous collection and feedback of performance data to stakeholders. Copyright © 2015 Elsevier Inc. All rights reserved.

  14. Multiple Access Interference Reduction Using Received Response Code Sequence for DS-CDMA UWB System

    Science.gov (United States)

    Toh, Keat Beng; Tachikawa, Shin'ichi

    This paper proposes a combination of novel Received Response (RR) sequence at the transmitter and a Matched Filter-RAKE (MF-RAKE) combining scheme receiver system for the Direct Sequence-Code Division Multiple Access Ultra Wideband (DS-CDMA UWB) multipath channel model. This paper also demonstrates the effectiveness of the RR sequence in Multiple Access Interference (MAI) reduction for the DS-CDMA UWB system. It suggests that by using conventional binary code sequence such as the M sequence or the Gold sequence, there is a possibility of generating extra MAI in the UWB system. Therefore, it is quite difficult to collect the energy efficiently although the RAKE reception method is applied at the receiver. The main purpose of the proposed system is to overcome the performance degradation for UWB transmission due to the occurrence of MAI during multiple accessing in the DS-CDMA UWB system. The proposed system improves the system performance by improving the RAKE reception performance using the RR sequence which can reduce the MAI effect significantly. Simulation results verify that significant improvement can be obtained by the proposed system in the UWB multipath channel models.

  15. Direct chloroplast sequencing: comparison of sequencing platforms and analysis tools for whole chloroplast barcoding.

    Directory of Open Access Journals (Sweden)

    Marta Brozynska

    Full Text Available Direct sequencing of total plant DNA using next generation sequencing technologies generates a whole chloroplast genome sequence that has the potential to provide a barcode for use in plant and food identification. Advances in DNA sequencing platforms may make this an attractive approach for routine plant identification. The HiSeq (Illumina and Ion Torrent (Life Technology sequencing platforms were used to sequence total DNA from rice to identify polymorphisms in the whole chloroplast genome sequence of a wild rice plant relative to cultivated rice (cv. Nipponbare. Consensus chloroplast sequences were produced by mapping sequence reads to the reference rice chloroplast genome or by de novo assembly and mapping of the resulting contigs to the reference sequence. A total of 122 polymorphisms (SNPs and indels between the wild and cultivated rice chloroplasts were predicted by these different sequencing and analysis methods. Of these, a total of 102 polymorphisms including 90 SNPs were predicted by both platforms. Indels were more variable with different sequencing methods, with almost all discrepancies found in homopolymers. The Ion Torrent platform gave no apparent false SNP but was less reliable for indels. The methods should be suitable for routine barcoding using appropriate combinations of sequencing platform and data analysis.

  16. Continuous- and Discrete-Time Stimulus Sequences for High Stimulus Rate Paradigm in Evoked Potential Studies

    Directory of Open Access Journals (Sweden)

    Tao Wang

    2013-01-01

    Full Text Available To obtain reliable transient auditory evoked potentials (AEPs from EEGs recorded using high stimulus rate (HSR paradigm, it is critical to design the stimulus sequences of appropriate frequency properties. Traditionally, the individual stimulus events in a stimulus sequence occur only at discrete time points dependent on the sampling frequency of the recording system and the duration of stimulus sequence. This dependency likely causes the implementation of suboptimal stimulus sequences, sacrificing the reliability of resulting AEPs. In this paper, we explicate the use of continuous-time stimulus sequence for HSR paradigm, which is independent of the discrete electroencephalogram (EEG recording system. We employ simulation studies to examine the applicability of the continuous-time stimulus sequences and the impacts of sampling frequency on AEPs in traditional studies using discrete-time design. Results from these studies show that the continuous-time sequences can offer better frequency properties and improve the reliability of recovered AEPs. Furthermore, we find that the errors in the recovered AEPs depend critically on the sampling frequencies of experimental systems, and their relationship can be fitted using a reciprocal function. As such, our study contributes to the literature by demonstrating the applicability and advantages of continuous-time stimulus sequences for HSR paradigm and by revealing the relationship between the reliability of AEPs and sampling frequencies of the experimental systems when discrete-time stimulus sequences are used in traditional manner for the HSR paradigm.

  17. Neural correlates of skill acquisition: decreased cortical activity during a serial interception sequence learning task.

    Science.gov (United States)

    Gobel, Eric W; Parrish, Todd B; Reber, Paul J

    2011-10-15

    Learning of complex motor skills requires learning of component movements as well as the sequential structure of their order and timing. Using a Serial Interception Sequence Learning (SISL) task, participants learned a sequence of precisely timed interception responses through training with a repeating sequence. Following initial implicit learning of the repeating sequence, functional MRI data were collected during performance of that known sequence and compared with activity evoked during novel sequences of actions, novel timing patterns, or both. Reduced activity was observed during the practiced sequence in a distributed bilateral network including extrastriate occipital, parietal, and premotor cortical regions. These reductions in evoked activity likely reflect improved efficiency in visuospatial processing, spatio-motor integration, motor planning, and motor execution for the trained sequence, which is likely supported by nondeclarative skill learning. In addition, the practiced sequence evoked increased activity in the left ventral striatum and medial prefrontal cortex, while the posterior cingulate was more active during periods of better performance. Many prior studies of perceptual-motor skill learning have found increased activity in motor areas of the frontal cortex (e.g., motor and premotor cortex, SMA) and striatal areas (e.g., the putamen). The change in activity observed here (i.e., decreased activity across a cortical network) may reflect skill learning that is predominantly expressed through more accurate performance rather than decreased reaction time. Copyright © 2011 Elsevier Inc. All rights reserved.

  18. Optical orthogonal code-division multiple-access system - Part 2: Multibits/sequence-period OOCDMA

    Science.gov (United States)

    Kwon, Hyuck M.

    1994-08-01

    In a recently proposed optical orthogonal code division multiple-access (OOCDMA) system, one bit of user's data is transmitted per sequence-period, and a threshold is employed for the final bit decision. In this paper, a system that can transmit multibits per sequence-period is introduced, and avalanche photodiode (APD) noise, thermal noise, and interference, are included. This system, derived by exploiting orthogonal properties of the OOCDMA code sequence and using a maximum search (instead of a threshold) in the final decision, is log(sub 2) F times higher in throughput, where F is sequence-period. For example, four orders of magnitude are better in bit error probability at - 56 dBW received laser power, with F = 1000 chips, 10 'marks' in a sequence, and 10 users of 30 Mb/s data rate for one-bit/sequence-period and 270 Mb/s data rate for multibits/sequence-period system. Furthermore, an exact analysis is performed for the log(sub 2)F bits/sequence-period system with a hard-limiter placed before the receiver, and its performance is compared to the performance without hard-limiter, for the chip-synchronous case. The improvement from using a hard-limiter is significant in the log(sub 2)F bits/sequence-period OCCDMA system.

  19. Filovirus RefSeq Entries: Evaluation and Selection of Filovirus Type Variants, Type Sequences, and Names

    Directory of Open Access Journals (Sweden)

    Jens H. Kuhn

    2014-09-01

    Full Text Available Sequence determination of complete or coding-complete genomes of viruses is becoming common practice for supporting the work of epidemiologists, ecologists, virologists, and taxonomists. Sequencing duration and costs are rapidly decreasing, sequencing hardware is under modification for use by non-experts, and software is constantly being improved to simplify sequence data management and analysis. Thus, analysis of virus disease outbreaks on the molecular level is now feasible, including characterization of the evolution of individual virus populations in single patients over time. The increasing accumulation of sequencing data creates a management problem for the curators of commonly used sequence databases and an entry retrieval problem for end users. Therefore, utilizing the data to their fullest potential will require setting nomenclature and annotation standards for virus isolates and associated genomic sequences. The National Center for Biotechnology Information’s (NCBI’s RefSeq is a non-redundant, curated database for reference (or type nucleotide sequence records that supplies source data to numerous other databases. Building on recently proposed templates for filovirus variant naming [ (////-], we report consensus decisions from a majority of past and currently active filovirus experts on the eight filovirus type variants and isolates to be represented in RefSeq, their final designations, and their associated sequences.

  20. DNA Sequencing as a Tool to Monitor Marine Ecological Status

    Directory of Open Access Journals (Sweden)

    Kelly D. Goodwin

    2017-05-01

    Full Text Available Many ocean policies mandate integrated, ecosystem-based approaches to marine monitoring, driving a global need for efficient, low-cost bioindicators of marine ecological quality. Most traditional methods to assess biological quality rely on specialized expertise to provide visual identification of a limited set of specific taxonomic groups, a time-consuming process that can provide a narrow view of ecological status. In addition, microbial assemblages drive food webs but are not amenable to visual inspection and thus are largely excluded from detailed inventory. Molecular-based assessments of biodiversity and ecosystem function offer advantages over traditional methods and are increasingly being generated for a suite of taxa using a “microbes to mammals” or “barcodes to biomes” approach. Progress in these efforts coupled with continued improvements in high-throughput sequencing and bioinformatics pave the way for sequence data to be employed in formal integrated ecosystem evaluation, including food web assessments, as called for in the European Union Marine Strategy Framework Directive. DNA sequencing of bioindicators, both traditional (e.g., benthic macroinvertebrates, ichthyoplankton and emerging (e.g., microbial assemblages, fish via eDNA, promises to improve assessment of marine biological quality by increasing the breadth, depth, and throughput of information and by reducing costs and reliance on specialized taxonomic expertise.

  1. Promises, pitfalls and practicalities of prenatal whole exome sequencing.

    Science.gov (United States)

    Best, Sunayna; Wou, Karen; Vora, Neeta; Van der Veyver, Ignatia B; Wapner, Ronald; Chitty, Lyn S

    2018-01-01

    Prenatal genetic diagnosis provides information for pregnancy and perinatal decision-making and management. In several small series, prenatal whole exome sequencing (WES) approaches have identified genetic diagnoses when conventional tests (karyotype and microarray) were not diagnostic. Here, we review published prenatal WES studies and recent conference abstracts. Thirty-one studies were identified, with diagnostic rates in series of five or more fetuses varying between 6.2% and 80%. Differences in inclusion criteria and trio versus singleton approaches to sequencing largely account for the wide range of diagnostic rates. The data suggest that diagnostic yields will be greater in fetuses with multiple anomalies or in cases preselected following genetic review. Beyond its ability to improve diagnostic rates, we explore the potential of WES to improve understanding of prenatal presentations of genetic disorders and lethal fetal syndromes. We discuss prenatal phenotyping limitations, counselling challenges regarding variants of uncertain significance, incidental and secondary findings, and technical problems in WES. We review the practical, ethical, social and economic issues that must be considered before prenatal WES could become part of routine testing. Finally, we reflect upon the potential future of prenatal genetic diagnosis, including a move towards whole genome sequencing and non-invasive whole exome and whole genome testing. © 2017 John Wiley & Sons, Ltd. © 2017 John Wiley & Sons, Ltd.

  2. Step out - Step in Sequencing Games

    NARCIS (Netherlands)

    Musegaas, M.; Borm, P.E.M.; Quant, M.

    2014-01-01

    In this paper a new class of relaxed sequencing games is introduced: the class of Step out - Step in sequencing games. In this relaxation any player within a coalition is allowed to step out from his position in the processing order and to step in at any position later in the processing order.

  3. Step out-step in sequencing games

    NARCIS (Netherlands)

    Musegaas, Marieke; Borm, Peter; Quant, Marieke

    2015-01-01

    In this paper a new class of relaxed sequencing games is introduced: the class of Step out–Step in sequencing games. In this relaxation any player within a coalition is allowed to step out from his position in the processing order and to step in at any position later in the processing order. First,

  4. Thread extraction for polyadic instruction sequences

    NARCIS (Netherlands)

    Bergstra, J.; Middelburg, C.

    2011-01-01

    In this paper, we study the phenomenon that instruction sequences are split into fragments which somehow produce a joint behaviour. In order to bring this phenomenon better into the picture, we formalize a simple mechanism by which several instruction sequence fragments can produce a joint

  5. Genome sequence of Lactobacillus rhamnosus ATCC 8530.

    Science.gov (United States)

    Pittet, Vanessa; Ewen, Emily; Bushell, Barry R; Ziola, Barry

    2012-02-01

    Lactobacillus rhamnosus is found in the human gastrointestinal tract and is important for probiotics. We became interested in L. rhamnosus isolate ATCC 8530 in relation to beer spoilage and hops resistance. We report here the genome sequence of this isolate, along with a brief comparison to other available L. rhamnosus genome sequences.

  6. Genome Sequence of Lactobacillus rhamnosus ATCC 8530

    OpenAIRE

    Pittet, Vanessa; Ewen, Emily; Bushell, Barry R.; Ziola, Barry

    2012-01-01

    Lactobacillus rhamnosus is found in the human gastrointestinal tract and is important for probiotics. We became interested in L. rhamnosus isolate ATCC 8530 in relation to beer spoilage and hops resistance. We report here the genome sequence of this isolate, along with a brief comparison to other available L. rhamnosus genome sequences.

  7. Question-answer sequences in survey interviews

    NARCIS (Netherlands)

    Dijkstra, W.; Ongena, Y.P.

    2006-01-01

    Interaction analysis was used to analyze a total of 14,265 question-answer sequences of (Q-A Sequences) 80 questions that originated from two face-to-face and three telephone surveys. The analysis was directed towards the causes and effects of particular interactional problems. Our results showed

  8. Trace maps for arbitrary substitution sequences

    International Nuclear Information System (INIS)

    Avishai, Y.

    1993-01-01

    The discovery of quasi-crystals and their 1-dimensional modeling have led to a deep mathematical study of Schroedinger operators with an arbitrary deterministic potential sequence. In this work we address this problem and find trace maps for an arbitrary substitution sequence. our trace maps have lower dimensionality than those of Kolar and Nori, which make them quite attractive for actual applications. (authors)

  9. Stochastic modelling of daily rainfall sequences

    NARCIS (Netherlands)

    Buishand, T.A.

    1977-01-01

    Rainfall series of different climatic regions were analysed with the aim of generating daily rainfall sequences. A survey of the data is given in I, 1. When analysing daily rainfall sequences one must be aware of the following points:
    a. Seasonality. Because of seasonal variation

  10. Learning of Sensory Sequences in Cerebellar Patients

    Science.gov (United States)

    Frings, Markus; Boenisch, Raoul; Gerwig, Marcus; Diener, Hans-Christoph; Timmann, Dagmar

    2004-01-01

    A possible role of the cerebellum in detecting and recognizing event sequences has been proposed. The present study sought to determine whether patients with cerebellar lesions are impaired in the acquisition and discrimination of sequences of sensory stimuli of different modalities. A group of 26 cerebellar patients and 26 controls matched for…

  11. On peculiar Šindel sequences

    Czech Academy of Sciences Publication Activity Database

    Křížek, Michal; Somer, L.

    2010-01-01

    Roč. 17, č. 2 (2010), s. 129-140 ISSN 0972-5555 R&D Projects: GA AV ČR(CZ) IAA100190803 Institutional research plan: CEZ:AV0Z10190503 Keywords : quadratic residue * Chinese remainder theorem * primitive Šindel sequences * Prague clock sequence Subject RIV: BA - General Mathematics http://www.pphmj.com/abstract/5095.htm

  12. Occupational Sequences: Auto Engines 1. AT 121.

    Science.gov (United States)

    Korb, A. W.; And Others

    In an attempt to individualize an automotive course, the Vocational-Technical Division of Northern Montana College has developed Occupational Sequences for an engine rebuilding course. Occupational Sequences, a learning or teaching aid, is an analysis of numbered operations involved in engine rebuilding. Job sheets, included in the book, provide a…

  13. Sequencing Events: Exploring Art and Art Jobs.

    Science.gov (United States)

    Stephens, Pamela Geiger; Shaddix, Robin K.

    2000-01-01

    Presents an activity for upper-elementary students that correlates the actions of archaeologists, patrons, and artists with the sequencing of events in a logical order. Features ancient Egyptian art images. Discusses the preparation of materials, motivation, a pre-writing activity, and writing a story in sequence. (CMK)

  14. Wijsman Orlicz Asymptotically Ideal -Statistical Equivalent Sequences

    Directory of Open Access Journals (Sweden)

    Bipan Hazarika

    2013-01-01

    in Wijsman sense and present some definitions which are the natural combination of the definition of asymptotic equivalence, statistical equivalent, -statistical equivalent sequences in Wijsman sense. Finally, we introduce the notion of Cesaro Orlicz asymptotically -equivalent sequences in Wijsman sense and establish their relationship with other classes.

  15. Nitrogen chronology of massive main sequence stars

    NARCIS (Netherlands)

    Köhler, K.; Borzyszkowski, M.; Brott, I.; Langer, N.; de Koter, A.

    2012-01-01

    Context. Rotational mixing in massive main sequence stars is predicted to monotonically increase their surface nitrogen abundance with time. Aims. We use this effect to design a method for constraining the age and the inclination angle of massive main sequence stars, given their observed luminosity,

  16. Massively parallel sequencing of forensic STRs

    DEFF Research Database (Denmark)

    Parson, Walther; Ballard, David; Budowle, Bruce

    2016-01-01

    The DNA Commission of the International Society for Forensic Genetics (ISFG) is reviewing factors that need to be considered ahead of the adoption by the forensic community of short tandem repeat (STR) genotyping by massively parallel sequencing (MPS) technologies. MPS produces sequence data that...

  17. Novel algorithms for protein sequence analysis

    NARCIS (Netherlands)

    Ye, Kai

    2008-01-01

    Each protein is characterized by its unique sequential order of amino acids, the so-called protein sequence. Biology”s paradigm is that this order of amino acids determines the protein”s architecture and function. In this thesis, we introduce novel algorithms to analyze protein sequences. Chapter 1

  18. Cyprinus carpio Genome sequencing and assembly

    NARCIS (Netherlands)

    Kolder, I.C.R.M.; Plas-Duivesteijn, van der Suzanne J.; Tan, G.; Wiegertjes, G.; Forlenza, M.; Guler, A.T.; Travin, D.Y.; Nakao, M.; Moritomo, T.; Irnazarow, I.; Jansen, H.J.

    2013-01-01

    Sequencing of the common carp (Cyprinus carpio carpio Linnaeus, 1758) genome, with the objective of establishing carp as a model organism to supplement the closely related zebrafish (Danio rerio). The sequenced individual is a homozygous female (by gynogenesis) of R3 x R8 carp, the heterozygous

  19. Sequence Comparison: Close and Open problems

    NARCIS (Netherlands)

    Lenzini, Gabriele; Cerrai, P.; Freguglia, P.

    Comparing sequences is a very important activity both in computer science and in a many other areas as well. For example thank to text editors, everyone knows the particular instance of a sequence comparison problem knonw as ``string mathcing problem''. It consists in searching a given work

  20. Pig genome sequence - analysis and publication strategy

    DEFF Research Database (Denmark)

    Archibald, Alan L.; Bolund, Lars; Churcher, Carol

    2010-01-01

    preferentially selected for sequencing. In accordance with the Bermuda and Fort Lauderdale agreements and the more recent Toronto Statement the data have been released into public sequence repositories (Genbank/EMBL, NCBI/Ensembl trace repositories) in a timely manner and in advance of publication. CONCLUSIONS...

  1. Swab-to-Sequence: Real-time Data Analysis Platform for the Biomolecule Sequencer

    Data.gov (United States)

    National Aeronautics and Space Administration — DNA was successfully sequenced on the ISS in 2016, but the DNA sequenced was prepared on the ground. With FY’16 IRAD funds, the same team developed a...

  2. From Sequence to Morphology - Long-Range Correlations in Complete Sequenced Genomes

    NARCIS (Netherlands)

    T.A. Knoch (Tobias)

    2004-01-01

    textabstractThe largely unresolved sequential organization, i.e. the relations within DNA sequences, and its connection to the three-dimensional organization of genomes was investigated by correlation analyses of completely sequenced chromosomes from Viroids, Archaea, Bacteria, Arabidopsis

  3. Exome-wide DNA capture and next generation sequencing in domestic and wild species

    Directory of Open Access Journals (Sweden)

    Ng Sarah B

    2011-07-01

    Full Text Available Abstract Background Gene-targeted and genome-wide markers are crucial to advance evolutionary biology, agriculture, and biodiversity conservation by improving our understanding of genetic processes underlying adaptation and speciation. Unfortunately, for eukaryotic species with large genomes it remains costly to obtain genome sequences and to develop genome resources such as genome-wide SNPs. A method is needed to allow gene-targeted, next-generation sequencing that is flexible enough to include any gene or number of genes, unlike transcriptome sequencing. Such a method would allow sequencing of many individuals, avoiding ascertainment bias in subsequent population genetic analyses. We demonstrate the usefulness of a recent technology, exon capture, for genome-wide, gene-targeted marker discovery in species with no genome resources. We use coding gene sequences from the domestic cow genome sequence (Bos taurus to capture (enrich for, and subsequently sequence, thousands of exons of B. taurus, B. indicus, and Bison bison (wild bison. Our capture array has probes for 16,131 exons in 2,570 genes, including 203 candidate genes with known function and of interest for their association with disease and other fitness traits. Results We successfully sequenced and mapped exon sequences from across the 29 autosomes and X chromosome in the B. taurus genome sequence. Exon capture and high-throughput sequencing identified thousands of putative SNPs spread evenly across all reference chromosomes, in all three individuals, including hundreds of SNPs in our targeted candidate genes. Conclusions This study shows exon capture can be customized for SNP discovery in many individuals and for non-model species without genomic resources. Our captured exome subset was small enough for affordable next-generation sequencing, and successfully captured exons from a divergent wild species using the domestic cow genome as reference.

  4. Exome-wide DNA capture and next generation sequencing in domestic and wild species.

    Science.gov (United States)

    Cosart, Ted; Beja-Pereira, Albano; Chen, Shanyuan; Ng, Sarah B; Shendure, Jay; Luikart, Gordon

    2011-07-05

    Gene-targeted and genome-wide markers are crucial to advance evolutionary biology, agriculture, and biodiversity conservation by improving our understanding of genetic processes underlying adaptation and speciation. Unfortunately, for eukaryotic species with large genomes it remains costly to obtain genome sequences and to develop genome resources such as genome-wide SNPs. A method is needed to allow gene-targeted, next-generation sequencing that is flexible enough to include any gene or number of genes, unlike transcriptome sequencing. Such a method would allow sequencing of many individuals, avoiding ascertainment bias in subsequent population genetic analyses.We demonstrate the usefulness of a recent technology, exon capture, for genome-wide, gene-targeted marker discovery in species with no genome resources. We use coding gene sequences from the domestic cow genome sequence (Bos taurus) to capture (enrich for), and subsequently sequence, thousands of exons of B. taurus, B. indicus, and Bison bison (wild bison). Our capture array has probes for 16,131 exons in 2,570 genes, including 203 candidate genes with known function and of interest for their association with disease and other fitness traits. We successfully sequenced and mapped exon sequences from across the 29 autosomes and X chromosome in the B. taurus genome sequence. Exon capture and high-throughput sequencing identified thousands of putative SNPs spread evenly across all reference chromosomes, in all three individuals, including hundreds of SNPs in our targeted candidate genes. This study shows exon capture can be customized for SNP discovery in many individuals and for non-model species without genomic resources. Our captured exome subset was small enough for affordable next-generation sequencing, and successfully captured exons from a divergent wild species using the domestic cow genome as reference.

  5. Study design requirements for RNA sequencing-based breast cancer diagnostics.

    Science.gov (United States)

    Mer, Arvind Singh; Klevebring, Daniel; Grönberg, Henrik; Rantalainen, Mattias

    2016-02-01

    Sequencing-based molecular characterization of tumors provides information required for individualized cancer treatment. There are well-defined molecular subtypes of breast cancer that provide improved prognostication compared to routine biomarkers. However, molecular subtyping is not yet implemented in routine breast cancer care. Clinical translation is dependent on subtype prediction models providing high sensitivity and specificity. In this study we evaluate sample size and RNA-sequencing read requirements for breast cancer subtyping to facilitate rational design of translational studies. We applied subsampling to ascertain the effect of training sample size and the number of RNA sequencing reads on classification accuracy of molecular subtype and routine biomarker prediction models (unsupervised and supervised). Subtype classification accuracy improved with increasing sample size up to N = 750 (accuracy = 0.93), although with a modest improvement beyond N = 350 (accuracy = 0.92). Prediction of routine biomarkers achieved accuracy of 0.94 (ER) and 0.92 (Her2) at N = 200. Subtype classification improved with RNA-sequencing library size up to 5 million reads. Development of molecular subtyping models for cancer diagnostics requires well-designed studies. Sample size and the number of RNA sequencing reads directly influence accuracy of molecular subtyping. Results in this study provide key information for rational design of translational studies aiming to bring sequencing-based diagnostics to the clinic.

  6. Exploiting sequence and stability information for directing nanobody stability engineering.

    Science.gov (United States)

    Kunz, Patrick; Flock, Tilman; Soler, Nicolas; Zaiss, Moritz; Vincke, Cécile; Sterckx, Yann; Kastelic, Damjana; Muyldermans, Serge; Hoheisel, Jörg D

    2017-09-01

    Variable domains of camelid heavy-chain antibodies, commonly named nanobodies, have high biotechnological potential. In view of their broad range of applications in research, diagnostics and therapy, engineering their stability is of particular interest. One important aspect is the improvement of thermostability, because it can have immediate effects on conformational stability, protease resistance and aggregation propensity of the protein. We analyzed the sequences and thermostabilities of 78 purified nanobody binders. From this data, potentially stabilizing amino acid variations were identified and studied experimentally. Some mutations improved the stability of nanobodies by up to 6.1°C, with an average of 2.3°C across eight modified nanobodies. The stabilizing mechanism involves an improvement of both conformational stability and aggregation behavior, explaining the variable degree of stabilization in individual molecules. In some instances, variations predicted to be stabilizing actually led to thermal destabilization of the proteins. The reasons for this contradiction between prediction and experiment were investigated. The results reveal a mutational strategy to improve the biophysical behavior of nanobody binders and indicate a species-specificity of nanobody architecture. This study illustrates the potential and limitations of engineering nanobody thermostability by merging sequence information with stability data, an aspect that is becoming increasingly important with the recent development of high-throughput biophysical methods. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.

  7. Nucleotide sequence preservation of human mitochondrial DNA

    International Nuclear Information System (INIS)

    Monnat, R.J. Jr.; Loeb, L.A.

    1985-01-01

    Recombinant DNA techniques have been used to quantitate the amount of nucleotide sequence divergence in the mitochondrial DNA population of individual normal humans. Mitochondrial DNA was isolated from the peripheral blood lymphocytes of five normal humans and cloned in M13 mp11; 49 kilobases of nucleotide sequence information was obtained from 248 independently isolated clones from the five normal donors. Both between- and within-individual differences were identified. Between-individual differences were identified in approximately = to 1/200 nucleotides. In contrast, only one within-individual difference was identified in 49 kilobases of nucleotide sequence information. This high degree of mitochondrial nucleotide sequence homogeneity in human somatic cells is in marked contrast to the rapid evolutionary divergence of human mitochondrial DNA and suggests the existence of mechanisms for the concerted preservation of mammalian mitochondrial DNA sequences in single organisms

  8. Snake Genome Sequencing: Results and Future Prospects.

    Science.gov (United States)

    Kerkkamp, Harald M I; Kini, R Manjunatha; Pospelov, Alexey S; Vonk, Freek J; Henkel, Christiaan V; Richardson, Michael K

    2016-12-01

    Snake genome sequencing is in its infancy-very much behind the progress made in sequencing the genomes of humans, model organisms and pathogens relevant to biomedical research, and agricultural species. We provide here an overview of some of the snake genome projects in progress, and discuss the biological findings, with special emphasis on toxinology, from the small number of draft snake genomes already published. We discuss the future of snake genomics, pointing out that new sequencing technologies will help overcome the problem of repetitive sequences in assembling snake genomes. Genome sequences are also likely to be valuable in examining the clustering of toxin genes on the chromosomes, in designing recombinant antivenoms and in studying the epigenetic regulation of toxin gene expression.

  9. Sequencing Cyclic Peptides by Multistage Mass Spectrometry

    Science.gov (United States)

    Mohimani, Hosein; Yang, Yu-Liang; Liu, Wei-Ting; Hsieh, Pei-Wen; Dorrestein, Pieter C.; Pevzner, Pavel A.

    2012-01-01

    Some of the most effective antibiotics (e.g., Vancomycin and Daptomycin) are cyclic peptides produced by non-ribosomal biosynthetic pathways. While hundreds of biomedically important cyclic peptides have been sequenced, the computational techniques for sequencing cyclic peptides are still in their infancy. Previous methods for sequencing peptide antibiotics and other cyclic peptides are based on Nuclear Magnetic Resonance spectroscopy, and require large amount (miligrams) of purified materials that, for most compounds, are not possible to obtain. Recently, development of mass spectrometry based methods has provided some hope for accurate sequencing of cyclic peptides using picograms of materials. In this paper we develop a method for sequencing of cyclic peptides by multistage mass spectrometry, and show its advantages over single stage mass spectrometry. The method is tested on known and new cyclic peptides from Bacillus brevis, Dianthus superbus and Streptomyces griseus, as well as a new family of cyclic peptides produced by marine bacteria. PMID:21751357

  10. Snake Genome Sequencing: Results and Future Prospects

    Directory of Open Access Journals (Sweden)

    Harald M. I. Kerkkamp

    2016-12-01

    Full Text Available Snake genome sequencing is in its infancy—very much behind the progress made in sequencing the genomes of humans, model organisms and pathogens relevant to biomedical research, and agricultural species. We provide here an overview of some of the snake genome projects in progress, and discuss the biological findings, with special emphasis on toxinology, from the small number of draft snake genomes already published. We discuss the future of snake genomics, pointing out that new sequencing technologies will help overcome the problem of repetitive sequences in assembling snake genomes. Genome sequences are also likely to be valuable in examining the clustering of toxin genes on the chromosomes, in designing recombinant antivenoms and in studying the epigenetic regulation of toxin gene expression.

  11. Sequencing and comparing whole mitochondrial genomes ofanimals

    Energy Technology Data Exchange (ETDEWEB)

    Boore, Jeffrey L.; Macey, J. Robert; Medina, Monica

    2005-04-22

    Comparing complete animal mitochondrial genome sequences is becoming increasingly common for phylogenetic reconstruction and as a model for genome evolution. Not only are they much more informative than shorter sequences of individual genes for inferring evolutionary relatedness, but these data also provide sets of genome-level characters, such as the relative arrangements of genes, that can be especially powerful. We describe here the protocols commonly used for physically isolating mtDNA, for amplifying these by PCR or RCA, for cloning,sequencing, assembly, validation, and gene annotation, and for comparing both sequences and gene arrangements. On several topics, we offer general observations based on our experiences to date with determining and comparing complete mtDNA sequences.

  12. Divide and conquer: enriching environmental sequencing data.

    Directory of Open Access Journals (Sweden)

    Anne Bergeron

    2007-09-01

    Full Text Available In environmental sequencing projects, a mix of DNA from a whole microbial community is fragmented and sequenced, with one of the possible goals being to reconstruct partial or complete genomes of members of the community. In communities with high diversity of species, a significant proportion of the sequences do not overlap any other fragment in the sample. This problem will arise not only in situations with a relatively even distribution of many species, but also when the community in a particular environment is routinely dominated by the same few species. In the former case, no genomes may be assembled at all, while in the latter case a few dominant species in an environment will always be sequenced at high coverage to the detriment of coverage of the greater number of sparse species.Here we show that, with the same global sequencing effort, separating the species into two or more sub-communities prior to sequencing can yield a much higher proportion of sequences that can be assembled. We first use the Lander-Waterman model to show that, if the expected percentage of singleton sequences is higher than 25%, then, under the uniform distribution hypothesis, splitting the community is always a wise choice. We then construct simulated microbial communities to show that the results hold for highly non-uniform distributions. We also show that, for the distributions considered in the experiments, it is possible to estimate quite accurately the relative diversity of the two sub-communities.Given the fact that several methods exist to split microbial communities based on physical properties such as size, density, surface biochemistry, or optical properties, we strongly suggest that groups involved in environmental sequencing, and expecting high diversity, consider splitting their communities in order to maximize the information content of their sequencing effort.

  13. Collaborative Improvement

    DEFF Research Database (Denmark)

    Kaltoft, Rasmus

    -organisational continuous improvement of their performance, relative to that of other EMEs. Developing a collaborative improvement relationship between companies is a protracted and complex process and, according to some surveys, the failure rate is as low as one to three. This failure rate is affected by a whole range...... of factors. The research presented in this thesis was aimed at identifying these factors and investigating their interplay and influence on the progress and success of the development of the collaborative improvement. This thesis presents our findings regarding the factors found, their interplay...

  14. Making Strategic Decisions: Conducting and Using Research on the Impact of Sequenced Library Instruction

    Science.gov (United States)

    Lundstrom, Kacy; Martin, Pamela; Cochran, Dory

    2016-01-01

    This study explores the relationship between course grades and sequenced library instruction interventions throughout psychology students' curriculum. Researchers conducted this study to inform decisions about sustaining and improving program integrations for first- and second-year composition courses and to improve discipline-level integrations.…

  15. Sequence Matters but How Exactly? A Method for Evaluating Activity Sequences from Data

    Science.gov (United States)

    Doroudi, Shayan; Holstein, Kenneth; Aleven, Vincent; Brunskill, Emma

    2016-01-01

    How should a wide variety of educational activities be sequenced to maximize student learning? Although some experimental studies have addressed this question, educational data mining methods may be able to evaluate a wider range of possibilities and better handle many simultaneous sequencing constraints. We introduce Sequencing Constraint…

  16. Identification of human chromosome 22 transcribed sequences with ORF expressed sequence tags

    DEFF Research Database (Denmark)

    de Souza, S J; Camargo, A A; Briones, M R

    2000-01-01

    Transcribed sequences in the human genome can be identified with confidence only by alignment with sequences derived from cDNAs synthesized from naturally occurring mRNAs. We constructed a set of 250,000 cDNAs that represent partial expressed gene sequences and that are biased toward the central ...

  17. Advantages of genome sequencing by long-read sequencer using SMRT technology in medical area.

    Science.gov (United States)

    Nakano, Kazuma; Shiroma, Akino; Shimoji, Makiko; Tamotsu, Hinako; Ashimine, Noriko; Ohki, Shun; Shinzato, Misuzu; Minami, Maiko; Nakanishi, Tetsuhiro; Teruya, Kuniko; Satou, Kazuhito; Hirano, Takashi

    2017-07-01

    PacBio RS II is the first commercialized third-generation DNA sequencer able to sequence a single molecule DNA in real-time without amplification. PacBio RS II's sequencing technology is novel and unique, enabling the direct observation of DNA synthesis by DNA polymerase. PacBio RS II confers four major advantages compared to other sequencing technologies: long read lengths, high consensus accuracy, a low degree of bias, and simultaneous capability of epigenetic characterization. These advantages surmount the obstacle of sequencing genomic regions such as high/low G+C, tandem repeat, and interspersed repeat regions. Moreover, PacBio RS II is ideal for whole genome sequencing, targeted sequencing, complex population analysis, RNA sequencing, and epigenetics characterization. With PacBio RS II, we have sequenced and analyzed the genomes of many species, from viruses to humans. Herein, we summarize and review some of our key genome sequencing projects, including full-length viral sequencing, complete bacterial genome and almost-complete plant genome assemblies, and long amplicon sequencing of a disease-associated gene region. We believe that PacBio RS II is not only an effective tool for use in the basic biological sciences but also in the medical/clinical setting.

  18. High Performance Systolic Array Core Architecture Design for DNA Sequencer

    Directory of Open Access Journals (Sweden)

    Saiful Nurdin Dayana

    2018-01-01

    Full Text Available This paper presents a high performance systolic array (SA core architecture design for Deoxyribonucleic Acid (DNA sequencer. The core implements the affine gap penalty score Smith-Waterman (SW algorithm. This time-consuming local alignment algorithm guarantees optimal alignment between DNA sequences, but it requires quadratic computation time when performed on standard desktop computers. The use of linear SA decreases the time complexity from quadratic to linear. In addition, with the exponential growth of DNA databases, the SA architecture is used to overcome the timing issue. In this work, the SW algorithm has been captured using Verilog Hardware Description Language (HDL and simulated using Xilinx ISIM simulator. The proposed design has been implemented in Xilinx Virtex -6 Field Programmable Gate Array (FPGA and improved in the core area by 90% reduction.

  19. Transcriptome sequences resolve deep relationships of the grape family.

    Science.gov (United States)

    Wen, Jun; Xiong, Zhiqiang; Nie, Ze-Long; Mao, Likai; Zhu, Yabing; Kan, Xian-Zhao; Ickert-Bond, Stefanie M; Gerrath, Jean; Zimmer, Elizabeth A; Fang, Xiao-Dong

    2013-01-01

    Previous phylogenetic studies of the grape family (Vitaceae) yielded poorly resolved deep relationships, thus impeding our understanding of the evolution of the family. Next-generation sequencing now offers access to protein coding sequences very easily, quickly and cost-effectively. To improve upon earlier work, we extracted 417 orthologous single-copy nuclear genes from the transcriptomes of 15 species of the Vitaceae, covering its phylogenetic diversity. The resulting transcriptome phylogeny provides robust support for the deep relationships, showing the phylogenetic utility of transcriptome data for plants over a time scale at least since the mid-Cretaceous. The pros and cons of transcriptome data for phylogenetic inference in plants are also evaluated.

  20. Transcriptome sequences resolve deep relationships of the grape family.

    Directory of Open Access Journals (Sweden)

    Jun Wen

    Full Text Available Previous phylogenetic studies of the grape family (Vitaceae yielded poorly resolved deep relationships, thus impeding our understanding of the evolution of the family. Next-generation sequencing now offers access to protein coding sequences very easily, quickly and cost-effectively. To improve upon earlier work, we extracted 417 orthologous single-copy nuclear genes from the transcriptomes of 15 species of the Vitaceae, covering its phylogenetic diversity. The resulting transcriptome phylogeny provides robust support for the deep relationships, showing the phylogenetic utility of transcriptome data for plants over a time scale at least since the mid-Cretaceous. The pros and cons of transcriptome data for phylogenetic inference in plants are also evaluated.