WorldWideScience

Sample records for high sequence similarity

  1. FRESCO: Referential compression of highly similar sequences.

    Science.gov (United States)

    Wandelt, Sebastian; Leser, Ulf

    2013-01-01

    In many applications, sets of similar texts or sequences are of high importance. Prominent examples are revision histories of documents or genomic sequences. Modern high-throughput sequencing technologies are able to generate DNA sequences at an ever-increasing rate. In parallel to the decreasing experimental time and cost necessary to produce DNA sequences, computational requirements for analysis and storage of the sequences are steeply increasing. Compression is a key technology to deal with this challenge. Recently, referential compression schemes, storing only the differences between a to-be-compressed input and a known reference sequence, gained a lot of interest in this field. In this paper, we propose a general open-source framework to compress large amounts of biological sequence data called Framework for REferential Sequence COmpression (FRESCO). Our basic compression algorithm is shown to be one to two orders of magnitudes faster than comparable related work, while achieving similar compression ratios. We also propose several techniques to further increase compression ratios, while still retaining the advantage in speed: 1) selecting a good reference sequence; and 2) rewriting a reference sequence to allow for better compression. In addition,we propose a new way of further boosting the compression ratios by applying referential compression to already referentially compressed files (second-order compression). This technique allows for compression ratios way beyond state of the art, for instance,4,000:1 and higher for human genomes. We evaluate our algorithms on a large data set from three different species (more than 1,000 genomes, more than 3 TB) and on a collection of versions of Wikipedia pages. Our results show that real-time compression of highly similar sequences at high compression ratios is possible on modern hardware.

  2. Characterization of CG6178 gene product with high sequence similarity to firefly luciferase in Drosophila melanogaster.

    Science.gov (United States)

    Oba, Yuichi; Ojika, Makoto; Inouye, Satoshi

    2004-03-31

    This is the first identification of a long-chain fatty acyl-CoA synthetase in Drosophila by enzymatic characterization. The gene product of CG6178 (CG6178) in Drosophila melanogaster genome, which has a high sequence similarity to firefly luciferase, has been expressed and characterized. CG6178 showed long-chain fatty acyl-CoA synthetic activity in the presence of ATP, CoA and Mg(2+), suggesting a fatty acyl adenylate is an intermediate. Recently, it was revealed that firefly luciferase has two catalytic functions, monooxygenase (luciferase) and AMP-mediated CoA ligase (fatty acyl-CoA synthetase). However, unlike firefly luciferase, CG6178 did not show luminescence activity in the presence of firefly luciferin, ATP, CoA and Mg(2+). The enzymatic properties of CG6178 including substrate specificity, pH dependency and optimal temperature were close to those of firefly luciferase and rat fatty acyl-CoA synthetase. Further, phylogenic analyses strongly suggest that the firefly luciferase gene may have evolved from a fatty acyl-CoA synthetase gene as a common ancestral gene.

  3. Single nucleus genome sequencing reveals high similarity among nuclei of an endomycorrhizal fungus.

    Directory of Open Access Journals (Sweden)

    Kui Lin

    2014-01-01

    Full Text Available Nuclei of arbuscular endomycorrhizal fungi have been described as highly diverse due to their asexual nature and absence of a single cell stage with only one nucleus. This has raised fundamental questions concerning speciation, selection and transmission of the genetic make-up to next generations. Although this concept has become textbook knowledge, it is only based on studying a few loci, including 45S rDNA. To provide a more comprehensive insight into the genetic makeup of arbuscular endomycorrhizal fungi, we applied de novo genome sequencing of individual nuclei of Rhizophagus irregularis. This revealed a surprisingly low level of polymorphism between nuclei. In contrast, within a nucleus, the 45S rDNA repeat unit turned out to be highly diverged. This finding demystifies a long-lasting hypothesis on the complex genetic makeup of arbuscular endomycorrhizal fungi. Subsequent genome assembly resulted in the first draft reference genome sequence of an arbuscular endomycorrhizal fungus. Its length is 141 Mbps, representing over 27,000 protein-coding gene models. We used the genomic sequence to reinvestigate the phylogenetic relationships of Rhizophagus irregularis with other fungal phyla. This unambiguously demonstrated that Glomeromycota are more closely related to Mucoromycotina than to its postulated sister Dikarya.

  4. Analysis of HIV-1 intersubtype recombination breakpoints suggests region with high pairing probability may be a more fundamental factor than sequence similarity affecting HIV-1 recombination.

    Science.gov (United States)

    Jia, Lei; Li, Lin; Gui, Tao; Liu, Siyang; Li, Hanping; Han, Jingwan; Guo, Wei; Liu, Yongjian; Li, Jingyun

    2016-09-21

    With increasing data on HIV-1, a more relevant molecular model describing mechanism details of HIV-1 genetic recombination usually requires upgrades. Currently an incomplete structural understanding of the copy choice mechanism along with several other issues in the field that lack elucidation led us to perform an analysis of the correlation between breakpoint distributions and (1) the probability of base pairing, and (2) intersubtype genetic similarity to further explore structural mechanisms. Near full length sequences of URFs from Asia, Europe, and Africa (one sequence/patient), and representative sequences of worldwide CRFs were retrieved from the Los Alamos HIV database. Their recombination patterns were analyzed by jpHMM in detail. Then the relationships between breakpoint distributions and (1) the probability of base pairing, and (2) intersubtype genetic similarities were investigated. Pearson correlation test showed that all URF groups and the CRF group exhibit the same breakpoint distribution pattern. Additionally, the Wilcoxon two-sample test indicated a significant and inexplicable limitation of recombination in regions with high pairing probability. These regions have been found to be strongly conserved across distinct biological states (i.e., strong intersubtype similarity), and genetic similarity has been determined to be a very important factor promoting recombination. Thus, the results revealed an unexpected disagreement between intersubtype similarity and breakpoint distribution, which were further confirmed by genetic similarity analysis. Our analysis reveals a critical conflict between results from natural HIV-1 isolates and those from HIV-1-based assay vectors in which genetic similarity has been shown to be a very critical factor promoting recombination. These results indicate the region with high-pairing probabilities may be a more fundamental factor affecting HIV-1 recombination than sequence similarity in natural HIV-1 infections. Our

  5. Characterization of human MMTV-like (HML) elements similar to a sequence that was highly expressed in a human breast cancer: further definition of the HML-6 group.

    Science.gov (United States)

    Yin, H; Medstrand, P; Kristofferson, A; Dietrich, U; Aman, P; Blomberg, J

    1999-03-30

    Previously, we found a retroviral sequence, HML-6.2BC1, to be expressed at high levels in a multifocal ductal breast cancer from a 41-year-old woman who also developed ovarian carcinoma. The sequence of a human genomic clone (HML-6.28) selected by high-stringency hybridization with HML-6.2BC1 is reported here. It was 99% identical to HML-6.2BC1 and gave the same restriction fragments as total DNA. HML-6.28 is a 4.7-kb provirus with a 5'LTR, truncated in RT. Data from two similar genomic clones and sequences found in GenBank are also reported. Overlaps between them gave a rather complete picture of the HML-6.2BC1-like human endogenous retroviral elements. Work with somatic cell hybrids and FISH localized HML-6.28 to chromosome 6, band p21, close to the MHC region. The causal role of HML-6.28 in breast cancer remains unclear. Nevertheless, the ca. 20 Myr old HML-6 sequences enabled the definition of common and unique features of type A, B, and D (ABD) retroviruses. In Gag, HML-6 has no intervening sequences between matrix and capsid proteins, unlike extant exogenous ABD viruses, possibly an ancestral feature. Alignment of the dUTPase showed it to be present in all ABD viruses, but gave a phylogenetic tree different from trees made from other ABD genes, indicating a distinct phylogeny of dUTPase. A conserved 24-mer sequence in the amino terminus of some ABD envelope genes suggested a conserved function. Copyright 1999 Academic Press.

  6. Investigating Correlation between Protein Sequence Similarity and Semantic Similarity Using Gene Ontology Annotations.

    Science.gov (United States)

    Ikram, Najmul; Qadir, Muhammad Abdul; Afzal, Muhammad Tanvir

    2018-01-01

    Sequence similarity is a commonly used measure to compare proteins. With the increasing use of ontologies, semantic (function) similarity is getting importance. The correlation between these measures has been applied in the evaluation of new semantic similarity methods, and in protein function prediction. In this research, we investigate the relationship between the two similarity methods. The results suggest absence of a strong correlation between sequence and semantic similarities. There is a large number of proteins with low sequence similarity and high semantic similarity. We observe that Pearson's correlation coefficient is not sufficient to explain the nature of this relationship. Interestingly, the term semantic similarity values above 0 and below 1 do not seem to play a role in improving the correlation. That is, the correlation coefficient depends only on the number of common GO terms in proteins under comparison, and the semantic similarity measurement method does not influence it. Semantic similarity and sequence similarity have a distinct behavior. These findings are of significant effect for future works on protein comparison, and will help understand the semantic similarity between proteins in a better way.

  7. Exploring the relationship between sequence similarity and accurate phylogenetic trees.

    Science.gov (United States)

    Cantarel, Brandi L; Morrison, Hilary G; Pearson, William

    2006-11-01

    We have characterized the relationship between accurate phylogenetic reconstruction and sequence similarity, testing whether high levels of sequence similarity can consistently produce accurate evolutionary trees. We generated protein families with known phylogenies using a modified version of the PAML/EVOLVER program that produces insertions and deletions as well as substitutions. Protein families were evolved over a range of 100-400 point accepted mutations; at these distances 63% of the families shared significant sequence similarity. Protein families were evolved using balanced and unbalanced trees, with ancient or recent radiations. In families sharing statistically significant similarity, about 60% of multiple sequence alignments were 95% identical to true alignments. To compare recovered topologies with true topologies, we used a score that reflects the fraction of clades that were correctly clustered. As expected, the accuracy of the phylogenies was greatest in the least divergent families. About 88% of phylogenies clustered over 80% of clades in families that shared significant sequence similarity, using Bayesian, parsimony, distance, and maximum likelihood methods. However, for protein families with short ancient branches (ancient radiation), only 30% of the most divergent (but statistically significant) families produced accurate phylogenies, and only about 70% of the second most highly conserved families, with median expectation values better than 10(-60), produced accurate trees. These values represent upper bounds on expected tree accuracy for sequences with a simple divergence history; proteins from 700 Giardia families, with a similar range of sequence similarities but considerably more gaps, produced much less accurate trees. For our simulated insertions and deletions, correct multiple sequence alignments did not perform much better than those produced by T-COFFEE, and including sequences with expressed sequence tag-like sequencing errors did not

  8. Interference effects in learning similar sequences of discrete movements

    NARCIS (Netherlands)

    Koedijker, J.M.; Oudejans, R.R.D.; Beek, P.J.

    2010-01-01

    Three experiments were conducted to examine proactive and retroactive interference effects in learning two similar sequences of discrete movements. In each experiment, the participants in the experimental group practiced two movement sequences on consecutive days (1 on each day, order

  9. Sequence Similarity Presenter: a tool for the graphic display of similarities of long sequences for use in presentations.

    Science.gov (United States)

    Fröhlich, K U

    1994-04-01

    A new method for the presentation of alignments of long sequences is described. The degree of identity for the aligned sequences is averaged for sections of a fixed number of residues. The resulting values are converted to shades of gray, with white corresponding to lack of identity and black corresponding to perfect identity. A sequence alignment is represented as a bar filled with varying shades of gray. The display is compact and allows for a fast and intuitive recognition of the distribution of regions with a high similarity. It is well suited for the presentation of alignments of long sequences, e.g. of protein superfamilies, in plenary lectures. The method is implemented as a HyperCard stack for Apple Macintosh computers. Several options for the modification of the output are available (e.g. background reduction, size of the summation window, consideration of amino acid similarity, inclusion of graphic markers to indicate specific domains). The output is a PostScript file which can be printed, imported as EPS or processed further with Adobe Illustrator.

  10. BLAST and FASTA similarity searching for multiple sequence alignment.

    Science.gov (United States)

    Pearson, William R

    2014-01-01

    BLAST, FASTA, and other similarity searching programs seek to identify homologous proteins and DNA sequences based on excess sequence similarity. If two sequences share much more similarity than expected by chance, the simplest explanation for the excess similarity is common ancestry-homology. The most effective similarity searches compare protein sequences, rather than DNA sequences, for sequences that encode proteins, and use expectation values, rather than percent identity, to infer homology. The BLAST and FASTA packages of sequence comparison programs provide programs for comparing protein and DNA sequences to protein databases (the most sensitive searches). Protein and translated-DNA comparisons to protein databases routinely allow evolutionary look back times from 1 to 2 billion years; DNA:DNA searches are 5-10-fold less sensitive. BLAST and FASTA can be run on popular web sites, but can also be downloaded and installed on local computers. With local installation, target databases can be customized for the sequence data being characterized. With today's very large protein databases, search sensitivity can also be improved by searching smaller comprehensive databases, for example, a complete protein set from an evolutionarily neighboring model organism. By default, BLAST and FASTA use scoring strategies target for distant evolutionary relationships; for comparisons involving short domains or queries, or searches that seek relatively close homologs (e.g. mouse-human), shallower scoring matrices will be more effective. Both BLAST and FASTA provide very accurate statistical estimates, which can be used to reliably identify protein sequences that diverged more than 2 billion years ago.

  11. Using SQL Databases for Sequence Similarity Searching and Analysis.

    Science.gov (United States)

    Pearson, William R; Mackey, Aaron J

    2017-09-13

    Relational databases can integrate diverse types of information and manage large sets of similarity search results, greatly simplifying genome-scale analyses. By focusing on taxonomic subsets of sequences, relational databases can reduce the size and redundancy of sequence libraries and improve the statistical significance of homologs. In addition, by loading similarity search results into a relational database, it becomes possible to explore and summarize the relationships between all of the proteins in an organism and those in other biological kingdoms. This unit describes how to use relational databases to improve the efficiency of sequence similarity searching and demonstrates various large-scale genomic analyses of homology-related data. It also describes the installation and use of a simple protein sequence database, seqdb_demo, which is used as a basis for the other protocols. The unit also introduces search_demo, a database that stores sequence similarity search results. The search_demo database is then used to explore the evolutionary relationships between E. coli proteins and proteins in other organisms in a large-scale comparative genomic analysis. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.

  12. The HMMER Web Server for Protein Sequence Similarity Search.

    Science.gov (United States)

    Prakash, Ananth; Jeffryes, Matt; Bateman, Alex; Finn, Robert D

    2017-12-08

    Protein sequence similarity search is one of the most commonly used bioinformatics methods for identifying evolutionarily related proteins. In general, sequences that are evolutionarily related share some degree of similarity, and sequence-search algorithms use this principle to identify homologs. The requirement for a fast and sensitive sequence search method led to the development of the HMMER software, which in the latest version (v3.1) uses a combination of sophisticated acceleration heuristics and mathematical and computational optimizations to enable the use of profile hidden Markov models (HMMs) for sequence analysis. The HMMER Web server provides a common platform by linking the HMMER algorithms to databases, thereby enabling the search for homologs, as well as providing sequence and functional annotation by linking external databases. This unit describes three basic protocols and two alternate protocols that explain how to use the HMMER Web server using various input formats and user defined parameters. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.

  13. Application of discrete Fourier inter-coefficient difference for assessing genetic sequence similarity.

    Science.gov (United States)

    King, Brian R; Aburdene, Maurice; Thompson, Alex; Warres, Zach

    2014-01-01

    Digital signal processing (DSP) techniques for biological sequence analysis continue to grow in popularity due to the inherent digital nature of these sequences. DSP methods have demonstrated early success for detection of coding regions in a gene. Recently, these methods are being used to establish DNA gene similarity. We present the inter-coefficient difference (ICD) transformation, a novel extension of the discrete Fourier transformation, which can be applied to any DNA sequence. The ICD method is a mathematical, alignment-free DNA comparison method that generates a genetic signature for any DNA sequence that is used to generate relative measures of similarity among DNA sequences. We demonstrate our method on a set of insulin genes obtained from an evolutionarily wide range of species, and on a set of avian influenza viral sequences, which represents a set of highly similar sequences. We compare phylogenetic trees generated using our technique against trees generated using traditional alignment techniques for similarity and demonstrate that the ICD method produces a highly accurate tree without requiring an alignment prior to establishing sequence similarity.

  14. Model-free aftershock forecasts constructed from similar sequences in the past

    Science.gov (United States)

    van der Elst, N.; Page, M. T.

    2017-12-01

    The basic premise behind aftershock forecasting is that sequences in the future will be similar to those in the past. Forecast models typically use empirically tuned parametric distributions to approximate past sequences, and project those distributions into the future to make a forecast. While parametric models do a good job of describing average outcomes, they are not explicitly designed to capture the full range of variability between sequences, and can suffer from over-tuning of the parameters. In particular, parametric forecasts may produce a high rate of "surprises" - sequences that land outside the forecast range. Here we present a non-parametric forecast method that cuts out the parametric "middleman" between training data and forecast. The method is based on finding past sequences that are similar to the target sequence, and evaluating their outcomes. We quantify similarity as the Poisson probability that the observed event count in a past sequence reflects the same underlying intensity as the observed event count in the target sequence. Event counts are defined in terms of differential magnitude relative to the mainshock. The forecast is then constructed from the distribution of past sequences outcomes, weighted by their similarity. We compare the similarity forecast with the Reasenberg and Jones (RJ95) method, for a set of 2807 global aftershock sequences of M≥6 mainshocks. We implement a sequence-specific RJ95 forecast using a global average prior and Bayesian updating, but do not propagate epistemic uncertainty. The RJ95 forecast is somewhat more precise than the similarity forecast: 90% of observed sequences fall within a factor of two of the median RJ95 forecast value, whereas the fraction is 85% for the similarity forecast. However, the surprise rate is much higher for the RJ95 forecast; 10% of observed sequences fall in the upper 2.5% of the (Poissonian) forecast range. The surprise rate is less than 3% for the similarity forecast. The similarity

  15. Pairwise local structural alignment of RNA sequences with sequence similarity less than 40%

    DEFF Research Database (Denmark)

    Havgaard, Jakob Hull; Lyngsø, Rune B.; Stormo, Gary D.

    2005-01-01

    detect two genes with low sequence similarity, where the genes are part of a larger genomic region. Results: Here we present such an approach for pairwise local alignment which is based on FILDALIGN and the Sankoff algorithm for simultaneous structural alignment of multiple sequences. We include...... the ability to conduct mutual scans of two sequences of arbitrary length while searching for common local structural motifs of some maximum length. This drastically reduces the complexity of the algorithm. The scoring scheme includes structural parameters corresponding to those available for free energy....... The structure prediction performance for a family is typically around 0.7 using Matthews correlation coefficient. In case (2), the algorithm is successful at locating RNA families with an average sensitivity of 0.8 and a positive predictive value of 0.9 using a BLAST-like hit selection scheme. Availability...

  16. Similar Ratios of Introns to Intergenic Sequence across Animal Genomes.

    Science.gov (United States)

    Francis, Warren R; Wörheide, Gert

    2017-06-01

    One central goal of genome biology is to understand how the usage of the genome differs between organisms. Our knowledge of genome composition, needed for downstream inferences, is critically dependent on gene annotations, yet problems associated with gene annotation and assembly errors are usually ignored in comparative genomics. Here, we analyze the genomes of 68 species across 12 animal phyla and some single-cell eukaryotes for general trends in genome composition and transcription, taking into account problems of gene annotation. We show that, regardless of genome size, the ratio of introns to intergenic sequence is comparable across essentially all animals, with nearly all deviations dominated by increased intergenic sequence. Genomes of model organisms have ratios much closer to 1:1, suggesting that the majority of published genomes of nonmodel organisms are underannotated and consequently omit substantial numbers of genes, with likely negative impact on evolutionary interpretations. Finally, our results also indicate that most animals transcribe half or more of their genomes arguing against differences in genome usage between animal groups, and also suggesting that the transcribed portion is more dependent on genome size than previously thought. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  17. Similar representations of sequence knowledge in young and older adults: A study of effector independent transfer

    Directory of Open Access Journals (Sweden)

    Jonathan Sebastiaan Barnhoorn

    2016-08-01

    Full Text Available Older adults show reduced motor performance and changes in motor skill development. To better understand these changes, we studied differences in sequence knowledge representations between young and older adults using a transfer task. Transfer, or the ability to apply motor skills flexibly, is highly relevant in day-to-day motor activity and facilitates generalization of learning to new contexts. By using movement types that are completely unrelated in terms of muscle activation and response location, we focused on transfer facilitated by the early, visuospatial system.We tested 32 right-handed older adults (65 – 74 and 32 young adults (18 – 30. During practice of a discrete sequence production task, participants learned two 6-element sequences using either unimanual key-presses (KPs or by moving a lever with lower arm flexion-extension (FE movements. Each sequence was performed 144 times. They then performed a test phase consisting of familiar and random sequences performed with the type of movements not used during practice. Both age groups displayed transfer from FE to KP movements as indicated by faster performance on the familiar sequences in the test phase. Only young adults transferred their sequence knowledge from KP to FE movements. In both directions, the young showed higher transfer than older adults. These results suggest that the older participants, like the young, represented their sequences in an abstract visuospatial manner. Transfer was asymmetric in both age groups: there was more transfer from FE to KP movements than vice versa. This similar asymmetry is a further indication that the types of representations that older adults develop are comparable to those that young adults develop. We furthermore found that older adults improved less during FE practice, gained less explicit knowledge, displayed a smaller visuospatial working memory capacity and had lower processing speed than young adults. Despite the many differences

  18. Bidirectional gene sequences with similar homology to functional proteins of alkane degrading bacterium pseudomonas fredriksbergensis DNA

    International Nuclear Information System (INIS)

    Megeed, A.A.

    2011-01-01

    The potential for two overlapping fragments of DNA from a clone of newly isolated alkanes degrading bacterium Pseudomonas frederiksbergensis encoding sequences with similar homology to two parts of functional proteins is described. One strand contains a sequence with high homology to alkanes monooxygenase (alkB), a member of the alkanes hydroxylase family, and the other strand contains a sequence with some homology to alcohol dehydrogenase gene (alkJ). Overlapping of the genes on opposite strands has been reported in eukaryotic species, and is now reported in a bacterial species. The sequence comparisons and ORFS results revealed that the regulation and the genes organization involved in alkane oxidation represented in Pseudomonas frederiksberghensis varies among the different known alkane degrading bacteria. The alk gene cluster containing homologues to the known alkane monooxygenase (alkB), and rubredoxin (alkG) are oriented in the same direction, whereas alcohol dehydrogenase (alkJ) is oriented in the opposite direction. Such genomes encode messages on both strands of the DNA, or in an overlapping but different reading frames, of the same strand of DNA. The possibility of creating novel genes from pre-existing sequences, known as overprinting, which is a widespread phenomenon in small viruses. Here, the origin and evolution of the gene overlap to bacteriophages belonging to the family Microviridae have been investigated. Such a phenomenon is most widely described in extremely small genomes such as those of viruses or small plasmids, yet here is a unique phenomenon. (author)

  19. Using relational databases for improved sequence similarity searching and large-scale genomic analyses.

    Science.gov (United States)

    Mackey, Aaron J; Pearson, William R

    2004-10-01

    Relational databases are designed to integrate diverse types of information and manage large sets of search results, greatly simplifying genome-scale analyses. Relational databases are essential for management and analysis of large-scale sequence analyses, and can also be used to improve the statistical significance of similarity searches by focusing on subsets of sequence libraries most likely to contain homologs. This unit describes using relational databases to improve the efficiency of sequence similarity searching and to demonstrate various large-scale genomic analyses of homology-related data. This unit describes the installation and use of a simple protein sequence database, seqdb_demo, which is used as a basis for the other protocols. These include basic use of the database to generate a novel sequence library subset, how to extend and use seqdb_demo for the storage of sequence similarity search results and making use of various kinds of stored search results to address aspects of comparative genomic analysis.

  20. Testing statistical significance scores of sequence comparison methods with structure similarity

    Directory of Open Access Journals (Sweden)

    Leunissen Jack AM

    2006-10-01

    Full Text Available Abstract Background In the past years the Smith-Waterman sequence comparison algorithm has gained popularity due to improved implementations and rapidly increasing computing power. However, the quality and sensitivity of a database search is not only determined by the algorithm but also by the statistical significance testing for an alignment. The e-value is the most commonly used statistical validation method for sequence database searching. The CluSTr database and the Protein World database have been created using an alternative statistical significance test: a Z-score based on Monte-Carlo statistics. Several papers have described the superiority of the Z-score as compared to the e-value, using simulated data. We were interested if this could be validated when applied to existing, evolutionary related protein sequences. Results All experiments are performed on the ASTRAL SCOP database. The Smith-Waterman sequence comparison algorithm with both e-value and Z-score statistics is evaluated, using ROC, CVE and AP measures. The BLAST and FASTA algorithms are used as reference. We find that two out of three Smith-Waterman implementations with e-value are better at predicting structural similarities between proteins than the Smith-Waterman implementation with Z-score. SSEARCH especially has very high scores. Conclusion The compute intensive Z-score does not have a clear advantage over the e-value. The Smith-Waterman implementations give generally better results than their heuristic counterparts. We recommend using the SSEARCH algorithm combined with e-values for pairwise sequence comparisons.

  1. Correlation between protein sequence similarity and x-ray diffraction quality in the protein data bank.

    Science.gov (United States)

    Lu, Hui-Meng; Yin, Da-Chuan; Ye, Ya-Jing; Luo, Hui-Min; Geng, Li-Qiang; Li, Hai-Sheng; Guo, Wei-Hong; Shang, Peng

    2009-01-01

    As the most widely utilized technique to determine the 3-dimensional structure of protein molecules, X-ray crystallography can provide structure of the highest resolution among the developed techniques. The resolution obtained via X-ray crystallography is known to be influenced by many factors, such as the crystal quality, diffraction techniques, and X-ray sources, etc. In this paper, the authors found that the protein sequence could also be one of the factors. We extracted information of the resolution and the sequence of proteins from the Protein Data Bank (PDB), classified the proteins into different clusters according to the sequence similarity, and statistically analyzed the relationship between the sequence similarity and the best resolution obtained. The results showed that there was a pronounced correlation between the sequence similarity and the obtained resolution. These results indicate that protein structure itself is one variable that may affect resolution when X-ray crystallography is used.

  2. Using sequence similarity networks for visualization of relationships across diverse protein superfamilies.

    Directory of Open Access Journals (Sweden)

    Holly J Atkinson

    Full Text Available The dramatic increase in heterogeneous types of biological data--in particular, the abundance of new protein sequences--requires fast and user-friendly methods for organizing this information in a way that enables functional inference. The most widely used strategy to link sequence or structure to function, homology-based function prediction, relies on the fundamental assumption that sequence or structural similarity implies functional similarity. New tools that extend this approach are still urgently needed to associate sequence data with biological information in ways that accommodate the real complexity of the problem, while being accessible to experimental as well as computational biologists. To address this, we have examined the application of sequence similarity networks for visualizing functional trends across protein superfamilies from the context of sequence similarity. Using three large groups of homologous proteins of varying types of structural and functional diversity--GPCRs and kinases from humans, and the crotonase superfamily of enzymes--we show that overlaying networks with orthogonal information is a powerful approach for observing functional themes and revealing outliers. In comparison to other primary methods, networks provide both a good representation of group-wise sequence similarity relationships and a strong visual and quantitative correlation with phylogenetic trees, while enabling analysis and visualization of much larger sets of sequences than trees or multiple sequence alignments can easily accommodate. We also define important limitations and caveats in the application of these networks. As a broadly accessible and effective tool for the exploration of protein superfamilies, sequence similarity networks show great potential for generating testable hypotheses about protein structure-function relationships.

  3. Using sequence similarity networks for visualization of relationships across diverse protein superfamilies.

    Science.gov (United States)

    Atkinson, Holly J; Morris, John H; Ferrin, Thomas E; Babbitt, Patricia C

    2009-01-01

    The dramatic increase in heterogeneous types of biological data--in particular, the abundance of new protein sequences--requires fast and user-friendly methods for organizing this information in a way that enables functional inference. The most widely used strategy to link sequence or structure to function, homology-based function prediction, relies on the fundamental assumption that sequence or structural similarity implies functional similarity. New tools that extend this approach are still urgently needed to associate sequence data with biological information in ways that accommodate the real complexity of the problem, while being accessible to experimental as well as computational biologists. To address this, we have examined the application of sequence similarity networks for visualizing functional trends across protein superfamilies from the context of sequence similarity. Using three large groups of homologous proteins of varying types of structural and functional diversity--GPCRs and kinases from humans, and the crotonase superfamily of enzymes--we show that overlaying networks with orthogonal information is a powerful approach for observing functional themes and revealing outliers. In comparison to other primary methods, networks provide both a good representation of group-wise sequence similarity relationships and a strong visual and quantitative correlation with phylogenetic trees, while enabling analysis and visualization of much larger sets of sequences than trees or multiple sequence alignments can easily accommodate. We also define important limitations and caveats in the application of these networks. As a broadly accessible and effective tool for the exploration of protein superfamilies, sequence similarity networks show great potential for generating testable hypotheses about protein structure-function relationships.

  4. K2 and K2*: efficient alignment-free sequence similarity measurement based on Kendall statistics.

    Science.gov (United States)

    Lin, Jie; Adjeroh, Donald A; Jiang, Bing-Hua; Jiang, Yue

    2018-05-15

    Alignment-free sequence comparison methods can compute the pairwise similarity between a huge number of sequences much faster than sequence-alignment based methods. We propose a new non-parametric alignment-free sequence comparison method, called K2, based on the Kendall statistics. Comparing to the other state-of-the-art alignment-free comparison methods, K2 demonstrates competitive performance in generating the phylogenetic tree, in evaluating functionally related regulatory sequences, and in computing the edit distance (similarity/dissimilarity) between sequences. Furthermore, the K2 approach is much faster than the other methods. An improved method, K2*, is also proposed, which is able to determine the appropriate algorithmic parameter (length) automatically, without first considering different values. Comparative analysis with the state-of-the-art alignment-free sequence similarity methods demonstrates the superiority of the proposed approaches, especially with increasing sequence length, or increasing dataset sizes. The K2 and K2* approaches are implemented in the R language as a package and is freely available for open access (http://community.wvu.edu/daadjeroh/projects/K2/K2_1.0.tar.gz). yueljiang@163.com. Supplementary data are available at Bioinformatics online.

  5. Prediction of Protein Structural Classes for Low-Similarity Sequences Based on Consensus Sequence and Segmented PSSM

    Directory of Open Access Journals (Sweden)

    Yunyun Liang

    2015-01-01

    Full Text Available Prediction of protein structural classes for low-similarity sequences is useful for understanding fold patterns, regulation, functions, and interactions of proteins. It is well known that feature extraction is significant to prediction of protein structural class and it mainly uses protein primary sequence, predicted secondary structure sequence, and position-specific scoring matrix (PSSM. Currently, prediction solely based on the PSSM has played a key role in improving the prediction accuracy. In this paper, we propose a novel method called CSP-SegPseP-SegACP by fusing consensus sequence (CS, segmented PsePSSM, and segmented autocovariance transformation (ACT based on PSSM. Three widely used low-similarity datasets (1189, 25PDB, and 640 are adopted in this paper. Then a 700-dimensional (700D feature vector is constructed and the dimension is decreased to 224D by using principal component analysis (PCA. To verify the performance of our method, rigorous jackknife cross-validation tests are performed on 1189, 25PDB, and 640 datasets. Comparison of our results with the existing PSSM-based methods demonstrates that our method achieves the favorable and competitive performance. This will offer an important complementary to other PSSM-based methods for prediction of protein structural classes for low-similarity sequences.

  6. Adhesive proteins of stalked and acorn barnacles display homology with low sequence similarities.

    Directory of Open Access Journals (Sweden)

    Jaimie-Leigh Jonker

    Full Text Available Barnacle adhesion underwater is an important phenomenon to understand for the prevention of biofouling and potential biotechnological innovations, yet so far, identifying what makes barnacle glue proteins 'sticky' has proved elusive. Examination of a broad range of species within the barnacles may be instructive to identify conserved adhesive domains. We add to extensive information from the acorn barnacles (order Sessilia by providing the first protein analysis of a stalked barnacle adhesive, Lepas anatifera (order Lepadiformes. It was possible to separate the L. anatifera adhesive into at least 10 protein bands using SDS-PAGE. Intense bands were present at approximately 30, 70, 90 and 110 kilodaltons (kDa. Mass spectrometry for protein identification was followed by de novo sequencing which detected 52 peptides of 7-16 amino acids in length. None of the peptides matched published or unpublished transcriptome sequences, but some amino acid sequence similarity was apparent between L. anatifera and closely-related Dosima fascicularis. Antibodies against two acorn barnacle proteins (ab-cp-52k and ab-cp-68k showed cross-reactivity in the adhesive glands of L. anatifera. We also analysed the similarity of adhesive proteins across several barnacle taxa, including Pollicipes pollicipes (a stalked barnacle in the order Scalpelliformes. Sequence alignment of published expressed sequence tags clearly indicated that P. pollicipes possesses homologues for the 19 kDa and 100 kDa proteins in acorn barnacles. Homology aside, sequence similarity in amino acid and gene sequences tended to decline as taxonomic distance increased, with minimum similarities of 18-26%, depending on the gene. The results indicate that some adhesive proteins (e.g. 100 kDa are more conserved within barnacles than others (20 kDa.

  7. Detecting atypical examples of known domain types by sequence similarity searching: the SBASE domain library approach.

    Science.gov (United States)

    Dhir, Somdutta; Pacurar, Mircea; Franklin, Dino; Gáspári, Zoltán; Kertész-Farkas, Attila; Kocsor, András; Eisenhaber, Frank; Pongor, Sándor

    2010-11-01

    SBASE is a project initiated to detect known domain types and predicting domain architectures using sequence similarity searching (Simon et al., Protein Seq Data Anal, 5: 39-42, 1992, Pongor et al, Nucl. Acids. Res. 21:3111-3115, 1992). The current approach uses a curated collection of domain sequences - the SBASE domain library - and standard similarity search algorithms, followed by postprocessing which is based on a simple statistics of the domain similarity network (http://hydra.icgeb.trieste.it/sbase/). It is especially useful in detecting rare, atypical examples of known domain types which are sometimes missed even by more sophisticated methodologies. This approach does not require multiple alignment or machine learning techniques, and can be a useful complement to other domain detection methodologies. This article gives an overview of the project history as well as of the concepts and principles developed within this the project.

  8. Construction of a phylogenetic tree of photosynthetic prokaryotes based on average similarities of whole genome sequences.

    Directory of Open Access Journals (Sweden)

    Soichirou Satoh

    Full Text Available Phylogenetic trees have been constructed for a wide range of organisms using gene sequence information, especially through the identification of orthologous genes that have been vertically inherited. The number of available complete genome sequences is rapidly increasing, and many tools for construction of genome trees based on whole genome sequences have been proposed. However, development of a reasonable method of using complete genome sequences for construction of phylogenetic trees has not been established. We have developed a method for construction of phylogenetic trees based on the average sequence similarities of whole genome sequences. We used this method to examine the phylogeny of 115 photosynthetic prokaryotes, i.e., cyanobacteria, Chlorobi, proteobacteria, Chloroflexi, Firmicutes and nonphotosynthetic organisms including Archaea. Although the bootstrap values for the branching order of phyla were low, probably due to lateral gene transfer and saturated mutation, the obtained tree was largely consistent with the previously reported phylogenetic trees, indicating that this method is a robust alternative to traditional phylogenetic methods.

  9. Similarity and self-similarity in high energy density physics: application to laboratory astrophysics

    International Nuclear Information System (INIS)

    Falize, E.

    2008-10-01

    The spectacular recent development of powerful facilities allows the astrophysical community to explore, in laboratory, astrophysical phenomena where radiation and matter are strongly coupled. The titles of the nine chapters of the thesis are: from high energy density physics to laboratory astrophysics; Lie groups, invariance and self-similarity; scaling laws and similarity properties in High-Energy-Density physics; the Burgan-Feix-Munier transformation; dynamics of polytropic gases; stationary radiating shocks and the POLAR project; structure, dynamics and stability of optically thin fluids; from young star jets to laboratory jets; modelling and experiences for laboratory jets

  10. On the Power and Limits of Sequence Similarity Based Clustering of Proteins Into Families

    DEFF Research Database (Denmark)

    Wiwie, Christian; Röttger, Richard

    2017-01-01

    Over the last decades, we have observed an ongoing tremendous growth of available sequencing data fueled by the advancements in wet-lab technology. The sequencing information is only the beginning of the actual understanding of how organisms survive and prosper. It is, for instance, equally...... important to also unravel the proteomic repertoire of an organism. A classical computational approach for detecting protein families is a sequence-based similarity calculation coupled with a subsequent cluster analysis. In this work we have intensively analyzed various clustering tools on a large scale. We...... used the data to investigate the behavior of the tools' parameters underlining the diversity of the protein families. Furthermore, we trained regression models for predicting the expected performance of a clustering tool for an unknown data set and aimed to also suggest optimal parameters...

  11. Density-based retrieval from high-similarity image databases

    DEFF Research Database (Denmark)

    Hansen, Michael Edberg; Carstensen, Jens Michael

    2004-01-01

    Many image classification problems can fruitfully be thought of as image retrieval in a "high similarity image database" (HSID) characterized by being tuned towards a specific application and having a high degree of visual similarity between entries that should be distinguished. We introduce a me...

  12. Efficient estimation for high similarities using odd sketches

    DEFF Research Database (Denmark)

    Mitzenmacher, Michael; Pagh, Rasmus; Pham, Ninh Dang

    2014-01-01

    . This means that Odd Sketches provide a highly space-efficient estimator for sets of high similarity, which is relevant in applications such as web duplicate detection, collaborative filtering, and association rule learning. The method extends to weighted Jaccard similarity, relevant e.g. for TF-IDF vector...... and web duplicate detection tasks....

  13. Enzyme sequence similarity improves the reaction alignment method for cross-species pathway comparison

    Energy Technology Data Exchange (ETDEWEB)

    Ovacik, Meric A. [Chemical and Biochemical Engineering Department, Rutgers University, Piscataway, NJ 08854 (United States); Androulakis, Ioannis P., E-mail: yannis@rci.rutgers.edu [Chemical and Biochemical Engineering Department, Rutgers University, Piscataway, NJ 08854 (United States); Biomedical Engineering Department, Rutgers University, Piscataway, NJ 08854 (United States)

    2013-09-15

    Pathway-based information has become an important source of information for both establishing evolutionary relationships and understanding the mode of action of a chemical or pharmaceutical among species. Cross-species comparison of pathways can address two broad questions: comparison in order to inform evolutionary relationships and to extrapolate species differences used in a number of different applications including drug and toxicity testing. Cross-species comparison of metabolic pathways is complex as there are multiple features of a pathway that can be modeled and compared. Among the various methods that have been proposed, reaction alignment has emerged as the most successful at predicting phylogenetic relationships based on NCBI taxonomy. We propose an improvement of the reaction alignment method by accounting for sequence similarity in addition to reaction alignment method. Using nine species, including human and some model organisms and test species, we evaluate the standard and improved comparison methods by analyzing glycolysis and citrate cycle pathways conservation. In addition, we demonstrate how organism comparison can be conducted by accounting for the cumulative information retrieved from nine pathways in central metabolism as well as a more complete study involving 36 pathways common in all nine species. Our results indicate that reaction alignment with enzyme sequence similarity results in a more accurate representation of pathway specific cross-species similarities and differences based on NCBI taxonomy.

  14. Enzyme sequence similarity improves the reaction alignment method for cross-species pathway comparison

    International Nuclear Information System (INIS)

    Ovacik, Meric A.; Androulakis, Ioannis P.

    2013-01-01

    Pathway-based information has become an important source of information for both establishing evolutionary relationships and understanding the mode of action of a chemical or pharmaceutical among species. Cross-species comparison of pathways can address two broad questions: comparison in order to inform evolutionary relationships and to extrapolate species differences used in a number of different applications including drug and toxicity testing. Cross-species comparison of metabolic pathways is complex as there are multiple features of a pathway that can be modeled and compared. Among the various methods that have been proposed, reaction alignment has emerged as the most successful at predicting phylogenetic relationships based on NCBI taxonomy. We propose an improvement of the reaction alignment method by accounting for sequence similarity in addition to reaction alignment method. Using nine species, including human and some model organisms and test species, we evaluate the standard and improved comparison methods by analyzing glycolysis and citrate cycle pathways conservation. In addition, we demonstrate how organism comparison can be conducted by accounting for the cumulative information retrieved from nine pathways in central metabolism as well as a more complete study involving 36 pathways common in all nine species. Our results indicate that reaction alignment with enzyme sequence similarity results in a more accurate representation of pathway specific cross-species similarities and differences based on NCBI taxonomy

  15. Ultra-fast sequence clustering from similarity networks with SiLiX

    Directory of Open Access Journals (Sweden)

    Duret Laurent

    2011-04-01

    Full Text Available Abstract Background The number of gene sequences that are available for comparative genomics approaches is increasing extremely quickly. A current challenge is to be able to handle this huge amount of sequences in order to build families of homologous sequences in a reasonable time. Results We present the software package SiLiX that implements a novel method which reconsiders single linkage clustering with a graph theoretical approach. A parallel version of the algorithms is also presented. As a demonstration of the ability of our software, we clustered more than 3 millions sequences from about 2 billion BLAST hits in 7 minutes, with a high clustering quality, both in terms of sensitivity and specificity. Conclusions Comparing state-of-the-art software, SiLiX presents the best up-to-date capabilities to face the problem of clustering large collections of sequences. SiLiX is freely available at http://lbbe.univ-lyon1.fr/SiLiX.

  16. Protein sequences and redox titrations indicate that the electron acceptors in reaction centers from heliobacteria are similar to Photosystem I

    Science.gov (United States)

    Trost, J. T.; Brune, D. C.; Blankenship, R. E.

    1992-01-01

    Photosynthetic reaction centers isolated from Heliobacillus mobilis exhibit a single major protein on SDS-PAGE of 47 000 Mr. Attempts to sequence the reaction center polypeptide indicated that the N-terminus is blocked. After enzymatic and chemical cleavage, four peptide fragments were sequenced from the Heliobacillus mobilis apoprotein. Only one of these sequences showed significant specific similarity to any of the protein and deduced protein sequences in the GenBank data base. This fragment is identical with 56% of the residues, including both cysteines, found in highly conserved region that is proposed to bind iron-sulfur center Fx in the Photosystem I reaction center peptide that is the psaB gene product. The similarity to the psaA gene product in this region is 48%. Redox titrations of laser-flash-induced photobleaching with millisecond decay kinetics on isolated reaction centers from Heliobacterium gestii indicate a midpoint potential of -414 mV with n = 2 titration behavior. In membranes, the behavior is intermediate between n = 1 and n = 2, and the apparent midpoint potential is -444 mV. This is compared to the behavior in Photosystem I, where the intermediate electron acceptor A1, thought to be a phylloquinone molecule, has been proposed to undergo a double reduction at low redox potentials in the presence of viologen redox mediators. These results strongly suggest that the acceptor side electron transfer system in reaction centers from heliobacteria is indeed analogous to that found in Photosystem I. The sequence similarities indicate that the divergence of the heliobacteria from the Photosystem I line occurred before the gene duplication and subsequent divergence that lead to the heterodimeric protein core of the Photosystem I reaction center.

  17. Statistical potential-based amino acid similarity matrices for aligning distantly related protein sequences.

    Science.gov (United States)

    Tan, Yen Hock; Huang, He; Kihara, Daisuke

    2006-08-15

    Aligning distantly related protein sequences is a long-standing problem in bioinformatics, and a key for successful protein structure prediction. Its importance is increasing recently in the context of structural genomics projects because more and more experimentally solved structures are available as templates for protein structure modeling. Toward this end, recent structure prediction methods employ profile-profile alignments, and various ways of aligning two profiles have been developed. More fundamentally, a better amino acid similarity matrix can improve a profile itself; thereby resulting in more accurate profile-profile alignments. Here we have developed novel amino acid similarity matrices from knowledge-based amino acid contact potentials. Contact potentials are used because the contact propensity to the other amino acids would be one of the most conserved features of each position of a protein structure. The derived amino acid similarity matrices are tested on benchmark alignments at three different levels, namely, the family, the superfamily, and the fold level. Compared to BLOSUM45 and the other existing matrices, the contact potential-based matrices perform comparably in the family level alignments, but clearly outperform in the fold level alignments. The contact potential-based matrices perform even better when suboptimal alignments are considered. Comparing the matrices themselves with each other revealed that the contact potential-based matrices are very different from BLOSUM45 and the other matrices, indicating that they are located in a different basin in the amino acid similarity matrix space.

  18. Compression-based classification of biological sequences and structures via the Universal Similarity Metric: experimental assessment

    Directory of Open Access Journals (Sweden)

    Manzini Giovanni

    2007-07-01

    Full Text Available Abstract Background Similarity of sequences is a key mathematical notion for Classification and Phylogenetic studies in Biology. It is currently primarily handled using alignments. However, the alignment methods seem inadequate for post-genomic studies since they do not scale well with data set size and they seem to be confined only to genomic and proteomic sequences. Therefore, alignment-free similarity measures are actively pursued. Among those, USM (Universal Similarity Metric has gained prominence. It is based on the deep theory of Kolmogorov Complexity and universality is its most novel striking feature. Since it can only be approximated via data compression, USM is a methodology rather than a formula quantifying the similarity of two strings. Three approximations of USM are available, namely UCD (Universal Compression Dissimilarity, NCD (Normalized Compression Dissimilarity and CD (Compression Dissimilarity. Their applicability and robustness is tested on various data sets yielding a first massive quantitative estimate that the USM methodology and its approximations are of value. Despite the rich theory developed around USM, its experimental assessment has limitations: only a few data compressors have been tested in conjunction with USM and mostly at a qualitative level, no comparison among UCD, NCD and CD is available and no comparison of USM with existing methods, both based on alignments and not, seems to be available. Results We experimentally test the USM methodology by using 25 compressors, all three of its known approximations and six data sets of relevance to Molecular Biology. This offers the first systematic and quantitative experimental assessment of this methodology, that naturally complements the many theoretical and the preliminary experimental results available. Moreover, we compare the USM methodology both with methods based on alignments and not. We may group our experiments into two sets. The first one, performed via ROC

  19. Compression-based classification of biological sequences and structures via the Universal Similarity Metric: experimental assessment.

    Science.gov (United States)

    Ferragina, Paolo; Giancarlo, Raffaele; Greco, Valentina; Manzini, Giovanni; Valiente, Gabriel

    2007-07-13

    Similarity of sequences is a key mathematical notion for Classification and Phylogenetic studies in Biology. It is currently primarily handled using alignments. However, the alignment methods seem inadequate for post-genomic studies since they do not scale well with data set size and they seem to be confined only to genomic and proteomic sequences. Therefore, alignment-free similarity measures are actively pursued. Among those, USM (Universal Similarity Metric) has gained prominence. It is based on the deep theory of Kolmogorov Complexity and universality is its most novel striking feature. Since it can only be approximated via data compression, USM is a methodology rather than a formula quantifying the similarity of two strings. Three approximations of USM are available, namely UCD (Universal Compression Dissimilarity), NCD (Normalized Compression Dissimilarity) and CD (Compression Dissimilarity). Their applicability and robustness is tested on various data sets yielding a first massive quantitative estimate that the USM methodology and its approximations are of value. Despite the rich theory developed around USM, its experimental assessment has limitations: only a few data compressors have been tested in conjunction with USM and mostly at a qualitative level, no comparison among UCD, NCD and CD is available and no comparison of USM with existing methods, both based on alignments and not, seems to be available. We experimentally test the USM methodology by using 25 compressors, all three of its known approximations and six data sets of relevance to Molecular Biology. This offers the first systematic and quantitative experimental assessment of this methodology, that naturally complements the many theoretical and the preliminary experimental results available. Moreover, we compare the USM methodology both with methods based on alignments and not. We may group our experiments into two sets. The first one, performed via ROC (Receiver Operating Curve) analysis, aims at

  20. HBLAST: Parallelised sequence similarity--A Hadoop MapReducable basic local alignment search tool.

    Science.gov (United States)

    O'Driscoll, Aisling; Belogrudov, Vladislav; Carroll, John; Kropp, Kai; Walsh, Paul; Ghazal, Peter; Sleator, Roy D

    2015-04-01

    The recent exponential growth of genomic databases has resulted in the common task of sequence alignment becoming one of the major bottlenecks in the field of computational biology. It is typical for these large datasets and complex computations to require cost prohibitive High Performance Computing (HPC) to function. As such, parallelised solutions have been proposed but many exhibit scalability limitations and are incapable of effectively processing "Big Data" - the name attributed to datasets that are extremely large, complex and require rapid processing. The Hadoop framework, comprised of distributed storage and a parallelised programming framework known as MapReduce, is specifically designed to work with such datasets but it is not trivial to efficiently redesign and implement bioinformatics algorithms according to this paradigm. The parallelisation strategy of "divide and conquer" for alignment algorithms can be applied to both data sets and input query sequences. However, scalability is still an issue due to memory constraints or large databases, with very large database segmentation leading to additional performance decline. Herein, we present Hadoop Blast (HBlast), a parallelised BLAST algorithm that proposes a flexible method to partition both databases and input query sequences using "virtual partitioning". HBlast presents improved scalability over existing solutions and well balanced computational work load while keeping database segmentation and recompilation to a minimum. Enhanced BLAST search performance on cheap memory constrained hardware has significant implications for in field clinical diagnostic testing; enabling faster and more accurate identification of pathogenic DNA in human blood or tissue samples. Copyright © 2015 Elsevier Inc. All rights reserved.

  1. High resolution sequence stratigraphy in China

    International Nuclear Information System (INIS)

    Zhang Shangfeng; Zhang Changmin; Yin Yanshi; Yin Taiju

    2008-01-01

    Since high resolution sequence stratigraphy was introduced into China by DENG Hong-wen in 1995, it has been experienced two development stages in China which are the beginning stage of theory research and development of theory research and application, and the stage of theoretical maturity and widely application that is going into. It is proved by practices that high resolution sequence stratigraphy plays more and more important roles in the exploration and development of oil and gas in Chinese continental oil-bearing basin and the research field spreads to the exploration of coal mine, uranium mine and other strata deposits. However, the theory of high resolution sequence stratigraphy still has some shortages, it should be improved in many aspects. The authors point out that high resolution sequence stratigraphy should be characterized quantitatively and modelized by computer techniques. (authors)

  2. Scaling Relations of Local Magnitude versus Moment Magnitude for Sequences of Similar Earthquakes in Switzerland

    KAUST Repository

    Bethmann, F.

    2011-03-22

    Theoretical considerations and empirical regressions show that, in the magnitude range between 3 and 5, local magnitude, ML, and moment magnitude, Mw, scale 1:1. Previous studies suggest that for smaller magnitudes this 1:1 scaling breaks down. However, the scatter between ML and Mw at small magnitudes is usually large and the resulting scaling relations are therefore uncertain. In an attempt to reduce these uncertainties, we first analyze the ML versus Mw relation based on 195 events, induced by the stimulation of a geothermal reservoir below the city of Basel, Switzerland. Values of ML range from 0.7 to 3.4. From these data we derive a scaling of ML ~ 1:5Mw over the given magnitude range. We then compare peak Wood-Anderson amplitudes to the low-frequency plateau of the displacement spectra for six sequences of similar earthquakes in Switzerland in the range of 0:5 ≤ ML ≤ 4:1. Because effects due to the radiation pattern and to the propagation path between source and receiver are nearly identical at a particular station for all events in a given sequence, the scatter in the data is substantially reduced. Again we obtain a scaling equivalent to ML ~ 1:5Mw. Based on simulations using synthetic source time functions for different magnitudes and Q values estimated from spectral ratios between downhole and surface recordings, we conclude that the observed scaling can be explained by attenuation and scattering along the path. Other effects that could explain the observed magnitude scaling, such as a possible systematic increase of stress drop or rupture velocity with moment magnitude, are masked by attenuation along the path.

  3. Defining reference sequences for Nocardia species by similarity and clustering analyses of 16S rRNA gene sequence data.

    Directory of Open Access Journals (Sweden)

    Manal Helal

    Full Text Available BACKGROUND: The intra- and inter-species genetic diversity of bacteria and the absence of 'reference', or the most representative, sequences of individual species present a significant challenge for sequence-based identification. The aims of this study were to determine the utility, and compare the performance of several clustering and classification algorithms to identify the species of 364 sequences of 16S rRNA gene with a defined species in GenBank, and 110 sequences of 16S rRNA gene with no defined species, all within the genus Nocardia. METHODS: A total of 364 16S rRNA gene sequences of Nocardia species were studied. In addition, 110 16S rRNA gene sequences assigned only to the Nocardia genus level at the time of submission to GenBank were used for machine learning classification experiments. Different clustering algorithms were compared with a novel algorithm or the linear mapping (LM of the distance matrix. Principal Components Analysis was used for the dimensionality reduction and visualization. RESULTS: The LM algorithm achieved the highest performance and classified the set of 364 16S rRNA sequences into 80 clusters, the majority of which (83.52% corresponded with the original species. The most representative 16S rRNA sequences for individual Nocardia species have been identified as 'centroids' in respective clusters from which the distances to all other sequences were minimized; 110 16S rRNA gene sequences with identifications recorded only at the genus level were classified using machine learning methods. Simple kNN machine learning demonstrated the highest performance and classified Nocardia species sequences with an accuracy of 92.7% and a mean frequency of 0.578. CONCLUSION: The identification of centroids of 16S rRNA gene sequence clusters using novel distance matrix clustering enables the identification of the most representative sequences for each individual species of Nocardia and allows the quantitation of inter- and intra

  4. A Novel Phytase with Sequence Similarity to Purple Acid Phosphatases Is Expressed in Cotyledons of Germinating Soybean Seedlings 1

    Science.gov (United States)

    Hegeman, Carla E.; Grabau, Elizabeth A.

    2001-01-01

    Phytic acid (myo-inositol hexakisphosphate) is the major storage form of phosphorus in plant seeds. During germination, stored reserves are used as a source of nutrients by the plant seedling. Phytic acid is degraded by the activity of phytases to yield inositol and free phosphate. Due to the lack of phytases in the non-ruminant digestive tract, monogastric animals cannot utilize dietary phytic acid and it is excreted into manure. High phytic acid content in manure results in elevated phosphorus levels in soil and water and accompanying environmental concerns. The use of phytases to degrade seed phytic acid has potential for reducing the negative environmental impact of livestock production. A phytase was purified to electrophoretic homogeneity from cotyledons of germinated soybeans (Glycine max L. Merr.). Peptide sequence data generated from the purified enzyme facilitated the cloning of the phytase sequence (GmPhy) employing a polymerase chain reaction strategy. The introduction of GmPhy into soybean tissue culture resulted in increased phytase activity in transformed cells, which confirmed the identity of the phytase gene. It is surprising that the soybean phytase was unrelated to previously characterized microbial or maize (Zea mays) phytases, which were classified as histidine acid phosphatases. The soybean phytase sequence exhibited a high degree of similarity to purple acid phosphatases, a class of metallophosphoesterases. PMID:11500558

  5. The genomic sequence of cowpea aphid-borne mosaic virus and its similarities with other potyviruses

    NARCIS (Netherlands)

    Mlotshwa, S.; Verver, J.; Sithole-Niang, I.; Kampen, van T.; Kammen, van A.; Wellink, J.

    2002-01-01

    The genomic sequence of a Zimbabwe isolate of Cowpea aphid-borne mosaic virus (CABMV-Z) was determined by sequencing overlapping viral cDNA clones generated by RT-PCR using degenerate and/or specific primers. The sequence is 9465 nucleotides in length excluding the 3' terminal poly (A) tail and

  6. When high similarity copycats lose and moderate similarity copycats gain: The impact of comparative evaluation

    NARCIS (Netherlands)

    Van Horen, F.; Pieters, R.

    2012-01-01

    Copycats imitate features of leading brands to free ride on their equity. The prevailing belief is that the more similar copycats are to the leader brand, the more positive their evaluation is, and thus the more they free ride. Three studies demonstrate when the reverse holds true:

  7. When high similarity copycats lose and moderate similarity copycats gain : The impact of comparative evaluation

    NARCIS (Netherlands)

    van Horen, F.; Pieters, R.

    2012-01-01

    Copycats imitate features of leading brands to free ride on their equity. The prevailing belief is that the more similar copycats are to the leader brand, the more positive their evaluation is, and thus the more they free ride. Three studies demonstrate when the reverse holds true:

  8. CLONING AND SEQUENCING OF THE GENE FOR A LACTOCOCCAL ENDOPEPTIDASE, AN ENZYME WITH SEQUENCE SIMILARITY TO MAMMALIAN ENKEPHALINASE

    NARCIS (Netherlands)

    Mierau, Igor; Tan, Paris S.T.; Haandrikman, Alfred J.; Kok, Jan; Leenhouts, Kees J.; Konings, Wil N.; Venema, Gerard

    The gene specifying an endopeptidase of Lactococcus lactis, named pepO, was cloned from a genomic library of L. lactis subsp. cremoris P8-247 in lambdaEMBL3 and was subsequently sequenced. pepO is probably the last gene of an operon encoding the binding-protein-dependent oligopeptide transport

  9. Phylogenetic similarity of the canine parvovirus wild-type isolates on the basis of VP1/VP2 gene fragment sequence analysis.

    Science.gov (United States)

    Rypul, K; Chmielewski, R; Smielewska-Loś, E; Klimentowski, S

    2002-04-01

    Biological material was taken from dogs with diarrhoea. Faecal samples were taken from within live animals and intestinal tract fragments (i.e. small intestine, and stomach) were taken from dead animals. In total, 18 specimens were investigated from dogs housed alone or in large groups. To test for the presence of the virus, latex (On Site Biotech, Uppsala, Sweden) and direct immunofluorescence tests were performed. At the same time, polymerase chain reaction (PCR) with primers complementary to a conservative region of VP1/VP2 was carried out. The products of amplification were analysed on 2% agarose gel. The purified products were cloned with the Template Generation System (Finnzymes, Espoo, Finland) using a transposition reaction and positive clones were searched using the 'colony screening by PCR' method. The sequencing gave 12 sequences of VP1/VP2 gene fragments that were of high similarity. Among the 12 analysed sequences, six exhibited 88% similarity, four exhibited 100% similarity and two exhibited 71% similarity.

  10. Simultaneous identification of long similar substrings in large sets of sequences

    Directory of Open Access Journals (Sweden)

    Wittig Burghardt

    2007-05-01

    Full Text Available Abstract Background Sequence comparison faces new challenges today, with many complete genomes and large libraries of transcripts known. Gene annotation pipelines match these sequences in order to identify genes and their alternative splice forms. However, the software currently available cannot simultaneously compare sets of sequences as large as necessary especially if errors must be considered. Results We therefore present a new algorithm for the identification of almost perfectly matching substrings in very large sets of sequences. Its implementation, called ClustDB, is considerably faster and can handle 16 times more data than VMATCH, the most memory efficient exact program known today. ClustDB simultaneously generates large sets of exactly matching substrings of a given minimum length as seeds for a novel method of match extension with errors. It generates alignments of maximum length with a considered maximum number of errors within each overlapping window of a given size. Such alignments are not optimal in the usual sense but faster to calculate and often more appropriate than traditional alignments for genomic sequence comparisons, EST and full-length cDNA matching, and genomic sequence assembly. The method is used to check the overlaps and to reveal possible assembly errors for 1377 Medicago truncatula BAC-size sequences published at http://www.medicago.org/genome/assembly_table.php?chr=1. Conclusion The program ClustDB proves that window alignment is an efficient way to find long sequence sections of homogenous alignment quality, as expected in case of random errors, and to detect systematic errors resulting from sequence contaminations. Such inserts are systematically overlooked in long alignments controlled by only tuning penalties for mismatches and gaps. ClustDB is freely available for academic use.

  11. On universal common ancestry, sequence similarity, and phylogenetic structure: the sins of P-values and the virtues of Bayesian evidence

    Directory of Open Access Journals (Sweden)

    Theobald Douglas L

    2011-11-01

    Full Text Available Abstract Background The universal common ancestry (UCA of all known life is a fundamental component of modern evolutionary theory, supported by a wide range of qualitative molecular evidence. Nevertheless, recently both the status and nature of UCA has been questioned. In earlier work I presented a formal, quantitative test of UCA in which model selection criteria overwhelmingly choose common ancestry over independent ancestry, based on a dataset of universally conserved proteins. These model-based tests are founded in likelihoodist and Bayesian probability theory, in opposition to classical frequentist null hypothesis tests such as Karlin-Altschul E-values for sequence similarity. In a recent comment, Koonin and Wolf (K&W claim that the model preference for UCA is "a trivial consequence of significant sequence similarity". They support this claim with a computational simulation, derived from universally conserved proteins, which produces similar sequences lacking phylogenetic structure. The model selection tests prefer common ancestry for this artificial data set. Results For the real universal protein sequences, hierarchical phylogenetic structure (induced by genealogical history is the overriding reason for why the tests choose UCA; sequence similarity is a relatively minor factor. First, for cases of conflicting phylogenetic structure, the tests choose independent ancestry even with highly similar sequences. Second, certain models, like star trees and K&W's profile model (corresponding to their simulation, readily explain sequence similarity yet lack phylogenetic structure. However, these are extremely poor models for the real proteins, even worse than independent ancestry models, though they explain K&W's artificial data well. Finally, K&W's simulation is an implementation of a well-known phylogenetic model, and it produces sequences that mimic homologous proteins. Therefore the model selection tests work appropriately with the artificial

  12. Similar Representations of Sequence Knowledge in Young and Older Adults: A Study of Effector Independent Transfer

    NARCIS (Netherlands)

    Barnhoorn, Jonathan Sebastiaan; Döhring, Falko R.; van Asseldonk, Edwin H.F.; Verwey, Willem B.

    2016-01-01

    Older adults show reduced motor performance and changes in motor skill development. To better understand these changes, we studied differences in sequence knowledge representations between young and older adults using a transfer task. Transfer, or the ability to apply motor skills flexibly, is

  13. Evidence for Deep Regulatory Similarities in Early Developmental Programs across Highly Diverged Insects

    Science.gov (United States)

    Zhang, Yinan; Samee, Md. Abul Hassan; Halfon, Marc S.; Sinha, Saurabh

    2014-01-01

    Many genes familiar from Drosophila development, such as the so-called gap, pair-rule, and segment polarity genes, play important roles in the development of other insects and in many cases appear to be deployed in a similar fashion, despite the fact that Drosophila-like “long germband” development is highly derived and confined to a subset of insect families. Whether or not these similarities extend to the regulatory level is unknown. Identification of regulatory regions beyond the well-studied Drosophila has been challenging as even within the Diptera (flies, including mosquitoes) regulatory sequences have diverged past the point of recognition by standard alignment methods. Here, we demonstrate that methods we previously developed for computational cis-regulatory module (CRM) discovery in Drosophila can be used effectively in highly diverged (250–350 Myr) insect species including Anopheles gambiae, Tribolium castaneum, Apis mellifera, and Nasonia vitripennis. In Drosophila, we have successfully used small sets of known CRMs as “training data” to guide the search for other CRMs with related function. We show here that although species-specific CRM training data do not exist, training sets from Drosophila can facilitate CRM discovery in diverged insects. We validate in vivo over a dozen new CRMs, roughly doubling the number of known CRMs in the four non-Drosophila species. Given the growing wealth of Drosophila CRM annotation, these results suggest that extensive regulatory sequence annotation will be possible in newly sequenced insects without recourse to costly and labor-intensive genome-scale experiments. We develop a new method, Regulus, which computes a probabilistic score of similarity based on binding site composition (despite the absence of nucleotide-level sequence alignment), and demonstrate similarity between functionally related CRMs from orthologous loci. Our work represents an important step toward being able to trace the evolutionary

  14. A behavioral similarity measure between labeled Petri nets based on principal transition sequences

    NARCIS (Netherlands)

    Wang, J.; He, T.; Wen, L.; Wu, N.; Hofstede, ter A.H.M.; Su, J.; Meersman, R.; Dillon, T.S.; Herrero, P.

    2010-01-01

    Being able to determine the degree of similarity between process models is important for management, reuse, and analysis of business process models. In this paper we propose a novel method to determine the degree of similarity between process models, which exploits their semantics. Our approach is

  15. Structural and Sequence Similarities of Hydra Xeroderma Pigmentosum A Protein to Human Homolog Suggest Early Evolution and Conservation

    Directory of Open Access Journals (Sweden)

    Apurva Barve

    2013-01-01

    Full Text Available Xeroderma pigmentosum group A (XPA is a protein that binds to damaged DNA, verifies presence of a lesion, and recruits other proteins of the nucleotide excision repair (NER pathway to the site. Though its homologs from yeast, Drosophila, humans, and so forth are well studied, XPA has not so far been reported from protozoa and lower animal phyla. Hydra is a fresh-water cnidarian with a remarkable capacity for regeneration and apparent lack of organismal ageing. Cnidarians are among the first metazoa with a defined body axis, tissue grade organisation, and nervous system. We report here for the first time presence of XPA gene in hydra. Putative protein sequence of hydra XPA contains nuclear localization signal and bears the zinc-finger motif. It contains two conserved Pfam domains and various characterized features of XPA proteins like regions for binding to excision repair cross-complementing protein-1 (ERCC1 and replication protein A 70 kDa subunit (RPA70 proteins. Hydra XPA shows a high degree of similarity with vertebrate homologs and clusters with deuterostomes in phylogenetic analysis. Homology modelling corroborates the very close similarity between hydra and human XPA. The protein thus most likely functions in hydra in the same manner as in other animals, indicating that it arose early in evolution and has been conserved across animal phyla.

  16. Winnowing DNA for rare sequences: highly specific sequence and methylation based enrichment.

    Directory of Open Access Journals (Sweden)

    Jason D Thompson

    Full Text Available Rare mutations in cell populations are known to be hallmarks of many diseases and cancers. Similarly, differential DNA methylation patterns arise in rare cell populations with diagnostic potential such as fetal cells circulating in maternal blood. Unfortunately, the frequency of alleles with diagnostic potential, relative to wild-type background sequence, is often well below the frequency of errors in currently available methods for sequence analysis, including very high throughput DNA sequencing. We demonstrate a DNA preparation and purification method that through non-linear electrophoretic separation in media containing oligonucleotide probes, achieves 10,000 fold enrichment of target DNA with single nucleotide specificity, and 100 fold enrichment of unmodified methylated DNA differing from the background by the methylation of a single cytosine residue.

  17. Winnowing DNA for rare sequences: highly specific sequence and methylation based enrichment.

    Science.gov (United States)

    Thompson, Jason D; Shibahara, Gosuke; Rajan, Sweta; Pel, Joel; Marziali, Andre

    2012-01-01

    Rare mutations in cell populations are known to be hallmarks of many diseases and cancers. Similarly, differential DNA methylation patterns arise in rare cell populations with diagnostic potential such as fetal cells circulating in maternal blood. Unfortunately, the frequency of alleles with diagnostic potential, relative to wild-type background sequence, is often well below the frequency of errors in currently available methods for sequence analysis, including very high throughput DNA sequencing. We demonstrate a DNA preparation and purification method that through non-linear electrophoretic separation in media containing oligonucleotide probes, achieves 10,000 fold enrichment of target DNA with single nucleotide specificity, and 100 fold enrichment of unmodified methylated DNA differing from the background by the methylation of a single cytosine residue.

  18. Functional enrichment analyses and construction of functional similarity networks with high confidence function prediction by PFP

    Directory of Open Access Journals (Sweden)

    Kihara Daisuke

    2010-05-01

    Full Text Available Abstract Background A new paradigm of biological investigation takes advantage of technologies that produce large high throughput datasets, including genome sequences, interactions of proteins, and gene expression. The ability of biologists to analyze and interpret such data relies on functional annotation of the included proteins, but even in highly characterized organisms many proteins can lack the functional evidence necessary to infer their biological relevance. Results Here we have applied high confidence function predictions from our automated prediction system, PFP, to three genome sequences, Escherichia coli, Saccharomyces cerevisiae, and Plasmodium falciparum (malaria. The number of annotated genes is increased by PFP to over 90% for all of the genomes. Using the large coverage of the function annotation, we introduced the functional similarity networks which represent the functional space of the proteomes. Four different functional similarity networks are constructed for each proteome, one each by considering similarity in a single Gene Ontology (GO category, i.e. Biological Process, Cellular Component, and Molecular Function, and another one by considering overall similarity with the funSim score. The functional similarity networks are shown to have higher modularity than the protein-protein interaction network. Moreover, the funSim score network is distinct from the single GO-score networks by showing a higher clustering degree exponent value and thus has a higher tendency to be hierarchical. In addition, examining function assignments to the protein-protein interaction network and local regions of genomes has identified numerous cases where subnetworks or local regions have functionally coherent proteins. These results will help interpreting interactions of proteins and gene orders in a genome. Several examples of both analyses are highlighted. Conclusion The analyses demonstrate that applying high confidence predictions from PFP

  19. Automated degenerate PCR primer design for high-throughput sequencing improves efficiency of viral sequencing

    Directory of Open Access Journals (Sweden)

    Li Kelvin

    2012-11-01

    Full Text Available Abstract Background In a high-throughput environment, to PCR amplify and sequence a large set of viral isolates from populations that are potentially heterogeneous and continuously evolving, the use of degenerate PCR primers is an important strategy. Degenerate primers allow for the PCR amplification of a wider range of viral isolates with only one set of pre-mixed primers, thus increasing amplification success rates and minimizing the necessity for genome finishing activities. To successfully select a large set of degenerate PCR primers necessary to tile across an entire viral genome and maximize their success, this process is best performed computationally. Results We have developed a fully automated degenerate PCR primer design system that plays a key role in the J. Craig Venter Institute’s (JCVI high-throughput viral sequencing pipeline. A consensus viral genome, or a set of consensus segment sequences in the case of a segmented virus, is specified using IUPAC ambiguity codes in the consensus template sequence to represent the allelic diversity of the target population. PCR primer pairs are then selected computationally to produce a minimal amplicon set capable of tiling across the full length of the specified target region. As part of the tiling process, primer pairs are computationally screened to meet the criteria for successful PCR with one of two described amplification protocols. The actual sequencing success rates for designed primers for measles virus, mumps virus, human parainfluenza virus 1 and 3, human respiratory syncytial virus A and B and human metapneumovirus are described, where >90% of designed primer pairs were able to consistently successfully amplify >75% of the isolates. Conclusions Augmenting our previously developed and published JCVI Primer Design Pipeline, we achieved similarly high sequencing success rates with only minor software modifications. The recommended methodology for the construction of the consensus

  20. Identification of similar regions of protein structures using integrated sequence and structure analysis tools

    Directory of Open Access Journals (Sweden)

    Heiland Randy

    2006-03-01

    Full Text Available Abstract Background Understanding protein function from its structure is a challenging problem. Sequence based approaches for finding homology have broad use for annotation of both structure and function. 3D structural information of protein domains and their interactions provide a complementary view to structure function relationships to sequence information. We have developed a web site http://www.sblest.org/ and an API of web services that enables users to submit protein structures and identify statistically significant neighbors and the underlying structural environments that make that match using a suite of sequence and structure analysis tools. To do this, we have integrated S-BLEST, PSI-BLAST and HMMer based superfamily predictions to give a unique integrated view to prediction of SCOP superfamilies, EC number, and GO term, as well as identification of the protein structural environments that are associated with that prediction. Additionally, we have extended UCSF Chimera and PyMOL to support our web services, so that users can characterize their own proteins of interest. Results Users are able to submit their own queries or use a structure already in the PDB. Currently the databases that a user can query include the popular structural datasets ASTRAL 40 v1.69, ASTRAL 95 v1.69, CLUSTER50, CLUSTER70 and CLUSTER90 and PDBSELECT25. The results can be downloaded directly from the site and include function prediction, analysis of the most conserved environments and automated annotation of query proteins. These results reflect both the hits found with PSI-BLAST, HMMer and with S-BLEST. We have evaluated how well annotation transfer can be performed on SCOP ID's, Gene Ontology (GO ID's and EC Numbers. The method is very efficient and totally automated, generally taking around fifteen minutes for a 400 residue protein. Conclusion With structural genomics initiatives determining structures with little, if any, functional characterization

  1. When Does Between-Sequence Phonological Similarity Promote Irrelevant Sound Disruption?

    Science.gov (United States)

    Marsh, John E.; Vachon, Francois; Jones, Dylan M.

    2008-01-01

    Typically, the phonological similarity between to-be-recalled items and TBI auditory stimuli has no impact if recall in serial order is required. However, in the present study, the authors have shown that the free recall, but not serial recall, of lists of phonologically related to-be-remembered items was disrupted by an irrelevant sound stream…

  2. In vitro identification and in silico utilization of interspecies sequence similarities using GeneChip® technology

    Directory of Open Access Journals (Sweden)

    Ye Shui Q

    2005-05-01

    Full Text Available Abstract Background Genomic approaches in large animal models (canine, ovine etc are challenging due to insufficient genomic information for these species and the lack of availability of corresponding microarray platforms. To address this problem, we speculated that conserved interspecies genetic sequences can be experimentally detected by cross-species hybridization. The Affymetrix platform probe redundancy offers flexibility in selecting individual probes with high sequence similarities between related species for gene expression analysis. Results Gene expression profiles of 40 canine samples were generated using the human HG-U133A GeneChip (U133A. Due to interspecies genetic differences, only 14 ± 2% of canine transcripts were detected by U133A probe sets whereas profiling of 40 human samples detected 49 ± 6% of human transcripts. However, when these probe sets were deconstructed into individual probes and examined performance of each probe, we found that 47% of human probes were able to find their targets in canine tissues and generate a detectable hybridization signal. Therefore, we restricted gene expression analysis to these probes and observed the 60% increase in the number of identified canine transcripts. These results were validated by comparison of transcripts identified by our restricted analysis of cross-species hybridization with transcripts identified by hybridization of total lung canine mRNA to new Affymetrix Canine GeneChip®. Conclusion The experimental identification and restriction of gene expression analysis to probes with detectable hybridization signal drastically increases transcript detection of canine-human hybridization suggesting the possibility of broad utilization of cross-hybridizations of related species using GeneChip technology.

  3. Pulmonary parenchyma segmentation in thin CT image sequences with spectral clustering and geodesic active contour model based on similarity

    Science.gov (United States)

    He, Nana; Zhang, Xiaolong; Zhao, Juanjuan; Zhao, Huilan; Qiang, Yan

    2017-07-01

    While the popular thin layer scanning technology of spiral CT has helped to improve diagnoses of lung diseases, the large volumes of scanning images produced by the technology also dramatically increase the load of physicians in lesion detection. Computer-aided diagnosis techniques like lesions segmentation in thin CT sequences have been developed to address this issue, but it remains a challenge to achieve high segmentation efficiency and accuracy without much involvement of human manual intervention. In this paper, we present our research on automated segmentation of lung parenchyma with an improved geodesic active contour model that is geodesic active contour model based on similarity (GACBS). Combining spectral clustering algorithm based on Nystrom (SCN) with GACBS, this algorithm first extracts key image slices, then uses these slices to generate an initial contour of pulmonary parenchyma of un-segmented slices with an interpolation algorithm, and finally segments lung parenchyma of un-segmented slices. Experimental results show that the segmentation results generated by our method are close to what manual segmentation can produce, with an average volume overlap ratio of 91.48%.

  4. Sequence diversity, cytotoxicity and antigenic similarities of the leukotoxin of isolates of Mannheimia species from mastitis in domestic sheep.

    Science.gov (United States)

    Omaleki, Lida; Browning, Glenn F; Barber, Stuart R; Allen, Joanne L; Srikumaran, Subramaniam; Markham, Philip F

    2014-11-07

    Species within the genus Mannheimia are among the most important causes of ovine mastitis. Isolates of these species can express leukotoxin A (LktA), a primary virulence factor of these bacteria. To examine the significance of variation in the LktA, the sequences of the lktA genes in a panel of isolates from cases of ovine mastitis were compared. The cross-neutralising capacities of rat antisera raised against LktA of one Mannheimia glucosida, one haemolytic Mannheimia ruminalis, and two Mannheimia haemolytica isolates were also examined to assess the effect that variation in the lktA gene can have on protective immunity against leukotoxins with differing sequences. The lktA nucleotide distance between the M. haemolytica isolates was greater than between the M. glucosida isolates, with the M. haemolytica isolates divisible into two groups based on their lktA sequences. Comparison of the topology of phylogenetic trees of 16S rDNA and lktA sequences revealed differences in the relationships between some isolates, suggesting horizontal gene transfer. Cross neutralisation data obtained with monospecific anti-LktA rat sera were used to derive antigenic similarity coefficients for LktA from the four Mannheimia species isolates. Similarity coefficients indicated that LktA of the two M. haemolytica isolates were least similar, while LktA from M. glucosida was most similar to those for one of the M. haemolytica isolates and the haemolytic M. ruminalis isolate. The results suggested that vaccination with the M. glucosida leukotoxin would generate the greatest cross-protection against ovine mastitis caused by Mannheimia species with these alleles. Copyright © 2014 Elsevier B.V. All rights reserved.

  5. An effective approach for annotation of protein families with low sequence similarity and conserved motifs: identifying GDSL hydrolases across the plant kingdom.

    Science.gov (United States)

    Vujaklija, Ivan; Bielen, Ana; Paradžik, Tina; Biđin, Siniša; Goldstein, Pavle; Vujaklija, Dušica

    2016-02-18

    The massive accumulation of protein sequences arising from the rapid development of high-throughput sequencing, coupled with automatic annotation, results in high levels of incorrect annotations. In this study, we describe an approach to decrease annotation errors of protein families characterized by low overall sequence similarity. The GDSL lipolytic family comprises proteins with multifunctional properties and high potential for pharmaceutical and industrial applications. The number of proteins assigned to this family has increased rapidly over the last few years. In particular, the natural abundance of GDSL enzymes reported recently in plants indicates that they could be a good source of novel GDSL enzymes. We noticed that a significant proportion of annotated sequences lack specific GDSL motif(s) or catalytic residue(s). Here, we applied motif-based sequence analyses to identify enzymes possessing conserved GDSL motifs in selected proteomes across the plant kingdom. Motif-based HMM scanning (Viterbi decoding-VD and posterior decoding-PD) and the here described PD/VD protocol were successfully applied on 12 selected plant proteomes to identify sequences with GDSL motifs. A significant number of identified GDSL sequences were novel. Moreover, our scanning approach successfully detected protein sequences lacking at least one of the essential motifs (171/820) annotated by Pfam profile search (PfamA) as GDSL. Based on these analyses we provide a curated list of GDSL enzymes from the selected plants. CLANS clustering and phylogenetic analysis helped us to gain a better insight into the evolutionary relationship of all identified GDSL sequences. Three novel GDSL subfamilies as well as unreported variations in GDSL motifs were discovered in this study. In addition, analyses of selected proteomes showed a remarkable expansion of GDSL enzymes in the lycophyte, Selaginella moellendorffii. Finally, we provide a general motif-HMM scanner which is easily accessible through

  6. Evidence for deep regulatory similarities in early developmental programs across highly diverged insects.

    Science.gov (United States)

    Kazemian, Majid; Suryamohan, Kushal; Chen, Jia-Yu; Zhang, Yinan; Samee, Md Abul Hassan; Halfon, Marc S; Sinha, Saurabh

    2014-09-01

    Many genes familiar from Drosophila development, such as the so-called gap, pair-rule, and segment polarity genes, play important roles in the development of other insects and in many cases appear to be deployed in a similar fashion, despite the fact that Drosophila-like "long germband" development is highly derived and confined to a subset of insect families. Whether or not these similarities extend to the regulatory level is unknown. Identification of regulatory regions beyond the well-studied Drosophila has been challenging as even within the Diptera (flies, including mosquitoes) regulatory sequences have diverged past the point of recognition by standard alignment methods. Here, we demonstrate that methods we previously developed for computational cis-regulatory module (CRM) discovery in Drosophila can be used effectively in highly diverged (250-350 Myr) insect species including Anopheles gambiae, Tribolium castaneum, Apis mellifera, and Nasonia vitripennis. In Drosophila, we have successfully used small sets of known CRMs as "training data" to guide the search for other CRMs with related function. We show here that although species-specific CRM training data do not exist, training sets from Drosophila can facilitate CRM discovery in diverged insects. We validate in vivo over a dozen new CRMs, roughly doubling the number of known CRMs in the four non-Drosophila species. Given the growing wealth of Drosophila CRM annotation, these results suggest that extensive regulatory sequence annotation will be possible in newly sequenced insects without recourse to costly and labor-intensive genome-scale experiments. We develop a new method, Regulus, which computes a probabilistic score of similarity based on binding site composition (despite the absence of nucleotide-level sequence alignment), and demonstrate similarity between functionally related CRMs from orthologous loci. Our work represents an important step toward being able to trace the evolutionary history of gene

  7. Musicians' and nonmusicians' short-term memory for verbal and musical sequences: comparing phonological similarity and pitch proximity.

    Science.gov (United States)

    Williamson, Victoria J; Baddeley, Alan D; Hitch, Graham J

    2010-03-01

    Language-music comparative studies have highlighted the potential for shared resources or neural overlap in auditory short-term memory. However, there is a lack of behavioral methodologies for comparing verbal and musical serial recall. We developed a visual grid response that allowed both musicians and nonmusicians to perform serial recall of letter and tone sequences. The new method was used to compare the phonological similarity effect with the impact of an operationalized musical equivalent-pitch proximity. Over the course of three experiments, we found that short-term memory for tones had several similarities to verbal memory, including limited capacity and a significant effect of pitch proximity in nonmusicians. Despite being vulnerable to phonological similarity when recalling letters, however, musicians showed no effect of pitch proximity, a result that we suggest might reflect strategy differences. Overall, the findings support a limited degree of correspondence in the way that verbal and musical sounds are processed in auditory short-term memory.

  8. StralSV: assessment of sequence variability within similar 3D structures and application to polio RNA-dependent RNA polymerase

    Energy Technology Data Exchange (ETDEWEB)

    Zemla, A; Lang, D; Kostova, T; Andino, R; Zhou, C

    2010-11-29

    Most of the currently used methods for protein function prediction rely on sequence-based comparisons between a query protein and those for which a functional annotation is provided. A serious limitation of sequence similarity-based approaches for identifying residue conservation among proteins is the low confidence in assigning residue-residue correspondences among proteins when the level of sequence identity between the compared proteins is poor. Multiple sequence alignment methods are more satisfactory - still, they cannot provide reliable results at low levels of sequence identity. Our goal in the current work was to develop an algorithm that could overcome these difficulties and facilitate the identification of structurally (and possibly functionally) relevant residue-residue correspondences between compared protein structures. Here we present StralSV, a new algorithm for detecting closely related structure fragments and quantifying residue frequency from tight local structure alignments. We apply StralSV in a study of the RNA-dependent RNA polymerase of poliovirus and demonstrate that the algorithm can be used to determine regions of the protein that are relatively unique or that shared structural similarity with structures that are distantly related. By quantifying residue frequencies among many residue-residue pairs extracted from local alignments, one can infer potential structural or functional importance of specific residues that are determined to be highly conserved or that deviate from a consensus. We further demonstrate that considerable detailed structural and phylogenetic information can be derived from StralSV analyses. StralSV is a new structure-based algorithm for identifying and aligning structure fragments that have similarity to a reference protein. StralSV analysis can be used to quantify residue-residue correspondences and identify residues that may be of particular structural or functional importance, as well as unusual or unexpected

  9. Remarkable sequence similarity between the dinoflagellate-infecting marine girus and the terrestrial pathogen African swine fever virus

    Directory of Open Access Journals (Sweden)

    Claverie Jean-Michel

    2009-10-01

    Full Text Available Abstract Heterocapsa circularisquama DNA virus (HcDNAV; previously designated as HcV is a giant virus (girus with a ~356-kbp double-stranded DNA (dsDNA genome. HcDNAV lytically infects the bivalve-killing marine dinoflagellate H. circularisquama, and currently represents the sole DNA virus isolated from dinoflagellates, one of the most abundant protists in marine ecosystems. Its morphological features, genome type, and host range previously suggested that HcDNAV might be a member of the family Phycodnaviridae of Nucleo-Cytoplasmic Large DNA Viruses (NCLDVs, though no supporting sequence data was available. NCLDVs currently include two families found in aquatic environments (Phycodnaviridae, Mimiviridae, one mostly infecting terrestrial animals (Poxviridae, another isolated from fish, amphibians and insects (Iridoviridae, and the last one (Asfarviridae exclusively represented by the animal pathogen African swine fever virus (ASFV, the agent of a fatal hemorrhagic disease in domestic swine. In this study, we determined the complete sequence of the type B DNA polymerase (PolB gene of HcDNAV. The viral PolB was transcribed at least from 6 h post inoculation (hpi, suggesting its crucial function for viral replication. Most unexpectedly, the HcDNAV PolB sequence was found to be closely related to the PolB sequence of ASFV. In addition, the amino acid sequence of HcDNAV PolB showed a rare amino acid substitution within a motif containing highly conserved motif: YSDTDS was found in HcDNAV PolB instead of YGDTDS in most dsDNA viruses. Together with the previous observation of ASFV-like sequences in the Sorcerer II Global Ocean Sampling metagenomic datasets, our results further reinforce the ideas that the terrestrial ASFV has its evolutionary origin in marine environments.

  10. "Venom" of the slow loris: sequence similarity of prosimian skin gland protein and Fel d 1 cat allergen.

    Science.gov (United States)

    Krane, Sonja; Itagaki, Yasuhiro; Nakanishi, Koji; Weldon, Paul J

    2003-02-01

    Bites inflicted on humans by the slow loris (Nycticebus coucang), a prosimian from Indonesia, are painful and elicit anaphylaxis. Toxins from N. coucang are thought to originate in the brachial organ, a naked, gland-laden area of skin situated on the flexor surface of the arm that is licked during grooming. We isolated a major component of the brachial organ secretions from N. coucang, an approximately 18 kDa protein composed of two 70-90 amino-acid chains linked by one or more disulfide bonds. The N-termini of these peptide chains exhibit nearly 70% sequence similarity (37% identity, chain 1; 54% identity, chain 2) with the two chains of Fel d 1, the major allergen from the domestic cat (Felis catus). The extensive sequence similarity between the brachial organ component of N. coucang and the cat allergen suggests that they exhibit immunogenic cross-reactivity. This work clarifies the chemical nature of the brachial organ exudate and suggests a possible mode of action underlying the noxious effects of slow loris bites.

  11. Evolutionary growth process of highly conserved sequences in vertebrate genomes.

    Science.gov (United States)

    Ishibashi, Minaka; Noda, Akiko Ogura; Sakate, Ryuichi; Imanishi, Tadashi

    2012-08-01

    Genome sequence comparison between evolutionarily distant species revealed ultraconserved elements (UCEs) among mammals under strong purifying selection. Most of them were also conserved among vertebrates. Because they tend to be located in the flanking regions of developmental genes, they would have fundamental roles in creating vertebrate body plans. However, the evolutionary origin and selection mechanism of these UCEs remain unclear. Here we report that UCEs arose in primitive vertebrates, and gradually grew in vertebrate evolution. We searched for UCEs in two teleost fishes, Tetraodon nigroviridis and Oryzias latipes, and found 554 UCEs with 100% identity over 100 bps. Comparison of teleost and mammalian UCEs revealed 43 pairs of common, jawed-vertebrate UCEs (jUCE) with high sequence identities, ranging from 83.1% to 99.2%. Ten of them retain lower similarities to the Petromyzon marinus genome, and the substitution rates of four non-exonic jUCEs were reduced after the teleost-mammal divergence, suggesting that robust conservation had been acquired in the jawed vertebrate lineage. Our results indicate that prototypical UCEs originated before the divergence of jawed and jawless vertebrates and have been frozen as perfect conserved sequences in the jawed vertebrate lineage. In addition, our comparative sequence analyses of UCEs and neighboring regions resulted in a discovery of lineage-specific conserved sequences. They were added progressively to prototypical UCEs, suggesting step-wise acquisition of novel regulatory roles. Our results indicate that conserved non-coding elements (CNEs) consist of blocks with distinct evolutionary history, each having been frozen since different evolutionary era along the vertebrate lineage. Copyright © 2012 Elsevier B.V. All rights reserved.

  12. Sequencing of the needle transcriptome from Norway spruce (Picea abies Karst L. reveals lower substitution rates, but similar selective constraints in gymnosperms and angiosperms

    Directory of Open Access Journals (Sweden)

    Chen Jun

    2012-11-01

    Full Text Available Abstract Background A detailed knowledge about spatial and temporal gene expression is important for understanding both the function of genes and their evolution. For the vast majority of species, transcriptomes are still largely uncharacterized and even in those where substantial information is available it is often in the form of partially sequenced transcriptomes. With the development of next generation sequencing, a single experiment can now simultaneously identify the transcribed part of a species genome and estimate levels of gene expression. Results mRNA from actively growing needles of Norway spruce (Picea abies was sequenced using next generation sequencing technology. In total, close to 70 million fragments with a length of 76 bp were sequenced resulting in 5 Gbp of raw data. A de novo assembly of these reads, together with publicly available expressed sequence tag (EST data from Norway spruce, was used to create a reference transcriptome. Of the 38,419 PUTs (putative unique transcripts longer than 150 bp in this reference assembly, 83.5% show similarity to ESTs from other spruce species and of the remaining PUTs, 3,704 show similarity to protein sequences from other plant species, leaving 4,167 PUTs with limited similarity to currently available plant proteins. By predicting coding frames and comparing not only the Norway spruce PUTs, but also PUTs from the close relatives Picea glauca and Picea sitchensis to both Pinus taeda and Taxus mairei, we obtained estimates of synonymous and non-synonymous divergence among conifer species. In addition, we detected close to 15,000 SNPs of high quality and estimated gene expression differences between samples collected under dark and light conditions. Conclusions Our study yielded a large number of single nucleotide polymorphisms as well as estimates of gene expression on transcriptome scale. In agreement with a recent study we find that the synonymous substitution rate per year (0.6 × 10

  13. Applications of High Throughput Nucleotide Sequencing

    DEFF Research Database (Denmark)

    Waage, Johannes Eichler

    equally large demands in data handling, analysis and interpretation, perhaps defining the modern challenge of the computational biologist of the post-genomic era. The first part of this thesis consists of a general introduction to the history, common terms and challenges of next generation sequencing......-sequencing, a study of the effects on alternative RNA splicing of KO of the nonsense mediated RNA decay system in Mus, using digital gene expression and a custom-built exon-exon junction mapping pipeline is presented (article I). Evolved from this work, a Bioconductor package, spliceR, for classifying alternative...

  14. Location of the redox-active thiols of ribonucleotide reductase: sequences similarity between the Escherichia coli and Lactobacillus leichmannii enzymes

    International Nuclear Information System (INIS)

    Lin, A.N.I.; Ashley, G.W.; Stubbe, J.

    1987-01-01

    The redox-active thiols of Escherichia coli ribonucleoside diphosphate reductase and of Lactobacillus leichmannii ribonucleoside triphosphate reductase have been located by a procedure involving (1) prereduction of enzyme with dithiothreitol, (2) specific oxidation of the redox-active thiols by treatment with substrate in the absence of exogenous reductant, (3) alkylation of other thiols with iodoacetamide, and (4) reduction of the disulfides with dithiothreitol and alkylation with [1- 14 C]iodoacetamide. The dithiothreitol-reduce E. coli B1 subunit is able to convert 3 equiv of CDP to dCDP and is labeled with 5.4 equiv of 14 C. Sequencing of tryptic peptides shows that 2.8 equiv of 14 C is on cysteines-752 and -757 at the C-terminus of B1, while 1.0-1.5 equiv of 14 C is on cysteines-222 and -227. It thus appears that two sets of redox-active dithiols are involved in substrate reduction. The L. leichmannii reductase is able to convert 1.1 equiv of CTP to dCTP and is labeled with 2.1 equiv of 14 C. Sequencing of tryptic peptides shows that 1.4 equiv of 14 C is located on the two cysteines of C-E-G-G-A-C-P-I-K. This peptide shows remarkable and unexpected similarity to the thiol-containing region of the C-terminal peptide of E. coli B1, C-E-S-G-A-C-K-I

  15. Exome Sequence Analysis of 14 Families With High Myopia

    DEFF Research Database (Denmark)

    Kloss, Bethany A.; Tompson, Stuart W.; Whisenhunt, Kristina N.

    2017-01-01

    Purpose: To identify causal gene mutations in 14 families with autosomal dominant (AD) high myopia using exome sequencing. Methods: Select individuals from 14 large Caucasian families with high myopia were exome sequenced. Gene variants were filtered to identify potential pathogenic changes. Sang...

  16. High-throughput sequence alignment using Graphics Processing Units

    Directory of Open Access Journals (Sweden)

    Trapnell Cole

    2007-12-01

    Full Text Available Abstract Background The recent availability of new, less expensive high-throughput DNA sequencing technologies has yielded a dramatic increase in the volume of sequence data that must be analyzed. These data are being generated for several purposes, including genotyping, genome resequencing, metagenomics, and de novo genome assembly projects. Sequence alignment programs such as MUMmer have proven essential for analysis of these data, but researchers will need ever faster, high-throughput alignment tools running on inexpensive hardware to keep up with new sequence technologies. Results This paper describes MUMmerGPU, an open-source high-throughput parallel pairwise local sequence alignment program that runs on commodity Graphics Processing Units (GPUs in common workstations. MUMmerGPU uses the new Compute Unified Device Architecture (CUDA from nVidia to align multiple query sequences against a single reference sequence stored as a suffix tree. By processing the queries in parallel on the highly parallel graphics card, MUMmerGPU achieves more than a 10-fold speedup over a serial CPU version of the sequence alignment kernel, and outperforms the exact alignment component of MUMmer on a high end CPU by 3.5-fold in total application time when aligning reads from recent sequencing projects using Solexa/Illumina, 454, and Sanger sequencing technologies. Conclusion MUMmerGPU is a low cost, ultra-fast sequence alignment program designed to handle the increasing volume of data produced by new, high-throughput sequencing technologies. MUMmerGPU demonstrates that even memory-intensive applications can run significantly faster on the relatively low-cost GPU than on the CPU.

  17. Automated cleaning and pre-processing of immunoglobulin gene sequences from high-throughput sequencing

    Directory of Open Access Journals (Sweden)

    Miri eMichaeli

    2012-12-01

    Full Text Available High throughput sequencing (HTS yields tens of thousands to millions of sequences that require a large amount of pre-processing work to clean various artifacts. Such cleaning cannot be performed manually. Existing programs are not suitable for immunoglobulin (Ig genes, which are variable and often highly mutated. This paper describes Ig-HTS-Cleaner (Ig High Throughput Sequencing Cleaner, a program containing a simple cleaning procedure that successfully deals with pre-processing of Ig sequences derived from HTS, and Ig-Indel-Identifier (Ig Insertion – Deletion Identifier, a program for identifying legitimate and artifact insertions and/or deletions (indels. Our programs were designed for analyzing Ig gene sequences obtained by 454 sequencing, but they are applicable to all types of sequences and sequencing platforms. Ig-HTS-Cleaner and Ig-Indel-Identifier have been implemented in Java and saved as executable JAR files, supported on Linux and MS Windows. No special requirements are needed in order to run the programs, except for correctly constructing the input files as explained in the text. The programs' performance has been tested and validated on real and simulated data sets.

  18. Management of High-Throughput DNA Sequencing Projects: Alpheus.

    Science.gov (United States)

    Miller, Neil A; Kingsmore, Stephen F; Farmer, Andrew; Langley, Raymond J; Mudge, Joann; Crow, John A; Gonzalez, Alvaro J; Schilkey, Faye D; Kim, Ryan J; van Velkinburgh, Jennifer; May, Gregory D; Black, C Forrest; Myers, M Kathy; Utsey, John P; Frost, Nicholas S; Sugarbaker, David J; Bueno, Raphael; Gullans, Stephen R; Baxter, Susan M; Day, Steve W; Retzel, Ernest F

    2008-12-26

    High-throughput DNA sequencing has enabled systems biology to begin to address areas in health, agricultural and basic biological research. Concomitant with the opportunities is an absolute necessity to manage significant volumes of high-dimensional and inter-related data and analysis. Alpheus is an analysis pipeline, database and visualization software for use with massively parallel DNA sequencing technologies that feature multi-gigabase throughput characterized by relatively short reads, such as Illumina-Solexa (sequencing-by-synthesis), Roche-454 (pyrosequencing) and Applied Biosystem's SOLiD (sequencing-by-ligation). Alpheus enables alignment to reference sequence(s), detection of variants and enumeration of sequence abundance, including expression levels in transcriptome sequence. Alpheus is able to detect several types of variants, including non-synonymous and synonymous single nucleotide polymorphisms (SNPs), insertions/deletions (indels), premature stop codons, and splice isoforms. Variant detection is aided by the ability to filter variant calls based on consistency, expected allele frequency, sequence quality, coverage, and variant type in order to minimize false positives while maximizing the identification of true positives. Alpheus also enables comparisons of genes with variants between cases and controls or bulk segregant pools. Sequence-based differential expression comparisons can be developed, with data export to SAS JMP Genomics for statistical analysis.

  19. A putative carbohydrate-binding domain of the lactose-binding Cytisus sessilifolius anti-H(O) lectin has a similar amino acid sequence to that of the L-fucose-binding Ulex europaeus anti-H(O) lectin.

    Science.gov (United States)

    Konami, Y; Yamamoto, K; Osawa, T; Irimura, T

    1995-04-01

    The complete amino acid sequence of a lactose-binding Cytisus sessilifolius anti-H(O) lectin II (CSA-II) was determined using a protein sequencer. After digestion of CSA-II with endoproteinase Lys-C or Asp-N, the resulting peptides were purified by reversed-phase high performance liquid chromatography (HPLC) and then subjected to sequence analysis. Comparison of the complete amino acid sequence of CSA-II with the sequences of other leguminous seed lectins revealed regions of extensive homology. The amino acid sequence of a putative carbohydrate-binding domain of CSA-II was found to be similar to those of several anti-H(O) leguminous lectins, especially to that of the L-fucose-binding Ulex europaeus lectin I (UEA-I).

  20. A protein-tyrosine phosphatase with sequence similarity to the SH2 domain of the protein-tyrosine kinases.

    Science.gov (United States)

    Shen, S H; Bastien, L; Posner, B I; Chrétien, P

    1991-08-22

    The phosphorylation of proteins at tyrosine residues is critical in cellular signal transduction, neoplastic transformation and control of the mitotic cycle. These mechanisms are regulated by the activities of both protein-tyrosine kinases (PTKs) and protein-tyrosine phosphatases (PTPases). As in the PTKs, there are two classes of PTPases: membrane associated, receptor-like enzymes and soluble proteins. Here we report the isolation of a complementary DNA clone encoding a new form of soluble PTPase, PTP1C. The enzyme possesses a large noncatalytic region at the N terminus which unexpectedly contains two adjacent copies of the Src homology region 2 (the SH2 domain) found in various nonreceptor PTKs and other cytoplasmic signalling proteins. As with other SH2 sequences, the SH2 domains of PTP1C formed high-affinity complexes with the activated epidermal growth factor receptor and other phosphotyrosine-containing proteins. These results suggest that the SH2 regions in PTP1C may interact with other cellular components to modulate its own phosphatase activity against interacting substrates. PTPase activity may thus directly link growth factor receptors and other signalling proteins through protein-tyrosine phosphorylation.

  1. Subfamily logos: visualization of sequence deviations at alignment positions with high information content

    Directory of Open Access Journals (Sweden)

    Beitz Eric

    2006-06-01

    Full Text Available Abstract Background Recognition of relevant sequence deviations can be valuable for elucidating functional differences between protein subfamilies. Interesting residues at highly conserved positions can then be mutated and experimentally analyzed. However, identification of such sites is tedious because automated approaches are scarce. Results Subfamily logos visualize subfamily-specific sequence deviations. The display is similar to classical sequence logos but extends into the negative range. Positive, upright characters correspond to residues which are characteristic for the subfamily, negative, upside-down characters to residues typical for the remaining sequences. The symbol height is adjusted to the information content of the alignment position. Residues which are conserved throughout do not appear. Conclusion Subfamily logos provide an intuitive display of relevant sequence deviations. The method has proven to be valid using a set of 135 aligned aquaporin sequences in which established subfamily-specific positions were readily identified by the algorithm.

  2. Highly conserved non-coding sequences are associated with vertebrate development.

    Directory of Open Access Journals (Sweden)

    Adam Woolfe

    2005-01-01

    Full Text Available In addition to protein coding sequence, the human genome contains a significant amount of regulatory DNA, the identification of which is proving somewhat recalcitrant to both in silico and functional methods. An approach that has been used with some success is comparative sequence analysis, whereby equivalent genomic regions from different organisms are compared in order to identify both similarities and differences. In general, similarities in sequence between highly divergent organisms imply functional constraint. We have used a whole-genome comparison between humans and the pufferfish, Fugu rubripes, to identify nearly 1,400 highly conserved non-coding sequences. Given the evolutionary divergence between these species, it is likely that these sequences are found in, and furthermore are essential to, all vertebrates. Most, and possibly all, of these sequences are located in and around genes that act as developmental regulators. Some of these sequences are over 90% identical across more than 500 bases, being more highly conserved than coding sequence between these two species. Despite this, we cannot find any similar sequences in invertebrate genomes. In order to begin to functionally test this set of sequences, we have used a rapid in vivo assay system using zebrafish embryos that allows tissue-specific enhancer activity to be identified. Functional data is presented for highly conserved non-coding sequences associated with four unrelated developmental regulators (SOX21, PAX6, HLXB9, and SHH, in order to demonstrate the suitability of this screen to a wide range of genes and expression patterns. Of 25 sequence elements tested around these four genes, 23 show significant enhancer activity in one or more tissues. We have identified a set of non-coding sequences that are highly conserved throughout vertebrates. They are found in clusters across the human genome, principally around genes that are implicated in the regulation of development

  3. Directed PCR-free engineering of highly repetitive DNA sequences

    Directory of Open Access Journals (Sweden)

    Preissler Steffen

    2011-09-01

    Full Text Available Abstract Background Highly repetitive nucleotide sequences are commonly found in nature e.g. in telomeres, microsatellite DNA, polyadenine (poly(A tails of eukaryotic messenger RNA as well as in several inherited human disorders linked to trinucleotide repeat expansions in the genome. Therefore, studying repetitive sequences is of biological, biotechnological and medical relevance. However, cloning of such repetitive DNA sequences is challenging because specific PCR-based amplification is hampered by the lack of unique primer binding sites resulting in unspecific products. Results For the PCR-free generation of repetitive DNA sequences we used antiparallel oligonucleotides flanked by restriction sites of Type IIS endonucleases. The arrangement of recognition sites allowed for stepwise and seamless elongation of repetitive sequences. This facilitated the assembly of repetitive DNA segments and open reading frames encoding polypeptides with periodic amino acid sequences of any desired length. By this strategy we cloned a series of polyglutamine encoding sequences as well as highly repetitive polyadenine tracts. Such repetitive sequences can be used for diverse biotechnological applications. As an example, the polyglutamine sequences were expressed as His6-SUMO fusion proteins in Escherichia coli cells to study their aggregation behavior in vitro. The His6-SUMO moiety enabled affinity purification of the polyglutamine proteins, increased their solubility, and allowed controlled induction of the aggregation process. We successfully purified the fusions proteins and provide an example for their applicability in filter retardation assays. Conclusion Our seamless cloning strategy is PCR-free and allows the directed and efficient generation of highly repetitive DNA sequences of defined lengths by simple standard cloning procedures.

  4. MUSCLE: multiple sequence alignment with high accuracy and high throughput.

    Science.gov (United States)

    Edgar, Robert C

    2004-01-01

    We describe MUSCLE, a new computer program for creating multiple alignments of protein sequences. Elements of the algorithm include fast distance estimation using kmer counting, progressive alignment using a new profile function we call the log-expectation score, and refinement using tree-dependent restricted partitioning. The speed and accuracy of MUSCLE are compared with T-Coffee, MAFFT and CLUSTALW on four test sets of reference alignments: BAliBASE, SABmark, SMART and a new benchmark, PREFAB. MUSCLE achieves the highest, or joint highest, rank in accuracy on each of these sets. Without refinement, MUSCLE achieves average accuracy statistically indistinguishable from T-Coffee and MAFFT, and is the fastest of the tested methods for large numbers of sequences, aligning 5000 sequences of average length 350 in 7 min on a current desktop computer. The MUSCLE program, source code and PREFAB test data are freely available at http://www.drive5. com/muscle.

  5. High-Throughput Block Optical DNA Sequence Identification.

    Science.gov (United States)

    Sagar, Dodderi Manjunatha; Korshoj, Lee Erik; Hanson, Katrina Bethany; Chowdhury, Partha Pratim; Otoupal, Peter Britton; Chatterjee, Anushree; Nagpal, Prashant

    2018-01-01

    Optical techniques for molecular diagnostics or DNA sequencing generally rely on small molecule fluorescent labels, which utilize light with a wavelength of several hundred nanometers for detection. Developing a label-free optical DNA sequencing technique will require nanoscale focusing of light, a high-throughput and multiplexed identification method, and a data compression technique to rapidly identify sequences and analyze genomic heterogeneity for big datasets. Such a method should identify characteristic molecular vibrations using optical spectroscopy, especially in the "fingerprinting region" from ≈400-1400 cm -1 . Here, surface-enhanced Raman spectroscopy is used to demonstrate label-free identification of DNA nucleobases with multiplexed 3D plasmonic nanofocusing. While nanometer-scale mode volumes prevent identification of single nucleobases within a DNA sequence, the block optical technique can identify A, T, G, and C content in DNA k-mers. The content of each nucleotide in a DNA block can be a unique and high-throughput method for identifying sequences, genes, and other biomarkers as an alternative to single-letter sequencing. Additionally, coupling two complementary vibrational spectroscopy techniques (infrared and Raman) can improve block characterization. These results pave the way for developing a novel, high-throughput block optical sequencing method with lossy genomic data compression using k-mer identification from multiplexed optical data acquisition. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  6. High copy number of highly similar mariner-like transposons in planarian (Platyhelminthe): evidence for a trans-phyla horizontal transfer.

    Science.gov (United States)

    Garcia-Fernàndez, J; Bayascas-Ramírez, J R; Marfany, G; Muñoz-Mármol, A M; Casali, A; Baguñà, J; Saló, E

    1995-05-01

    Several DNA sequences similar to the mariner element were isolated and characterized in the platyhelminthe Dugesia (Girardia) tigrina. They were 1,288 bp long, flanked by two 32 bp-inverted repeats, and contained a single 339 amino acid open-reading frame (ORF) encoding the transposase. The number of copies of this element is approximately 8,000 per haploid genome, constituting a member of the middle-repetitive DNA of Dugesia tigrina. Sequence analysis of several elements showed a high percentage of conservation between the different copies. Most of them presented an intact ORF and the standard signals of actively expressed genes, which suggests that some of them are or have recently been functional transposons. The high degree of similarity shared with other mariner elements from some arthropods, together with the fact that this element is undetectable in other planarian species, strongly suggests a case of horizontal transfer between these two distant phyla.

  7. An Alignment-Free Algorithm in Comparing the Similarity of Protein Sequences Based on Pseudo-Markov Transition Probabilities among Amino Acids.

    Science.gov (United States)

    Li, Yushuang; Song, Tian; Yang, Jiasheng; Zhang, Yi; Yang, Jialiang

    2016-01-01

    In this paper, we have proposed a novel alignment-free method for comparing the similarity of protein sequences. We first encode a protein sequence into a 440 dimensional feature vector consisting of a 400 dimensional Pseudo-Markov transition probability vector among the 20 amino acids, a 20 dimensional content ratio vector, and a 20 dimensional position ratio vector of the amino acids in the sequence. By evaluating the Euclidean distances among the representing vectors, we compare the similarity of protein sequences. We then apply this method into the ND5 dataset consisting of the ND5 protein sequences of 9 species, and the F10 and G11 datasets representing two of the xylanases containing glycoside hydrolase families, i.e., families 10 and 11. As a result, our method achieves a correlation coefficient of 0.962 with the canonical protein sequence aligner ClustalW in the ND5 dataset, much higher than those of other 5 popular alignment-free methods. In addition, we successfully separate the xylanases sequences in the F10 family and the G11 family and illustrate that the F10 family is more heat stable than the G11 family, consistent with a few previous studies. Moreover, we prove mathematically an identity equation involving the Pseudo-Markov transition probability vector and the amino acids content ratio vector.

  8. High-Throughput Next-Generation Sequencing of Polioviruses

    Science.gov (United States)

    Montmayeur, Anna M.; Schmidt, Alexander; Zhao, Kun; Magaña, Laura; Iber, Jane; Castro, Christina J.; Chen, Qi; Henderson, Elizabeth; Ramos, Edward; Shaw, Jing; Tatusov, Roman L.; Dybdahl-Sissoko, Naomi; Endegue-Zanga, Marie Claire; Adeniji, Johnson A.; Oberste, M. Steven; Burns, Cara C.

    2016-01-01

    ABSTRACT The poliovirus (PV) is currently targeted for worldwide eradication and containment. Sanger-based sequencing of the viral protein 1 (VP1) capsid region is currently the standard method for PV surveillance. However, the whole-genome sequence is sometimes needed for higher resolution global surveillance. In this study, we optimized whole-genome sequencing protocols for poliovirus isolates and FTA cards using next-generation sequencing (NGS), aiming for high sequence coverage, efficiency, and throughput. We found that DNase treatment of poliovirus RNA followed by random reverse transcription (RT), amplification, and the use of the Nextera XT DNA library preparation kit produced significantly better results than other preparations. The average viral reads per total reads, a measurement of efficiency, was as high as 84.2% ± 15.6%. PV genomes covering >99 to 100% of the reference length were obtained and validated with Sanger sequencing. A total of 52 PV genomes were generated, multiplexing as many as 64 samples in a single Illumina MiSeq run. This high-throughput, sequence-independent NGS approach facilitated the detection of a diverse range of PVs, especially for those in vaccine-derived polioviruses (VDPV), circulating VDPV, or immunodeficiency-related VDPV. In contrast to results from previous studies on other viruses, our results showed that filtration and nuclease treatment did not discernibly increase the sequencing efficiency of PV isolates. However, DNase treatment after nucleic acid extraction to remove host DNA significantly improved the sequencing results. This NGS method has been successfully implemented to generate PV genomes for molecular epidemiology of the most recent PV isolates. Additionally, the ability to obtain full PV genomes from FTA cards will aid in facilitating global poliovirus surveillance. PMID:27927929

  9. Application of high-throughput DNA sequencing in phytopathology.

    Science.gov (United States)

    Studholme, David J; Glover, Rachel H; Boonham, Neil

    2011-01-01

    The new sequencing technologies are already making a big impact in academic research on medically important microbes and may soon revolutionize diagnostics, epidemiology, and infection control. Plant pathology also stands to gain from exploiting these opportunities. This manuscript reviews some applications of these high-throughput sequencing methods that are relevant to phytopathology, with emphasis on the associated computational and bioinformatics challenges and their solutions. Second-generation sequencing technologies have recently been exploited in genomics of both prokaryotic and eukaryotic plant pathogens. They are also proving to be useful in diagnostics, especially with respect to viruses. Copyright © 2011 by Annual Reviews. All rights reserved.

  10. Quack: A quality assurance tool for high throughput sequence data.

    Science.gov (United States)

    Thrash, Adam; Arick, Mark; Peterson, Daniel G

    2018-05-01

    The quality of data generated by high-throughput DNA sequencing tools must be rapidly assessed in order to determine how useful the data may be in making biological discoveries; higher quality data leads to more confident results and conclusions. Due to the ever-increasing size of data sets and the importance of rapid quality assessment, tools that analyze sequencing data should quickly produce easily interpretable graphics. Quack addresses these issues by generating information-dense visualizations from FASTQ files at a speed far surpassing other publicly available quality assurance tools in a manner independent of sequencing technology. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.

  11. Sequence similarity between the cp gene and the transgene in transgenic papayas = Similaridade de seqüência entre o gene cp do vírus e do transgene presente em mamoeiros transgênicos

    NARCIS (Netherlands)

    Souza, M.T.; Teixeira, M.; Gonsalves, D.

    2005-01-01

    The Papaya ringspot virus (PRSV) coat protein transgene present in 'Rainbow' and 'SunUp' papayas disclose high sequence similarity (>89%) to the cp gene from PRSV BR and TH. Despite this, both isolates are able to break down the resistance in 'Rainbow', while only the latter is able to do so in

  12. Structural and sequence variants in patients with Silver-Russell syndrome or similar features-Curation of a disease database

    DEFF Research Database (Denmark)

    Tümer, Zeynep; López-Hernández, Julia Angélica; Netchine, Irène

    2018-01-01

    data of these patients. The clinical features are scored according to the Netchine-Harbison clinical scoring system (NH-CSS), which has recently been accepted as standard by consensus. The structural and sequence variations are reviewed and where necessary redescribed according to recent...

  13. Similarity of High-Resolution Tandem Mass Spectrometry Spectra of Structurally Related Micropollutants and Transformation Products

    Science.gov (United States)

    Schollée, Jennifer E.; Schymanski, Emma L.; Stravs, Michael A.; Gulde, Rebekka; Thomaidis, Nikolaos S.; Hollender, Juliane

    2017-12-01

    High-resolution tandem mass spectrometry (HRMS2) with electrospray ionization is frequently applied to study polar organic molecules such as micropollutants. Fragmentation provides structural information to confirm structures of known compounds or propose structures of unknown compounds. Similarity of HRMS2 spectra between structurally related compounds has been suggested to facilitate identification of unknown compounds. To test this hypothesis, the similarity of reference standard HRMS2 spectra was calculated for 243 pairs of micropollutants and their structurally related transformation products (TPs); for comparison, spectral similarity was also calculated for 219 pairs of unrelated compounds. Spectra were measured on Orbitrap and QTOF mass spectrometers and similarity was calculated with the dot product. The influence of different factors on spectral similarity [e.g., normalized collision energy (NCE), merging fragments from all NCEs, and shifting fragments by the mass difference of the pair] was considered. Spectral similarity increased at higher NCEs and highest similarity scores for related pairs were obtained with merged spectra including measured fragments and shifted fragments. Removal of the monoisotopic peak was critical to reduce false positives. Using a spectral similarity score threshold of 0.52, 40% of related pairs and 0% of unrelated pairs were above this value. Structural similarity was estimated with the Tanimoto coefficient and pairs with higher structural similarity generally had higher spectral similarity. Pairs where one or both compounds contained heteroatoms such as sulfur often resulted in dissimilar spectra. This work demonstrates that HRMS2 spectral similarity may indicate structural similarity and that spectral similarity can be used in the future to screen complex samples for related compounds such as micropollutants and TPs, assisting in the prioritization of non-target compounds. [Figure not available: see fulltext.

  14. An improved high throughput sequencing method for studying oomycete communities

    DEFF Research Database (Denmark)

    Sapkota, Rumakanta; Nicolaisen, Mogens

    2015-01-01

    the usefulness of the method not only in soil DNA but also in a plant DNA background. In conclusion, we demonstrate a successful approach for pyrosequencing of oomycete communities using ITS1 as the barcode sequence with well-known primers for oomycete DNA amplification....... communities. Thewell-known primer sets ITS4, ITS6 and ITS7were used in the study in a semi-nested PCR approach to target the internal transcribed spacer (ITS) 1 of ribosomal DNA in a next generation sequencing protocol. These primers have been used in similar studies before, butwith limited success.......Wewere able to increase the proportion of retrieved oomycete sequences dramaticallymainly by increasing the annealing temperature during PCR. The optimized protocol was validated using three mock communities and the method was further evaluated using total DNA from 26 soil samples collected from different...

  15. Highly multiplexed targeted DNA sequencing from single nuclei.

    Science.gov (United States)

    Leung, Marco L; Wang, Yong; Kim, Charissa; Gao, Ruli; Jiang, Jerry; Sei, Emi; Navin, Nicholas E

    2016-02-01

    Single-cell DNA sequencing methods are challenged by poor physical coverage, high technical error rates and low throughput. To address these issues, we developed a single-cell DNA sequencing protocol that combines flow-sorting of single nuclei, time-limited multiple-displacement amplification (MDA), low-input library preparation, DNA barcoding, targeted capture and next-generation sequencing (NGS). This approach represents a major improvement over our previous single nucleus sequencing (SNS) Nature Protocols paper in terms of generating higher-coverage data (>90%), thereby enabling the detection of genome-wide variants in single mammalian cells at base-pair resolution. Furthermore, by pooling 48-96 single-cell libraries together for targeted capture, this approach can be used to sequence many single-cell libraries in parallel in a single reaction. This protocol greatly reduces the cost of single-cell DNA sequencing, and it can be completed in 5-6 d by advanced users. This single-cell DNA sequencing protocol has broad applications for studying rare cells and complex populations in diverse fields of biological research and medicine.

  16. Decoding the Divergent Subcellular Location of Two Highly Similar Paralogous LEA Proteins

    Directory of Open Access Journals (Sweden)

    Marie-Hélène Avelange-Macherel

    2018-05-01

    Full Text Available Many mitochondrial proteins are synthesized as precursors in the cytosol with an N-terminal mitochondrial targeting sequence (MTS which is cleaved off upon import. Although much is known about import mechanisms and MTS structural features, the variability of MTS still hampers robust sub-cellular software predictions. Here, we took advantage of two paralogous late embryogenesis abundant proteins (LEA from Arabidopsis with different subcellular locations to investigate structural determinants of mitochondrial import and gain insight into the evolution of the LEA genes. LEA38 and LEA2 are short proteins of the LEA_3 family, which are very similar along their whole sequence, but LEA38 is targeted to mitochondria while LEA2 is cytosolic. Differences in the N-terminal protein sequences were used to generate a series of mutated LEA2 which were expressed as GFP-fusion proteins in leaf protoplasts. By combining three types of mutation (substitution, charge inversion, and segment replacement, we were able to redirect the mutated LEA2 to mitochondria. Analysis of the effect of the mutations and determination of the LEA38 MTS cleavage site highlighted important structural features within and beyond the MTS. Overall, these results provide an explanation for the likely loss of mitochondrial location after duplication of the ancestral gene.

  17. Roche genome sequencer FLX based high-throughput sequencing of ancient DNA

    DEFF Research Database (Denmark)

    Alquezar-Planas, David E; Fordyce, Sarah Louise

    2012-01-01

    Since the development of so-called "next generation" high-throughput sequencing in 2005, this technology has been applied to a variety of fields. Such applications include disease studies, evolutionary investigations, and ancient DNA. Each application requires a specialized protocol to ensure...... that the data produced is optimal. Although much of the procedure can be followed directly from the manufacturer's protocols, the key differences lie in the library preparation steps. This chapter presents an optimized protocol for the sequencing of fossil remains and museum specimens, commonly referred...

  18. Sequencing of 50 human exomes reveals adaptation to high altitude

    DEFF Research Database (Denmark)

    Yi, Xin; Liang, Yu; Huerta-Sanchez, Emilia

    2010-01-01

    Residents of the Tibetan Plateau show heritable adaptations to extreme altitude. We sequenced 50 exomes of ethnic Tibetans, encompassing coding sequences of 92% of human genes, with an average coverage of 18x per individual. Genes showing population-specific allele frequency changes, which repres...... in genetic adaptation to high altitude.......Residents of the Tibetan Plateau show heritable adaptations to extreme altitude. We sequenced 50 exomes of ethnic Tibetans, encompassing coding sequences of 92% of human genes, with an average coverage of 18x per individual. Genes showing population-specific allele frequency changes, which...... represent strong candidates for altitude adaptation, were identified. The strongest signal of natural selection came from endothelial Per-Arnt-Sim (PAS) domain protein 1 (EPAS1), a transcription factor involved in response to hypoxia. One single-nucleotide polymorphism (SNP) at EPAS1 shows a 78% frequency...

  19. High throughput 16S rRNA gene amplicon sequencing

    DEFF Research Database (Denmark)

    Nierychlo, Marta; Larsen, Poul; Jørgensen, Mads Koustrup

    S rRNA gene amplicon sequencing has been developed over the past few years and is now ready to use for more comprehensive studies related to plant operation and optimization thanks to short analysis time, low cost, high throughput, and high taxonomic resolution. In this study we show how 16S r......RNA gene amplicon sequencing can be used to reveal factors of importance for the operation of full-scale nutrient removal plants related to settling problems and floc properties. Using optimized DNA extraction protocols, indexed primers and our in-house Illumina platform, we prepared multiple samples...... be correlated to the presence of the species that are regarded as “strong” and “weak” floc formers. In conclusion, 16S rRNA gene amplicon sequencing provides a high throughput approach for a rapid and cheap community profiling of activated sludge that in combination with multivariate statistics can be used...

  20. Similarity measurement method of high-dimensional data based on normalized net lattice subspace

    Institute of Scientific and Technical Information of China (English)

    Li Wenfa; Wang Gongming; Li Ke; Huang Su

    2017-01-01

    The performance of conventional similarity measurement methods is affected seriously by the curse of dimensionality of high-dimensional data.The reason is that data difference between sparse and noisy dimensionalities occupies a large proportion of the similarity, leading to the dissimilarities between any results.A similarity measurement method of high-dimensional data based on normalized net lattice subspace is proposed.The data range of each dimension is divided into several intervals, and the components in different dimensions are mapped onto the corresponding interval.Only the component in the same or adjacent interval is used to calculate the similarity.To validate this meth-od, three data types are used, and seven common similarity measurement methods are compared. The experimental result indicates that the relative difference of the method is increasing with the di-mensionality and is approximately two or three orders of magnitude higher than the conventional method.In addition, the similarity range of this method in different dimensions is [0, 1], which is fit for similarity analysis after dimensionality reduction.

  1. Scrutinizing virus genome termini by high-throughput sequencing.

    Directory of Open Access Journals (Sweden)

    Shasha Li

    Full Text Available Analysis of genomic terminal sequences has been a major step in studies on viral DNA replication and packaging mechanisms. However, traditional methods to study genome termini are challenging due to the time-consuming protocols and their inefficiency where critical details are lost easily. Recent advances in next generation sequencing (NGS have enabled it to be a powerful tool to study genome termini. In this study, using NGS we sequenced one iridovirus genome and twenty phage genomes and confirmed for the first time that the high frequency sequences (HFSs found in the NGS reads are indeed the terminal sequences of viral genomes. Further, we established a criterion to distinguish the type of termini and the viral packaging mode. We also obtained additional terminal details such as terminal repeats, multi-termini, asymmetric termini. With this approach, we were able to simultaneously detect details of the genome termini as well as obtain the complete sequence of bacteriophage genomes. Theoretically, this application can be further extended to analyze larger and more complicated genomes of plant and animal viruses. This study proposed a novel and efficient method for research on viral replication, packaging, terminase activity, transcription regulation, and metabolism of the host cell.

  2. Exome sequencing identifies ZNF644 mutations in high myopia.

    Directory of Open Access Journals (Sweden)

    Yi Shi

    2011-06-01

    Full Text Available Myopia is the most common ocular disorder worldwide, and high myopia in particular is one of the leading causes of blindness. Genetic factors play a critical role in the development of myopia, especially high myopia. Recently, the exome sequencing approach has been successfully used for the disease gene identification of Mendelian disorders. Here we show a successful application of exome sequencing to identify a gene for an autosomal dominant disorder, and we have identified a gene potentially responsible for high myopia in a monogenic form. We captured exomes of two affected individuals from a Han Chinese family with high myopia and performed sequencing analysis by a second-generation sequencer with a mean coverage of 30× and sufficient depth to call variants at ∼97% of each targeted exome. The shared genetic variants of these two affected individuals in the family being studied were filtered against the 1000 Genomes Project and the dbSNP131 database. A mutation A672G in zinc finger protein 644 isoform 1 (ZNF644 was identified as being related to the phenotype of this family. After we performed sequencing analysis of the exons in the ZNF644 gene in 300 sporadic cases of high myopia, we identified an additional five mutations (I587V, R680G, C699Y, 3'UTR+12 C>G, and 3'UTR+592 G>A in 11 different patients. All these mutations were absent in 600 normal controls. The ZNF644 gene was expressed in human retinal and retinal pigment epithelium (RPE. Given that ZNF644 is predicted to be a transcription factor that may regulate genes involved in eye development, mutation may cause the axial elongation of eyeball found in high myopia patients. Our results suggest that ZNF644 might be a causal gene for high myopia in a monogenic form.

  3. Color-Based Image Retrieval from High-Similarity Image Databases

    DEFF Research Database (Denmark)

    Hansen, Michael Adsetts Edberg; Carstensen, Jens Michael

    2003-01-01

    Many image classification problems can fruitfully be thought of as image retrieval in a "high similarity image database" (HSID) characterized by being tuned towards a specific application and having a high degree of visual similarity between entries that should be distinguished. We introduce...... a method for HSID retrieval using a similarity measure based on a linear combination of Jeffreys-Matusita (JM) distances between distributions of color (and color derivatives) estimated from a set of automatically extracted image regions. The weight coefficients are estimated based on optimal retrieval...... performance. Experimental results on the difficult task of visually identifying clones of fungal colonies grown in a petri dish and categorization of pelts show a high retrieval accuracy of the method when combined with standardized sample preparation and image acquisition....

  4. An alignment-free method to find similarity among protein sequences via the general form of Chou's pseudo amino acid composition.

    Science.gov (United States)

    Gupta, M K; Niyogi, R; Misra, M

    2013-01-01

    In this paper, we propose a method to create the 60-dimensional feature vector for protein sequences via the general form of pseudo amino acid composition. The construction of the feature vector is based on the contents of amino acids, total distance of each amino acid from the first amino acid in the protein sequence and the distribution of 20 amino acids. The obtained cosine distance metric (also called the similarity matrix) is used to construct the phylogenetic tree by the neighbour joining method. In order to show the applicability of our approach, we tested it on three proteins: 1) ND5 protein sequences from nine species, 2) ND6 protein sequences from eight species, and 3) 50 coronavirus spike proteins. The results are in agreement with known history and the output from the multiple sequence alignment program ClustalW, which is widely used. We have also compared our phylogenetic results with six other recently proposed alignment-free methods. These comparisons show that our proposed method gives a more consistent biological relationship than the others. In addition, the time complexity is linear and space required is less as compared with other alignment-free methods that use graphical representation. It should be noted that the multiple sequence alignment method has exponential time complexity.

  5. Characterization of a highly toxic strain of Bacillus thuringiensis serovar kurstaki very similar to the HD-73 strain.

    Science.gov (United States)

    Reinoso-Pozo, Yaritza; Del Rincón-Castro, Ma Cristina; Ibarra, Jorge E

    2016-09-01

    The LBIT-1200 strain of Bacillus thuringiensis was recently isolated from soil, and showed a 6.4 and 9.5 increase in toxicity, against Manduca sexta and Trichoplusia ni, respectively, compared to HD-73. However, LBIT-1200 was still highly similar to HD-73, including the production of bipyramidal crystals containing only one protein of ∼130 000 kDa, its flagellin gene sequence related to the kurstaki serotype, plasmid and RepPCR patterns similar to HD-73, no production of β-exotoxin and no presence of VIP genes. Sequencing of its cry gene showed the presence of a cry1Ac-type gene with four amino acid differences, including two amino acid replacements in domain III, compared to Cry1Ac1, which may explain its higher toxicity. In conclusion, the LBIT-1200 strain is a variant of the HD-73 strain but shows a much higher toxicity, which makes this new strain an important candidate to be developed as a bioinsecticide, once it passes other tests, throughout its biotechnological development. © FEMS 2016. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  6. High-intensity discharge lamp and Duffing oscillator—Similarities and differences

    Science.gov (United States)

    Baumann, Bernd; Schwieger, Joerg; Stein, Ulrich; Hallerberg, Sarah; Wolff, Marcus

    2017-12-01

    The processes inside the arc tube of high-intensity discharge lamps are investigated using finite element simulations. The behavior of the gas mixture inside the arc tube is governed by differential equations describing mass, energy, and charge conservation, as well as the Helmholtz equation for the acoustic pressure and the Reynolds equations for the flow driven by buoyancy and Reynolds stresses. The model is highly nonlinear and requires a recursion procedure to account for the impact of acoustic streaming on the temperature and other fields. The investigations reveal the presence of a hysteresis and the corresponding jump phenomenon, quite similar to a Duffing oscillator. The similarities and, in particular, the differences of the nonlinear behavior of the high-intensity discharge lamp to that of a Duffing oscillator are discussed. For large amplitudes, the high-intensity discharge lamp exhibits a stiffening effect in contrast to the Duffing oscillator. It is speculated on how the stiffening might affect hysteresis suppression.

  7. Probabilistic Methods for Processing High-Throughput Sequencing Signals

    DEFF Research Database (Denmark)

    Sørensen, Lasse Maretty

    High-throughput sequencing has the potential to answer many of the big questions in biology and medicine. It can be used to determine the ancestry of species, to chart complex ecosystems and to understand and diagnose disease. However, going from raw sequencing data to biological or medical insig....... By estimating the genotypes on a set of candidate variants obtained from both a standard mapping-based approach as well as de novo assemblies, we are able to find considerably more structural variation than previous studies...... for reconstructing transcript sequences from RNA sequencing data. The method is based on a novel sparse prior distribution over transcript abundances and is markedly more accurate than existing approaches. The second chapter describes a new method for calling genotypes from a fixed set of candidate variants....... The method queries the reads using a graph representation of the variants and hereby mitigates the reference-bias that characterise standard genotyping methods. In the last chapter, we apply this method to call the genotypes of 50 deeply sequencing parent-offspring trios from the GenomeDenmark project...

  8. Fold-recognition and comparative modeling of human α2,3-sialyltransferases reveal their sequence and structural similarities to CstII from Campylobacter jejuni

    Directory of Open Access Journals (Sweden)

    Balaji Petety V

    2006-04-01

    Full Text Available Abstract Background The 3-D structure of none of the eukaryotic sialyltransferases (SiaTs has been determined so far. Sequence alignment algorithms such as BLAST and PSI-BLAST could not detect a homolog of these enzymes from the protein databank. SiaTs, thus, belong to the hard/medium target category in the CASP experiments. The objective of the current work is to model the 3-D structures of human SiaTs which transfer the sialic acid in α2,3-linkage viz., ST3Gal I, II, III, IV, V, and VI, using fold-recognition and comparative modeling methods. The pair-wise sequence similarity among these six enzymes ranges from 41 to 63%. Results Unlike the sequence similarity servers, fold-recognition servers identified CstII, a α2,3/8 dual-activity SiaT from Campylobacter jejuni as the homolog of all the six ST3Gals; the level of sequence similarity between CstII and ST3Gals is only 15–20% and the similarity is restricted to well-characterized motif regions of ST3Gals. Deriving template-target sequence alignments for the entire ST3Gal sequence was not straightforward: the fold-recognition servers could not find a template for the region preceding the L-motif and that between the L- and S-motifs. Multiple structural templates were identified to model these regions and template identification-modeling-evaluation had to be performed iteratively to choose the most appropriate templates. The modeled structures have acceptable stereochemical properties and are also able to provide qualitative rationalizations for some of the site-directed mutagenesis results reported in literature. Apart from the predicted models, an unexpected but valuable finding from this study is the sequential and structural relatedness of family GT42 and family GT29 SiaTs. Conclusion The modeled 3-D structures can be used for docking and other modeling studies and for the rational identification of residues to be mutated to impart desired properties such as altered stability, substrate

  9. The use of coded PCR primers enables high-throughput sequencing of multiple homolog amplification products by 454 parallel sequencing

    DEFF Research Database (Denmark)

    Binladen, Jonas; Gilbert, M Thomas P; Bollback, Jonathan P

    2007-01-01

    BACKGROUND: The invention of the Genome Sequence 20 DNA Sequencing System (454 parallel sequencing platform) has enabled the rapid and high-volume production of sequence data. Until now, however, individual emulsion PCR (emPCR) reactions and subsequent sequencing runs have been unable to combine...... primers that is dependent on the 5' nucleotide of the tag. In particular, primers 5' labelled with a cytosine are heavily overrepresented among the final sequences, while those 5' labelled with a thymine are strongly underrepresented. A weaker bias also exists with regards to the distribution...

  10. Deja vu: a database of highly similar citations in the scientific literature.

    Science.gov (United States)

    Errami, Mounir; Sun, Zhaohui; Long, Tara C; George, Angela C; Garner, Harold R

    2009-01-01

    In the scientific research community, plagiarism and covert multiple publications of the same data are considered unacceptable because they undermine the public confidence in the scientific integrity. Yet, little has been done to help authors and editors to identify highly similar citations, which sometimes may represent cases of unethical duplication. For this reason, we have made available Déjà vu, a publicly available database of highly similar Medline citations identified by the text similarity search engine eTBLAST. Following manual verification, highly similar citation pairs are classified into various categories ranging from duplicates with different authors to sanctioned duplicates. Déjà vu records also contain user-provided commentary and supporting information to substantiate each document's categorization. Déjà vu and eTBLAST are available to authors, editors, reviewers, ethicists and sociologists to study, intercept, annotate and deter questionable publication practices. These tools are part of a sustained effort to enhance the quality of Medline as 'the' biomedical corpus. The Déjà vu database is freely accessible at http://spore.swmed.edu/dejavu. The tool eTBLAST is also freely available at http://etblast.org.

  11. The use of coded PCR primers enables high-throughput sequencing of multiple homolog amplification products by 454 parallel sequencing.

    Directory of Open Access Journals (Sweden)

    Jonas Binladen

    2007-02-01

    Full Text Available The invention of the Genome Sequence 20 DNA Sequencing System (454 parallel sequencing platform has enabled the rapid and high-volume production of sequence data. Until now, however, individual emulsion PCR (emPCR reactions and subsequent sequencing runs have been unable to combine template DNA from multiple individuals, as homologous sequences cannot be subsequently assigned to their original sources.We use conventional PCR with 5'-nucleotide tagged primers to generate homologous DNA amplification products from multiple specimens, followed by sequencing through the high-throughput Genome Sequence 20 DNA Sequencing System (GS20, Roche/454 Life Sciences. Each DNA sequence is subsequently traced back to its individual source through 5'tag-analysis.We demonstrate that this new approach enables the assignment of virtually all the generated DNA sequences to the correct source once sequencing anomalies are accounted for (miss-assignment rate<0.4%. Therefore, the method enables accurate sequencing and assignment of homologous DNA sequences from multiple sources in single high-throughput GS20 run. We observe a bias in the distribution of the differently tagged primers that is dependent on the 5' nucleotide of the tag. In particular, primers 5' labelled with a cytosine are heavily overrepresented among the final sequences, while those 5' labelled with a thymine are strongly underrepresented. A weaker bias also exists with regards to the distribution of the sequences as sorted by the second nucleotide of the dinucleotide tags. As the results are based on a single GS20 run, the general applicability of the approach requires confirmation. However, our experiments demonstrate that 5'primer tagging is a useful method in which the sequencing power of the GS20 can be applied to PCR-based assays of multiple homologous PCR products. The new approach will be of value to a broad range of research areas, such as those of comparative genomics, complete mitochondrial

  12. Investigation on the effect of nonlinear processes on similarity law in high-pressure argon discharges

    Science.gov (United States)

    Fu, Yangyang; Parsey, Guy M.; Verboncoeur, John P.; Christlieb, Andrew J.

    2017-11-01

    In this paper, the effect of nonlinear processes (such as three-body collisions and stepwise ionizations) on the similarity law in high-pressure argon discharges has been studied by the use of the Kinetic Global Model framework. In the discharge model, the ground state argon atoms (Ar), electrons (e), atom ions (Ar+), molecular ions (Ar2+), and fourteen argon excited levels Ar*(4s and 4p) are considered. The steady-state electron and ion densities are obtained with nonlinear processes included and excluded in the designed models, respectively. It is found that in similar gas gaps, keeping the product of gas pressure and linear dimension unchanged, with the nonlinear processes included, the normalized density relations deviate from the similarity relations gradually as the scale-up factor decreases. Without the nonlinear processes, the parameter relations are in good agreement with the similarity law predictions. Furthermore, the pressure and the dimension effects are also investigated separately with and without the nonlinear processes. It is shown that the gas pressure effect on the results is less obvious than the dimension effect. Without the nonlinear processes, the pressure and the dimension effects could be estimated from one to the other based on the similarity relations.

  13. Towards novel organic high-Tc superconductors: Data mining using density of states similarity search

    Science.gov (United States)

    Geilhufe, R. Matthias; Borysov, Stanislav S.; Kalpakchi, Dmytro; Balatsky, Alexander V.

    2018-02-01

    Identifying novel functional materials with desired key properties is an important part of bridging the gap between fundamental research and technological advancement. In this context, high-throughput calculations combined with data-mining techniques highly accelerated this process in different areas of research during the past years. The strength of a data-driven approach for materials prediction lies in narrowing down the search space of thousands of materials to a subset of prospective candidates. Recently, the open-access organic materials database OMDB was released providing electronic structure data for thousands of previously synthesized three-dimensional organic crystals. Based on the OMDB, we report about the implementation of a novel density of states similarity search tool which is capable of retrieving materials with similar density of states to a reference material. The tool is based on the approximate nearest neighbor algorithm as implemented in the ANNOY library and can be applied via the OMDB web interface. The approach presented here is wide ranging and can be applied to various problems where the density of states is responsible for certain key properties of a material. As the first application, we report about materials exhibiting electronic structure similarities to the aromatic hydrocarbon p-terphenyl which was recently discussed as a potential organic high-temperature superconductor exhibiting a transition temperature in the order of 120 K under strong potassium doping. Although the mechanism driving the remarkable transition temperature remains under debate, we argue that the density of states, reflecting the electronic structure of a material, might serve as a crucial ingredient for the observed high Tc. To provide candidates which might exhibit comparable properties, we present 15 purely organic materials with similar features to p-terphenyl within the electronic structure, which also tend to have structural similarities with p

  14. Comparative Genomics in Switchgrass Using 61,585 High-Quality Expressed Sequence Tags

    Directory of Open Access Journals (Sweden)

    Christian M. Tobias

    2008-11-01

    Full Text Available The development of genomic resources for switchgrass ( L., a perennial NAD-malic enzyme type C grass, is required to enable molecular breeding and biotechnological approaches for improving its value as a forage and bioenergy crop. Expressed sequence tag (EST sequencing is one method that can quickly sample gene inventories and produce data suitable for marker development or analysis of tissue-specific patterns of expression. Toward this goal, three cDNA libraries from callus, crown, and seedling tissues of ‘Kanlow’ switchgrass were end-sequenced to generate a total of 61,585 high-quality ESTs from 36,565 separate clones. Seventy-three percent of the assembled consensus sequences could be aligned with the sorghum [ (L. Moench] genome at a -value of <1 × 10, indicating a high degree of similarity. Sixty-five percent of the ESTs matched with gene ontology molecular terms, and 3.3% of the sequences were matched with genes that play potential roles in cell-wall biogenesis. The representation in the three libraries of gene families known to be associated with C photosynthesis, cellulose and β-glucan synthesis, phenylpropanoid biosynthesis, and peroxidase activity indicated likely roles for individual family members. Pairwise comparisons of synonymous codon substitutions were used to assess genome sequence diversity and indicated an overall similarity between the two genome copies present in the tetraploid. Identification of EST–simple sequence repeat markers and amplification on two individual parents of a mapping population yielded an average of 2.18 amplicons per individual, and 35% of the markers produced fragment length polymorphisms.

  15. Using high-throughput barcode sequencing to efficiently map connectomes.

    Science.gov (United States)

    Peikon, Ian D; Kebschull, Justus M; Vagin, Vasily V; Ravens, Diana I; Sun, Yu-Chi; Brouzes, Eric; Corrêa, Ivan R; Bressan, Dario; Zador, Anthony M

    2017-07-07

    The function of a neural circuit is determined by the details of its synaptic connections. At present, the only available method for determining a neural wiring diagram with single synapse precision-a 'connectome'-is based on imaging methods that are slow, labor-intensive and expensive. Here, we present SYNseq, a method for converting the connectome into a form that can exploit the speed and low cost of modern high-throughput DNA sequencing. In SYNseq, each neuron is labeled with a unique random nucleotide sequence-an RNA 'barcode'-which is targeted to the synapse using engineered proteins. Barcodes in pre- and postsynaptic neurons are then associated through protein-protein crosslinking across the synapse, extracted from the tissue, and joined into a form suitable for sequencing. Although our failure to develop an efficient barcode joining scheme precludes the widespread application of this approach, we expect that with further development SYNseq will enable tracing of complex circuits at high speed and low cost. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  16. High Performance Systolic Array Core Architecture Design for DNA Sequencer

    Directory of Open Access Journals (Sweden)

    Saiful Nurdin Dayana

    2018-01-01

    Full Text Available This paper presents a high performance systolic array (SA core architecture design for Deoxyribonucleic Acid (DNA sequencer. The core implements the affine gap penalty score Smith-Waterman (SW algorithm. This time-consuming local alignment algorithm guarantees optimal alignment between DNA sequences, but it requires quadratic computation time when performed on standard desktop computers. The use of linear SA decreases the time complexity from quadratic to linear. In addition, with the exponential growth of DNA databases, the SA architecture is used to overcome the timing issue. In this work, the SW algorithm has been captured using Verilog Hardware Description Language (HDL and simulated using Xilinx ISIM simulator. The proposed design has been implemented in Xilinx Virtex -6 Field Programmable Gate Array (FPGA and improved in the core area by 90% reduction.

  17. Similarity-dissimilarity plot for visualization of high dimensional data in biomedical pattern classification.

    Science.gov (United States)

    Arif, Muhammad

    2012-06-01

    In pattern classification problems, feature extraction is an important step. Quality of features in discriminating different classes plays an important role in pattern classification problems. In real life, pattern classification may require high dimensional feature space and it is impossible to visualize the feature space if the dimension of feature space is greater than four. In this paper, we have proposed a Similarity-Dissimilarity plot which can project high dimensional space to a two dimensional space while retaining important characteristics required to assess the discrimination quality of the features. Similarity-dissimilarity plot can reveal information about the amount of overlap of features of different classes. Separable data points of different classes will also be visible on the plot which can be classified correctly using appropriate classifier. Hence, approximate classification accuracy can be predicted. Moreover, it is possible to know about whom class the misclassified data points will be confused by the classifier. Outlier data points can also be located on the similarity-dissimilarity plot. Various examples of synthetic data are used to highlight important characteristics of the proposed plot. Some real life examples from biomedical data are also used for the analysis. The proposed plot is independent of number of dimensions of the feature space.

  18. Method of synthesis of abstract images with high self-similarity

    Science.gov (United States)

    Matveev, Nikolay V.; Shcheglov, Sergey A.; Romanova, Galina E.; Koneva, Ð.¢atiana A.

    2017-06-01

    Abstract images with high self-similarity could be used for drug-free stress therapy. This based on the fact that a complex visual environment has a high affective appraisal. To create such an image we can use the setup based on the three laser sources of small power and different colors (Red, Green, Blue), the image is the pattern resulting from the reflecting and refracting by the complicated form object placed into the laser ray paths. The images were obtained experimentally which showed the good therapy effect. However, to find and to choose the object which gives needed image structure is very difficult and requires many trials. The goal of the work is to develop a method and a procedure of finding the object form which if placed into the ray paths can provide the necessary structure of the image In fact the task means obtaining the necessary irradiance distribution on the given surface. Traditionally such problems are solved using the non-imaging optics methods. In the given case this task is very complicated because of the complicated structure of the illuminance distribution and its high non-linearity. Alternative way is to use the projected image of a mask with a given structure. We consider both ways and discuss how they can help to speed up the synthesis procedure for the given abstract image of the high self-similarity for the setups of drug-free therapy.

  19. Benfotiamine is similar to thiamine in correcting endothelial cell defects induced by high glucose.

    Science.gov (United States)

    Pomero, F; Molinar Min, A; La Selva, M; Allione, A; Molinatti, G M; Porta, M

    2001-01-01

    We investigated the hypothesis that benfotiamine, a lipophilic derivative of thiamine, affects replication delay and generation of advanced glycosylation end-products (AGE) in human umbilical vein endothelial cells cultured in the presence of high glucose. Cells were grown in physiological (5.6 mM) and high (28.0 mM) concentrations of D-glucose, with and without 150 microM thiamine or benfotiamine. Cell proliferation was measured by mitochondrial dehydrogenase activity. AGE generation after 20 days was assessed fluorimetrically. Cell replication was impaired by high glucose (72.3%+/-5.1% of that in physiological glucose, p=0.001). This was corrected by the addition of either thiamine (80.6%+/-2.4%, p=0.005) or benfotiamine (87.5%+/-8.9%, p=0.006), although it not was completely normalized (p=0.001 and p=0.008, respectively) to that in physiological glucose. Increased AGE production in high glucose (159.7%+/-38.9% of fluorescence in physiological glucose, p=0.003) was reduced by thiamine (113.2%+/-16.3%, p=0.008 vs. high glucose alone) or benfotiamine (135.6%+/-49.8%, p=0.03 vs. high glucose alone) to levels similar to those observed in physiological glucose. Benfotiamine, a derivative of thiamine with better bioavailability, corrects defective replication and increased AGE generation in endothelial cells cultured in high glucose, to a similar extent as thiamine. These effects may result from normalization of accelerated glycolysis and the consequent decrease in metabolites that are extremely active in generating nonenzymatic protein glycation. The potential role of thiamine administration in the prevention or treatment of vascular complications of diabetes deserves further investigation.

  20. The Pinus taeda genome is characterized by diverse and highly diverged repetitive sequences

    Directory of Open Access Journals (Sweden)

    Yandell Mark

    2010-07-01

    Full Text Available Abstract Background In today's age of genomic discovery, no attempt has been made to comprehensively sequence a gymnosperm genome. The largest genus in the coniferous family Pinaceae is Pinus, whose 110-120 species have extremely large genomes (c. 20-40 Gb, 2N = 24. The size and complexity of these genomes have prompted much speculation as to the feasibility of completing a conifer genome sequence. Conifer genomes are reputed to be highly repetitive, but there is little information available on the nature and identity of repetitive units in gymnosperms. The pines have extensive genetic resources, with approximately 329000 ESTs from eleven species and genetic maps in eight species, including a dense genetic map of the twelve linkage groups in Pinus taeda. Results We present here the Sanger sequence and annotation of ten P. taeda BAC clones and Genome Analyzer II whole genome shotgun (WGS sequences representing 7.5% of the genome. Computational annotation of ten BACs predicts three putative protein-coding genes and at least fifteen likely pseudogenes in nearly one megabase of sequence. We found three conifer-specific LTR retroelements in the BACs, and tentatively identified at least 15 others based on evidence from the distantly related angiosperms. Alignment of WGS sequences to the BACs indicates that 80% of BAC sequences have similar copies (≥ 75% nucleotide identity elsewhere in the genome, but only 23% have identical copies (99% identity. The three most common repetitive elements in the genome were identified and, when combined, represent less than 5% of the genome. Conclusions This study indicates that the majority of repeats in the P. taeda genome are 'novel' and will therefore require additional BAC or genomic sequencing for accurate characterization. The pine genome contains a very large number of diverged and probably defunct repetitive elements. This study also provides new evidence that sequencing a pine genome using a WGS approach is

  1. SVM-Prot 2016: A Web-Server for Machine Learning Prediction of Protein Functional Families from Sequence Irrespective of Similarity.

    Science.gov (United States)

    Li, Ying Hong; Xu, Jing Yu; Tao, Lin; Li, Xiao Feng; Li, Shuang; Zeng, Xian; Chen, Shang Ying; Zhang, Peng; Qin, Chu; Zhang, Cheng; Chen, Zhe; Zhu, Feng; Chen, Yu Zong

    2016-01-01

    Knowledge of protein function is important for biological, medical and therapeutic studies, but many proteins are still unknown in function. There is a need for more improved functional prediction methods. Our SVM-Prot web-server employed a machine learning method for predicting protein functional families from protein sequences irrespective of similarity, which complemented those similarity-based and other methods in predicting diverse classes of proteins including the distantly-related proteins and homologous proteins of different functions. Since its publication in 2003, we made major improvements to SVM-Prot with (1) expanded coverage from 54 to 192 functional families, (2) more diverse protein descriptors protein representation, (3) improved predictive performances due to the use of more enriched training datasets and more variety of protein descriptors, (4) newly integrated BLAST analysis option for assessing proteins in the SVM-Prot predicted functional families that were similar in sequence to a query protein, and (5) newly added batch submission option for supporting the classification of multiple proteins. Moreover, 2 more machine learning approaches, K nearest neighbor and probabilistic neural networks, were added for facilitating collective assessment of protein functions by multiple methods. SVM-Prot can be accessed at http://bidd2.nus.edu.sg/cgi-bin/svmprot/svmprot.cgi.

  2. High Throughput Sequencing for Detection of Foodborne Pathogens

    Directory of Open Access Journals (Sweden)

    Camilla Sekse

    2017-10-01

    Full Text Available High-throughput sequencing (HTS is becoming the state-of-the-art technology for typing of microbial isolates, especially in clinical samples. Yet, its application is still in its infancy for monitoring and outbreak investigations of foods. Here we review the published literature, covering not only bacterial but also viral and Eukaryote food pathogens, to assess the status and potential of HTS implementation to inform stakeholders, improve food safety and reduce outbreak impacts. The developments in sequencing technology and bioinformatics have outpaced the capacity to analyze and interpret the sequence data. The influence of sample processing, nucleic acid extraction and purification, harmonized protocols for generation and interpretation of data, and properly annotated and curated reference databases including non-pathogenic “natural” strains are other major obstacles to the realization of the full potential of HTS in analytical food surveillance, epidemiological and outbreak investigations, and in complementing preventive approaches for the control and management of foodborne pathogens. Despite significant obstacles, the achieved progress in capacity and broadening of the application range over the last decade is impressive and unprecedented, as illustrated with the chosen examples from the literature. Large consortia, often with broad international participation, are making coordinated efforts to cope with many of the mentioned obstacles. Further rapid progress can therefore be prospected for the next decade.

  3. Scaling and interaction of self-similar modes in models of high Reynolds number wall turbulence.

    Science.gov (United States)

    Sharma, A S; Moarref, R; McKeon, B J

    2017-03-13

    Previous work has established the usefulness of the resolvent operator that maps the terms nonlinear in the turbulent fluctuations to the fluctuations themselves. Further work has described the self-similarity of the resolvent arising from that of the mean velocity profile. The orthogonal modes provided by the resolvent analysis describe the wall-normal coherence of the motions and inherit that self-similarity. In this contribution, we present the implications of this similarity for the nonlinear interaction between modes with different scales and wall-normal locations. By considering the nonlinear interactions between modes, it is shown that much of the turbulence scaling behaviour in the logarithmic region can be determined from a single arbitrarily chosen reference plane. Thus, the geometric scaling of the modes is impressed upon the nonlinear interaction between modes. Implications of these observations on the self-sustaining mechanisms of wall turbulence, modelling and simulation are outlined.This article is part of the themed issue 'Toward the development of high-fidelity models of wall turbulence at large Reynolds number'. © 2017 The Author(s).

  4. Management of high blood pressure in children: similarities and differences between US and European guidelines.

    Science.gov (United States)

    Brady, Tammy M; Stefani-Glücksberg, Amalia; Simonetti, Giacomo D

    2018-03-28

    Over the last several decades, many seminal longitudinal cohort studies have clearly shown that the antecedents to adult disease have their origins in childhood. Hypertension (HTN), which has become increasingly prevalent in childhood, represents one of the most important risk factors for cardiovascular diseases (CVD) such as heart disease and stroke. With the risk of adult HTN much greater when HTN is manifest in childhood, the future burden of CVD worldwide is therefore concerning. In an effort to slow the current trajectory, professional societies have called for more rigorous, evidence-based guideline development to aid primary care providers and subspecialists in improving recognition, diagnosis, evaluation, and treatment of pediatric HTN. In 2016 the European Society of Hypertension and in 2017 the American Academy of Pediatrics published updated guidelines for prevention and management of high blood pressure (BP) in children. While there are many similarities between the two guidelines, important differences exist. These differences, along with the identified knowledge gaps in each, will hopefully spur clinical researchers to action. This review highlights some of these similarities and differences, focusing on several of the more important facets regarding prevalence, prevention, diagnosis, management, and treatment of childhood HTN.

  5. Masturbation Experiences of Swedish Senior High School Students: Gender Differences and Similarities.

    Science.gov (United States)

    Driemeyer, Wiebke; Janssen, Erick; Wiltfang, Jens; Elmerstig, Eva

    Research about masturbation tends to be limited to the assessment of masturbation incidence and frequency. Consequently, little is known about what people experience connected to masturbation. This might be one reason why theoretical approaches that specifically address the persistent gender gap in masturbation frequency are lacking. The aim of the current study was to explore several aspects of masturbation in young men and women, and to examine possible associations with their social backgrounds and sexual histories. Data from 1,566 women and 1,452 men (ages 18 to 22) from 52 Swedish senior high schools were analyzed. Comparisons between men and women were made regarding incidence of and age at first masturbation, the use of objects (e.g., sex toys), fantasies, and sexual functioning during masturbation, as well as about their attitudes toward masturbation and sexual fantasies. Cluster analysis was carried out to identify similarities between and differences within the gender groups. While overall more men than women reported experience with several of the investigated aspects, cluster analyses revealed that a large proportion of men and women reported similar experiences and that fewer experiences are not necessarily associated with negative attitudes toward masturbation. Implications of these findings are discussed in consideration of particular social backgrounds.

  6. Get your high-quality low-cost genome sequence

    NARCIS (Netherlands)

    Faino, L.; Thomma, B.P.H.J.

    2014-01-01

    The study of whole-genome sequences has become essential for almost all branches of biological research. Next-generation sequencing (NGS) has revolutionized the scalability, speed, and resolution of sequencing and brought genomic science within reach of academic laboratories that study non-model

  7. High dimensional and high resolution pulse sequences for backbone resonance assignment of intrinsically disordered proteins

    Energy Technology Data Exchange (ETDEWEB)

    Zawadzka-Kazimierczuk, Anna; Kozminski, Wiktor, E-mail: kozmin@chem.uw.edu.pl [University of Warsaw, Faculty of Chemistry (Poland); Sanderova, Hana; Krasny, Libor [Institute of Microbiology, Academy of Sciences of the Czech Republic, Laboratory of Molecular Genetics of Bacteria, Department of Bacteriology (Czech Republic)

    2012-04-15

    Four novel 5D (HACA(N)CONH, HNCOCACB, (HACA)CON(CA)CONH, (H)NCO(NCA)CONH), and one 6D ((H)NCO(N)CACONH) NMR pulse sequences are proposed. The new experiments employ non-uniform sampling that enables achieving high resolution in indirectly detected dimensions. The experiments facilitate resonance assignment of intrinsically disordered proteins. The novel pulse sequences were successfully tested using {delta} subunit (20 kDa) of Bacillus subtilis RNA polymerase that has an 81-amino acid disordered part containing various repetitive sequences.

  8. Uncovering highly obfuscated plagiarism cases using fuzzy semantic-based similarity model

    Directory of Open Access Journals (Sweden)

    Salha M. Alzahrani

    2015-07-01

    Full Text Available Highly obfuscated plagiarism cases contain unseen and obfuscated texts, which pose difficulties when using existing plagiarism detection methods. A fuzzy semantic-based similarity model for uncovering obfuscated plagiarism is presented and compared with five state-of-the-art baselines. Semantic relatedness between words is studied based on the part-of-speech (POS tags and WordNet-based similarity measures. Fuzzy-based rules are introduced to assess the semantic distance between source and suspicious texts of short lengths, which implement the semantic relatedness between words as a membership function to a fuzzy set. In order to minimize the number of false positives and false negatives, a learning method that combines a permission threshold and a variation threshold is used to decide true plagiarism cases. The proposed model and the baselines are evaluated on 99,033 ground-truth annotated cases extracted from different datasets, including 11,621 (11.7% handmade paraphrases, 54,815 (55.4% artificial plagiarism cases, and 32,578 (32.9% plagiarism-free cases. We conduct extensive experimental verifications, including the study of the effects of different segmentations schemes and parameter settings. Results are assessed using precision, recall, F-measure and granularity on stratified 10-fold cross-validation data. The statistical analysis using paired t-tests shows that the proposed approach is statistically significant in comparison with the baselines, which demonstrates the competence of fuzzy semantic-based model to detect plagiarism cases beyond the literal plagiarism. Additionally, the analysis of variance (ANOVA statistical test shows the effectiveness of different segmentation schemes used with the proposed approach.

  9. Comparative analyses of six solanaceous transcriptomes reveal a high degree of sequence conservation and species-specific transcripts

    Directory of Open Access Journals (Sweden)

    Ouyang Shu

    2005-09-01

    Full Text Available Abstract Background The Solanaceae is a family of closely related species with diverse phenotypes that have been exploited for agronomic purposes. Previous studies involving a small number of genes suggested sequence conservation across the Solanaceae. The availability of large collections of Expressed Sequence Tags (ESTs for the Solanaceae now provides the opportunity to assess sequence conservation and divergence on a genomic scale. Results All available ESTs and Expressed Transcripts (ETs, 449,224 sequences for six Solanaceae species (potato, tomato, pepper, petunia, tobacco and Nicotiana benthamiana, were clustered and assembled into gene indices. Examination of gene ontologies revealed that the transcripts within the gene indices encode a similar suite of biological processes. Although the ESTs and ETs were derived from a variety of tissues, 55–81% of the sequences had significant similarity at the nucleotide level with sequences among the six species. Putative orthologs could be identified for 28–58% of the sequences. This high degree of sequence conservation was supported by expression profiling using heterologous hybridizations to potato cDNA arrays that showed similar expression patterns in mature leaves for all six solanaceous species. 16–19% of the transcripts within the six Solanaceae gene indices did not have matches among Solanaceae, Arabidopsis, rice or 21 other plant gene indices. Conclusion Results from this genome scale analysis confirmed a high level of sequence conservation at the nucleotide level of the coding sequence among Solanaceae. Additionally, the results indicated that part of the Solanaceae transcriptome is likely to be unique for each species.

  10. Molecular phylogeny and species separation of five morphologically similar Holosticha-complex ciliates (Protozoa, Ciliophora) using ARDRA riboprinting and multigene sequence data

    Science.gov (United States)

    Gao, Feng; Yi, Zhenzhen; Gong, Jun; Al-Rasheid Khaled, A. S.; Song, Weibo

    2010-05-01

    To separate and redefine the ambiguous Holosticha-complex, a confusing group of hypotrichous ciliates, six strains belonging to five morphospecies of three genera, Holosticha heterofoissneri, Anteholosticha sp. pop1, Anteholosticha sp. pop2, A. manca, A. gracilis and Nothoholosticha fasciola, were analyzed using 12 restriction enzymes on the basis of amplified ribosomal DNA restriction analysis. Nine of the 12 enzymes could digest the DNA products, four ( Hinf I, Hind III, Msp I, Taq I) yielded species-specific restriction patterns, and Hind III and Taq I produced different patterns for two Anteholosticha sp. populations. Distinctly different restriction digestion haplotypes and similarity indices can be used to separate the species. The secondary structures of the five species were predicted based on the ITS2 transcripts and there were several minor differences among species, while two Anteholosticha sp. populations were identical. In addition, phylogenies based on the SSrRNA gene sequences were reconstructed using multiple algorithms, which grouped them generally into four clades, and exhibited that the genus Anteholosticha should be a convergent assemblage. The fact that Holosticha species clustered with the oligotrichs and choreotrichs, though with very low support values, indicated that the topology may be very divergent and unreliable when the number of sequence data used in the analyses is too low.

  11. High frequency RNA recombination in porcine reproductive and respiratory syndrome virus occurs preferentially between parental sequences with high similarity

    DEFF Research Database (Denmark)

    van Vugt, Joke .J.F.A.; Storgaard, Torben; Oleksiewicz, Martin B.

    2001-01-01

    Two types of porcine reproductive and respiratory syndrome virus (PRRSV) exist, a North American type and a European type. The co-existence of both types in some countries, such as Denmark, Slovakia and Canada, creates a risk of inter-type recombination. To evaluate this risk, cell cultures were co......, but no recombination was detected between the European and North American types. Calculation of the maximum theoretical risk of European-American recombination, based on the sensitivity of the RT-PCR system, revealed that RNA recombination between the European and North American types of PRRSV is at least 10000 times...

  12. Similarity analysis for the high-pressure inductively coupled plasma source

    International Nuclear Information System (INIS)

    Vanden-Abeele, D; Degrez, G

    2004-01-01

    It is well known that the optimal operating parameters of an inductively coupled plasma (ICP) torch strongly depend upon its dimensions. To understand this relationship better, we derive a dimensionless form of the equations governing the behaviour of high-pressure ICPs. The requirement of similarity then naturally leads to expressions for the operating parameters as a function of the plasma radius. In addition to the well-known scaling law for frequency, surprising results appear for the dependence of the mass flow rate, dissipated power and operating pressure upon the plasma radius. While the obtained laws do not appear to be in good agreement with empirical results in the literature, their correctness is supported by detailed numerical calculations of ICP sources of varying diameters. The approximations of local thermodynamic equilibrium and negligible radiative losses restrict the validity of our results and can be responsible for the disagreement with empirical data. The derived scaling laws are useful for the design of new plasma torches and may provide explanations for the unsteadiness observed in certain existing ICP sources

  13. High doses of dextromethorphan, an NMDA antagonist, produce effects similar to classic hallucinogens

    Science.gov (United States)

    Carter, Lawrence P.; Johnson, Matthew W.; Mintzer, Miriam Z.; Klinedinst, Margaret A.; Griffiths, Roland R.

    2013-01-01

    Rationale Although reports of dextromethorphan (DXM) abuse have increased recently, few studies have examined the effects of high doses of DXM. Objective This study in humans evaluated the effects of supratherapeutic doses of DXM and triazolam. Methods Single, acute, oral doses of DXM (100, 200, 300, 400, 500, 600, 700, 800 mg/70 kg), triazolam (0.25, 0.5 mg/70kg), and placebo were administered to twelve healthy volunteers with histories of hallucinogen use, under double-blind conditions, using an ascending dose run-up design. Subjective, behavioral, and physiological effects were assessed repeatedly after drug administration for 6 hours. Results Triazolam produced dose-related increases in subject-rated sedation, observer-rated sedation, and behavioral impairment. DXM produced a profile of dose-related physiological and subjective effects differing from triazolam. DXM effects included increases in blood pressure, heart rate, and emesis, increases in observer-rated effects typical of classic hallucinogens (e.g. distance from reality, visual effects with eyes open and closed, joy, anxiety), and participant ratings of stimulation (e.g. jittery, nervous), somatic effects (e.g. tingling, headache), perceptual changes, end-of-session drug liking, and mystical-type experience. After 400 mg/70kg DXM, 11 of 12 participants indicated on a pharmacological class questionnaire that they thought they had received a classic hallucinogen (e.g. psilocybin). Drug effects resolved without significant adverse effects by the end of the session. In a 1-month follow up volunteers attributed increased spirituality and positive changes in attitudes, moods, and behavior to the session experiences. Conclusions High doses of DXM produced effects distinct from triazolam and had characteristics that were similar to the classic hallucinogen psilocybin. PMID:22526529

  14. Total and high molecular weight adiponectin have similar utility for the identification of insulin resistance

    Directory of Open Access Journals (Sweden)

    Aguilar-Salinas Carlos A

    2010-06-01

    Full Text Available Abstract Background Insulin resistance (IR and related metabolic disturbances are characterized by low levels of adiponectin. High molecular weight adiponectin (HMWA is considered the active form of adiponectin and a better marker of IR than total adiponectin. The objective of this study is to compare the utility of total adiponectin, HMWA and the HMWA/total adiponectin index (SA index for the identification of IR and related metabolic conditions. Methods A cross-sectional analysis was performed in a group of ambulatory subjects, aged 20 to 70 years, in Mexico City. Areas under the receiver operator characteristic (ROC curve for total, HMWA and the SA index were plotted for the identification of metabolic disturbances. Sensitivity and specificity, positive and negative predictive values, and accuracy for the identification of IR were calculated. Results The study included 101 men and 168 women. The areas under the ROC curve for total and HMWA for the identification of IR (0.664 vs. 0.669, P = 0.74, obesity (0.592 vs. 0.610, P = 0.32, hypertriglyceridemia (0.661 vs. 0.671, P = 0.50 and hypoalphalipoproteinemia (0.624 vs. 0.633, P = 0.58 were similar. A total adiponectin level of 8.03 μg/ml was associated with a sensitivity of 57.6%, a specificity of 65.9%, a positive predictive value of 50.0%, a negative predictive value of 72.4%, and an accuracy of 62.7% for the diagnosis of IR. The corresponding figures for a HMWA value of 4.25 μg/dl were 59.6%, 67.1%, 51.8%, 73.7% and 64.2%. The area under the ROC curve of the SA index for the identification of IR was 0.622 [95% CI 0.554-0.691], obesity 0.613 [95% CI 0.536-0.689], hypertriglyceridemia 0.616 [95% CI 0.549-0.683], and hypoalphalipoproteinemia 0.606 [95% CI 0.535-0.677]. Conclusions Total adiponectin, HMWA and the SA index had similar utility for the identification of IR and metabolic disturbances.

  15. Experimental design-based functional mining and characterization of high-throughput sequencing data in the sequence read archive.

    Directory of Open Access Journals (Sweden)

    Takeru Nakazato

    Full Text Available High-throughput sequencing technology, also called next-generation sequencing (NGS, has the potential to revolutionize the whole process of genome sequencing, transcriptomics, and epigenetics. Sequencing data is captured in a public primary data archive, the Sequence Read Archive (SRA. As of January 2013, data from more than 14,000 projects have been submitted to SRA, which is double that of the previous year. Researchers can download raw sequence data from SRA website to perform further analyses and to compare with their own data. However, it is extremely difficult to search entries and download raw sequences of interests with SRA because the data structure is complicated, and experimental conditions along with raw sequences are partly described in natural language. Additionally, some sequences are of inconsistent quality because anyone can submit sequencing data to SRA with no quality check. Therefore, as a criterion of data quality, we focused on SRA entries that were cited in journal articles. We extracted SRA IDs and PubMed IDs (PMIDs from SRA and full-text versions of journal articles and retrieved 2748 SRA ID-PMID pairs. We constructed a publication list referring to SRA entries. Since, one of the main themes of -omics analyses is clarification of disease mechanisms, we also characterized SRA entries by disease keywords, according to the Medical Subject Headings (MeSH extracted from articles assigned to each SRA entry. We obtained 989 SRA ID-MeSH disease term pairs, and constructed a disease list referring to SRA data. We previously developed feature profiles of diseases in a system called "Gendoo". We generated hyperlinks between diseases extracted from SRA and the feature profiles of it. The developed project, publication and disease lists resulting from this study are available at our web service, called "DBCLS SRA" (http://sra.dbcls.jp/. This service will improve accessibility to high-quality data from SRA.

  16. Gender Similarities in Math Performance from Middle School through High School

    Science.gov (United States)

    Scafidi, Tony; Bui, Khanh

    2010-01-01

    Using data from 10 states, Hyde, Lindberg, Linn, Ellis, and Williams (2008) found gender similarities in performance on standardized math tests. The present study attempted to replicate this finding with national data and to extend it by examining whether gender similarities in math performance are moderated by race, socioeconomic status, or math…

  17. BOOGIE: Predicting Blood Groups from High Throughput Sequencing Data.

    Science.gov (United States)

    Giollo, Manuel; Minervini, Giovanni; Scalzotto, Marta; Leonardi, Emanuela; Ferrari, Carlo; Tosatto, Silvio C E

    2015-01-01

    Over the last decade, we have witnessed an incredible growth in the amount of available genotype data due to high throughput sequencing (HTS) techniques. This information may be used to predict phenotypes of medical relevance, and pave the way towards personalized medicine. Blood phenotypes (e.g. ABO and Rh) are a purely genetic trait that has been extensively studied for decades, with currently over thirty known blood groups. Given the public availability of blood group data, it is of interest to predict these phenotypes from HTS data which may translate into more accurate blood typing in clinical practice. Here we propose BOOGIE, a fast predictor for the inference of blood groups from single nucleotide variant (SNV) databases. We focus on the prediction of thirty blood groups ranging from the well known ABO and Rh, to the less studied Junior or Diego. BOOGIE correctly predicted the blood group with 94% accuracy for the Personal Genome Project whole genome profiles where good quality SNV annotation was available. Additionally, our tool produces a high quality haplotype phase, which is of interest in the context of ethnicity-specific polymorphisms or traits. The versatility and simplicity of the analysis make it easily interpretable and allow easy extension of the protocol towards other phenotypes. BOOGIE can be downloaded from URL http://protein.bio.unipd.it/download/.

  18. Evolution of sequence-defined highly functionalized nucleic acid polymers

    Science.gov (United States)

    Chen, Zhen; Lichtor, Phillip A.; Berliner, Adrian P.; Chen, Jonathan C.; Liu, David R.

    2018-03-01

    The evolution of sequence-defined synthetic polymers made of building blocks beyond those compatible with polymerase enzymes or the ribosome has the potential to generate new classes of receptors, catalysts and materials. Here we describe a ligase-mediated DNA-templated polymerization and in vitro selection system to evolve highly functionalized nucleic acid polymers (HFNAPs) made from 32 building blocks that contain eight chemically diverse side chains on a DNA backbone. Through iterated cycles of polymer translation, selection and reverse translation, we discovered HFNAPs that bind proprotein convertase subtilisin/kexin type 9 (PCSK9) and interleukin-6, two protein targets implicated in human diseases. Mutation and reselection of an active PCSK9-binding polymer yielded evolved polymers with high affinity (KD = 3 nM). This evolved polymer potently inhibited the binding between PCSK9 and the low-density lipoprotein receptor. Structure-activity relationship studies revealed that specific side chains at defined positions in the polymers are required for binding to their respective targets. Our findings expand the chemical space of evolvable polymers to include densely functionalized nucleic acids with diverse, researcher-defined chemical repertoires.

  19. Similar health benefits of endurance and high-intensity interval training in obese children.

    Directory of Open Access Journals (Sweden)

    Ana Carolina Corte de Araujo

    Full Text Available PURPOSE: To compare two modalities of exercise training (i.e., Endurance Training [ET] and High-Intensity Interval Training [HIT] on health-related parameters in obese children aged between 8 and 12 years. METHODS: Thirty obese children were randomly allocated into either the ET or HIT group. The ET group performed a 30 to 60-minute continuous exercise at 80% of the peak heart rate (HR. The HIT group training performed 3 to 6 sets of 60-s sprint at 100% of the peak velocity interspersed by a 3-min active recovery period at 50% of the exercise velocity. HIT sessions last ~70% less than ET sessions. At baseline and after 12 weeks of intervention, aerobic fitness, body composition and metabolic parameters were assessed. RESULTS: BOTH THE ABSOLUTE (ET: 26.0%; HIT: 19.0% and the relative VO(2 peak (ET: 13.1%; HIT: 14.6% were significantly increased in both groups after the intervention. Additionally, the total time of exercise (ET: 19.5%; HIT: 16.4% and the peak velocity during the maximal graded cardiorespiratory test (ET: 16.9%; HIT: 13.4% were significantly improved across interventions. Insulinemia (ET: 29.4%; HIT: 30.5% and HOMA-index (ET: 42.8%; HIT: 37.0% were significantly lower for both groups at POST when compared to PRE. Body mass was significantly reduced in the HIT (2.6%, but not in the ET group (1.2%. A significant reduction in BMI was observed for both groups after the intervention (ET: 3.0%; HIT: 5.0%. The responsiveness analysis revealed a very similar pattern of the most responsive variables among groups. CONCLUSION: HIT and ET were equally effective in improving important health related parameters in obese youth.

  20. Protein profiling reveals inter-individual protein homogeneity of arachnoid cyst fluid and high qualitative similarity to cerebrospinal fluid

    Directory of Open Access Journals (Sweden)

    Berle Magnus

    2011-05-01

    the majority of abundant proteins in AC fluid also can be found in CSF. Compared to plasma, as many as 104 proteins in AC were not found in the list of 3017 plasma proteins. Conclusions Based on the protein content of AC fluid, our data indicate that temporal AC is a homogenous condition, pointing towards a similar AC filling mechanism for the 14 patients examined. Most of the proteins identified in AC fluid have been identified in CSF, indicating high similarity in the qualitative protein content of AC to CSF, whereas this was not the case between AC and plasma. This indicates that AC is filled with a liquid similar to CSF. As far as we know, this is the first proteomics study that explores the AC fluid proteome.

  1. Evaluation of a pooled strategy for high-throughput sequencing of cosmid clones from metagenomic libraries.

    Science.gov (United States)

    Lam, Kathy N; Hall, Michael W; Engel, Katja; Vey, Gregory; Cheng, Jiujun; Neufeld, Josh D; Charles, Trevor C

    2014-01-01

    High-throughput sequencing methods have been instrumental in the growing field of metagenomics, with technological improvements enabling greater throughput at decreased costs. Nonetheless, the economy of high-throughput sequencing cannot be fully leveraged in the subdiscipline of functional metagenomics. In this area of research, environmental DNA is typically cloned to generate large-insert libraries from which individual clones are isolated, based on specific activities of interest. Sequence data are required for complete characterization of such clones, but the sequencing of a large set of clones requires individual barcode-based sample preparation; this can become costly, as the cost of clone barcoding scales linearly with the number of clones processed, and thus sequencing a large number of metagenomic clones often remains cost-prohibitive. We investigated a hybrid Sanger/Illumina pooled sequencing strategy that omits barcoding altogether, and we evaluated this strategy by comparing the pooled sequencing results to reference sequence data obtained from traditional barcode-based sequencing of the same set of clones. Using identity and coverage metrics in our evaluation, we show that pooled sequencing can generate high-quality sequence data, without producing problematic chimeras. Though caveats of a pooled strategy exist and further optimization of the method is required to improve recovery of complete clone sequences and to avoid circumstances that generate unrecoverable clone sequences, our results demonstrate that pooled sequencing represents an effective and low-cost alternative for sequencing large sets of metagenomic clones.

  2. High-Throughput DNA sequencing of ancient wood.

    Science.gov (United States)

    Wagner, Stefanie; Lagane, Frédéric; Seguin-Orlando, Andaine; Schubert, Mikkel; Leroy, Thibault; Guichoux, Erwan; Chancerel, Emilie; Bech-Hebelstrup, Inger; Bernard, Vincent; Billard, Cyrille; Billaud, Yves; Bolliger, Matthias; Croutsch, Christophe; Čufar, Katarina; Eynaud, Frédérique; Heussner, Karl Uwe; Köninger, Joachim; Langenegger, Fabien; Leroy, Frédéric; Lima, Christine; Martinelli, Nicoletta; Momber, Garry; Billamboz, André; Nelle, Oliver; Palomo, Antoni; Piqué, Raquel; Ramstein, Marianne; Schweichel, Roswitha; Stäuble, Harald; Tegel, Willy; Terradas, Xavier; Verdin, Florence; Plomion, Christophe; Kremer, Antoine; Orlando, Ludovic

    2018-03-01

    Reconstructing the colonization and demographic dynamics that gave rise to extant forests is essential to forecasts of forest responses to environmental changes. Classical approaches to map how population of trees changed through space and time largely rely on pollen distribution patterns, with only a limited number of studies exploiting DNA molecules preserved in wooden tree archaeological and subfossil remains. Here, we advance such analyses by applying high-throughput (HTS) DNA sequencing to wood archaeological and subfossil material for the first time, using a comprehensive sample of 167 European white oak waterlogged remains spanning a large temporal (from 550 to 9,800 years) and geographical range across Europe. The successful characterization of the endogenous DNA and exogenous microbial DNA of 140 (~83%) samples helped the identification of environmental conditions favouring long-term DNA preservation in wood remains, and started to unveil the first trends in the DNA decay process in wood material. Additionally, the maternally inherited chloroplast haplotypes of 21 samples from three periods of forest human-induced use (Neolithic, Bronze Age and Middle Ages) were found to be consistent with those of modern populations growing in the same geographic areas. Our work paves the way for further studies aiming at using ancient DNA preserved in wood to reconstruct the micro-evolutionary response of trees to climate change and human forest management. © 2018 John Wiley & Sons Ltd.

  3. Communicating the Benefits of a Full Sequence of High School Science Courses

    Science.gov (United States)

    Nicholas, Catherine Marie

    2014-01-01

    High school students are generally uninformed about the benefits of enrolling in a full sequence of science courses, therefore only about a third of our nation's high school graduates have completed the science sequence of Biology, Chemistry and Physics. The lack of students completing a full sequence of science courses contributes to the deficit…

  4. High-throughput genome sequencing of two Listeria monocytogenes clinical isolates during a large foodborne outbreak

    Directory of Open Access Journals (Sweden)

    Trout-Yakel Keri M

    2010-02-01

    Full Text Available Abstract Background A large, multi-province outbreak of listeriosis associated with ready-to-eat meat products contaminated with Listeria monocytogenes serotype 1/2a occurred in Canada in 2008. Subtyping of outbreak-associated isolates using pulsed-field gel electrophoresis (PFGE revealed two similar but distinct AscI PFGE patterns. High-throughput pyrosequencing of two L. monocytogenes isolates was used to rapidly provide the genome sequence of the primary outbreak strain and to investigate the extent of genetic diversity associated with a change of a single restriction enzyme fragment during PFGE. Results The chromosomes were collinear, but differences included 28 single nucleotide polymorphisms (SNPs and three indels, including a 33 kbp prophage that accounted for the observed difference in AscI PFGE patterns. The distribution of these traits was assessed within further clinical, environmental and food isolates associated with the outbreak, and this comparison indicated that three distinct, but highly related strains may have been involved in this nationwide outbreak. Notably, these two isolates were found to harbor a 50 kbp putative mobile genomic island encoding translocation and efflux functions that has not been observed in other Listeria genomes. Conclusions High-throughput genome sequencing provided a more detailed real-time assessment of genetic traits characteristic of the outbreak strains than could be achieved with routine subtyping methods. This study confirms that the latest generation of DNA sequencing technologies can be applied during high priority public health events, and laboratories need to prepare for this inevitability and assess how to properly analyze and interpret whole genome sequences in the context of molecular epidemiology.

  5. Highly accurate fluorogenic DNA sequencing with information theory-based error correction.

    Science.gov (United States)

    Chen, Zitian; Zhou, Wenxiong; Qiao, Shuo; Kang, Li; Duan, Haifeng; Xie, X Sunney; Huang, Yanyi

    2017-12-01

    Eliminating errors in next-generation DNA sequencing has proved challenging. Here we present error-correction code (ECC) sequencing, a method to greatly improve sequencing accuracy by combining fluorogenic sequencing-by-synthesis (SBS) with an information theory-based error-correction algorithm. ECC embeds redundancy in sequencing reads by creating three orthogonal degenerate sequences, generated by alternate dual-base reactions. This is similar to encoding and decoding strategies that have proved effective in detecting and correcting errors in information communication and storage. We show that, when combined with a fluorogenic SBS chemistry with raw accuracy of 98.1%, ECC sequencing provides single-end, error-free sequences up to 200 bp. ECC approaches should enable accurate identification of extremely rare genomic variations in various applications in biology and medicine.

  6. Capillary gel electrophoresis for rapid, high resolution DNA sequencing.

    OpenAIRE

    Swerdlow, H; Gesteland, R

    1990-01-01

    Capillary gel electrophoresis has been demonstrated for the separation and detection of DNA sequencing samples. Enzymatic dideoxy nucleotide chain termination was employed, using fluorescently tagged oligonucleotide primers and laser based on-column detection (limit of detection is 6,000 molecules per peak). Capillary gel separations were shown to be three times faster, with better resolution (2.4 x), and higher separation efficiency (5.4 x) than a conventional automated slab gel DNA sequenci...

  7. Supervised detection of exoplanets in high-contrast imaging sequences

    Science.gov (United States)

    Gomez Gonzalez, C. A.; Absil, O.; Van Droogenbroeck, M.

    2018-06-01

    Context. Post-processing algorithms play a key role in pushing the detection limits of high-contrast imaging (HCI) instruments. State-of-the-art image processing approaches for HCI enable the production of science-ready images relying on unsupervised learning techniques, such as low-rank approximations, for generating a model point spread function (PSF) and subtracting the residual starlight and speckle noise. Aims: In order to maximize the detection rate of HCI instruments and survey campaigns, advanced algorithms with higher sensitivities to faint companions are needed, especially for the speckle-dominated innermost region of the images. Methods: We propose a reformulation of the exoplanet detection task (for ADI sequences) that builds on well-established machine learning techniques to take HCI post-processing from an unsupervised to a supervised learning context. In this new framework, we present algorithmic solutions using two different discriminative models: SODIRF (random forests) and SODINN (neural networks). We test these algorithms on real ADI datasets from VLT/NACO and VLT/SPHERE HCI instruments. We then assess their performances by injecting fake companions and using receiver operating characteristic analysis. This is done in comparison with state-of-the-art ADI algorithms, such as ADI principal component analysis (ADI-PCA). Results: This study shows the improved sensitivity versus specificity trade-off of the proposed supervised detection approach. At the diffraction limit, SODINN improves the true positive rate by a factor ranging from 2 to 10 (depending on the dataset and angular separation) with respect to ADI-PCA when working at the same false-positive level. Conclusions: The proposed supervised detection framework outperforms state-of-the-art techniques in the task of discriminating planet signal from speckles. In addition, it offers the possibility of re-processing existing HCI databases to maximize their scientific return and potentially improve

  8. Ulysses transposable element of Drosophila shows high structural similarities to functional domains of retroviruses.

    Science.gov (United States)

    Evgen'ev, M B; Corces, V G; Lankenau, D H

    1992-06-05

    We have determined the DNA structure of the Ulysses transposable element of Drosophila virilis and found that this transposon is 10,653 bp and is flanked by two unusually large direct repeats 2136 bp long. Ulysses shows the characteristic organization of LTR-containing retrotransposons, with matrix and capsid protein domains encoded in the first open reading frame. In addition, Ulysses contains protease, reverse transcriptase, RNase H and integrase domains encoded in the second open reading frame. Ulysses lacks a third open reading frame present in some retrotransposons that could encode an env-like protein. A dendrogram analysis based on multiple alignments of the protease, reverse transcriptase, RNase H, integrase and tRNA primer binding site of all known Drosophila LTR-containing retrotransposon sequences establishes a phylogenetic relationship of Ulysses to other retrotransposons and suggests that Ulysses belongs to a new family of this type of elements.

  9. Methodology to unmix spectrally similar minerals using high order derivative spectra

    CSIR Research Space (South Africa)

    Debba, Pravesh

    2009-07-01

    Full Text Available pure vanilla extract milk Table: Chocolate cake ingredients Debba (CSIR) Unmixing spectrally similar minerals Rhodes University 2009 8 / 40 Introduction to Unmixing Ingredients Quantity unsweetened chocolate 120 grams unsweetened cocoa powder 28... grams boiling water 240 ml flour 315 grams baking powder 2 teaspoons baking soda 1 teaspoon salt 1/4 teaspoon unsalted butter 226 grams white sugar 400 grams eggs 3 large pure vanilla extract 2 teaspoons milk 240 ml Table: Chocolate cake...

  10. Self-similarity in high Atwood number Rayleigh-Taylor experiments

    Science.gov (United States)

    Mikhaeil, Mark; Suchandra, Prasoon; Pathikonda, Gokul; Ranjan, Devesh

    2017-11-01

    Self-similarity is a critical concept in turbulent and mixing flows. In the Rayleigh-Taylor instability, theory and simulations have shown that the flow exhibits properties of self-similarity as the mixing Reynolds number exceeds 20000 and the flow enters the turbulent regime. Here, we present results from the first large Atwood number (0.7) Rayleigh-Taylor experimental campaign for mixing Reynolds number beyond 20000 in an effort to characterize the self-similar nature of the instability. Experiments are performed in a statistically steady gas tunnel facility, allowing for the evaluation of turbulence statistics. A visualization diagnostic is used to study the evolution of the mixing width as the instability grows. This allows for computation of the instability growth rate. For the first time in such a facility, stereoscopic particle image velocimetry is used to resolve three-component velocity information in a plane. Velocity means, fluctuations, and correlations are considered as well as their appropriate scaling. Probability density functions of velocity fields, energy spectra, and higher-order statistics are also presented. The energy budget of the flow is described, including the ratio of the kinetic energy to the released potential energy. This work was supported by the DOE-NNSA SSAA Grant DE-NA0002922.

  11. The Impact of Protein Structure and Sequence Similarity on the Accuracy of Machine-Learning Scoring Functions for Binding Affinity Prediction.

    Science.gov (United States)

    Li, Hongjian; Peng, Jiangjun; Leung, Yee; Leung, Kwong-Sak; Wong, Man-Hon; Lu, Gang; Ballester, Pedro J

    2018-03-14

    It has recently been claimed that the outstanding performance of machine-learning scoring functions (SFs) is exclusively due to the presence of training complexes with highly similar proteins to those in the test set. Here, we revisit this question using 24 similarity-based training sets, a widely used test set, and four SFs. Three of these SFs employ machine learning instead of the classical linear regression approach of the fourth SF (X-Score which has the best test set performance out of 16 classical SFs). We have found that random forest (RF)-based RF-Score-v3 outperforms X-Score even when 68% of the most similar proteins are removed from the training set. In addition, unlike X-Score, RF-Score-v3 is able to keep learning with an increasing training set size, becoming substantially more predictive than X-Score when the full 1105 complexes are used for training. These results show that machine-learning SFs owe a substantial part of their performance to training on complexes with dissimilar proteins to those in the test set, against what has been previously concluded using the same data. Given that a growing amount of structural and interaction data will be available from academic and industrial sources, this performance gap between machine-learning SFs and classical SFs is expected to enlarge in the future.

  12. Applications of High-Throughput Nucleotide Sequencing (PhD)

    DEFF Research Database (Denmark)

    Waage, Johannes

    equally large demands in data handling, analysis and interpretation, perhaps defining the modern challenge of the computational biologist of the post-genomic era. The first part of this thesis consists of a general introduction to the history, common terms and challenges of next generation sequencing......-sequencing, a study of the effects on alternative RNA splicing of KO of the nonsense mediated RNA decay system in Mus, using digital gene expression and a custom-built exon-exon junction mapping pipeline is presented (article I). Evolved from this work, a Bioconductor package, spliceR, for classifying alternative...

  13. Algorithms for mapping high-throughput DNA sequences

    DEFF Research Database (Denmark)

    Frellsen, Jes; Menzel, Peter; Krogh, Anders

    2014-01-01

    of data generation, new bioinformatics approaches have been developed to cope with the large amount of sequencing reads obtained in these experiments. In this chapter, we first introduce HTS technologies and their usage in molecular biology and discuss the problem of mapping sequencing reads...... to their genomic origin. We then in detail describe two approaches that offer very fast heuristics to solve the mapping problem in a feasible runtime. In particular, we describe the BLAT algorithm, and we give an introduction to the Burrows-Wheeler Transform and the mapping algorithms based on this transformation....

  14. Exploring fungal diversity in deep-sea sediments from Okinawa Trough using high-throughput Illumina sequencing

    Science.gov (United States)

    Zhang, Xiao-Yong; Wang, Guang-Hua; Xu, Xin-Ya; Nong, Xu-Hua; Wang, Jie; Amin, Muhammad; Qi, Shu-Hua

    2016-10-01

    The present study investigated the fungal diversity in four different deep-sea sediments from Okinawa Trough using high-throughput Illumina sequencing of the nuclear ribosomal internal transcribed spacer-1 (ITS1). A total of 40,297 fungal ITS1 sequences clustered into 420 operational taxonomic units (OTUs) with 97% sequence similarity and 170 taxa were recovered from these sediments. Most ITS1 sequences (78%) belonged to the phylum Ascomycota, followed by Basidiomycota (17.3%), Zygomycota (1.5%) and Chytridiomycota (0.8%), and a small proportion (2.4%) belonged to unassigned fungal phyla. Compared with previous studies on fungal diversity of sediments from deep-sea environments by culture-dependent approach and clone library analysis, the present result suggested that Illumina sequencing had been dramatically accelerating the discovery of fungal community of deep-sea sediments. Furthermore, our results revealed that Sordariomycetes was the most diverse and abundant fungal class in this study, challenging the traditional view that the diversity of Sordariomycetes phylotypes was low in the deep-sea environments. In addition, more than 12 taxa accounted for 21.5% sequences were found to be rarely reported as deep-sea fungi, suggesting the deep-sea sediments from Okinawa Trough harbored a plethora of different fungal communities compared with other deep-sea environments. To our knowledge, this study is the first exploration of the fungal diversity in deep-sea sediments from Okinawa Trough using high-throughput Illumina sequencing.

  15. Analysis of transposable elements in the genome of Asparagus officinalis from high coverage sequence data.

    Science.gov (United States)

    Li, Shu-Fen; Gao, Wu-Jun; Zhao, Xin-Peng; Dong, Tian-Yu; Deng, Chuan-Liang; Lu, Long-Dou

    2014-01-01

    Asparagus officinalis is an economically and nutritionally important vegetable crop that is widely cultivated and is used as a model dioecious species to study plant sex determination and sex chromosome evolution. To improve our understanding of its genome composition, especially with respect to transposable elements (TEs), which make up the majority of the genome, we performed Illumina HiSeq2000 sequencing of both male and female asparagus genomes followed by bioinformatics analysis. We generated 17 Gb of sequence (12×coverage) and assembled them into 163,406 scaffolds with a total cumulated length of 400 Mbp, which represent about 30% of asparagus genome. Overall, TEs masked about 53% of the A. officinalis assembly. Majority of the identified TEs belonged to LTR retrotransposons, which constitute about 28% of genomic DNA, with Ty1/copia elements being more diverse and accumulated to higher copy numbers than Ty3/gypsy. Compared with LTR retrotransposons, non-LTR retrotransposons and DNA transposons were relatively rare. In addition, comparison of the abundance of the TE groups between male and female genomes showed that the overall TE composition was highly similar, with only slight differences in the abundance of several TE groups, which is consistent with the relatively recent origin of asparagus sex chromosomes. This study greatly improves our knowledge of the repetitive sequence construction of asparagus, which facilitates the identification of TEs responsible for the early evolution of plant sex chromosomes and is helpful for further studies on this dioecious plant.

  16. Human Treponema pallidum 11q/j isolate belongs to subsp. endemicum but contains two loci with a sequence in TP0548 and TP0488 similar to subsp. pertenue and subsp. pallidum, respectively.

    Directory of Open Access Journals (Sweden)

    Lenka Mikalová

    2017-03-01

    Full Text Available Treponema pallidum subsp. endemicum (TEN is the causative agent of endemic syphilis (bejel. An unusual human TEN 11q/j isolate was obtained from a syphilis-like primary genital lesion from a patient that returned to France from Pakistan.The TEN 11q/j isolate was characterized using nested PCR followed by Sanger sequencing and/or direct Illumina sequencing. Altogether, 44 chromosomal regions were analyzed. Overall, the 11q/j isolate clustered with TEN strains Bosnia A and Iraq B as expected from previous TEN classification of the 11q/j isolate. However, the 11q/j sequence in a 505 bp-long region at the TP0488 locus was similar to Treponema pallidum subsp. pallidum (TPA strains, but not to TEN Bosnia A and Iraq B sequences, suggesting a recombination event at this locus. Similarly, the 11q/j sequence in a 613 bp-long region at the TP0548 locus was similar to Treponema pallidum subsp. pertenue (TPE strains, but not to TEN sequences.A detailed analysis of two recombinant loci found in the 11q/j clinical isolate revealed that the recombination event occurred just once, in the TP0488, with the donor sequence originating from a TPA strain. Since TEN Bosnia A and Iraq B were found to contain TPA-like sequences at the TP0548 locus, the recombination at TP0548 took place in a treponeme that was an ancestor to both TEN Bosnia A and Iraq B. The sequence of 11q/j isolate in TP0548 represents an ancestral TEN sequence that is similar to yaws-causing treponemes. In addition to the importance of the 11q/j isolate for reconstruction of the TEN phylogeny, this case emphasizes the possible role of TEN strains in development of syphilis-like lesions.

  17. Whole Genome Sequencing of Enterovirus species C Isolates by High-throughput Sequencing: Development of Generic Primers

    Directory of Open Access Journals (Sweden)

    Maël Bessaud

    2016-08-01

    Full Text Available Enteroviruses are among the most common viruses infecting humans and can cause diverse clinical syndromes ranging from minor febrile illness to severe and potentially fatal diseases. Enterovirus species C (EV-C consists of more than 20 types, among which the 3 serotypes of polioviruses, the etiological agents of poliomyelitis, are included. Biodiversity and evolution of EV-C genomes are shaped by frequent recombination events. Therefore, identification and characterization of circulating EV-C strains require the sequencing of different genomic regions.A simple method was developed to sequence quickly the entire genome of EV-C isolates. Four overlapping fragments were produced separately by RT-PCR performed with generic primers. The four amplicons were then pooled and purified prior to be sequenced by high-throughput technique.The method was assessed on a panel of EV-Cs belonging to a wide-range of types. It can be used to determine full-length genome sequences through de novo assembly of thousands of reads. It was also able to discriminate reads from closely related viruses in mixtures.By decreasing the workload compared to classical Sanger-based techniques, this method will serve as a precious tool for sequencing large panels of EV-Cs isolated in cell cultures during environmental surveillance or from patients, including vaccine-derived polioviruses.

  18. Tracking TCRβ sequence clonotype expansions during antiviral therapy using high-throughput sequencing of the hypervariable region

    Directory of Open Access Journals (Sweden)

    Mark W Robinson

    2016-04-01

    Full Text Available To maintain a persistent infection viruses such as hepatitis C virus (HCV employ a range of mechanisms that subvert protective T cell responses. The suppression of antigen-specific T cell responses by HCV hinders efforts to profile T cell responses during chronic infection and antiviral therapy. Conventional methods of detecting antigen-specific T cells utilise either antigen stimulation (e.g. ELISpot, proliferation assays, cytokine production or antigen-loaded tetramer staining. This limits the ability to profile T cell responses during chronic infection due to suppressed effector function and the requirement for prior knowledge of antigenic viral peptide sequences. Recently high-throughput sequencing (HTS technologies have been developed for the analysis of T cell repertoires. In the present study we have assessed the feasibility of HTS of the TCRβ complementarity determining region (CDR3 to track T cell expansions in an antigen-independent manner. Using sequential blood samples from HCV-infected individuals undergoing anti-viral therapy we were able to measure the population frequencies of >35,000 TCRβ sequence clonotypes in each individual over the course of 12 weeks. TRBV/TRBJ gene segment usage varied markedly between individuals but remained relatively constant within individuals across the course of therapy. Despite this stable TRBV/TRBJ gene segment usage, a number of TCRβ sequence clonotypes showed dramatic changes in read frequency. These changes could not be linked to therapy outcomes in the present study however the TCRβ CDR3 sequences with the largest fold changes did include sequences with identical TRBV/TRBJ gene segment usage and high joining region homology to previously published CDR3 sequences from HCV-specific T cells targeting the HLA-B*0801-restricted 1395HSKKKCDEL1403 and HLA-A*0101–restricted 1435ATDALMTGY1443 epitopes. The pipeline developed in this proof of concept study provides a platform for the design of

  19. Presence of Stenotrophomonas maltophilia exhibiting high genetic similarity to clinical isolates in final effluents of pig farm wastewater treatment plants.

    Science.gov (United States)

    Kim, Young-Ji; Park, Jin-Hyeong; Seo, Kun-Ho

    2018-03-01

    Although the prevalence of community-acquired Stenotrophomonas maltophilia infections is sharply increasing, the sources and likely transmission routes of this bacterium are poorly understood. We studied the significance of the presence of S. maltophilia in final effluents and receiving rivers of pig farm wastewater treatment plants (WWTPs). The loads and antibiotic resistance profiles of S. maltophilia in final effluents were assessed. Antibiotic resistance determinants and biofilm formation genes were detected by PCR, and genetic similarity to clinical isolates was investigated using multilocus sequence typing (MLST). S. maltophilia was recovered from final effluents at two of three farms and one corresponding receiving river. Tests of resistance to antibiotics recommended for S. maltophilia infection revealed that for each agent, at least one isolate was classified as resistant or intermediate, with the exception of minocycline. Furthermore, multidrug resistant S. maltophilia susceptible to antibiotics of only two categories was isolated and found to carry the sul2 gene, conferring trimethoprim/sulfamethoxazole resistance. All isolates carried spgM, encoding a major factor in biofilm formation. MLST revealed that isolates of the same sequence type (ST; ST189) were present in both effluent and receiving river samples, and phylogenetic analysis showed that all of the STs identified in this study clustered with clinical isolates. Moreover, one isolate (ST192) recovered in this investigation demonstrated 99.61% sequence identity with a clinical isolate (ST98) associated with a fatal infection in South Korea. Thus, the pathogenicity of the isolates reported here is likely similar to that of those from clinical environments, and WWTPs may play a role as a source of S. maltophilia from which this bacterium spreads to human communities. To the best of our knowledge, this represents the first report of S. maltophilia in pig farm WWTPs. Our results indicate that

  20. High similarity in physicochemical properties of chitin and chitosan from nymphs and adults of a grasshopper.

    Science.gov (United States)

    Erdogan, Sevil; Kaya, Murat

    2016-08-01

    This is the first study to explain the differences in the physicochemical properties of chitin and chitosan obtained from the nymphs and adults of Dociostaurus maroccanus using the same method. Fourier transform infrared spectroscopy, thermogravimetric analysis and x-ray diffraction analysis results demonstrated that the chitins from both the adults and nymphs were in the α-form. The chitin contents of the adults (14%) and nymphs (12%) were of the same order of magnitude. The crystalline index values of chitins from the adult and nymph grasshoppers were 71% and 74%, respectively. Thermal stabilities of the chitins and chitosans from adult and nymph grasshoppers were close to each other. Both the adult (7.2kDa) and nymph (5.6kDa) chitosans had low molar masses. Environmental scanning electron microscopy revealed that the surface morphologies of both chitins consisted of nanofibers and nanopores together, and they were very similar to each other. Consequently, it was determined that the physicochemical properties of the chitins and chitosans from adults and nymphs of D. maroccanus were not very different, so it can be hypothesized that the development of the chitin structure in the nymph has almost been completed and the nymph chitin has the same characteristics as the adult. Copyright © 2016 Elsevier B.V. All rights reserved.

  1. Earliest Memories and Recent Memories of Highly Salient Events--Are They Similar?

    Science.gov (United States)

    Peterson, Carole; Fowler, Tania; Brandeau, Katherine M.

    2015-01-01

    Four- to 11-year-old children were interviewed about 2 different sorts of memories in the same home visit: recent memories of highly salient and stressful events--namely, injuries serious enough to require hospital emergency room treatment--and their earliest memories. Injury memories were scored for amount of unique information, completeness…

  2. Preparation of highly multiplexed small RNA sequencing libraries.

    Science.gov (United States)

    Persson, Helena; Søkilde, Rolf; Pirona, Anna Chiara; Rovira, Carlos

    2017-08-01

    MicroRNAs (miRNAs) are ~22-nucleotide-long small non-coding RNAs that regulate the expression of protein-coding genes by base pairing to partially complementary target sites, preferentially located in the 3´ untranslated region (UTR) of target mRNAs. The expression and function of miRNAs have been extensively studied in human disease, as well as the possibility of using these molecules as biomarkers for prognostication and treatment guidance. To identify and validate miRNAs as biomarkers, their expression must be screened in large collections of patient samples. Here, we develop a scalable protocol for the rapid and economical preparation of a large number of small RNA sequencing libraries using dual indexing for multiplexing. Combined with the use of off-the-shelf reagents, more samples can be sequenced simultaneously on large-scale sequencing platforms at a considerably lower cost per sample. Sample preparation is simplified by pooling libraries prior to gel purification, which allows for the selection of a narrow size range while minimizing sample variation. A comparison with publicly available data from benchmarking of miRNA analysis platforms showed that this method captures absolute and differential expression as effectively as commercially available alternatives.

  3. High-Throughput Gene Expression Profiles to Define Drug Similarity and Predict Compound Activity.

    Science.gov (United States)

    De Wolf, Hans; Cougnaud, Laure; Van Hoorde, Kirsten; De Bondt, An; Wegner, Joerg K; Ceulemans, Hugo; Göhlmann, Hinrich

    2018-04-01

    By adding biological information, beyond the chemical properties and desired effect of a compound, uncharted compound areas and connections can be explored. In this study, we add transcriptional information for 31K compounds of Janssen's primary screening deck, using the HT L1000 platform and assess (a) the transcriptional connection score for generating compound similarities, (b) machine learning algorithms for generating target activity predictions, and (c) the scaffold hopping potential of the resulting hits. We demonstrate that the transcriptional connection score is best computed from the significant genes only and should be interpreted within its confidence interval for which we provide the stats. These guidelines help to reduce noise, increase reproducibility, and enable the separation of specific and promiscuous compounds. The added value of machine learning is demonstrated for the NR3C1 and HSP90 targets. Support Vector Machine models yielded balanced accuracy values ≥80% when the expression values from DDIT4 & SERPINE1 and TMEM97 & SPR were used to predict the NR3C1 and HSP90 activity, respectively. Combining both models resulted in 22 new and confirmed HSP90-independent NR3C1 inhibitors, providing two scaffolds (i.e., pyrimidine and pyrazolo-pyrimidine), which could potentially be of interest in the treatment of depression (i.e., inhibiting the glucocorticoid receptor (i.e., NR3C1), while leaving its chaperone, HSP90, unaffected). As such, the initial hit rate increased by a factor 300, as less, but more specific chemistry could be screened, based on the upfront computed activity predictions.

  4. Effects of circulating member B of the family with sequence similarity 3 on the risk of developing metabolic syndrome and its components: A 5-year prospective study.

    Science.gov (United States)

    Wang, Haoyu; Yu, Fadong; Zhang, Zhuo; Hou, Yuanyuan; Teng, Weiping; Shan, Zhongyan; Lai, Yaxin

    2017-11-27

    Member B of the family with sequence similarity 3 (FAM3B), also known as pancreatic-derived factor, is mainly synthesized and secreted by islet β-cells, and plays a role in abnormal metabolism of glucose and lipids. However, the prospective association of FAM3B with metabolic disorders remains unclear. The present study aimed to reveal the predictive relationship between pancreas-specific cytokine and metabolic syndrome (MetS). A total of 210 adults (88 men and 122 women) without MetS, aged between 40 and 65 years, were recruited and received a comprehensive health examination. Baseline serum FAM3B levels were determined by sandwich enzyme-linked immunosorbent assay. Subsequently, all participants underwent a follow-up examination after 5 years. MetS was identified in accordance with the International Diabetes Federation criteria. During follow up, 35.7% participants developed MetS. In comparison with the non-MetS group, participants with MetS had an increased serum FAM3B at baseline (21.85 ng/mL [19.38, 24.17 ng/mL] vs 28.56 ng/mL [25.32, 38.10 ng/mL], P < 0.001). Moreover, serum FAM3B was significantly associated with variations in fasting plasma insulin (r = -0.306, P < 0.001), homeostasis model assessment of β-cell function (r = -0.328, P < 0.001) and homeostasis model assessment of insulin resistance (r = -0.191, P = 0.006). Furthermore, a positive correlation between baseline FAM3B and the incidence of MetS was observed, even after multivariable adjustment (relative risk 1.23 [1.15, 1.31], P < 0.001). Furthermore, the optimal cut-off values of FAM3B was 23.98 ng/mL for predicting MetS based on the Youden Index. Elevated circulating FAM3B might be considered as a predictor of newly-onset MetS and its progression. © 2017 The Authors. Journal of Diabetes Investigation published by Asian Association for the Study of Diabetes (AASD) and John Wiley & Sons Australia, Ltd.

  5. Parallel Sequencing of Expressed Sequence Tags from Two Complementary DNA Libraries for High and Low Phosphorus Adaptation in Common Beans

    Directory of Open Access Journals (Sweden)

    Matthew W. Blair

    2011-11-01

    Full Text Available Expressed sequence tags (ESTs have proven useful for gene discovery in many crops. In this work, our objective was to construct complementary DNA (cDNA libraries from root tissues of common beans ( L. grown under low and high P hydroponic conditions and to conduct EST sequencing and comparative analyses of the libraries. Expressed sequence tag analysis of 3648 clones identified 2372 unigenes, of which 1591 were annotated as known genes while a total of 465 unigenes were not associated with any known gene. Unigenes with hits were categorized according to biological processes, molecular function, and cellular compartmentalization. Given the young tissue used to make the root libraries, genes for catalytic activity and binding were highly expressed. Comparisons with previous root EST sequencing and between the two libraries made here resulted in a set of genes to study further for differential gene expression and adaptation to low P, such as a 14 kDa praline-rich protein, a metallopeptidase, tonoplast intrinsic protein, adenosine triphosphate (ATP citrate synthase, and cell proliferation genes expressed in the low P treated plants. Given that common beans are often grown on acid soils of the tropics and subtropics that are usually low in P these genes and the two parallel libraries will be useful for selection for better uptake of this essential macronutrient. The importance of EST generation for common bean root tissues under low P and other abiotic soil stresses is also discussed.

  6. Apparent mineral retention is similar in control and hyperinsulinemic men after consumption of high amylose cornstarch.

    Science.gov (United States)

    Behall, Kay M; Howe, Juliette C; Anderson, Richard A

    2002-07-01

    The effects on apparent mineral retention after long-term consumption of a high amylose diet containing 30 g resistant starch (RS) were investigated in 10 control and 14 hyperinsulinemic men. Subjects consumed products (bread, muffins, cookies, corn flakes and cheese puffs) made with standard (70% amylopectin, 30% amylose; AP) or high amylose (70% amylose, 30% amylopectin; AM) cornstarch for two 14-wk periods in a crossover pattern. Starch products replaced usual starches in the habitual diet for 10 wk followed by 4 wk of consuming the controlled diets. During wk 12, all urine, feces and duplicate foods were collected for 7 d. Urinary chromium losses after a glucose tolerance test or 24-h collections of the hyperinsulinemic and control subjects did not differ and were not altered by diet. Except for zinc, the two subject types did not differ significantly in apparent mineral balance. Apparent retentions of calcium and magnesium were not significantly affected by diet (AM vs. AP) or type-by-diet interaction. Apparent iron retention tended to be greater after AM than AP consumption (P copper retention was greater after consuming AP than after AM (P < 0.02), whereas apparent zinc retention was greater after consuming AM than after AP (P < 0.018). Zinc also showed a significant type-by-diet interaction (P < 0.034) with control subjects retaining less zinc after consuming AP than after AM. In summary, a high amylose cornstarch diet containing 30 g RS could be consumed long term without markedly affecting, and possibly enhancing, retention of some minerals.

  7. Very high resolution single pass HLA genotyping using amplicon sequencing on the 454 next generation DNA sequencers: Comparison with Sanger sequencing.

    Science.gov (United States)

    Yamamoto, F; Höglund, B; Fernandez-Vina, M; Tyan, D; Rastrou, M; Williams, T; Moonsamy, P; Goodridge, D; Anderson, M; Erlich, H A; Holcomb, C L

    2015-12-01

    Compared to Sanger sequencing, next-generation sequencing offers advantages for high resolution HLA genotyping including increased throughput, lower cost, and reduced genotype ambiguity. Here we describe an enhancement of the Roche 454 GS GType HLA genotyping assay to provide very high resolution (VHR) typing, by the addition of 8 primer pairs to the original 14, to genotype 11 HLA loci. These additional amplicons help resolve common and well-documented alleles and exclude commonly found null alleles in genotype ambiguity strings. Simplification of workflow to reduce the initial preparation effort using early pooling of amplicons or the Fluidigm Access Array™ is also described. Performance of the VHR assay was evaluated on 28 well characterized cell lines using Conexio Assign MPS software which uses genomic, rather than cDNA, reference sequence. Concordance was 98.4%; 1.6% had no genotype assignment. Of concordant calls, 53% were unambiguous. To further assess the assay, 59 clinical samples were genotyped and results compared to unambiguous allele assignments obtained by prior sequence-based typing supplemented with SSO and/or SSP. Concordance was 98.7% with 58.2% as unambiguous calls; 1.3% could not be assigned. Our results show that the amplicon-based VHR assay is robust and can replace current Sanger methodology. Together with software enhancements, it has the potential to provide even higher resolution HLA typing. Copyright © 2015. Published by Elsevier Inc.

  8. CSReport: A New Computational Tool Designed for Automatic Analysis of Class Switch Recombination Junctions Sequenced by High-Throughput Sequencing.

    Science.gov (United States)

    Boyer, François; Boutouil, Hend; Dalloul, Iman; Dalloul, Zeinab; Cook-Moreau, Jeanne; Aldigier, Jean-Claude; Carrion, Claire; Herve, Bastien; Scaon, Erwan; Cogné, Michel; Péron, Sophie

    2017-05-15

    B cells ensure humoral immune responses due to the production of Ag-specific memory B cells and Ab-secreting plasma cells. In secondary lymphoid organs, Ag-driven B cell activation induces terminal maturation and Ig isotype class switch (class switch recombination [CSR]). CSR creates a virtually unique IgH locus in every B cell clone by intrachromosomal recombination between two switch (S) regions upstream of each C region gene. Amount and structural features of CSR junctions reveal valuable information about the CSR mechanism, and analysis of CSR junctions is useful in basic and clinical research studies of B cell functions. To provide an automated tool able to analyze large data sets of CSR junction sequences produced by high-throughput sequencing (HTS), we designed CSReport, a software program dedicated to support analysis of CSR recombination junctions sequenced with a HTS-based protocol (Ion Torrent technology). CSReport was assessed using simulated data sets of CSR junctions and then used for analysis of Sμ-Sα and Sμ-Sγ1 junctions from CH12F3 cells and primary murine B cells, respectively. CSReport identifies junction segment breakpoints on reference sequences and junction structure (blunt-ended junctions or junctions with insertions or microhomology). Besides the ability to analyze unprecedentedly large libraries of junction sequences, CSReport will provide a unified framework for CSR junction studies. Our results show that CSReport is an accurate tool for analysis of sequences from our HTS-based protocol for CSR junctions, thereby facilitating and accelerating their study. Copyright © 2017 by The American Association of Immunologists, Inc.

  9. High-Throughput Sequencing Based Methods of RNA Structure Investigation

    DEFF Research Database (Denmark)

    Kielpinski, Lukasz Jan

    In this thesis we describe the development of four related methods for RNA structure probing that utilize massive parallel sequencing. Using them, we were able to gather structural data for multiple, long molecules simultaneously. First, we have established an easy to follow experimental...... and computational protocol for detecting the reverse transcription termination sites (RTTS-Seq). This protocol was subsequently applied to hydroxyl radical footprinting of three dimensional RNA structures to give a probing signal that correlates well with the RNA backbone solvent accessibility. Moreover, we applied...

  10. Vocal neighbour-mate discrimination in female great tits despite high song similarity

    DEFF Research Database (Denmark)

    Blumenrath, Sandra H.; Dabelsteen, Torben; Pedersen, Simon Boel

    2007-01-01

    Discrimination between conspecifics is important in mediating social interactions between several individuals in a network environment. In great tits, Parus major, females readily distinguish between the songs of their mate and those of a stranger. The high degree of song sharing among neighbouring...... males, however, raises the question of whether females are also able to perceive differences between songs shared by their mate and a neighbour. The great tit is a socially monogamous, hole-nesting species with biparental care. Pair bond maintenance and coordination of the pair's reproductive efforts...... are important, and the female's ability to recognize her mate's song should therefore be adaptive. In a neighbour-mate discrimination playback experiment, we presented 13 incubating great tit females situated inside nestboxes with a song of their mate and the same song type from a neighbour. Each female...

  11. Targeted Capture and High-Throughput Sequencing Using Molecular Inversion Probes (MIPs).

    Science.gov (United States)

    Cantsilieris, Stuart; Stessman, Holly A; Shendure, Jay; Eichler, Evan E

    2017-01-01

    Molecular inversion probes (MIPs) in combination with massively parallel DNA sequencing represent a versatile, yet economical tool for targeted sequencing of genomic DNA. Several thousand genomic targets can be selectively captured using long oligonucleotides containing unique targeting arms and universal linkers. The ability to append sequencing adaptors and sample-specific barcodes allows large-scale pooling and subsequent high-throughput sequencing at relatively low cost per sample. Here, we describe a "wet bench" protocol detailing the capture and subsequent sequencing of >2000 genomic targets from 192 samples, representative of a single lane on the Illumina HiSeq 2000 platform.

  12. High levels of diversity characterize mandrill (Mandrillus sphinx) Mhc-DRB sequences.

    Science.gov (United States)

    Abbott, Kristin M; Wickings, E Jean; Knapp, Leslie A

    2006-08-01

    The major histocompatibility complex (MHC) is highly polymorphic in most primate species studied thus far. The rhesus macaque (Macaca mulatta) has been studied extensively and the Mhc-DRB region demonstrates variability similar to humans. The extent of MHC diversity is relatively unknown for other Old World monkeys (OWM), especially among genera other than Macaca. A molecular survey of the Mhc-DRB region in mandrills (Mandrillus sphinx) revealed extensive variability, suggesting that other OWMs may also possess high levels of Mhc-DRB polymorphism. In the present study, 33 Mhc-DRB loci were identified from only 13 animals. Eleven were wild-born and presumed to be unrelated and two were captive-born twins. Two to seven different sequences were identified for each individual, suggesting that some mandrills may have as many as four Mhc-DRB loci on a single haplotype. From these sequences, representatives of at least six Mhc-DRB loci or lineages were identified. As observed in other primates, some new lineages may have arisen through the process of gene conversion. These findings indicate that mandrills have Mhc-DRB diversity not unlike rhesus macaques and humans.

  13. Cortical cytasters: a highly conserved developmental trait of Bilateria with similarities to Ctenophora

    Directory of Open Access Journals (Sweden)

    Salinas-Saavedra Miguel

    2011-12-01

    Full Text Available Abstract Background Cytasters (cytoplasmic asters are centriole-based nucleation centers of microtubule polymerization that are observable in large numbers in the cortical cytoplasm of the egg and zygote of bilaterian organisms. In both protostome and deuterostome taxa, cytasters have been described to develop during oogenesis from vesicles of nuclear membrane that move to the cortical cytoplasm. They become associated with several cytoplasmic components, and participate in the reorganization of cortical cytoplasm after fertilization, patterning the antero-posterior and dorso-ventral body axes. Presentation of the hypothesis The specific resemblances in the development of cytasters in both protostome and deuterostome taxa suggest that an independent evolutionary origin is unlikely. An assessment of published data confirms that cytasters are present in several protostome and deuterostome phyla, but are absent in the non-bilaterian phyla Cnidaria and Ctenophora. We hypothesize that cytasters evolved in the lineage leading to Bilateria and were already present in the most recent common ancestor shared by protostomes and deuterostomes. Thus, cytasters would be an ancient and highly conserved trait that is homologous across the different bilaterian phyla. The alternative possibility is homoplasy, that is cytasters have evolved independently in different lineages of Bilateria. Testing the hypothesis So far, available published information shows that appropriate observations have been made in eight different bilaterian phyla. All of them present cytasters. This is consistent with the hypothesis of homology and conservation. However, there are several important groups for which there are no currently available data. The hypothesis of homology predicts that cytasters should be present in these groups. Increasing the taxonomic sample using modern techniques uniformly will test for evolutionary patterns supporting homology, homoplasy, or secondary loss of

  14. High-throughput sequencing of black pepper root transcriptome

    Science.gov (United States)

    2012-01-01

    Background Black pepper (Piper nigrum L.) is one of the most popular spices in the world. It is used in cooking and the preservation of food and even has medicinal properties. Losses in production from disease are a major limitation in the culture of this crop. The major diseases are root rot and foot rot, which are results of root infection by Fusarium solani and Phytophtora capsici, respectively. Understanding the molecular interaction between the pathogens and the host’s root region is important for obtaining resistant cultivars by biotechnological breeding. Genetic and molecular data for this species, though, are limited. In this paper, RNA-Seq technology has been employed, for the first time, to describe the root transcriptome of black pepper. Results The root transcriptome of black pepper was sequenced by the NGS SOLiD platform and assembled using the multiple-k method. Blast2Go and orthoMCL methods were used to annotate 10338 unigenes. The 4472 predicted proteins showed about 52% homology with the Arabidopsis proteome. Two root proteomes identified 615 proteins, which seem to define the plant’s root pattern. Simple-sequence repeats were identified that may be useful in studies of genetic diversity and may have applications in biotechnology and ecology. Conclusions This dataset of 10338 unigenes is crucially important for the biotechnological breeding of black pepper and the ecogenomics of the Magnoliids, a major group of basal angiosperms. PMID:22984782

  15. High-throughput sequencing of black pepper root transcriptome

    Directory of Open Access Journals (Sweden)

    Gordo Sheila MC

    2012-09-01

    Full Text Available Abstract Background Black pepper (Piper nigrum L. is one of the most popular spices in the world. It is used in cooking and the preservation of food and even has medicinal properties. Losses in production from disease are a major limitation in the culture of this crop. The major diseases are root rot and foot rot, which are results of root infection by Fusarium solani and Phytophtora capsici, respectively. Understanding the molecular interaction between the pathogens and the host’s root region is important for obtaining resistant cultivars by biotechnological breeding. Genetic and molecular data for this species, though, are limited. In this paper, RNA-Seq technology has been employed, for the first time, to describe the root transcriptome of black pepper. Results The root transcriptome of black pepper was sequenced by the NGS SOLiD platform and assembled using the multiple-k method. Blast2Go and orthoMCL methods were used to annotate 10338 unigenes. The 4472 predicted proteins showed about 52% homology with the Arabidopsis proteome. Two root proteomes identified 615 proteins, which seem to define the plant’s root pattern. Simple-sequence repeats were identified that may be useful in studies of genetic diversity and may have applications in biotechnology and ecology. Conclusions This dataset of 10338 unigenes is crucially important for the biotechnological breeding of black pepper and the ecogenomics of the Magnoliids, a major group of basal angiosperms.

  16. Similar sediment provenance of low and high arsenic aquifers in Bangladesh

    Science.gov (United States)

    Zheng, Y.; Yang, Q.; Li, S.; Hemming, S. R.; Zhang, Y.; Rasbury, T.; Hemming, G.

    2017-12-01

    Geogenic arsenic (As) in drinking water, especially in groundwater, is estimated to have affected the health of over 100 million people worldwide, with nearly half of the total at risk population in Bangladesh. Sluggish flow and reducing biogeochemical environment in sedimentary aquifers have been shown as the primary controls for the release of As from sediment to the shallower groundwater in the Holocene aquifer. In contrast, deeper groundwater in the Pleistocene aquifer is depleted in groundwater As and sediment-extractable As. This study assesses the origin of the sediment in two aquifers of Bangladesh that contain distinctly different As levels to ascertain whether the source of the sediment is a factor in this difference through measurements of detrital mica Ar-Ar age, detrital zircon U-Pb age, as well as sediment silicate Sr and Nd isotopes. Whole rock geochemical data were also used to illuminate the extent of chemical weathering. Detrital mica 40Ar/39Ar cooling ages and detrital zircon U-Pb ages show no statistical difference between high-As Holocene sediment and low-As Pleistocene sediment, but suggest an aquifer sediment source of both the Brahmaputra and the Ganges rivers. Silicate 87Sr/86Sr and 143Nd/144Nd further depict a major sediment source from the Brahmaputra river, which is supported by a two end member mixing model using 87Sr/86Sr and Sr concentrations. Pleistocene and Holocene sediments show little difference in weathering of mobile elements including As, while coarser sediments and a longer history of the Pleistocene aquifer suggest that sorting and flushing play more important roles in regulating the contrast of As occurrence between these two aquifers.

  17. The Microsoft Biology Foundation Applications for High-Throughput Sequencing

    Science.gov (United States)

    Mercer, S.

    2010-01-01

    w9-2 The need for reusable libraries of bioinformatics functions has been recognized for many years and a number of language-specific toolkits have been constructed. Such toolkits have served as valuable nucleation points for the community, promoting the sharing of code and establishing standards. The majority of DNA sequencing machines and many other standard pieces of lab equipment are controlled by PCs using Windows, and a Microsoft genomics toolkit would enable initial processing and quality control to happen closer to the instrumentation and provide opportunities for added-value services within core facilities. The Microsoft Biology Foundation (MBF) is an open source software library, freely available for both commercial and academic use, available as an early-stage betafrom mbf.codeplex.com. This presentation will describe the structure and goals of MBF and demonstrate some of its uses.

  18. Sequence similarity between the erythrocyte binding domain 1 of the Plasmodium vivax Duffy binding protein and the V3 loop of HIV-1 strain MN reveals binding residues for the Duffy Antigen Receptor for Chemokines

    OpenAIRE

    Bolton, Michael J; Garry, Robert F

    2011-01-01

    Abstract Background The surface glycoprotein (SU, gp120) of the human immunodeficiency virus (HIV) must bind to a chemokine receptor, CCR5 or CXCR4, to invade CD4+ cells. Plasmodium vivax uses the Duffy Binding Protein (DBP) to bind the Duffy Antigen Receptor for Chemokines (DARC) and invade reticulocytes. Results Variable loop 3 (V3) of HIV-1 SU and domain 1 of the Plasmodium vivax DBP share a sequence similarity. The site of amino acid sequence similarity was necessary, but not sufficient, ...

  19. A tale of two pectins: Diverse fine structures can result from identical processive PME treatments on similar high DM subtrates

    Science.gov (United States)

    The effects of a processive pectin-methylesterase treatment on two different pectins, both possessing a high degree of methylesterification, were investigated. While the starting samples were purportedly very similar in fine structure, and even though the sample-averaged degree of methylesterificati...

  20. Differences and similarities in double special educational needs: high abilities/giftedness x Asperger’s Syndrome

    Directory of Open Access Journals (Sweden)

    Nara Joyce Wellausen Vieira

    2012-08-01

    Full Text Available The study was developed from a literature search in books, articles and theses that have been published since the year 2000 on the theme High Abilities / Giftedness and Asperger’s Syndrome. The objectives of this research were to conduct a search on publications from 2000 to 2011, about the common and different features to the person with Asperger syndrome and high ability gifted, and also relate the number of publications found in Education and Special Education. At theoretical we present the conception of High Abilities / Giftedness of Renzulli (2004 and Gardner (2000 and in the conception of Asperger Syndrome, Mello (2007 and Klin (2006. When analyzing the data, were perceived similarities and differences between the behavioral characteristics of individuals with High Abilities / Giftedness and those with Asperger’s Syndrome. It’s possible point out that there is much evidence that separate these two special educational needs and few similarities between them. But do not neglect that there may be a dual disability between these two particular special educational needs, because there are still few studies that verify theoretically the differences and similarities of these subjects, much less those that investigate these similarities and distinctions in the subjects themselves.

  1. Fusion protein gene nucleotide sequence similarities, shared antigenic sites and phylogenetic analysis suggest that phocid distemper virus 2 and canine distemper virus belong to the same virus entity.

    NARCIS (Netherlands)

    I.K.G. Visser (Ilona); R.W.J. van der Heijden (Roger); M.W.G. van de Bildt (Marco); M.J.H. Kenter (Marcel); C. Örvell; A.D.M.E. Osterhaus (Albert)

    1993-01-01

    textabstractNucleotide sequencing of the fusion protein (F) gene of phocid distemper virus-2 (PDV-2), recently isolated from Baikal seals (Phoca sibirica), revealed an open reading frame (nucleotides 84 to 2075) with two potential in-frame ATG translation initiation codons. We suggest that the

  2. Sources of PCR-induced distortions in high-throughput sequencing data sets

    Science.gov (United States)

    Kebschull, Justus M.; Zador, Anthony M.

    2015-01-01

    PCR permits the exponential and sequence-specific amplification of DNA, even from minute starting quantities. PCR is a fundamental step in preparing DNA samples for high-throughput sequencing. However, there are errors associated with PCR-mediated amplification. Here we examine the effects of four important sources of error—bias, stochasticity, template switches and polymerase errors—on sequence representation in low-input next-generation sequencing libraries. We designed a pool of diverse PCR amplicons with a defined structure, and then used Illumina sequencing to search for signatures of each process. We further developed quantitative models for each process, and compared predictions of these models to our experimental data. We find that PCR stochasticity is the major force skewing sequence representation after amplification of a pool of unique DNA amplicons. Polymerase errors become very common in later cycles of PCR but have little impact on the overall sequence distribution as they are confined to small copy numbers. PCR template switches are rare and confined to low copy numbers. Our results provide a theoretical basis for removing distortions from high-throughput sequencing data. In addition, our findings on PCR stochasticity will have particular relevance to quantification of results from single cell sequencing, in which sequences are represented by only one or a few molecules. PMID:26187991

  3. Turbulence, dynamic similarity and scale effects in high-velocity free-surface flows above a stepped chute

    Science.gov (United States)

    Felder, Stefan; Chanson, Hubert

    2009-07-01

    In high-velocity free-surface flows, air entrainment is common through the interface, and intense interactions take place between turbulent structures and entrained bubbles. Two-phase flow properties were measured herein in high-velocity open channel flows above a stepped chute. Detailed turbulence measurements were conducted in a large-size facility, and a comparative analysis was applied to test the validity of the Froude and Reynolds similarities. The results showed consistently that the Froude similitude was not satisfied using a 2:1 geometric scaling ratio. Lesser number of entrained bubbles and comparatively greater bubble sizes were observed at the smaller Reynolds numbers, as well as lower turbulence levels and larger turbulent length and time scales. The results implied that small-size models did underestimate the rate of energy dissipation and the aeration efficiency of prototype stepped spillways for similar flow conditions. Similarly a Reynolds similitude was tested. The results showed also some significant scale effects. However a number of self-similar relationships remained invariant under changes of scale and confirmed the analysis of Chanson and Carosi (Exp Fluids 42:385-401, 2007). The finding is significant because self-similarity may provide a picture general enough to be used to characterise the air-water flow field in large prototype channels.

  4. SUGAR: graphical user interface-based data refiner for high-throughput DNA sequencing.

    Science.gov (United States)

    Sato, Yukuto; Kojima, Kaname; Nariai, Naoki; Yamaguchi-Kabata, Yumi; Kawai, Yosuke; Takahashi, Mamoru; Mimori, Takahiro; Nagasaki, Masao

    2014-08-08

    Next-generation sequencers (NGSs) have become one of the main tools for current biology. To obtain useful insights from the NGS data, it is essential to control low-quality portions of the data affected by technical errors such as air bubbles in sequencing fluidics. We develop a software SUGAR (subtile-based GUI-assisted refiner) which can handle ultra-high-throughput data with user-friendly graphical user interface (GUI) and interactive analysis capability. The SUGAR generates high-resolution quality heatmaps of the flowcell, enabling users to find possible signals of technical errors during the sequencing. The sequencing data generated from the error-affected regions of a flowcell can be selectively removed by automated analysis or GUI-assisted operations implemented in the SUGAR. The automated data-cleaning function based on sequence read quality (Phred) scores was applied to a public whole human genome sequencing data and we proved the overall mapping quality was improved. The detailed data evaluation and cleaning enabled by SUGAR would reduce technical problems in sequence read mapping, improving subsequent variant analysis that require high-quality sequence data and mapping results. Therefore, the software will be especially useful to control the quality of variant calls to the low population cells, e.g., cancers, in a sample with technical errors of sequencing procedures.

  5. Self-similarity of high-pT hadron production in π-p and π- A collisions

    International Nuclear Information System (INIS)

    Tokarev, M.V.; Panebrattsev, Yu.A.; Skoro, G.P.; Zborovsky, I.

    2002-01-01

    Self-similar properties of hadron production in π - p and π - A collisions over a high-p T region are studied. The analysis if experimental data is performed in the framework of z-scaling. The scaling variable depends on the anomalous fractal dimension of the incoming pion. Its value is found to be δ π ≅ 0.1. Independence of the scaling function Ψ(z) on the collision energy is shown. A-dependence of data z-presentation confirms self-similarity of particle formation in πA collisions

  6. High resolution clustering of Salmonella enterica serovar Montevideo strains using a next-generation sequencing approach

    Directory of Open Access Journals (Sweden)

    Allard Marc W

    2012-01-01

    Full Text Available Abstract Background Next-Generation Sequencing (NGS is increasingly being used as a molecular epidemiologic tool for discerning ancestry and traceback of the most complicated, difficult to resolve bacterial pathogens. Making a linkage between possible food sources and clinical isolates requires distinguishing the suspected pathogen from an environmental background and placing the variation observed into the wider context of variation occurring within a serovar and among other closely related foodborne pathogens. Equally important is the need to validate these high resolution molecular tools for use in molecular epidemiologic traceback. Such efforts include the examination of strain cluster stability as well as the cumulative genetic effects of sub-culturing on these clusters. Numerous isolates of S. Montevideo were shot-gun sequenced including diverse lineage representatives as well as numerous replicate clones to determine how much variability is due to bias, sequencing error, and or the culturing of isolates. All new draft genomes were compared to 34 S. Montevideo isolates previously published during an NGS-based molecular epidemiological case study. Results Intraserovar lineages of S. Montevideo differ by thousands of SNPs, that are only slightly less than the number of SNPs observed between S. Montevideo and other distinct serovars. Much less variability was discovered within an individual S. Montevideo clade implicated in a recent foodborne outbreak as well as among individual NGS replicates. These findings were similar to previous reports documenting homopolymeric and deletion error rates with the Roche 454 GS Titanium technology. In no case, however, did variability associated with sequencing methods or sample preparations create inconsistencies with our current phylogenetic results or the subsequent molecular epidemiological evidence gleaned from these data. Conclusions Implementation of a validated pipeline for NGS data acquisition and

  7. Intraspecific sequence comparisons reveal similar rates of non-collinear gene insertion in the B and D genomes of bread wheat

    Czech Academy of Sciences Publication Activity Database

    Bartoš, Jan; Vlček, Čestmír; Choulet, F.; Džunková, Mária; Cviková, Kateřina; Šafář, Jan; Šimková, Hana; Pačes, Jan; Strnad, Hynek; Sourdille, P.; Berges, H.; Cattonaro, F.; Feuillet, C.; Doležel, Jaroslav

    2012-01-01

    Roč. 12, č. 155 (2012), s. 1-10 ISSN 1471-2229 R&D Projects: GA ČR GAP501/10/1778 Grant - others:GA MŠk(CZ) ED0007/01/01 Program:ED Institutional research plan: CEZ:AV0Z50380511; CEZ:AV0Z50520514 Keywords : Wheat * BAC sequencing * Homoeologous genomes Subject RIV: EB - Genetics ; Molecular Biology Impact factor: 4.354, year: 2012

  8. Multiple Teaching Approaches, Teaching Sequence and Concept Retention in High School Physics Education

    Science.gov (United States)

    Fogarty, Ian; Geelan, David

    2013-01-01

    Students in 4 Canadian high school physics classes completed instructional sequences in two key physics topics related to motion--Straight Line Motion and Newton's First Law. Different sequences of laboratory investigation, teacher explanation (lecture) and the use of computer-based scientific visualizations (animations and simulations) were…

  9. High-throughput sequencing of forensic genetic samples using punches of FTA cards with buccal swabs

    DEFF Research Database (Denmark)

    Kampmann, Marie-Louise; Buchard, Anders; Børsting, Claus

    2016-01-01

    Here, we demonstrate that punches from buccal swab samples preserved on FTA cards can be used for high-throughput DNA sequencing, also known as massively parallel sequencing (MPS). We typed 44 reference samples with the HID-Ion AmpliSeq Identity Panel using washed 1.2 mm punches from FTA cards...

  10. Library Design-Facilitated High-Throughput Sequencing of Synthetic Peptide Libraries.

    Science.gov (United States)

    Vinogradov, Alexander A; Gates, Zachary P; Zhang, Chi; Quartararo, Anthony J; Halloran, Kathryn H; Pentelute, Bradley L

    2017-11-13

    A methodology to achieve high-throughput de novo sequencing of synthetic peptide mixtures is reported. The approach leverages shotgun nanoliquid chromatography coupled with tandem mass spectrometry-based de novo sequencing of library mixtures (up to 2000 peptides) as well as automated data analysis protocols to filter away incorrect assignments, noise, and synthetic side-products. For increasing the confidence in the sequencing results, mass spectrometry-friendly library designs were developed that enabled unambiguous decoding of up to 600 peptide sequences per hour while maintaining greater than 85% sequence identification rates in most cases. The reliability of the reported decoding strategy was additionally confirmed by matching fragmentation spectra for select authentic peptides identified from library sequencing samples. The methods reported here are directly applicable to screening techniques that yield mixtures of active compounds, including particle sorting of one-bead one-compound libraries and affinity enrichment of synthetic library mixtures performed in solution.

  11. Sequence similarity between the viral cp gene and the transgene in transgenic papayas Similaridade de seqüência entre o gene cp do vírus e do transgene presente em mamoeiros transgênicos

    Directory of Open Access Journals (Sweden)

    Manoel Teixeira Souza Júnior

    2005-05-01

    Full Text Available The Papaya ringspot virus (PRSV coat protein transgene present in 'Rainbow' and 'SunUp' papayas disclose high sequence similarity (>89% to the cp gene from PRSV BR and TH. Despite this, both isolates are able to break down the resistance in 'Rainbow', while only the latter is able to do so in 'SunUp'. The objective of this work was to evaluate the degree of sequence similarity between the cp gene in the challenge isolate and the cp transgene in transgenic papayas resistant to PRSV. The production of a hybrid virus containing the genome backbone of PRSV HA up to the Apa I site in the NIb gene, and downstream from there, the sequence of PRSV TH was undertaken. This hybrid virus, PRSV HA/TH, was obtained and used to challenge 'Rainbow', 'SunUp', and an R2 population derived from line 63-1, all resistant to PRSV HA. PRSV HA/TH broke down the resistance in both papaya varieties and in the 63-1 population, demonstrating that sequence similarity is a major factor in the mechanism of resistance used by transgenic papayas expressing the cp gene. A comparative analysis of the cp gene present in line 55-1 and 63-1-derived transgenic plants and in PRSV HA, BR, and TH was also performed.O gene da capa protéica (cp do vírus da mancha anelar do mamoeiro (Papaya ringspot virus, PRSV, presente nos mamoeiros 'Rainbow' e 'SunUp', tem alta similaridade de seqüência (>89% com o gene cp dos isolados PRSV BR e TH. Apesar deste alto grau de similaridade, ambos isolados são capazes de quebrar a resistência observada em 'Rainbow', ao passo que TH quebra a resistência em 'SunUp'. O objetivo deste trabalho foi avaliar o grau de similaridade de seqüência entre o gene cp do vírus desafiante e do transgene em mamoeiros transgênicos resistentes a PRSV. Produziu-se um vírus híbrido contendo o genoma do isolado PRSV HA até o sítio de restrição Apa I no gene NIb, e, a partir deste ponto, este vírus continha o genoma do isolado PRSV TH. PRSV HA/TH foi utilizado

  12. High-power Yb-fiber comb based on pre-chirped-management self-similar amplification

    Science.gov (United States)

    Luo, Daping; Liu, Yang; Gu, Chenglin; Wang, Chao; Zhu, Zhiwei; Zhang, Wenchao; Deng, Zejiang; Zhou, Lian; Li, Wenxue; Zeng, Heping

    2018-02-01

    We report a fiber self-similar-amplification (SSA) comb system that delivers a 250-MHz, 109-W, 42-fs pulse train with a 10-dB spectral width of 85 nm at 1056 nm. A pair of grisms is employed to compensate the group velocity dispersion and third-order dispersion of pre-amplified pulses for facilitating a self-similar evolution and a self-phase modulation (SPM). Moreover, we analyze the stabilities and noise characteristics of both the locked carrier envelope phase and the repetition rate, verifying the stability of the generated high-power comb. The demonstration of the SSA comb at such high power proves the feasibility of the SPM-based low-noise ultrashort comb.

  13. ISRNA: an integrative online toolkit for short reads from high-throughput sequencing data.

    Science.gov (United States)

    Luo, Guan-Zheng; Yang, Wei; Ma, Ying-Ke; Wang, Xiu-Jie

    2014-02-01

    Integrative Short Reads NAvigator (ISRNA) is an online toolkit for analyzing high-throughput small RNA sequencing data. Besides the high-speed genome mapping function, ISRNA provides statistics for genomic location, length distribution and nucleotide composition bias analysis of sequence reads. Number of reads mapped to known microRNAs and other classes of short non-coding RNAs, coverage of short reads on genes, expression abundance of sequence reads as well as some other analysis functions are also supported. The versatile search functions enable users to select sequence reads according to their sub-sequences, expression abundance, genomic location, relationship to genes, etc. A specialized genome browser is integrated to visualize the genomic distribution of short reads. ISRNA also supports management and comparison among multiple datasets. ISRNA is implemented in Java/C++/Perl/MySQL and can be freely accessed at http://omicslab.genetics.ac.cn/ISRNA/.

  14. Judgments of brand similarity

    NARCIS (Netherlands)

    Bijmolt, THA; Wedel, M; Pieters, RGM; DeSarbo, WS

    This paper provides empirical insight into the way consumers make pairwise similarity judgments between brands, and how familiarity with the brands, serial position of the pair in a sequence, and the presentation format affect these judgments. Within the similarity judgment process both the

  15. Applications of high-throughput sequencing to chromatin structure and function in mammals

    OpenAIRE

    Dunham, Ian

    2009-01-01

    High-throughput DNA sequencing approaches have enabled direct interrogation of chromatin samples from mammalian cells. We are beginning to develop a genome-wide description of nuclear function during development, but further data collection, refinement, and integration are needed.

  16. Continuous- and Discrete-Time Stimulus Sequences for High Stimulus Rate Paradigm in Evoked Potential Studies

    Directory of Open Access Journals (Sweden)

    Tao Wang

    2013-01-01

    Full Text Available To obtain reliable transient auditory evoked potentials (AEPs from EEGs recorded using high stimulus rate (HSR paradigm, it is critical to design the stimulus sequences of appropriate frequency properties. Traditionally, the individual stimulus events in a stimulus sequence occur only at discrete time points dependent on the sampling frequency of the recording system and the duration of stimulus sequence. This dependency likely causes the implementation of suboptimal stimulus sequences, sacrificing the reliability of resulting AEPs. In this paper, we explicate the use of continuous-time stimulus sequence for HSR paradigm, which is independent of the discrete electroencephalogram (EEG recording system. We employ simulation studies to examine the applicability of the continuous-time stimulus sequences and the impacts of sampling frequency on AEPs in traditional studies using discrete-time design. Results from these studies show that the continuous-time sequences can offer better frequency properties and improve the reliability of recovered AEPs. Furthermore, we find that the errors in the recovered AEPs depend critically on the sampling frequencies of experimental systems, and their relationship can be fitted using a reciprocal function. As such, our study contributes to the literature by demonstrating the applicability and advantages of continuous-time stimulus sequences for HSR paradigm and by revealing the relationship between the reliability of AEPs and sampling frequencies of the experimental systems when discrete-time stimulus sequences are used in traditional manner for the HSR paradigm.

  17. Centroid based clustering of high throughput sequencing reads based on n-mer counts.

    Science.gov (United States)

    Solovyov, Alexander; Lipkin, W Ian

    2013-09-08

    Many problems in computational biology require alignment-free sequence comparisons. One of the common tasks involving sequence comparison is sequence clustering. Here we apply methods of alignment-free comparison (in particular, comparison using sequence composition) to the challenge of sequence clustering. We study several centroid based algorithms for clustering sequences based on word counts. Study of their performance shows that using k-means algorithm with or without the data whitening is efficient from the computational point of view. A higher clustering accuracy can be achieved using the soft expectation maximization method, whereby each sequence is attributed to each cluster with a specific probability. We implement an open source tool for alignment-free clustering. It is publicly available from github: https://github.com/luscinius/afcluster. We show the utility of alignment-free sequence clustering for high throughput sequencing analysis despite its limitations. In particular, it allows one to perform assembly with reduced resources and a minimal loss of quality. The major factor affecting performance of alignment-free read clustering is the length of the read.

  18. High-functioning autism patients share similar but more severe impairments in verbal theory of mind than schizophrenia patients.

    Science.gov (United States)

    Tin, L N W; Lui, S S Y; Ho, K K Y; Hung, K S Y; Wang, Y; Yeung, H K H; Wong, T Y; Lam, S M; Chan, R C K; Cheung, E F C

    2018-06-01

    Evidence suggests that autism and schizophrenia share similarities in genetic, neuropsychological and behavioural aspects. Although both disorders are associated with theory of mind (ToM) impairments, a few studies have directly compared ToM between autism patients and schizophrenia patients. This study aimed to investigate to what extent high-functioning autism patients and schizophrenia patients share and differ in ToM performance. Thirty high-functioning autism patients, 30 schizophrenia patients and 30 healthy individuals were recruited. Participants were matched in age, gender and estimated intelligence quotient. The verbal-based Faux Pas Task and the visual-based Yoni Task were utilised to examine first- and higher-order, affective and cognitive ToM. The task/item difficulty of two paradigms was examined using mixed model analyses of variance (ANOVAs). Multiple ANOVAs and mixed model ANOVAs were used to examine group differences in ToM. The Faux Pas Task was more difficult than the Yoni Task. High-functioning autism patients showed more severely impaired verbal-based ToM in the Faux Pas Task, but shared similar visual-based ToM impairments in the Yoni Task with schizophrenia patients. The findings that individuals with high-functioning autism shared similar but more severe impairments in verbal ToM than individuals with schizophrenia support the autism-schizophrenia continuum. The finding that verbal-based but not visual-based ToM was more impaired in high-functioning autism patients than schizophrenia patients could be attributable to the varied task/item difficulty between the two paradigms.

  19. Isolation and characterization of antigen-specific alpaca (Lama pacos) VHH antibodies by biopanning followed by high-throughput sequencing.

    Science.gov (United States)

    Miyazaki, Nobuo; Kiyose, Norihiko; Akazawa, Yoko; Takashima, Mizuki; Hagihara, Yosihisa; Inoue, Naokazu; Matsuda, Tomonari; Ogawa, Ryu; Inoue, Seiya; Ito, Yuji

    2015-09-01

    The antigen-binding domain of camelid dimeric heavy chain antibodies, known as VHH or Nanobody, has much potential in pharmaceutical and industrial applications. To establish the isolation process of antigen-specific VHH, a VHH phage library was constructed with a diversity of 8.4 × 10(7) from cDNA of peripheral blood mononuclear cells of an alpaca (Lama pacos) immunized with a fragment of IZUMO1 (IZUMO1PFF) as a model antigen. By conventional biopanning, 13 antigen-specific VHHs were isolated. The amino acid sequences of these VHHs, designated as N-group VHHs, were very similar to each other (>93% identity). To find more diverse antibodies, we performed high-throughput sequencing (HTS) of VHH genes. By comparing the frequencies of each sequence between before and after biopanning, we found the sequences whose frequencies were increased by biopanning. The top 100 sequences of them were supplied for phylogenic tree analysis. In total 75% of them belonged to N-group VHHs, but the other were phylogenically apart from N-group VHHs (Non N-group). Two of three VHHs selected from non N-group VHHs showed sufficient antigen binding ability. These results suggested that biopanning followed by HTS provided a useful method for finding minor and diverse antigen-specific clones that could not be identified by conventional biopanning. © The Authors 2015. Published by Oxford University Press on behalf of the Japanese Biochemical Society. All rights reserved.

  20. The proximal first exon architecture of the murine ghrelin gene is highly similar to its human orthologue

    Directory of Open Access Journals (Sweden)

    Seim Inge

    2009-05-01

    Full Text Available Abstract Background The murine ghrelin gene (Ghrl, originally sequenced from stomach tissue, contains five exons and a single transcription start site in a short, 19 bp first exon (exon 0. We recently isolated several novel first exons of the human ghrelin gene and found evidence of a complex transcriptional repertoire. In this report, we examined the 5' exons of the murine ghrelin orthologue in a range of tissues using 5' RACE. Findings 5' RACE revealed two transcription start sites (TSSs in exon 0 and four TSSs in intron 0, which correspond to 5' extensions of exon 1. Using quantitative, real-time RT-PCR (qRT-PCR, we demonstrated that extended exon 1 containing Ghrl transcripts are largely confined to the spleen, adrenal gland, stomach, and skin. Conclusion We demonstrate that multiple transcription start sites are present in exon 0 and an extended exon 1 of the murine ghrelin gene, similar to the proximal first exon organisation of its human orthologue. The identification of several transcription start sites in intron 0 of mouse ghrelin (resulting in an extension of exon 1 raises the possibility that developmental-, cell- and tissue-specific Ghrl mRNA species are created by employing alternative promoters and further studies of the murine ghrelin gene are warranted.

  1. Application of high-throughput sequencing in understanding human oral microbiome related with health and disease

    OpenAIRE

    Chen, Hui; Jiang, Wen

    2014-01-01

    The oral microbiome is one of most diversity habitat in the human body and they are closely related with oral health and disease. As the technique developing,, high throughput sequencing has become a popular approach applied for oral microbial analysis. Oral bacterial profiles have been studied to explore the relationship between microbial diversity and oral diseases such as caries and periodontal disease. This review describes the application of high-throughput sequencing for characterizati...

  2. A SNP based high-density linkage map of Apis cerana reveals a high recombination rate similar to Apis mellifera.

    Directory of Open Access Journals (Sweden)

    Yuan Yuan Shi

    Full Text Available BACKGROUND: The Eastern honey bee, Apis cerana Fabricius, is distributed in southern and eastern Asia, from India and China to Korea and Japan and southeast to the Moluccas. This species is also widely kept for honey production besides Apis mellifera. Apis cerana is also a model organism for studying social behavior, caste determination, mating biology, sexual selection, and host-parasite interactions. Few resources are available for molecular research in this species, and a linkage map was never constructed. A linkage map is a prerequisite for quantitative trait loci mapping and for analyzing genome structure. We used the Chinese honey bee, Apis cerana cerana to construct the first linkage map in the Eastern honey bee. RESULTS: F2 workers (N = 103 were genotyped for 126,990 single nucleotide polymorphisms (SNPs. After filtering low quality and those not passing the Mendel test, we obtained 3,000 SNPs, 1,535 of these were informative and used to construct a linkage map. The preliminary map contains 19 linkage groups, we then mapped the 19 linkage groups to 16 chromosomes by comparing the markers to the genome of A. mellfiera. The final map contains 16 linkage groups with a total of 1,535 markers. The total genetic distance is 3,942.7 centimorgans (cM with the largest linkage group (180 loci measuring 574.5 cM. Average marker interval for all markers across the 16 linkage groups is 2.6 cM. CONCLUSION: We constructed a high density linkage map for A. c. cerana with 1,535 markers. Because the map is based on SNP markers, it will enable easier and faster genotyping assays than randomly amplified polymorphic DNA or microsatellite based maps used in A. mellifera.

  3. Whole-exome sequencing identified a homozygous FNBP4 mutation in a family with a condition similar to microphthalmia with limb anomalies.

    Science.gov (United States)

    Kondo, Yukiko; Koshimizu, Eriko; Megarbane, Andre; Hamanoue, Haruka; Okada, Ippei; Nishiyama, Kiyomi; Kodera, Hirofumi; Miyatake, Satoko; Tsurusaki, Yoshinori; Nakashima, Mitsuko; Doi, Hiroshi; Miyake, Noriko; Saitsu, Hirotomo; Matsumoto, Naomichi

    2013-07-01

    Microphthalmia with limb anomalies (MLA), also known as Waardenburg anophthalmia syndrome or ophthalmoacromelic syndrome, is a rare autosomal recessive disorder. Recently, we and others successfully identified SMOC1 as the causative gene for MLA. However, there are several MLA families without SMOC1 abnormality, suggesting locus heterogeneity in MLA. We aimed to identify a pathogenic mutation in one Lebanese family having an MLA-like condition without SMOC1 mutation by whole-exome sequencing (WES) combined with homozygosity mapping. A c.683C>T (p.Thr228Met) in FNBP4 was found as a primary candidate, drawing the attention that FNBP4 and SMOC1 may potentially modulate BMP signaling. Copyright © 2013 Wiley Periodicals, Inc.

  4. Low cognitive load strengthens distractor interference while high load attenuates when cognitive load and distractor possess similar visual characteristics.

    Science.gov (United States)

    Minamoto, Takehiro; Shipstead, Zach; Osaka, Naoyuki; Engle, Randall W

    2015-07-01

    Studies on visual cognitive load have reported inconsistent effects of distractor interference when distractors have visual characteristic that are similar to the cognitive load. Some studies have shown that the cognitive load enhances distractor interference, while others reported an attenuating effect. We attribute these inconsistencies to the amount of cognitive load that a person is required to maintain. Lower amounts of cognitive load increase distractor interference by orienting attention toward visually similar distractors. Higher amounts of cognitive load attenuate distractor interference by depleting attentional resources needed to process distractors. In the present study, cognitive load consisted of faces (Experiments 1-3) or scenes (Experiment 2). Participants performed a selective attention task in which they ignored face distractors while judging a color of a target dot presented nearby, under differing amounts of load. Across these experiments distractor interference was greater in the low-load condition and smaller in the high-load condition when the content of the cognitive load had similar visual characteristic to the distractors. We also found that when a series of judgments needed to be made, the effect was apparent for the first trial but not for the second. We further tested an involvement of working memory capacity (WMC) in the load effect (Experiment 3). Interestingly, both high and low WMC groups received an equivalent effect of the cognitive load in the first distractor, suggesting these effects are fairly automatic.

  5. Highly parallel translation of DNA sequences into small molecules.

    Directory of Open Access Journals (Sweden)

    Rebecca M Weisinger

    Full Text Available A large body of in vitro evolution work establishes the utility of biopolymer libraries comprising 10(10 to 10(15 distinct molecules for the discovery of nanomolar-affinity ligands to proteins. Small-molecule libraries of comparable complexity will likely provide nanomolar-affinity small-molecule ligands. Unlike biopolymers, small molecules can offer the advantages of cell permeability, low immunogenicity, metabolic stability, rapid diffusion and inexpensive mass production. It is thought that such desirable in vivo behavior is correlated with the physical properties of small molecules, specifically a limited number of hydrogen bond donors and acceptors, a defined range of hydrophobicity, and most importantly, molecular weights less than 500 Daltons. Creating a collection of 10(10 to 10(15 small molecules that meet these criteria requires the use of hundreds to thousands of diversity elements per step in a combinatorial synthesis of three to five steps. With this goal in mind, we have reported a set of mesofluidic devices that enable DNA-programmed combinatorial chemistry in a highly parallel 384-well plate format. Here, we demonstrate that these devices can translate DNA genes encoding 384 diversity elements per coding position into corresponding small-molecule gene products. This robust and efficient procedure yields small molecule-DNA conjugates suitable for in vitro evolution experiments.

  6. RNA-Sequencing of Drosophila melanogaster Head Tissue on High-Sugar and High-Fat Diets

    Directory of Open Access Journals (Sweden)

    Wayne Hemphill

    2018-01-01

    Full Text Available Obesity has been shown to increase risk for cardiovascular disease and type-2 diabetes. In addition, it has been implicated in aggravation of neurological conditions such as Alzheimer’s. In the model organism Drosophila melanogaster, a physiological state mimicking diet-induced obesity can be induced by subjecting fruit flies to a solid medium disproportionately higher in sugar than protein, or that has been supplemented with a rich source of saturated fat. These flies can exhibit increased circulating glucose levels, increased triglyceride content, insulin-like peptide resistance, and behavior indicative of neurological decline. We subjected flies to variants of the high-sugar diet, high-fat diet, or normal (control diet, followed by a total RNA extraction from fly heads of each diet group for the purpose of Poly-A selected RNA-Sequencing. Our objective was to identify the effects of obesogenic diets on transcriptome patterns, how they differed between obesogenic diets, and identify genes that may relate to pathogenesis accompanying an obesity-like state. Gene ontology analysis indicated an overrepresentation of affected genes associated with immunity, metabolism, and hemocyanin in the high-fat diet group, and CHK, cell cycle activity, and DNA binding and transcription in the high-sugar diet group. Our results also indicate differences in the effects of the high-fat diet and high-sugar diet on expression profiles in head tissue of flies, despite the reportedly similar phenotypic impacts of the diets. The impacted genes, and how they may relate to pathogenesis in the Drosophila obesity-like state, warrant further experimental investigation.

  7. Highly accurate sequence imputation enables precise QTL mapping in Brown Swiss cattle.

    Science.gov (United States)

    Frischknecht, Mirjam; Pausch, Hubert; Bapst, Beat; Signer-Hasler, Heidi; Flury, Christine; Garrick, Dorian; Stricker, Christian; Fries, Ruedi; Gredler-Grandl, Birgit

    2017-12-29

    Within the last few years a large amount of genomic information has become available in cattle. Densities of genomic information vary from a few thousand variants up to whole genome sequence information. In order to combine genomic information from different sources and infer genotypes for a common set of variants, genotype imputation is required. In this study we evaluated the accuracy of imputation from high density chips to whole genome sequence data in Brown Swiss cattle. Using four popular imputation programs (Beagle, FImpute, Impute2, Minimac) and various compositions of reference panels, the accuracy of the imputed sequence variant genotypes was high and differences between the programs and scenarios were small. We imputed sequence variant genotypes for more than 1600 Brown Swiss bulls and performed genome-wide association studies for milk fat percentage at two stages of lactation. We found one and three quantitative trait loci for early and late lactation fat content, respectively. Known causal variants that were imputed from the sequenced reference panel were among the most significantly associated variants of the genome-wide association study. Our study demonstrates that whole-genome sequence information can be imputed at high accuracy in cattle populations. Using imputed sequence variant genotypes in genome-wide association studies may facilitate causal variant detection.

  8. SINA: accurate high-throughput multiple sequence alignment of ribosomal RNA genes.

    Science.gov (United States)

    Pruesse, Elmar; Peplies, Jörg; Glöckner, Frank Oliver

    2012-07-15

    In the analysis of homologous sequences, computation of multiple sequence alignments (MSAs) has become a bottleneck. This is especially troublesome for marker genes like the ribosomal RNA (rRNA) where already millions of sequences are publicly available and individual studies can easily produce hundreds of thousands of new sequences. Methods have been developed to cope with such numbers, but further improvements are needed to meet accuracy requirements. In this study, we present the SILVA Incremental Aligner (SINA) used to align the rRNA gene databases provided by the SILVA ribosomal RNA project. SINA uses a combination of k-mer searching and partial order alignment (POA) to maintain very high alignment accuracy while satisfying high throughput performance demands. SINA was evaluated in comparison with the commonly used high throughput MSA programs PyNAST and mothur. The three BRAliBase III benchmark MSAs could be reproduced with 99.3, 97.6 and 96.1 accuracy. A larger benchmark MSA comprising 38 772 sequences could be reproduced with 98.9 and 99.3% accuracy using reference MSAs comprising 1000 and 5000 sequences. SINA was able to achieve higher accuracy than PyNAST and mothur in all performed benchmarks. Alignment of up to 500 sequences using the latest SILVA SSU/LSU Ref datasets as reference MSA is offered at http://www.arb-silva.de/aligner. This page also links to Linux binaries, user manual and tutorial. SINA is made available under a personal use license.

  9. A priori Considerations When Conducting High-Throughput Amplicon-Based Sequence Analysis

    Directory of Open Access Journals (Sweden)

    Aditi Sengupta

    2016-03-01

    Full Text Available Amplicon-based sequencing strategies that include 16S rRNA and functional genes, alongside “meta-omics” analyses of communities of microorganisms, have allowed researchers to pose questions and find answers to “who” is present in the environment and “what” they are doing. Next-generation sequencing approaches that aid microbial ecology studies of agricultural systems are fast gaining popularity among agronomy, crop, soil, and environmental science researchers. Given the rapid development of these high-throughput sequencing techniques, researchers with no prior experience will desire information about the best practices that can be used before actually starting high-throughput amplicon-based sequence analyses. We have outlined items that need to be carefully considered in experimental design, sampling, basic bioinformatics, sequencing of mock communities and negative controls, acquisition of metadata, and in standardization of reaction conditions as per experimental requirements. Not all considerations mentioned here may pertain to a particular study. The overall goal is to inform researchers about considerations that must be taken into account when conducting high-throughput microbial DNA sequencing and sequences analysis.

  10. Highly divergent 16S rRNA sequences in ribosomal operons of Scytonema hyalinum (Cyanobacteria.

    Directory of Open Access Journals (Sweden)

    Jeffrey R Johansen

    Full Text Available A highly divergent 16S rRNA gene was found in one of the five ribosomal operons present in a species complex currently circumscribed as Scytonema hyalinum (Nostocales, Cyanobacteria using clone libraries. If 16S rRNA sequence macroheterogeneity among ribosomal operons due to insertions, deletions or truncation is excluded, the sequence heterogeneity observed in S. hyalinum was the highest observed in any prokaryotic species thus far (7.3-9.0%. The secondary structure of the 16S rRNA molecules encoded by the two divergent operons was nearly identical, indicating possible functionality. The 23S rRNA gene was examined for a few strains in this complex, and it was also found to be highly divergent from the gene in Type 2 operons (8.7%, and likewise had nearly identical secondary structure between the Type 1 and Type 2 operons. Furthermore, the 16S-23S ITS showed marked differences consistent between operons among numerous strains. Both operons have promoter sequences that satisfy consensus requirements for functional prokaryotic transcription initiation. Horizontal gene transfer from another unknown heterocytous cyanobacterium is considered the most likely explanation for the origin of this molecule, but does not explain the ultimate origin of this sequence, which is very divergent from all 16S rRNA sequences found thus far in cyanobacteria. The divergent sequence is highly conserved among numerous strains of S. hyalinum, suggesting adaptive advantage and selective constraint of the divergent sequence.

  11. Fast high-resolution MR imaging using the snapshot-FLASH MR sequence

    International Nuclear Information System (INIS)

    Matthaei, D.; Haase, A.; Henrich, D.; Duhmke, E.

    1990-01-01

    Snapshot, fast low-angle short (FLASH) MR imaging using an accelerated FLASH-MR sequence provides MR images with measuring times far below 1 second. The short TE of this sequence prevents susceptibility artifacts in gradient-echo imaging. In this paper variations of the sequence are shown that provide high resolution images with T1-weighted IR, T2-weighted SE, and chemical shift (CHESS) contrast sequences. METHODS AND MATERIALS: A whole-body 2-T system (Bruker-Medizintechnik) were used in combination with a 60-cm gradient system (providing gradient strength of 5 mT/m) to study healthy volunteers. The measuring time for a 256 x 256 image matrix was 800 msec. This sequence has been used in combination with T1-weighted IR, T2-weighted SE, and CHESS variations

  12. High-Throughput Mapping of Single-Neuron Projections by Sequencing of Barcoded RNA.

    Science.gov (United States)

    Kebschull, Justus M; Garcia da Silva, Pedro; Reid, Ashlan P; Peikon, Ian D; Albeanu, Dinu F; Zador, Anthony M

    2016-09-07

    Neurons transmit information to distant brain regions via long-range axonal projections. In the mouse, area-to-area connections have only been systematically mapped using bulk labeling techniques, which obscure the diverse projections of intermingled single neurons. Here we describe MAPseq (Multiplexed Analysis of Projections by Sequencing), a technique that can map the projections of thousands or even millions of single neurons by labeling large sets of neurons with random RNA sequences ("barcodes"). Axons are filled with barcode mRNA, each putative projection area is dissected, and the barcode mRNA is extracted and sequenced. Applying MAPseq to the locus coeruleus (LC), we find that individual LC neurons have preferred cortical targets. By recasting neuroanatomy, which is traditionally viewed as a problem of microscopy, as a problem of sequencing, MAPseq harnesses advances in sequencing technology to permit high-throughput interrogation of brain circuits. Copyright © 2016 Elsevier Inc. All rights reserved.

  13. Gait in ducks (Anas platyrhynchos and chickens (Gallus gallus – similarities in adaptation to high growth rate

    Directory of Open Access Journals (Sweden)

    B. M. Duggan

    2016-08-01

    Full Text Available Genetic selection for increased growth rate and muscle mass in broiler chickens has been accompanied by mobility issues and poor gait. There are concerns that the Pekin duck, which is on a similar selection trajectory (for production traits to the broiler chicken, may encounter gait problems in the future. In order to understand how gait has been altered by selection, the walking ability of divergent lines of high- and low-growth chickens and ducks was objectively measured using a pressure platform, which recorded various components of their gait. In both species, lines which had been selected for large breast muscle mass moved at a slower velocity and with a greater step width than their lighter conspecifics. These high-growth lines also spent more time supported by two feet in order to improve balance when compared with their lighter, low-growth conspecifics. We demonstrate that chicken and duck lines which have been subjected to intense selection for high growth rates and meat yields have adapted their gait in similar ways. A greater understanding of which components of gait have been altered in selected lines with impaired walking ability may lead to more effective breeding strategies to improve gait in poultry.

  14. Identifying Likely Transmission Pathways within a 10-Year Community Outbreak of Tuberculosis by High-Depth Whole Genome Sequencing.

    Directory of Open Access Journals (Sweden)

    Alexander C Outhred

    Full Text Available Improved tuberculosis control and the need to contain the spread of drug-resistant strains provide a strong rationale for exploring tuberculosis transmission dynamics at the population level. Whole-genome sequencing provides optimal strain resolution, facilitating detailed mapping of potential transmission pathways.We sequenced 22 isolates from a Mycobacterium tuberculosis cluster in New South Wales, Australia, identified during routine 24-locus mycobacterial interspersed repetitive unit typing. Following high-depth paired-end sequencing using the Illumina HiSeq 2000 platform, two independent pipelines were employed for analysis, both employing read mapping onto reference genomes as well as de novo assembly, to control biases in variant detection. In addition to single-nucleotide polymorphisms, the analyses also sought to identify insertions, deletions and structural variants.Isolates were highly similar, with a distance of 13 variants between the most distant members of the cluster. The most sensitive analysis classified the 22 isolates into 18 groups. Four of the isolates did not appear to share a recent common ancestor with the largest clade; another four isolates had an uncertain ancestral relationship with the largest clade.Whole genome sequencing, with analysis of single-nucleotide polymorphisms, insertions, deletions, structural variants and subpopulations, enabled the highest possible level of discrimination between cluster members, clarifying likely transmission pathways and exposing the complexity of strain origin. The analysis provides a basis for targeted public health intervention and enhanced classification of future isolates linked to the cluster.

  15. High-throughput sequencing of core STR loci for forensic genetic investigations using the Roche Genome Sequencer FLX platform

    DEFF Research Database (Denmark)

    Fordyce, Sarah Louise; Avila Arcos, Maria del Carmen; Rockenbauer, Eszter

    2011-01-01

    repeat units. These methods do not allow for the full resolution of STR base composition that sequencing approaches could provide. Here we present an STR profiling method based on the use of the Roche Genome Sequencer (GS) FLX to simultaneously sequence multiple core STR loci. Using this method...

  16. Analysis of high-depth sequence data for studying viral diversity: a comparison of next generation sequencing platforms using Segminator II

    Directory of Open Access Journals (Sweden)

    Archer John

    2012-03-01

    Full Text Available Abstract Background Next generation sequencing provides detailed insight into the variation present within viral populations, introducing the possibility of treatment strategies that are both reactive and predictive. Current software tools, however, need to be scaled up to accommodate for high-depth viral data sets, which are often temporally or spatially linked. In addition, due to the development of novel sequencing platforms and chemistries, each with implicit strengths and weaknesses, it will be helpful for researchers to be able to routinely compare and combine data sets from different platforms/chemistries. In particular, error associated with a specific sequencing process must be quantified so that true biological variation may be identified. Results Segminator II was developed to allow for the efficient comparison of data sets derived from different sources. We demonstrate its usage by comparing large data sets from 12 influenza H1N1 samples sequenced on both the 454 Life Sciences and Illumina platforms, permitting quantification of platform error. For mismatches median error rates at 0.10 and 0.12%, respectively, suggested that both platforms performed similarly. For insertions and deletions median error rates within the 454 data (at 0.3 and 0.2%, respectively were significantly higher than those within the Illumina data (0.004 and 0.006%, respectively. In agreement with previous observations these higher rates were strongly associated with homopolymeric stretches on the 454 platform. Outside of such regions both platforms had similar indel error profiles. Additionally, we apply our software to the identification of low frequency variants. Conclusion We have demonstrated, using Segminator II, that it is possible to distinguish platform specific error from biological variation using data derived from two different platforms. We have used this approach to quantify the amount of error present within the 454 and Illumina platforms in

  17. Single-nucleotide polymorphism discovery by high-throughput sequencing in sorghum

    Directory of Open Access Journals (Sweden)

    White Frank F

    2011-07-01

    Full Text Available Abstract Background Eight diverse sorghum (Sorghum bicolor L. Moench accessions were subjected to short-read genome sequencing to characterize the distribution of single-nucleotide polymorphisms (SNPs. Two strategies were used for DNA library preparation. Missing SNP genotype data were imputed by local haplotype comparison. The effect of library type and genomic diversity on SNP discovery and imputation are evaluated. Results Alignment of eight genome equivalents (6 Gb to the public reference genome revealed 283,000 SNPs at ≥82% confirmation probability. Sequencing from libraries constructed to limit sequencing to start at defined restriction sites led to genotyping 10-fold more SNPs in all 8 accessions, and correctly imputing 11% more missing data, than from semirandom libraries. The SNP yield advantage of the reduced-representation method was less than expected, since up to one fifth of reads started at noncanonical restriction sites and up to one third of restriction sites predicted in silico to yield unique alignments were not sampled at near-saturation. For imputation accuracy, the availability of a genomically similar accession in the germplasm panel was more important than panel size or sequencing coverage. Conclusions A sequence quantity of 3 million 50-base reads per accession using a BsrFI library would conservatively provide satisfactory genotyping of 96,000 sorghum SNPs. For most reliable SNP-genotype imputation in shallowly sequenced genomes, germplasm panels should consist of pairs or groups of genomically similar entries. These results may help in designing strategies for economical genotyping-by-sequencing of large numbers of plant accessions.

  18. Fine grained compositional analysis of Port Everglades Inlet microbiome using high throughput DNA sequencing.

    Science.gov (United States)

    O'Connell, Lauren; Gao, Song; McCorquodale, Donald; Fleisher, Jay; Lopez, Jose V

    2018-01-01

    Similar to natural rivers, manmade inlets connect inland runoff to the ocean. Port Everglades Inlet (PEI) is a busy cargo and cruise ship port in South Florida, which can act as a source of pollution to surrounding beaches and offshore coral reefs. Understanding the composition and fluctuations of bacterioplankton communities ("microbiomes") in major port inlets is important due to potential impacts on surrounding environments. We hypothesize seasonal microbial fluctuations, which were profiled by high throughput 16S rRNA amplicon sequencing and analysis. Surface water samples were collected every week for one year. A total of four samples per month, two from each sampling location, were used for statistical analysis creating a high sampling frequency and finer sampling scale than previous inlet microbiome studies. We observed significant differences in community alpha diversity between months and seasons. Analysis of composition of microbiomes (ANCOM) tests were run in QIIME 2 at genus level taxonomic classification to determine which genera were differentially abundant between seasons and months. Beta diversity results yielded significant differences in PEI community composition in regard to month, season, water temperature, and salinity. Analysis of potentially pathogenic genera showed presence of Staphylococcus and Streptococcus . However, statistical analysis indicated that these organisms were not present in significantly high abundances throughout the year or between seasons. Significant differences in alpha diversity were observed when comparing microbial communities with respect to time. This observation stems from the high community evenness and low community richness in August. This indicates that only a few organisms dominated the community during this month. August had lower than average rainfall levels for a wet season, which may have contributed to less runoff, and fewer bacterial groups introduced into the port surface waters. Bacterioplankton beta

  19. Fine grained compositional analysis of Port Everglades Inlet microbiome using high throughput DNA sequencing

    Directory of Open Access Journals (Sweden)

    Lauren O’Connell

    2018-05-01

    Full Text Available Background Similar to natural rivers, manmade inlets connect inland runoff to the ocean. Port Everglades Inlet (PEI is a busy cargo and cruise ship port in South Florida, which can act as a source of pollution to surrounding beaches and offshore coral reefs. Understanding the composition and fluctuations of bacterioplankton communities (“microbiomes” in major port inlets is important due to potential impacts on surrounding environments. We hypothesize seasonal microbial fluctuations, which were profiled by high throughput 16S rRNA amplicon sequencing and analysis. Methods & Results Surface water samples were collected every week for one year. A total of four samples per month, two from each sampling location, were used for statistical analysis creating a high sampling frequency and finer sampling scale than previous inlet microbiome studies. We observed significant differences in community alpha diversity between months and seasons. Analysis of composition of microbiomes (ANCOM tests were run in QIIME 2 at genus level taxonomic classification to determine which genera were differentially abundant between seasons and months. Beta diversity results yielded significant differences in PEI community composition in regard to month, season, water temperature, and salinity. Analysis of potentially pathogenic genera showed presence of Staphylococcus and Streptococcus. However, statistical analysis indicated that these organisms were not present in significantly high abundances throughout the year or between seasons. Discussion Significant differences in alpha diversity were observed when comparing microbial communities with respect to time. This observation stems from the high community evenness and low community richness in August. This indicates that only a few organisms dominated the community during this month. August had lower than average rainfall levels for a wet season, which may have contributed to less runoff, and fewer bacterial groups

  20. High-throughput sequencing of forensic genetic samples using punches of FTA cards with buccal swabs.

    Science.gov (United States)

    Kampmann, Marie-Louise; Buchard, Anders; Børsting, Claus; Morling, Niels

    2016-01-01

    Here, we demonstrate that punches from buccal swab samples preserved on FTA cards can be used for high-throughput DNA sequencing, also known as massively parallel sequencing (MPS). We typed 44 reference samples with the HID-Ion AmpliSeq Identity Panel using washed 1.2 mm punches from FTA cards with buccal swabs and compared the results with those obtained with DNA extracted using the EZ1 DNA Investigator Kit. Concordant profiles were obtained for all samples. Our protocol includes simple punch, wash, and PCR steps, reducing cost and hands-on time in the laboratory. Furthermore, it facilitates automation of DNA sequencing.

  1. High signals in the uterine cervix on T2-weighted MRI sequences

    International Nuclear Information System (INIS)

    Graef, De M.; Karam, R.; Daclin, P.Y.; Rouanet, J.P.; Juhan, V.; Maubon, A.J.

    2003-01-01

    The aim of this pictorial review was to illustrate the normal cervix appearance on T2-weighted images, and give a review of common or less common disorders of the uterine cervix that appear as high signal intensity lesions on T2-weighted sequences. Numerous aetiologies dominated by cervical cancer are reviewed and discussed. This gamut is obviously incomplete; however, radiologists who perform MR women's imaging should perform T2-weighted sequences in the sagittal plane regardless of the indication for pelvic MR. Those sequences will diagnose some previously unknown cervical cancers as well as many other unknown cervical or uterine lesions. (orig.)

  2. Targeted DNA Methylation Analysis by High Throughput Sequencing in Porcine Peri-attachment Embryos

    OpenAIRE

    MORRILL, Benson H.; COX, Lindsay; WARD, Anika; HEYWOOD, Sierra; PRATHER, Randall S.; ISOM, S. Clay

    2013-01-01

    Abstract The purpose of this experiment was to implement and evaluate the effectiveness of a next-generation sequencing-based method for DNA methylation analysis in porcine embryonic samples. Fourteen discrete genomic regions were amplified by PCR using bisulfite-converted genomic DNA derived from day 14 in vivo-derived (IVV) and parthenogenetic (PA) porcine embryos as template DNA. Resulting PCR products were subjected to high-throughput sequencing using the Illumina Genome Analyzer IIx plat...

  3. Retirement Sequences of Older Americans: Moderately Destandardized and Highly Stratified Across Gender, Class, and Race.

    Science.gov (United States)

    Calvo, Esteban; Madero-Cabib, Ignacio; Staudinger, Ursula M

    2017-06-06

    A destandardization of labor-force patterns revolving around retirement has been observed in recent literature. It is unclear, however, to which degree and of which kind. This study looked at sequences rather than individual statuses or transitions and argued that differentiating older Americans' retirement sequences by type, order, and timing and considering gender, class, and race differences yields a less destandardized picture. Sequence analysis was employed to analyze panel data from the Health and Retirement Study (HRS) for 7,881 individuals observed 6 consecutive times between ages 60-61 and 70-71. As expected, types of retirement sequences were identified that cannot be subsumed under the conventional model of complete retirement from full-time employment around age 65. However, these retirement sequences were not entirely destandardized, as some irreversibility and age-grading persisted. Further, the degree of destandardization varied along gender, class, and race. Unconventional sequences were archetypal for middle-level educated individuals and Blacks. Also, sequences for women and individuals with lower education showed more unemployment and part-time jobs, and less age-grading. A sequence-analytic approach that models group differences uncovers misjudgments about the degree of destandardization of retirement sequences. When a continuous process is represented as individual transitions, the overall pattern of retirement sequences gets lost and appears destandardized. These patterns get further complicated by differences in social structures by gender, class, and race in ways that seem to reproduce advantages that men, more highly educated individuals, and Whites enjoy in numerous areas over the life course. © The Author 2017. Published by Oxford University Press on behalf of The Gerontological Society of America. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  4. High depth, whole-genome sequencing of cholera isolates from Haiti and the Dominican Republic.

    Science.gov (United States)

    Sealfon, Rachel; Gire, Stephen; Ellis, Crystal; Calderwood, Stephen; Qadri, Firdausi; Hensley, Lisa; Kellis, Manolis; Ryan, Edward T; LaRocque, Regina C; Harris, Jason B; Sabeti, Pardis C

    2012-09-11

    Whole-genome sequencing is an important tool for understanding microbial evolution and identifying the emergence of functionally important variants over the course of epidemics. In October 2010, a severe cholera epidemic began in Haiti, with additional cases identified in the neighboring Dominican Republic. We used whole-genome approaches to sequence four Vibrio cholerae isolates from Haiti and the Dominican Republic and three additional V. cholerae isolates to a high depth of coverage (>2000x); four of the seven isolates were previously sequenced. Using these sequence data, we examined the effect of depth of coverage and sequencing platform on genome assembly and identification of sequence variants. We found that 50x coverage is sufficient to construct a whole-genome assembly and to accurately call most variants from 100 base pair paired-end sequencing reads. Phylogenetic analysis between the newly sequenced and thirty-three previously sequenced V. cholerae isolates indicates that the Haitian and Dominican Republic isolates are closest to strains from South Asia. The Haitian and Dominican Republic isolates form a tight cluster, with only four variants unique to individual isolates. These variants are located in the CTX region, the SXT region, and the core genome. Of the 126 mutations identified that separate the Haiti-Dominican Republic cluster from the V. cholerae reference strain (N16961), 73 are non-synonymous changes, and a number of these changes cluster in specific genes and pathways. Sequence variant analyses of V. cholerae isolates, including multiple isolates from the Haitian outbreak, identify coverage-specific and technology-specific effects on variant detection, and provide insight into genomic change and functional evolution during an epidemic.

  5. High depth, whole-genome sequencing of cholera isolates from Haiti and the Dominican Republic

    Directory of Open Access Journals (Sweden)

    Sealfon Rachel

    2012-09-01

    Full Text Available Abstract Background Whole-genome sequencing is an important tool for understanding microbial evolution and identifying the emergence of functionally important variants over the course of epidemics. In October 2010, a severe cholera epidemic began in Haiti, with additional cases identified in the neighboring Dominican Republic. We used whole-genome approaches to sequence four Vibrio cholerae isolates from Haiti and the Dominican Republic and three additional V. cholerae isolates to a high depth of coverage (>2000x; four of the seven isolates were previously sequenced. Results Using these sequence data, we examined the effect of depth of coverage and sequencing platform on genome assembly and identification of sequence variants. We found that 50x coverage is sufficient to construct a whole-genome assembly and to accurately call most variants from 100 base pair paired-end sequencing reads. Phylogenetic analysis between the newly sequenced and thirty-three previously sequenced V. cholerae isolates indicates that the Haitian and Dominican Republic isolates are closest to strains from South Asia. The Haitian and Dominican Republic isolates form a tight cluster, with only four variants unique to individual isolates. These variants are located in the CTX region, the SXT region, and the core genome. Of the 126 mutations identified that separate the Haiti-Dominican Republic cluster from the V. cholerae reference strain (N16961, 73 are non-synonymous changes, and a number of these changes cluster in specific genes and pathways. Conclusions Sequence variant analyses of V. cholerae isolates, including multiple isolates from the Haitian outbreak, identify coverage-specific and technology-specific effects on variant detection, and provide insight into genomic change and functional evolution during an epidemic.

  6. The specificity of memory for a highly trained finger movement sequence: Change the ending, change all.

    Science.gov (United States)

    Rozanov, Simon; Keren, Ofer; Karni, Avi

    2010-05-17

    How are highly trained movement sequences represented in long-term memory? Here we show that the gains attained in the performance of a well-trained sequence of finger movements can be expressed only when the order of the movements is exactly as practiced. Ten young adults were trained to perform a given 5-element sequence of finger-to-thumb opposition movements with their left hand. Movements were analyzed using video based tracking. Three weeks of training resulted, along with improved accuracy, in robustly shortened movement times as well as shorter finger-to-thumb touch times. However, there was little transfer of these gains in speed to the execution of the same component movements arranged in a new order. Moreover, even when the only change was the omission of the one before final movement of the trained sequence (Omit sequence), the initial movements of the sequence were significantly slowed down, although these movements were identical to the initial movements of the trained sequence. Our results support the notion that a well-trained sequence of finger movements can be represented, in the adult motor system, as a singular, co-articulated, unit of movement, in which even the initial component movements are contingent on the subsequent, anticipated, ones. Because of co-articulation related anticipatory effects, gains in fluency and accuracy acquired in training on a specific movement sequence cannot be expressed in full in the execution of the trained component movements or of a full segment of the trained sequence, if followed by a different ending segment. Copyright 2010. Published by Elsevier B.V.

  7. SEED 2: a user-friendly platform for amplicon high-throughput sequencing data analyses.

    Science.gov (United States)

    Vetrovský, Tomáš; Baldrian, Petr; Morais, Daniel; Berger, Bonnie

    2018-02-14

    Modern molecular methods have increased our ability to describe microbial communities. Along with the advances brought by new sequencing technologies, we now require intensive computational resources to make sense of the large numbers of sequences continuously produced. The software developed by the scientific community to address this demand, although very useful, require experience of the command-line environment, extensive training and have steep learning curves, limiting their use. We created SEED 2, a graphical user interface for handling high-throughput amplicon-sequencing data under Windows operating systems. SEED 2 is the only sequence visualizer that empowers users with tools to handle amplicon-sequencing data of microbial community markers. It is suitable for any marker genes sequences obtained through Illumina, IonTorrent or Sanger sequencing. SEED 2 allows the user to process raw sequencing data, identify specific taxa, produce of OTU-tables, create sequence alignments and construct phylogenetic trees. Standard dual core laptops with 8 GB of RAM can handle ca. 8 million of Illumina PE 300 bp sequences, ca. 4GB of data. SEED 2 was implemented in Object Pascal and uses internal functions and external software for amplicon data processing. SEED 2 is a freeware software, available at http://www.biomed.cas.cz/mbu/lbwrf/seed/ as a self-contained file, including all the dependencies, and does not require installation. Supplementary data contain a comprehensive list of supported functions. daniel.morais@biomed.cas.cz. Supplementary data are available at Bioinformatics online. © The Author(s) 2018. Published by Oxford University Press.

  8. The application of the high throughput sequencing technology in the transposable elements.

    Science.gov (United States)

    Liu, Zhen; Xu, Jian-hong

    2015-09-01

    High throughput sequencing technology has dramatically improved the efficiency of DNA sequencing, and decreased the costs to a great extent. Meanwhile, this technology usually has advantages of better specificity, higher sensitivity and accuracy. Therefore, it has been applied to the research on genetic variations, transcriptomics and epigenomics. Recently, this technology has been widely employed in the studies of transposable elements and has achieved fruitful results. In this review, we summarize the application of high throughput sequencing technology in the fields of transposable elements, including the estimation of transposon content, preference of target sites and distribution, insertion polymorphism and population frequency, identification of rare copies, transposon horizontal transfers as well as transposon tagging. We also briefly introduce the major common sequencing strategies and algorithms, their advantages and disadvantages, and the corresponding solutions. Finally, we envision the developing trends of high throughput sequencing technology, especially the third generation sequencing technology, and its application in transposon studies in the future, hopefully providing a comprehensive understanding and reference for related scientific researchers.

  9. Salmonella enterica Prophage Sequence Profiles Reflect Genome Diversity and Can Be Used for High Discrimination Subtyping

    Directory of Open Access Journals (Sweden)

    Walid Mottawea

    2018-05-01

    Full Text Available Non-typhoidal Salmonella is a leading cause of foodborne illness worldwide. Prompt and accurate identification of the sources of Salmonella responsible for disease outbreaks is crucial to minimize infections and eliminate ongoing sources of contamination. Current subtyping tools including single nucleotide polymorphism (SNP typing may be inadequate, in some instances, to provide the required discrimination among epidemiologically unrelated Salmonella strains. Prophage genes represent the majority of the accessory genes in bacteria genomes and have potential to be used as high discrimination markers in Salmonella. In this study, the prophage sequence diversity in different Salmonella serovars and genetically related strains was investigated. Using whole genome sequences of 1,760 isolates of S. enterica representing 151 Salmonella serovars and 66 closely related bacteria, prophage sequences were identified from assembled contigs using PHASTER. We detected 154 different prophages in S. enterica genomes. Prophage sequences were highly variable among S. enterica serovars with a median ± interquartile range (IQR of 5 ± 3 prophage regions per genome. While some prophage sequences were highly conserved among the strains of specific serovars, few regions were lineage specific. Therefore, strains belonging to each serovar could be clustered separately based on their prophage content. Analysis of S. Enteritidis isolates from seven outbreaks generated distinct prophage profiles for each outbreak. Taken altogether, the diversity of the prophage sequences correlates with genome diversity. Prophage repertoires provide an additional marker for differentiating S. enterica subtypes during foodborne outbreaks.

  10. High-throughput sequencing, characterization and detection of new and conserved cucumber miRNAs.

    Directory of Open Access Journals (Sweden)

    Germán Martínez

    Full Text Available Micro RNAS (miRNAs are a class of endogenous small non coding RNAs involved in the post-transcriptional regulation of gene expression. In plants, a great number of conserved and specific miRNAs, mainly arising from model species, have been identified to date. However less is known about the diversity of these regulatory RNAs in vegetal species with agricultural and/or horticultural importance. Here we report a combined approach of bioinformatics prediction, high-throughput sequencing data and molecular methods to analyze miRNAs populations in cucumber (Cucumis sativus plants. A set of 19 conserved and 6 known but non-conserved miRNA families were found in our cucumber small RNA dataset. We also identified 7 (3 with their miRNA* strand not previously described miRNAs, candidates to be cucumber-specific. To validate their description these new C. sativus miRNAs were detected by northern blot hybridization. Additionally, potential targets for most conserved and new miRNAs were identified in cucumber genome.In summary, in this study we have identified, by first time, conserved, known non-conserved and new miRNAs arising from an agronomically important species such as C. sativus. The detection of this complex population of regulatory small RNAs suggests that similarly to that observe in other plant species, cucumber miRNAs may possibly play an important role in diverse biological and metabolic processes.

  11. Thousands of primer-free, high-quality, full-length SSU rRNA sequences from all domains of life

    DEFF Research Database (Denmark)

    Karst, Soeren M; Dueholm, Morten S; McIlroy, Simon J

    2016-01-01

    Ribosomal RNA (rRNA) genes are the consensus marker for determination of microbial diversity on the planet, invaluable in studies of evolution and, for the past decade, high-throughput sequencing of variable regions of ribosomal RNA genes has become the backbone of most microbial ecology studies...... (SSU) rRNA genes and synthetic long read sequencing by molecular tagging, to generate primer-free, full-length SSU rRNA gene sequences from all domains of life, with a median raw error rate of 0.17%. We generated thousands of full-length SSU rRNA sequences from five well-studied ecosystems (soil, human...... gut, fresh water, anaerobic digestion, and activated sludge) and obtained sequences covering all domains of life and the majority of all described phyla. Interestingly, 30% of all bacterial operational taxonomic units were novel, compared to the SILVA database (less than 97% similarity...

  12. Exome sequencing generates high quality data in non-target regions

    Directory of Open Access Journals (Sweden)

    Guo Yan

    2012-05-01

    Full Text Available Abstract Background Exome sequencing using next-generation sequencing technologies is a cost efficient approach to selectively sequencing coding regions of human genome for detection of disease variants. A significant amount of DNA fragments from the capture process fall outside target regions, and sequence data for positions outside target regions have been mostly ignored after alignment. Result We performed whole exome sequencing on 22 subjects using Agilent SureSelect capture reagent and 6 subjects using Illumina TrueSeq capture reagent. We also downloaded sequencing data for 6 subjects from the 1000 Genomes Project Pilot 3 study. Using these data, we examined the quality of SNPs detected outside target regions by computing consistency rate with genotypes obtained from SNP chips or the Hapmap database, transition-transversion (Ti/Tv ratio, and percentage of SNPs inside dbSNP. For all three platforms, we obtained high-quality SNPs outside target regions, and some far from target regions. In our Agilent SureSelect data, we obtained 84,049 high-quality SNPs outside target regions compared to 65,231 SNPs inside target regions (a 129% increase. For our Illumina TrueSeq data, we obtained 222,171 high-quality SNPs outside target regions compared to 95,818 SNPs inside target regions (a 232% increase. For the data from the 1000 Genomes Project, we obtained 7,139 high-quality SNPs outside target regions compared to 1,548 SNPs inside target regions (a 461% increase. Conclusions These results demonstrate that a significant amount of high quality genotypes outside target regions can be obtained from exome sequencing data. These data should not be ignored in genetic epidemiology studies.

  13. Highly similar prokaryotic communities of sunken wood at shallow and deep-sea sites across the oceans.

    Science.gov (United States)

    Palacios, Carmen; Zbinden, Magali; Pailleret, Marie; Gaill, Françoise; Lebaron, Philippe

    2009-11-01

    With an increased appreciation of the frequency of their occurrence, large organic falls such as sunken wood and whale carcasses have become important to consider in the ecology of the oceans. Organic-rich deep-sea falls may play a major role in the dispersal and evolution of chemoautotrophic communities at the ocean floor, and chemosynthetic symbiotic, free-living, and attached microorganisms may drive the primary production at these communities. However, little is known about the microbiota thriving in and around organic falls. Our aim was to investigate and compare free-living and attached communities of bacteria and archaea from artificially immersed and naturally sunken wood logs with varying characteristics at several sites in the deep sea and in shallow water to address basic questions on the microbial ecology of sunken wood. Multivariate indirect ordination analyses of capillary electrophoresis single-stranded conformation polymorphisms (CE-SSCP) fingerprinting profiles demonstrated high similarity of bacterial and archaeal assemblages present in timbers and logs situated at geographically distant sites and at different depths of immersion. This similarity implies that wood falls harbor a specialized microbiota as observed in other ecosystems when the same environmental conditions reoccur. Scanning and transmission electron microscopy observations combined with multivariate direct gradient analysis of Bacteria CE-SSCP profiles demonstrate that type of wood (hard vs. softwood), and time of immersion are important in structuring sunken wood bacterial communities. Archaeal populations were present only in samples with substantial signs of decay, which were also more similar in their bacterial assemblages, providing indirect evidence of temporal succession in the microbial communities that develop in and around wood falls.

  14. Chemistry, the Central Science? The History of the High School Science Sequence

    Science.gov (United States)

    Sheppard, Keith; Robbins, Dennis M.

    2005-01-01

    Chemistry became the ''central science'' not by design but by accident in the US high schools. The three important factors, which had their influence on the high school science, are sequenced and their impact on the development of US science education, are mentioned.

  15. Interconnectedness during high water maintains similarity in fish assemblages of island floodplain lakes in the Amazonian Basin

    Directory of Open Access Journals (Sweden)

    Carlos Edwar de C. Freitas

    2010-01-01

    Full Text Available We conducted a study to test the hypothesis that interconnectedness among island floodplain lakes and the adjacent Solimões River during the flood stage of the hydrologic cycle is enough to maintain similarity in fish species assemblages. Gill net samples were collected during high and low water periods for three consecutive years (July 2004 to July 2006 in four lakes on Paciência Island. Two lakes, Piranha and Ressaca, are connected to the river all year, and the other two, Preto and Cacau, which are in the center of the island, are isolated during low water periods. The abundance, species richness and evenness of the fish assemblages in these lakes did not differ according to their relative positions or the season of the hydrological cycle, which confirmed our hypothesis. However, fish abundance during the dry season was greater than in the flood season. Apparently, the short period of full connection between the lakes is enough to allow the colonization of all fish species, but not to cause similar abundances. Our study indicates that persistence of the species composition of island floodplain lakes is primarily due to the annual replenishment of fish to the lakes during the flood season.

  16. Activity/inactivity circadian rhythm shows high similarities between young obesity-induced rats and old rats.

    Science.gov (United States)

    Bravo Santos, R; Delgado, J; Cubero, J; Franco, L; Ruiz-Moyano, S; Mesa, M; Rodríguez, A B; Uguz, C; Barriga, C

    2016-03-01

    The objective of the present study was to compare differences between elderly rats and young obesity-induced rats in their activity/inactivity circadian rhythm. The investigation was motivated by the differences reported previously for the circadian rhythms of both obese and elderly humans (and other animals), and those of healthy, young or mature individuals. Three groups of rats were formed: a young control group which was fed a standard chow for rodents; a young obesity-induced group which was fed a high-fat diet for four months; and an elderly control group with rats aged 2.5 years that was fed a standard chow for rodents. Activity/inactivity data were registered through actimetry using infrared actimeter systems in each cage to detect activity. Data were logged on a computer and chronobiological analysis were performed. The results showed diurnal activity (sleep time), nocturnal activity (awake time), amplitude, acrophase, and interdaily stability to be similar between the young obesity-induced group and the elderly control group, but different in the young control group. We have concluded that obesity leads to a chronodisruption status in the body similar to the circadian rhythm degradation observed in the elderly.

  17. Implication of the cause of differences in 3D structures of proteins with high sequence identity based on analyses of amino acid sequences and 3D structures.

    Science.gov (United States)

    Matsuoka, Masanari; Sugita, Masatake; Kikuchi, Takeshi

    2014-09-18

    Proteins that share a high sequence homology while exhibiting drastically different 3D structures are investigated in this study. Recently, artificial proteins related to the sequences of the GA and IgG binding GB domains of human serum albumin have been designed. These artificial proteins, referred to as GA and GB, share 98% amino acid sequence identity but exhibit different 3D structures, namely, a 3α bundle versus a 4β + α structure. Discriminating between their 3D structures based on their amino acid sequences is a very difficult problem. In the present work, in addition to using bioinformatics techniques, an analysis based on inter-residue average distance statistics is used to address this problem. It was hard to distinguish which structure a given sequence would take only with the results of ordinary analyses like BLAST and conservation analyses. However, in addition to these analyses, with the analysis based on the inter-residue average distance statistics and our sequence tendency analysis, we could infer which part would play an important role in its structural formation. The results suggest possible determinants of the different 3D structures for sequences with high sequence identity. The possibility of discriminating between the 3D structures based on the given sequences is also discussed.

  18. On the optimal trimming of high-throughput mRNA sequence data

    Directory of Open Access Journals (Sweden)

    Matthew D MacManes

    2014-01-01

    Full Text Available The widespread and rapid adoption of high-throughput sequencing technologies has afforded researchers the opportunity to gain a deep understanding of genome level processes that underlie evolutionary change, and perhaps more importantly, the links between genotype and phenotype. In particular, researchers interested in functional biology and adaptation have used these technologies to sequence mRNA transcriptomes of specific tissues, which in turn are often compared to other tissues, or other individuals with different phenotypes. While these techniques are extremely powerful, careful attention to data quality is required. In particular, because high-throughput sequencing is more error-prone than traditional Sanger sequencing, quality trimming of sequence reads should be an important step in all data processing pipelines. While several software packages for quality trimming exist, no general guidelines for the specifics of trimming have been developed. Here, using empirically derived sequence data, I provide general recommendations regarding the optimal strength of trimming, specifically in mRNA-Seq studies. Although very aggressive quality trimming is common, this study suggests that a more gentle trimming, specifically of those nucleotides whose Phred score < 2 or < 5, is optimal for most studies across a wide variety of metrics.

  19. A robust, simple genotyping-by-sequencing (GBS approach for high diversity species.

    Directory of Open Access Journals (Sweden)

    Robert J Elshire

    Full Text Available Advances in next generation technologies have driven the costs of DNA sequencing down to the point that genotyping-by-sequencing (GBS is now feasible for high diversity, large genome species. Here, we report a procedure for constructing GBS libraries based on reducing genome complexity with restriction enzymes (REs. This approach is simple, quick, extremely specific, highly reproducible, and may reach important regions of the genome that are inaccessible to sequence capture approaches. By using methylation-sensitive REs, repetitive regions of genomes can be avoided and lower copy regions targeted with two to three fold higher efficiency. This tremendously simplifies computationally challenging alignment problems in species with high levels of genetic diversity. The GBS procedure is demonstrated with maize (IBM and barley (Oregon Wolfe Barley recombinant inbred populations where roughly 200,000 and 25,000 sequence tags were mapped, respectively. An advantage in species like barley that lack a complete genome sequence is that a reference map need only be developed around the restriction sites, and this can be done in the process of sample genotyping. In such cases, the consensus of the read clusters across the sequence tagged sites becomes the reference. Alternatively, for kinship analyses in the absence of a reference genome, the sequence tags can simply be treated as dominant markers. Future application of GBS to breeding, conservation, and global species and population surveys may allow plant breeders to conduct genomic selection on a novel germplasm or species without first having to develop any prior molecular tools, or conservation biologists to determine population structure without prior knowledge of the genome or diversity in the species.

  20. Communicating the Benefits of a Full Sequence of High School Science Courses

    Science.gov (United States)

    Nicholas, Catherine Marie

    High school students are generally uninformed about the benefits of enrolling in a full sequence of science courses, therefore only about a third of our nation's high school graduates have completed the science sequence of Biology, Chemistry and Physics. The lack of students completing a full sequence of science courses contributes to the deficit in the STEM degree production rate needed to fill the demand of the current job market and remain competitive as a nation. The purpose of the study was to make a difference in the number of students who have access to information about the benefits of completing a full sequence of science courses. This dissertation study employed qualitative research methodology to gain a broad perspective of staff through a questionnaire and document review and then a deeper understanding through semi-structured interview protocol. The data revealed that a universal sequence of science courses in the high school district did not exist. It also showed that not all students had access to all science courses; students were sorted and tracked according to prerequisites that did not necessarily match the skill set needed for the courses. In addition, the study showed a desire for more support and direction from the district office. It was also apparent that there was a disconnect that existed between who staff members believed should enroll in a full sequence of science courses and who actually enrolled. Finally, communication about science was shown to occur mainly through counseling and peers. A common science sequence, detracking of science courses, increased communication about the postsecondary and academic benefits of a science education, increased district direction and realistic mathematics alignment were all discussed as solutions to the problem.

  1. Newcastle Disease Viruses Causing Recent Outbreaks Worldwide Show Unexpectedly High Genetic Similarity to Historical Virulent Isolates from the 1940s

    Science.gov (United States)

    Dimitrov, Kiril M.; Lee, Dong-Hun; Williams-Coplin, Dawn; Olivier, Timothy L.; Miller, Patti J.

    2016-01-01

    Virulent strains of Newcastle disease virus (NDV) cause Newcastle disease (ND), a devastating disease of poultry and wild birds. Phylogenetic analyses clearly distinguish historical isolates (obtained prior to 1960) from currently circulating viruses of class II genotypes V, VI, VII, and XII through XVIII. Here, partial and complete genomic sequences of recent virulent isolates of genotypes II and IX from China, Egypt, and India were found to be nearly identical to those of historical viruses isolated in the 1940s. Phylogenetic analysis, nucleotide distances, and rates of change demonstrate that these recent isolates have not evolved significantly from the most closely related ancestors from the 1940s. The low rates of change for these virulent viruses (7.05 × 10−5 and 2.05 × 10−5 per year, respectively) and the minimal genetic distances existing between these and historical viruses (0.3 to 1.2%) of the same genotypes indicate an unnatural origin. As with any other RNA virus, Newcastle disease virus is expected to evolve naturally; thus, these findings suggest that some recent field isolates should be excluded from evolutionary studies. Furthermore, phylogenetic analyses show that these recent virulent isolates are more closely related to virulent strains isolated during the 1940s, which have been and continue to be used in laboratory and experimental challenge studies. Since the preservation of viable viruses in the environment for over 6 decades is highly unlikely, it is possible that the source of some of the recent virulent viruses isolated from poultry and wild birds might be laboratory viruses. PMID:26888902

  2. Family with sequence similarity 83, member B is a predictor of poor prognosis and a potential therapeutic target for lung adenocarcinoma expressing wild-type epidermal growth factor receptor.

    Science.gov (United States)

    Yamaura, Takumi; Ezaki, Junji; Okabe, Naoyuki; Takagi, Hironori; Ozaki, Yuki; Inoue, Takuya; Watanabe, Yuzuru; Fukuhara, Mitsuro; Muto, Satoshi; Matsumura, Yuki; Hasegawa, Takeo; Hoshino, Mika; Osugi, Jun; Shio, Yutaka; Waguri, Satoshi; Tamura, Hirosumi; Imai, Jun-Ichi; Ito, Emi; Yanagisawa, Yuka; Honma, Reiko; Watanabe, Shinya; Suzuki, Hiroyuki

    2018-02-01

    Lung adenocarcinoma (ADC) patients with tumors that harbor no targetable driver gene mutation, such as epidermal growth factor receptor ( EGFR ) gene mutations, have unfavorable prognosis, and thus, novel therapeutic targets are required. Family with sequence similarity 83, member B ( FAM83B ) is a biomarker for squamous cell lung cancer. FAM83B has also recently been shown to serve an important role in the EGFR signaling pathway. In the present study, the molecular and clinical impact of FAM83B in lung ADC was investigated. Matched tumor and adjacent normal tissue samples were obtained from 216 patients who underwent complete lung resection for primary lung ADC and were examined for FAM83B expression using cDNA microarray analysis. The associations between FAM83B expression and clinicopathological parameters, including patient survival, were examined. FAM83B was highly expressed in tumors from males, smokers and in tumors with wild-type EGFR . Multivariate analyses further confirmed that wild-type EGFR tumors were significantly positively associated with FAM83B expression. In survival analysis, FAM83B expression was associated with poor outcomes in disease-free survival and overall survival, particularly when stratified against tumors with wild-type EGFR . Furthermore, FAM83B knockdown was performed to investigate its phenotypic effect on lung ADC cell lines. Gene silencing by FAM83B RNA interference induced growth suppression in the HLC-1 and H1975 lung ADC cell lines. FAM83B may be involved in lung ADC tumor proliferation and can be a predictor of poor survival. FAM83B is also a potential novel therapeutic target for ADC with wild-type EGFR .

  3. Gene Flow Results in High Genetic Similarity Between Sibiraea (Rosaceae species in the Qinghai-Tibetan Plateau

    Directory of Open Access Journals (Sweden)

    Peng-Cheng Fu

    2016-10-01

    Full Text Available Studying closely related species and divergent populations provides insight into the process of speciation. Previous studies showed that the Sibiraea complex's evolutionary history on the Qinghai-Tibetan Plateau (QTP was confusing and could not be distinguishable on the molecular level. In this study, the genetic structure and gene flow of S. laevigata and S. angustata on the QTP was examined across 45 populations using 8 microsatellite loci. Microsatellites revealed high genetic diversity in Sibiraea populations. Most of the variance was detected within populations (87.45% rather than between species (4.39%. We found no significant correlations between genetic and geographical distances among populations. Bayesian cluster analysis grouped all individuals in the sympatric area of Sibiraea into one cluster and other individuals of S. angustata into another. Divergence history analysis based on the approximate Bayesian computation method indicated that the populations of S. angustata at the sympatric area derived from the admixture of 2 species. The assignment test assigned all individuals to populations of their own species rather than its congeneric species. Consistently, intraspecies were detected rather than interspecies first-generation migrants. The bidirectional gene flow in long-term patterns between the 2 species was asymmetric, with more from S. angustata to S. laevigata. In conclusion, the Sibiraea complex was distinguishable on the molecular level using microsatellite loci. We found that the high genetic similarity of these 2 species resulted from huge bidirectional gene flow, especially on the sympatric area where population admixtures between the species occurred.

  4. Use of high flip angle in T1-prepared FAST sequences for myocardial perfusion quantification

    International Nuclear Information System (INIS)

    Vallee, Jean-Paul; Ivancevic, Marko; Lazeyras, Francois; Didier, Dominique; Kasuboski, Larry; Chatelain, Pascal; Righetti, Alberto

    2003-01-01

    This study reports on the first use of high flip angle and radio-frequency (RF) spoiling in T1-prepared fast acquisition in steady state (FAST) sequence for myocardial perfusion in patients. T1 dynamic range was measured in vitro with a FAST, an RF FAST and a snapshot fast low-angle shot (FLASH) sequences with a 90 flip angle. Myocardial perfusion was then measured twice in 6 patients during the same MR session. The RF FAST and FLASH, but not the FAST sequence, demonstrated an extended T1 dynamic range; however, the FLASH images were degraded by artifacts not present on the RF FAST images. The myocardial perfusion indices K1 (first-order transfer constant from the blood to the myocardium for the Gd-DTPA) and Vd (distribution volume of Gd-DTPA in myocardium) did not differ significantly between the two injections. K1 was 0.48±0.12 ml/min g -1 and Vd was 12.5±2.9%. With an extended T1 dynamic range and the sensitivity required for myocardial perfusion quantification, the RF FAST sequence with a 90 flip angle outperformed the snapshot FLASH sequence in terms of image quality and the FAST sequence in terms of contrast dynamic range. (orig.)

  5. Detection of genomic variation by selection of a 9 mb DNA region and high throughput sequencing.

    Directory of Open Access Journals (Sweden)

    Sergey I Nikolaev

    Full Text Available Detection of the rare polymorphisms and causative mutations of genetic diseases in a targeted genomic area has become a major goal in order to understand genomic and phenotypic variability. We have interrogated repeat-masked regions of 8.9 Mb on human chromosomes 21 (7.8 Mb and 7 (1.1 Mb from an individual from the International HapMap Project (NA12872. We have optimized a method of genomic selection for high throughput sequencing. Microarray-based selection and sequencing resulted in 260-fold enrichment, with 41% of reads mapping to the target region. 83% of SNPs in the targeted region had at least 4-fold sequence coverage and 54% at least 15-fold. When assaying HapMap SNPs in NA12872, our sequence genotypes are 91.3% concordant in regions with coverage > or = 4-fold, and 97.9% concordant in regions with coverage > or = 15-fold. About 81% of the SNPs recovered with both thresholds are listed in dbSNP. We observed that regions with low sequence coverage occur in close proximity to low-complexity DNA. Validation experiments using Sanger sequencing were performed for 46 SNPs with 15-20 fold coverage, with a confirmation rate of 96%, suggesting that DNA selection provides an accurate and cost-effective method for identifying rare genomic variants.

  6. Functional role of a highly repetitive DNA sequence in anchorage of the mouse genome.

    Science.gov (United States)

    Neuer-Nitsche, B; Lu, X N; Werner, D

    1988-09-12

    The major portion of the eukaryotic genome consists of various categories of repetitive DNA sequences which have been studied with respect to their base compositions, organizations, copy numbers, transcription and species specificities; their biological roles, however, are still unclear. A novel quality of a highly repetitive mouse DNA sequence is described which points to a functional role: All copies (approximately 50,000 per haploid genome) of this DNA sequence reside on genomic Alu I DNA fragments each associated with nuclear polypeptides that are not released from DNA by proteinase K, SDS and phenol extraction. By this quality the repetitive DNA sequence is classified as a member of the sub-set of DNA sequences involved in tight DNA-polypeptide complexes which have been previously shown to be components of the subnuclear structure termed 'nuclear matrix'. From these results it has to be concluded that the repetitive DNA sequence characterized in this report represents or comprises a signal for a large number of site specific attachment points of the mouse genome in the nuclear matrix.

  7. Sequence of a cDNA encoding turtle high mobility group 1 protein.

    Science.gov (United States)

    Zheng, Jifang; Hu, Bi; Wu, Duansheng

    2005-07-01

    In order to understand sequence information about turtle HMG1 gene, a cDNA encoding HMG1 protein of the Chinese soft-shell turtle (Pelodiscus sinensis) was amplified by RT-PCR from kidney total RNA, and was cloned, sequenced and analyzed. The results revealed that the open reading frame (ORF) of turtle HMG1 cDNA is 606 bp long. The ORF codifies 202 amino acid residues, from which two DNA-binding domains and one polyacidic region are derived. The DNA-binding domains share higher amino acid identity with homologues sequences of chicken (96.5%) and mammalian (74%) than homologues sequence of rainbow trout (67%). The polyacidic region shows 84.6% amino acid homology with the equivalent region of chicken HMG1 cDNA. Turtle HMG1 protein contains 3 Cys residues located at completely conserved positions. Conservation in sequence and structure suggests that the functions of turtle HMG1 cDNA may be highly conserved during evolution. To our knowledge, this is the first report of HMG1 cDNA sequence in any reptilian.

  8. High-dose bee venom exposure induces similar tolerogenic B-cell responses in allergic patients and healthy beekeepers.

    Science.gov (United States)

    Boonpiyathad, T; Meyer, N; Moniuszko, M; Sokolowska, M; Eljaszewicz, A; Wirz, O F; Tomasiak-Lozowska, M M; Bodzenta-Lukaszyk, A; Ruxrungtham, K; van de Veen, W

    2017-03-01

    The involvement of B cells in allergen tolerance induction remains largely unexplored. This study investigates the role of B cells in this process, by comparing B-cell responses in allergic patients before and during allergen immunotherapy (AIT) and naturally exposed healthy beekeepers before and during the beekeeping season. Circulating B cells were characterized by flow cytometry. Phospholipase A2 (PLA)-specific B cells were identified using dual-color staining with fluorescently labeled PLA. Expression of regulatory B-cell-associated surface markers, interleukin-10, chemokine receptors, and immunoglobulin heavy-chain isotypes, was measured. Specific and total IgG1, IgG4, IgA, and IgE from plasma as well as culture supernatants of PLA-specific cells were measured by ELISA. Strikingly, similar responses were observed in allergic patients and beekeepers after venom exposure. Both groups showed increased frequencies of plasmablasts, PLA-specific memory B cells, and IL-10-secreting CD73 - CD25 + CD71 + B R 1 cells. Phospholipase A2-specific IgG4-switched memory B cells expanded after bee venom exposure. Interestingly, PLA-specific B cells showed increased CCR5 expression after high-dose allergen exposure while CXCR4, CXCR5, CCR6, and CCR7 expression remained unaffected. This study provides the first detailed characterization of allergen-specific B cells before and after bee venom tolerance induction. The observed B-cell responses in both venom immunotherapy-treated patients and naturally exposed beekeepers suggest a similar functional immunoregulatory role for B cells in allergen tolerance in both groups. These findings can be investigated in other AIT models to determine their potential as biomarkers of early and successful AIT responses. © 2016 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  9. Reliable Detection of Herpes Simplex Virus Sequence Variation by High-Throughput Resequencing.

    Science.gov (United States)

    Morse, Alison M; Calabro, Kaitlyn R; Fear, Justin M; Bloom, David C; McIntyre, Lauren M

    2017-08-16

    High-throughput sequencing (HTS) has resulted in data for a number of herpes simplex virus (HSV) laboratory strains and clinical isolates. The knowledge of these sequences has been critical for investigating viral pathogenicity. However, the assembly of complete herpesviral genomes, including HSV, is complicated due to the existence of large repeat regions and arrays of smaller reiterated sequences that are commonly found in these genomes. In addition, the inherent genetic variation in populations of isolates for viruses and other microorganisms presents an additional challenge to many existing HTS sequence assembly pipelines. Here, we evaluate two approaches for the identification of genetic variants in HSV1 strains using Illumina short read sequencing data. The first, a reference-based approach, identifies variants from reads aligned to a reference sequence and the second, a de novo assembly approach, identifies variants from reads aligned to de novo assembled consensus sequences. Of critical importance for both approaches is the reduction in the number of low complexity regions through the construction of a non-redundant reference genome. We compared variants identified in the two methods. Our results indicate that approximately 85% of variants are identified regardless of the approach. The reference-based approach to variant discovery captures an additional 15% representing variants divergent from the HSV1 reference possibly due to viral passage. Reference-based approaches are significantly less labor-intensive and identify variants across the genome where de novo assembly-based approaches are limited to regions where contigs have been successfully assembled. In addition, regions of poor quality assembly can lead to false variant identification in de novo consensus sequences. For viruses with a well-assembled reference genome, a reference-based approach is recommended.

  10. Comparative high-throughput transcriptome sequencing and development of SiESTa, the Silene EST annotation database

    Directory of Open Access Journals (Sweden)

    Marais Gabriel AB

    2011-07-01

    Full Text Available Abstract Background The genus Silene is widely used as a model system for addressing ecological and evolutionary questions in plants, but advances in using the genus as a model system are impeded by the lack of available resources for studying its genome. Massively parallel sequencing cDNA has recently developed into an efficient method for characterizing the transcriptomes of non-model organisms, generating massive amounts of data that enable the study of multiple species in a comparative framework. The sequences generated provide an excellent resource for identifying expressed genes, characterizing functional variation and developing molecular markers, thereby laying the foundations for future studies on gene sequence and gene expression divergence. Here, we report the results of a comparative transcriptome sequencing study of eight individuals representing four Silene and one Dianthus species as outgroup. All sequences and annotations have been deposited in a newly developed and publicly available database called SiESTa, the Silene EST annotation database. Results A total of 1,041,122 EST reads were generated in two runs on a Roche GS-FLX 454 pyrosequencing platform. EST reads were analyzed separately for all eight individuals sequenced and were assembled into contigs using TGICL. These were annotated with results from BLASTX searches and Gene Ontology (GO terms, and thousands of single-nucleotide polymorphisms (SNPs were characterized. Unassembled reads were kept as singletons and together with the contigs contributed to the unigenes characterized in each individual. The high quality of unigenes is evidenced by the proportion (49% that have significant hits in similarity searches with the A. thaliana proteome. The SiESTa database is accessible at http://www.siesta.ethz.ch. Conclusion The sequence collections established in the present study provide an important genomic resource for four Silene and one Dianthus species and will help to

  11. Comparative high-throughput transcriptome sequencing and development of SiESTa, the Silene EST annotation database

    Science.gov (United States)

    2011-01-01

    Background The genus Silene is widely used as a model system for addressing ecological and evolutionary questions in plants, but advances in using the genus as a model system are impeded by the lack of available resources for studying its genome. Massively parallel sequencing cDNA has recently developed into an efficient method for characterizing the transcriptomes of non-model organisms, generating massive amounts of data that enable the study of multiple species in a comparative framework. The sequences generated provide an excellent resource for identifying expressed genes, characterizing functional variation and developing molecular markers, thereby laying the foundations for future studies on gene sequence and gene expression divergence. Here, we report the results of a comparative transcriptome sequencing study of eight individuals representing four Silene and one Dianthus species as outgroup. All sequences and annotations have been deposited in a newly developed and publicly available database called SiESTa, the Silene EST annotation database. Results A total of 1,041,122 EST reads were generated in two runs on a Roche GS-FLX 454 pyrosequencing platform. EST reads were analyzed separately for all eight individuals sequenced and were assembled into contigs using TGICL. These were annotated with results from BLASTX searches and Gene Ontology (GO) terms, and thousands of single-nucleotide polymorphisms (SNPs) were characterized. Unassembled reads were kept as singletons and together with the contigs contributed to the unigenes characterized in each individual. The high quality of unigenes is evidenced by the proportion (49%) that have significant hits in similarity searches with the A. thaliana proteome. The SiESTa database is accessible at http://www.siesta.ethz.ch. Conclusion The sequence collections established in the present study provide an important genomic resource for four Silene and one Dianthus species and will help to further develop Silene as a

  12. Accurate RNA consensus sequencing for high-fidelity detection of transcriptional mutagenesis-induced epimutations.

    Science.gov (United States)

    Reid-Bayliss, Kate S; Loeb, Lawrence A

    2017-08-29

    Transcriptional mutagenesis (TM) due to misincorporation during RNA transcription can result in mutant RNAs, or epimutations, that generate proteins with altered properties. TM has long been hypothesized to play a role in aging, cancer, and viral and bacterial evolution. However, inadequate methodologies have limited progress in elucidating a causal association. We present a high-throughput, highly accurate RNA sequencing method to measure epimutations with single-molecule sensitivity. Accurate RNA consensus sequencing (ARC-seq) uniquely combines RNA barcoding and generation of multiple cDNA copies per RNA molecule to eliminate errors introduced during cDNA synthesis, PCR, and sequencing. The stringency of ARC-seq can be scaled to accommodate the quality of input RNAs. We apply ARC-seq to directly assess transcriptome-wide epimutations resulting from RNA polymerase mutants and oxidative stress.

  13. Assessing the Diversity of Rodent-Borne Viruses: Exploring of High-Throughput Sequencing and Classical Amplification/Sequencing Approaches.

    Science.gov (United States)

    Drewes, Stephan; Straková, Petra; Drexler, Jan F; Jacob, Jens; Ulrich, Rainer G

    2017-01-01

    Rodents are distributed throughout the world and interact with humans in many ways. They provide vital ecosystem services, some species are useful models in biomedical research and some are held as pet animals. However, many rodent species can have adverse effects such as damage to crops and stored produce, and they are of health concern because of the transmission of pathogens to humans and livestock. The first rodent viruses were discovered by isolation approaches and resulted in break-through knowledge in immunology, molecular and cell biology, and cancer research. In addition to rodent-specific viruses, rodent-borne viruses are causing a large number of zoonotic diseases. Most prominent examples are reemerging outbreaks of human hemorrhagic fever disease cases caused by arena- and hantaviruses. In addition, rodents are reservoirs for vector-borne pathogens, such as tick-borne encephalitis virus and Borrelia spp., and may carry human pathogenic agents, but likely are not involved in their transmission to human. In our days, next-generation sequencing or high-throughput sequencing (HTS) is revolutionizing the speed of the discovery of novel viruses, but other molecular approaches, such as generic RT-PCR/PCR and rolling circle amplification techniques, contribute significantly to the rapidly ongoing process. However, the current knowledge still represents only the tip of the iceberg, when comparing the known human viruses to those known for rodents, the mammalian taxon with the largest species number. The diagnostic potential of HTS-based metagenomic approaches is illustrated by their use in the discovery and complete genome determination of novel borna- and adenoviruses as causative disease agents in squirrels. In conclusion, HTS, in combination with conventional RT-PCR/PCR-based approaches, resulted in a drastically increased knowledge of the diversity of rodent viruses. Future improvements of the used workflows, including bioinformatics analysis, will further

  14. Genome-wide SNP discovery in tetraploid alfalfa using 454 sequencing and high resolution melting analysis

    Directory of Open Access Journals (Sweden)

    Zhao Patrick X

    2011-07-01

    Full Text Available Abstract Background Single nucleotide polymorphisms (SNPs are the most common type of sequence variation among plants and are often functionally important. We describe the use of 454 technology and high resolution melting analysis (HRM for high throughput SNP discovery in tetraploid alfalfa (Medicago sativa L., a species with high economic value but limited genomic resources. Results The alfalfa genotypes selected from M. sativa subsp. sativa var. 'Chilean' and M. sativa subsp. falcata var. 'Wisfal', which differ in water stress sensitivity, were used to prepare cDNA from tissue of clonally-propagated plants grown under either well-watered or water-stressed conditions, and then pooled for 454 sequencing. Based on 125.2 Mb of raw sequence, a total of 54,216 unique sequences were obtained including 24,144 tentative consensus (TCs sequences and 30,072 singletons, ranging from 100 bp to 6,662 bp in length, with an average length of 541 bp. We identified 40,661 candidate SNPs distributed throughout the genome. A sample of candidate SNPs were evaluated and validated using high resolution melting (HRM analysis. A total of 3,491 TCs harboring 20,270 candidate SNPs were located on the M. truncatula (MT 3.5.1 chromosomes. Gene Ontology assignments indicate that sequences obtained cover a broad range of GO categories. Conclusions We describe an efficient method to identify thousands of SNPs distributed throughout the alfalfa genome covering a broad range of GO categories. Validated SNPs represent valuable molecular marker resources that can be used to enhance marker density in linkage maps, identify potential factors involved in heterosis and genetic variation, and as tools for association mapping and genomic selection in alfalfa.

  15. High-throughput sequencing of three Lemnoideae (duckweeds chloroplast genomes from total DNA.

    Directory of Open Access Journals (Sweden)

    Wenqin Wang

    Full Text Available BACKGROUND: Chloroplast genomes provide a wealth of information for evolutionary and population genetic studies. Chloroplasts play a particularly important role in the adaption for aquatic plants because they float on water and their major surface is exposed continuously to sunlight. The subfamily of Lemnoideae represents such a collection of aquatic species that because of photosynthesis represents one of the fastest growing plant species on earth. METHODS: We sequenced the chloroplast genomes from three different genera of Lemnoideae, Spirodela polyrhiza, Wolffiella lingulata and Wolffia australiana by high-throughput DNA sequencing of genomic DNA using the SOLiD platform. Unfractionated total DNA contains high copies of plastid DNA so that sequences from the nucleus and mitochondria can easily be filtered computationally. Remaining sequence reads were assembled into contiguous sequences (contigs using SOLiD software tools. Contigs were mapped to a reference genome of Lemna minor and gaps, selected by PCR, were sequenced on the ABI3730xl platform. CONCLUSIONS: This combinatorial approach yielded whole genomic contiguous sequences in a cost-effective manner. Over 1,000-time coverage of chloroplast from total DNA were reached by the SOLiD platform in a single spot on a quadrant slide without purification. Comparative analysis indicated that the chloroplast genome was conserved in gene number and organization with respect to the reference genome of L. minor. However, higher nucleotide substitution, abundant deletions and insertions occurred in non-coding regions of these genomes, indicating a greater genomic dynamics than expected from the comparison of other related species in the Pooideae. Noticeably, there was no transition bias over transversion in Lemnoideae. The data should have immediate applications in evolutionary biology and plant taxonomy with increased resolution and statistical power.

  16. Extraction of High Molecular Weight DNA from Fungal Rust Spores for Long Read Sequencing.

    Science.gov (United States)

    Schwessinger, Benjamin; Rathjen, John P

    2017-01-01

    Wheat rust fungi are complex organisms with a complete life cycle that involves two different host plants and five different spore types. During the asexual infection cycle on wheat, rusts produce massive amounts of dikaryotic urediniospores. These spores are dikaryotic (two nuclei) with each nucleus containing one haploid genome. This dikaryotic state is likely to contribute to their evolutionary success, making them some of the major wheat pathogens globally. Despite this, most published wheat rust genomes are highly fragmented and contain very little haplotype-specific sequence information. Current long-read sequencing technologies hold great promise to provide more contiguous and haplotype-phased genome assemblies. Long reads are able to span repetitive regions and phase structural differences between the haplomes. This increased genome resolution enables the identification of complex loci and the study of genome evolution beyond simple nucleotide polymorphisms. Long-read technologies require pure high molecular weight DNA as an input for sequencing. Here, we describe a DNA extraction protocol for rust spores that yields pure double-stranded DNA molecules with molecular weight of >50 kilo-base pairs (kbp). The isolated DNA is of sufficient purity for PacBio long-read sequencing, but may require additional purification for other sequencing technologies such as Nanopore and 10× Genomics.

  17. A High Resolution Genetic Map Anchoring Scaffolds of the Sequenced Watermelon Genome

    Science.gov (United States)

    Kou, Qinghe; Jiang, Jiao; Guo, Shaogui; Zhang, Haiying; Hou, Wenju; Zou, Xiaohua; Sun, Honghe; Gong, Guoyi; Levi, Amnon; Xu, Yong

    2012-01-01

    As part of our ongoing efforts to sequence and map the watermelon (Citrullus spp.) genome, we have constructed a high density genetic linkage map. The map positioned 234 watermelon genome sequence scaffolds (an average size of 1.41 Mb) that cover about 330 Mb and account for 93.5% of the 353 Mb of the assembled genomic sequences of the elite Chinese watermelon line 97103 (Citrullus lanatus var. lanatus). The genetic map was constructed using an F8 population of 103 recombinant inbred lines (RILs). The RILs are derived from a cross between the line 97103 and the United States Plant Introduction (PI) 296341-FR (C. lanatus var. citroides) that contains resistance to fusarium wilt (races 0, 1, and 2). The genetic map consists of eleven linkage groups that include 698 simple sequence repeat (SSR), 219 insertion-deletion (InDel) and 36 structure variation (SV) markers and spans ∼800 cM with a mean marker interval of 0.8 cM. Using fluorescent in situ hybridization (FISH) with 11 BACs that produced chromosome-specifc signals, we have depicted watermelon chromosomes that correspond to the eleven linkage groups constructed in this study. The high resolution genetic map developed here should be a useful platform for the assembly of the watermelon genome, for the development of sequence-based markers used in breeding programs, and for the identification of genes associated with important agricultural traits. PMID:22247776

  18. Discovery of viruses and virus-like pathogens in pistachio using high-throughput sequencing

    Science.gov (United States)

    Pistachio (Pistacia vera L.) trees from the National Clonal Germplasm Repository (NCGR) and orchards in California were surveyed for viruses and virus-like agents by high-throughput sequencing (HTS). Analyses of 60 trees including clonal UCB-1 hybrid rootstock (P. atlantica × P. integerrima) identif...

  19. Draft Genome Sequences of Klebsiella oxytoca Isolates Originating from a Highly Contaminated Liquid Hand Soap Product

    OpenAIRE

    Hammerl, J. A.; Lasch, P.; Nitsche, A.; Dabrowski, P. W.; Hahmann, H.; Wicke, A.; Kleta, S.; Dahouk, S. Al; Dieckmann, R.

    2015-01-01

    In 2013, contaminated liquid soap was detected by routine microbiological monitoring of consumer products through state health authorities. Because of its high load of Klebsiella oxytoca, the liquid soap was notified via the European Union Rapid Alert System for Dangerous Non-Food Products (EU-RAPEX) and recalled. Here, we present two draft genome sequences and a summary of their general features.

  20. The Importance of Agriculture Science Course Sequencing in High Schools: A View from Collegiate Agriculture Students

    Science.gov (United States)

    Wheelus, Robin P.

    2009-01-01

    The objective of this study was to investigate the importance of Agriculture Science course sequencing in high schools, as a preparatory factor for students enrolled in collegiate agriculture classes. With the variety of courses listed in the Texas Essential Knowledge and Skills (TEKS) for Agriculture Science, it has been possible for counselors,…

  1. Increasing Classroom Compliance: Using a High-Probability Command Sequence with Noncompliant Students

    Science.gov (United States)

    Axelrod, Michael I.; Zank, Amber J.

    2012-01-01

    Noncompliance is one of the most problematic behaviors within the school setting. One strategy to increase compliance of noncompliant students is a high-probability command sequence (HPCS; i.e., a set of simple commands in which an individual is likely to comply immediately prior to the delivery of a command that has a lower probability of…

  2. Sequence similarity between the erythrocyte binding domain of the Plasmodium vivax Duffy binding protein and the V3 loop of HIV-1 strain MN reveals a functional heparin binding motif involved in binding to the Duffy antigen receptor for chemokines

    OpenAIRE

    Bolton, Michael J; Garry, Robert F

    2011-01-01

    Abstract Background The HIV surface glycoprotein gp120 (SU, gp120) and the Plasmodium vivax Duffy binding protein (PvDBP) bind to chemokine receptors during infection and have a site of amino acid sequence similarity in their binding domains that often includes a heparin binding motif (HBM). Infection by either pathogen has been found to be inhibited by polyanions. Results Specific polyanions that inhibit HIV infection and bind to the V3 loop of X4 strains also inhibited DBP-mediated infectio...

  3. HPV-QUEST: A highly customized system for automated HPV sequence analysis capable of processing Next Generation sequencing data set.

    Science.gov (United States)

    Yin, Li; Yao, Jiqiang; Gardner, Brent P; Chang, Kaifen; Yu, Fahong; Goodenow, Maureen M

    2012-01-01

    Next Generation sequencing (NGS) applied to human papilloma viruses (HPV) can provide sensitive methods to investigate the molecular epidemiology of multiple type HPV infection. Currently a genotyping system with a comprehensive collection of updated HPV reference sequences and a capacity to handle NGS data sets is lacking. HPV-QUEST was developed as an automated and rapid HPV genotyping system. The web-based HPV-QUEST subtyping algorithm was developed using HTML, PHP, Perl scripting language, and MYSQL as the database backend. HPV-QUEST includes a database of annotated HPV reference sequences with updated nomenclature covering 5 genuses, 14 species and 150 mucosal and cutaneous types to genotype blasted query sequences. HPV-QUEST processes up to 10 megabases of sequences within 1 to 2 minutes. Results are reported in html, text and excel formats and display e-value, blast score, and local and coverage identities; provide genus, species, type, infection site and risk for the best matched reference HPV sequence; and produce results ready for additional analyses.

  4. Detailed investigation of the bifurcation diagram of capacitively coupled Josephson junctions in high-Tc superconductors and its self similarity

    Science.gov (United States)

    Hamdipour, Mohammad

    2018-04-01

    We study an array of coupled Josephson junction of superconductor/insulator/superconductor type (SIS junction) as a model for high temperature superconductors with layered structure. In the current-voltage characteristics of this system there is a breakpoint region in which a net electric charge appear on superconducting layers, S-layers, of junctions which motivate us to study the charge dynamics in this region. In this paper first of all we show a current voltage characteristics (CVC) of Intrinsic Josephson Junctions (IJJs) with N=3 Junctions, then we show the breakpoint region in that CVC, then we try to investigate the chaos in this region. We will see that at the end of the breakpoint region, behavior of the system is chaotic and Lyapunov exponent become positive. We also study the route by which the system become chaotic and will see this route is bifurcation. Next goal of this paper is to show the self similarity in the bifurcation diagram of the system and detailed analysis of bifurcation diagram.

  5. Pleomorphic lobular carcinoma: is it more similar to a classic lobular cancer or to a high-grade ductal cancer?

    Directory of Open Access Journals (Sweden)

    Costarelli L

    2017-12-01

    Full Text Available Leopoldo Costarelli, Domenico Campagna, Alessandra Ascarelli, Francesco Cavaliere, Maria Helena Colavito, Tatiana Ponzani, Laura Broglia, Massimo La Pinta, Elena Manna, Lucio Fortunato Breast Unit, San Giovanni-Addolorata Hospital, Rome, Italy Background: Pleomorphic invasive lobular carcinoma (P-ILC is an uncommon variety of invasive lobular carcinoma with aggressive clinical features. Little is described in the literature regarding this topic.Materials and methods: We reviewed our experiences from 2010 to 2015 and compared 40 patients with P-ILC, 126 patients with classic-ILC (C-ILC and 574 cases of high-grade invasive ductal carcinoma (HG-IDC. We studied the histologic and immunohistochemical features, clinical presentation and surgical treatment.Results: P-ILC is diagnosed at the same age and tumor diameter as those of the other two histologic types. It is associated more frequently with multiple lymph node metastases and high proliferative index, and HER2/neu is amplified in 10% of cases. In spite of sharing some histologic characteristics with C-ILC (same growth pattern, loss of E-cadherin expression, same genetic pathway, its clinical and pathologic features define an autonomous entity. Its surgical treatment is similar to those of C-ILC and HG-IDC.Conclusion: This is the first review comparing these three pathologic entities. Our findings may be useful in understanding this variety of invasive lobular carcinoma, and further studies are certainly needed in this field. Keywords: breast cancer, lobular cancer, pleomorphic, mastectomy

  6. The efficacy of high-throughput sequencing and target enrichment on charred archaeobotanical remains

    DEFF Research Database (Denmark)

    Nistelberger, H. M.; Smith, O.; Wales, Nathan

    2016-01-01

    . It has been suggested that high-throughput sequencing (HTS) technologies coupled with DNA enrichment techniques may overcome some of these limitations. Here we report the findings of HTS and target enrichment on four important archaeological crops (barley, grape, maize and rice) performed in three...... lightly-charred maize cob. Even with target enrichment, this sample failed to yield adequate data required to address fundamental questions in archaeology and biology. We further reanalysed part of an existing dataset on charred plant material, and found all purported endogenous DNA sequences were likely...

  7. Investigation of a Quadruplex-Forming Repeat Sequence Highly Enriched in Xanthomonas and Nostoc sp.

    Science.gov (United States)

    Rehm, Charlotte; Wurmthaler, Lena A; Li, Yuanhao; Frickey, Tancred; Hartig, Jörg S

    2015-01-01

    In prokaryotes simple sequence repeats (SSRs) with unit sizes of 1-5 nucleotides (nt) are causative for phase and antigenic variation. Although an increased abundance of heptameric repeats was noticed in bacteria, reports about SSRs of 6-9 nt are rare. In particular G-rich repeat sequences with the propensity to fold into G-quadruplex (G4) structures have received little attention. In silico analysis of prokaryotic genomes show putative G4 forming sequences to be abundant. This report focuses on a surprisingly enriched G-rich repeat of the type GGGNATC in Xanthomonas and cyanobacteria such as Nostoc. We studied in detail the genomes of Xanthomonas campestris pv. campestris ATCC 33913 (Xcc), Xanthomonas axonopodis pv. citri str. 306 (Xac), and Nostoc sp. strain PCC7120 (Ana). In all three organisms repeats are spread all over the genome with an over-representation in non-coding regions. Extensive variation of the number of repetitive units was observed with repeat numbers ranging from two up to 26 units. However a clear preference for four units was detected. The strong bias for four units coincides with the requirement of four consecutive G-tracts for G4 formation. Evidence for G4 formation of the consensus repeat sequences was found in biophysical studies utilizing CD spectroscopy. The G-rich repeats are preferably located between aligned open reading frames (ORFs) and are under-represented in coding regions or between divergent ORFs. The G-rich repeats are preferentially located within a distance of 50 bp upstream of an ORF on the anti-sense strand or within 50 bp from the stop codon on the sense strand. Analysis of whole transcriptome sequence data showed that the majority of repeat sequences are transcribed. The genetic loci in the vicinity of repeat regions show increased genomic stability. In conclusion, we introduce and characterize a special class of highly abundant and wide-spread quadruplex-forming repeat sequences in bacteria.

  8. High Sequence Variations in Mitochondrial DNA Control Region among Worldwide Populations of Flathead Mullet Mugil cephalus

    Directory of Open Access Journals (Sweden)

    Brian Wade Jamandre

    2014-01-01

    Full Text Available The sequence and structure of the complete mtDNA control region (CR of M. cephalus from African, Pacific, and Atlantic populations are presented in this study to assess its usefulness in phylogeographic studies of this species. The mtDNA CR sequence variations among M. cephalus populations largely exceeded intraspecific polymorphisms that are generally observed in other vertebrates. The length of CR sequence varied among M. cephalus populations due to the presence of indels and variable number of tandem repeats at the 3′ hypervariable domain. The high evolutionary rate of the CR in this species probably originated from these mutations. However, no excessive homoplasic mutations were noticed. Finally, the star shaped tree inferred from the CR polymorphism stresses a rapid radiation worldwide, in this species. The CR still appears as a good marker for phylogeographic investigations and additional worldwide samples are warranted to further investigate the genetic structure and evolution in M. cephalus.

  9. On Statistical Modeling of Sequencing Noise in High Depth Data to Assess Tumor Evolution

    Science.gov (United States)

    Rabadan, Raul; Bhanot, Gyan; Marsilio, Sonia; Chiorazzi, Nicholas; Pasqualucci, Laura; Khiabanian, Hossein

    2017-12-01

    One cause of cancer mortality is tumor evolution to therapy-resistant disease. First line therapy often targets the dominant clone, and drug resistance can emerge from preexisting clones that gain fitness through therapy-induced natural selection. Such mutations may be identified using targeted sequencing assays by analysis of noise in high-depth data. Here, we develop a comprehensive, unbiased model for sequencing error background. We find that noise in sufficiently deep DNA sequencing data can be approximated by aggregating negative binomial distributions. Mutations with frequencies above noise may have prognostic value. We evaluate our model with simulated exponentially expanded populations as well as data from cell line and patient sample dilution experiments, demonstrating its utility in prognosticating tumor progression. Our results may have the potential to identify significant mutations that can cause recurrence. These results are relevant in the pretreatment clinical setting to determine appropriate therapy and prepare for potential recurrence pretreatment.

  10. WebPrInSeS: automated full-length clone sequence identification and verification using high-throughput sequencing data.

    Science.gov (United States)

    Massouras, Andreas; Decouttere, Frederik; Hens, Korneel; Deplancke, Bart

    2010-07-01

    High-throughput sequencing (HTS) is revolutionizing our ability to obtain cheap, fast and reliable sequence information. Many experimental approaches are expected to benefit from the incorporation of such sequencing features in their pipeline. Consequently, software tools that facilitate such an incorporation should be of great interest. In this context, we developed WebPrInSeS, a web server tool allowing automated full-length clone sequence identification and verification using HTS data. WebPrInSeS encompasses two separate software applications. The first is WebPrInSeS-C which performs automated sequence verification of user-defined open-reading frame (ORF) clone libraries. The second is WebPrInSeS-E, which identifies positive hits in cDNA or ORF-based library screening experiments such as yeast one- or two-hybrid assays. Both tools perform de novo assembly using HTS data from any of the three major sequencing platforms. Thus, WebPrInSeS provides a highly integrated, cost-effective and efficient way to sequence-verify or identify clones of interest. WebPrInSeS is available at http://webprinses.epfl.ch/ and is open to all users.

  11. Scalable whole-exome sequencing of cell-free DNA reveals high concordance with metastatic tumors.

    Science.gov (United States)

    Adalsteinsson, Viktor A; Ha, Gavin; Freeman, Samuel S; Choudhury, Atish D; Stover, Daniel G; Parsons, Heather A; Gydush, Gregory; Reed, Sarah C; Rotem, Denisse; Rhoades, Justin; Loginov, Denis; Livitz, Dimitri; Rosebrock, Daniel; Leshchiner, Ignaty; Kim, Jaegil; Stewart, Chip; Rosenberg, Mara; Francis, Joshua M; Zhang, Cheng-Zhong; Cohen, Ofir; Oh, Coyin; Ding, Huiming; Polak, Paz; Lloyd, Max; Mahmud, Sairah; Helvie, Karla; Merrill, Margaret S; Santiago, Rebecca A; O'Connor, Edward P; Jeong, Seong H; Leeson, Rachel; Barry, Rachel M; Kramkowski, Joseph F; Zhang, Zhenwei; Polacek, Laura; Lohr, Jens G; Schleicher, Molly; Lipscomb, Emily; Saltzman, Andrea; Oliver, Nelly M; Marini, Lori; Waks, Adrienne G; Harshman, Lauren C; Tolaney, Sara M; Van Allen, Eliezer M; Winer, Eric P; Lin, Nancy U; Nakabayashi, Mari; Taplin, Mary-Ellen; Johannessen, Cory M; Garraway, Levi A; Golub, Todd R; Boehm, Jesse S; Wagle, Nikhil; Getz, Gad; Love, J Christopher; Meyerson, Matthew

    2017-11-06

    Whole-exome sequencing of cell-free DNA (cfDNA) could enable comprehensive profiling of tumors from blood but the genome-wide concordance between cfDNA and tumor biopsies is uncertain. Here we report ichorCNA, software that quantifies tumor content in cfDNA from 0.1× coverage whole-genome sequencing data without prior knowledge of tumor mutations. We apply ichorCNA to 1439 blood samples from 520 patients with metastatic prostate or breast cancers. In the earliest tested sample for each patient, 34% of patients have ≥10% tumor-derived cfDNA, sufficient for standard coverage whole-exome sequencing. Using whole-exome sequencing, we validate the concordance of clonal somatic mutations (88%), copy number alterations (80%), mutational signatures, and neoantigens between cfDNA and matched tumor biopsies from 41 patients with ≥10% cfDNA tumor content. In summary, we provide methods to identify patients eligible for comprehensive cfDNA profiling, revealing its applicability to many patients, and demonstrate high concordance of cfDNA and metastatic tumor whole-exome sequencing.

  12. Galaxy Workflows for Web-based Bioinformatics Analysis of Aptamer High-throughput Sequencing Data

    Directory of Open Access Journals (Sweden)

    William H Thiel

    2016-01-01

    Full Text Available Development of RNA and DNA aptamers for diagnostic and therapeutic applications is a rapidly growing field. Aptamers are identified through iterative rounds of selection in a process termed SELEX (Systematic Evolution of Ligands by EXponential enrichment. High-throughput sequencing (HTS revolutionized the modern SELEX process by identifying millions of aptamer sequences across multiple rounds of aptamer selection. However, these vast aptamer HTS datasets necessitated bioinformatics techniques. Herein, we describe a semiautomated approach to analyze aptamer HTS datasets using the Galaxy Project, a web-based open source collection of bioinformatics tools that were originally developed to analyze genome, exome, and transcriptome HTS data. Using a series of Workflows created in the Galaxy webserver, we demonstrate efficient processing of aptamer HTS data and compilation of a database of unique aptamer sequences. Additional Workflows were created to characterize the abundance and persistence of aptamer sequences within a selection and to filter sequences based on these parameters. A key advantage of this approach is that the online nature of the Galaxy webserver and its graphical interface allow for the analysis of HTS data without the need to compile code or install multiple programs.

  13. Similar Anti-Inflammatory Acute Responses from Moderate-Intensity Continuous and High-Intensity Intermittent Exercise

    Directory of Open Access Journals (Sweden)

    Carolina Cabral-Santos, José Gerosa-Neto, Daniela Sayuri Inoue, Valéria Leme Gonçalves Panissa, Luís Alberto Gobbo, Alessandro Moura Zagatto, Eduardo Zapaterra Campos, Fábio Santos Lira

    2015-12-01

    Full Text Available The purpose of this study was to compare the effect of high-intensity intermittent exercise (HIIE versus volume matched steady state exercise (SSE on inflammatory and metabolic responses. Eight physically active male subjects completed two experimental sessions, a 5-km run on a treadmill either continuously (70% vVO2max or intermittently (1:1 min at vVO2max. Blood samples were collected at rest, immediately, 30 and 60 minutes after the exercise session. Blood was analyzed for glucose, non-ester fatty acid (NEFA, uric acid, lactate, cortisol, and cytokines (IL-6, IL-10 and TNF-α levels. The lactate levels exhibited higher values immediately post-exercise than at rest (HIIE 1.34 ± 0.24 to 7.11 ± 2.85, and SSE 1.35 ± 0.14 to 4.06±1.60 mmol·L-1, p 0.05. Cortisol, IL-6, IL-10 and TNF-α levels showed time-dependent changes under the different conditions (p < 0.05, however, the area under the curve of TNF-α in the SSE were higher than HIIE (p < 0.05, and the area under the curve of IL-6 in the HIIE showed higher values than SSE (p < 0.05. In addition, both exercise conditions promote increased IL-10 levels and IL-10/TNF-α ratio (p < 0.05. In conclusion, our results demonstrated that both exercise protocols, when volume is matched, promote similar inflammatory responses, leading to an anti-inflammatory status; however, the metabolic responses are different.

  14. Sequence similarity between the erythrocyte binding domain 1 of the Plasmodium vivax Duffy binding protein and the V3 loop of HIV-1 strain MN reveals binding residues for the Duffy Antigen Receptor for Chemokines

    Directory of Open Access Journals (Sweden)

    Garry Robert F

    2011-01-01

    Full Text Available Abstract Background The surface glycoprotein (SU, gp120 of the human immunodeficiency virus (HIV must bind to a chemokine receptor, CCR5 or CXCR4, to invade CD4+ cells. Plasmodium vivax uses the Duffy Binding Protein (DBP to bind the Duffy Antigen Receptor for Chemokines (DARC and invade reticulocytes. Results Variable loop 3 (V3 of HIV-1 SU and domain 1 of the Plasmodium vivax DBP share a sequence similarity. The site of amino acid sequence similarity was necessary, but not sufficient, for DARC binding and contained a consensus heparin binding site essential for DARC binding. Both HIV-1 and P. vivax can be blocked from binding to their chemokine receptors by the chemokine, RANTES and its analog AOP-RANTES. Site directed mutagenesis of the heparin binding motif in members of the DBP family, the P. knowlesi alpha, beta and gamma proteins abrogated their binding to erythrocytes. Positively charged residues within domain 1 are required for binding of P. vivax and P. knowlesi erythrocyte binding proteins. Conclusion A heparin binding site motif in members of the DBP family may form part of a conserved erythrocyte receptor binding pocket.

  15. Rapid high resolution genotyping of Francisella tularensis by whole genome sequence comparison of annotated genes ("MLST+".

    Directory of Open Access Journals (Sweden)

    Markus H Antwerpen

    Full Text Available The zoonotic disease tularemia is caused by the bacterium Francisella tularensis. This pathogen is considered as a category A select agent with potential to be misused in bioterrorism. Molecular typing based on DNA-sequence like canSNP-typing or MLVA has become the accepted standard for this organism. Due to the organism's highly clonal nature, the current typing methods have reached their limit of discrimination for classifying closely related subpopulations within the subspecies F. tularensis ssp. holarctica. We introduce a new gene-by-gene approach, MLST+, based on whole genome data of 15 sequenced F. tularensis ssp. holarctica strains and apply this approach to investigate an epidemic of lethal tularemia among non-human primates in two animal facilities in Germany. Due to the high resolution of MLST+ we are able to demonstrate that three independent clones of this highly infectious pathogen were responsible for these spatially and temporally restricted outbreaks.

  16. High prevalence of human polyomavirus JC VP1 gene sequences in pediatric malignancies.

    Science.gov (United States)

    Shiramizu, B; Hu, N; Frisque, R J; Nerurkar, V R

    2007-05-15

    The oncogenic potential of human polyomavirus JC (JCV), a ubiquitous virus that establishes infection during early childhood in approximately 70% of the human population, is unclear. As a neurotropic virus, JCV has been implicated in pediatric central nervous system tumors and has been suggested to be a pathogenic agent in pediatric acute lymphoblastic leukemia. Recent studies have demonstrated JCV gene sequences in pediatric medulloblastomas and among patients with colorectal cancer. JCV early protein T-antigen (TAg) can form complexes with cellular regulatory proteins and thus may play a role in tumorigenesis. Since JCV is detected in B-lymphocytes, a retrospective analysis of pediatric B-cell and non-B-cell malignancies as well as other HIV-associated pediatric malignancies was conducted for the presence of JCV gene sequences. DNA was extracted from 49 pediatric malignancies, including Hodgkin disease, non-Hodgkin lymphoma, large cell lymphoma and sarcoma. Polymerase chain reaction (PCR) was conducted using JCV specific nested primer sets for the transcriptional control region (TCR), TAg, and viral capsid protein 1 (VP1) genes. Southern blot analysis and DNA sequencing were used to confirm specificity of the amplicons. A 215-bp region of the JCV VP1 gene was amplified from 26 (53%) pediatric tumor tissues. The JCV TCR and two JCV gene regions were amplified from a leiomyosarcoma specimen from an HIV-infected patient. The leiomyosarcoma specimen from the cecum harbored the archetype strain of JCV. Including the leiomyosarcoma specimen, three of five specimens sequenced were typed as JCV genotype 2. The failure to amplify JCV TCR, and TAg gene sequences in the presence of JCV VP1 gene sequence is surprising. Even though JCV TAg gene, which is similar to the SV40 TAg gene, is oncogenic in animal models, the presence of JCV gene sequences in pediatric malignancies does not prove causality. In light of the available data on the presence of JCV in normal and cancerous

  17. Frequency-locked pulse sequencer for high-frame-rate monochromatic tissue motion imaging.

    Science.gov (United States)

    Azar, Reza Zahiri; Baghani, Ali; Salcudean, Septimiu E; Rohling, Robert

    2011-04-01

    To overcome the inherent low frame rate of conventional ultrasound, we have previously presented a system that can be implemented on conventional ultrasound scanners for high-frame-rate imaging of monochromatic tissue motion. The system employs a sector subdivision technique in the sequencer to increase the acquisition rate. To eliminate the delays introduced during data acquisition, a motion phase correction algorithm has also been introduced to create in-phase displacement images. Previous experimental results from tissue- mimicking phantoms showed that the system can achieve effective frame rates of up to a few kilohertz on conventional ultrasound systems. In this short communication, we present a new pulse sequencing strategy that facilitates high-frame-rate imaging of monochromatic motion such that the acquired echo signals are inherently in-phase. The sequencer uses the knowledge of the excitation frequency to synchronize the acquisition of the entire imaging plane to that of an external exciter. This sequencing approach eliminates any need for synchronization or phase correction and has applications in tissue elastography, which we demonstrate with tissue-mimicking phantoms. © 2011 IEEE

  18. Exploring sequence characteristics related to high-level production of secreted proteins in Aspergillus niger.

    Directory of Open Access Journals (Sweden)

    Bastiaan A van den Berg

    Full Text Available Protein sequence features are explored in relation to the production of over-expressed extracellular proteins by fungi. Knowledge on features influencing protein production and secretion could be employed to improve enzyme production levels in industrial bioprocesses via protein engineering. A large set, over 600 homologous and nearly 2,000 heterologous fungal genes, were overexpressed in Aspergillus niger using a standardized expression cassette and scored for high versus no production. Subsequently, sequence-based machine learning techniques were applied for identifying relevant DNA and protein sequence features. The amino-acid composition of the protein sequence was found to be most predictive and interpretation revealed that, for both homologous and heterologous gene expression, the same features are important: tyrosine and asparagine composition was found to have a positive correlation with high-level production, whereas for unsuccessful production, contributions were found for methionine and lysine composition. The predictor is available online at http://bioinformatics.tudelft.nl/hipsec. Subsequent work aims at validating these findings by protein engineering as a method for increasing expression levels per gene copy.

  19. Improvements and impacts of GRCh38 human reference on high throughput sequencing data analysis.

    Science.gov (United States)

    Guo, Yan; Dai, Yulin; Yu, Hui; Zhao, Shilin; Samuels, David C; Shyr, Yu

    2017-03-01

    Analyses of high throughput sequencing data starts with alignment against a reference genome, which is the foundation for all re-sequencing data analyses. Each new release of the human reference genome has been augmented with improved accuracy and completeness. It is presumed that the latest release of human reference genome, GRCh38 will contribute more to high throughput sequencing data analysis by providing more accuracy. But the amount of improvement has not yet been quantified. We conducted a study to compare the genomic analysis results between the GRCh38 reference and its predecessor GRCh37. Through analyses of alignment, single nucleotide polymorphisms, small insertion/deletions, copy number and structural variants, we show that GRCh38 offers overall more accurate analysis of human sequencing data. More importantly, GRCh38 produced fewer false positive structural variants. In conclusion, GRCh38 is an improvement over GRCh37 not only from the genome assembly aspect, but also yields more reliable genomic analysis results. Copyright © 2017. Published by Elsevier Inc.

  20. miRBase: annotating high confidence microRNAs using deep sequencing data.

    Science.gov (United States)

    Kozomara, Ana; Griffiths-Jones, Sam

    2014-01-01

    We describe an update of the miRBase database (http://www.mirbase.org/), the primary microRNA sequence repository. The latest miRBase release (v20, June 2013) contains 24 521 microRNA loci from 206 species, processed to produce 30 424 mature microRNA products. The rate of deposition of novel microRNAs and the number of researchers involved in their discovery continue to increase, driven largely by small RNA deep sequencing experiments. In the face of these increases, and a range of microRNA annotation methods and criteria, maintaining the quality of the microRNA sequence data set is a significant challenge. Here, we describe recent developments of the miRBase database to address this issue. In particular, we describe the collation and use of deep sequencing data sets to assign levels of confidence to miRBase entries. We now provide a high confidence subset of miRBase entries, based on the pattern of mapped reads. The high confidence microRNA data set is available alongside the complete microRNA collection at http://www.mirbase.org/. We also describe embedding microRNA-specific Wikipedia pages on the miRBase website to encourage the microRNA community to contribute and share textual and functional information.

  1. Analysis of high-throughput sequencing and annotation strategies for phage genomes.

    Directory of Open Access Journals (Sweden)

    Matthew R Henn

    Full Text Available BACKGROUND: Bacterial viruses (phages play a critical role in shaping microbial populations as they influence both host mortality and horizontal gene transfer. As such, they have a significant impact on local and global ecosystem function and human health. Despite their importance, little is known about the genomic diversity harbored in phages, as methods to capture complete phage genomes have been hampered by the lack of knowledge about the target genomes, and difficulties in generating sufficient quantities of genomic DNA for sequencing. Of the approximately 550 phage genomes currently available in the public domain, fewer than 5% are marine phage. METHODOLOGY/PRINCIPAL FINDINGS: To advance the study of phage biology through comparative genomic approaches we used marine cyanophage as a model system. We compared DNA preparation methodologies (DNA extraction directly from either phage lysates or CsCl purified phage particles, and sequencing strategies that utilize either Sanger sequencing of a linker amplification shotgun library (LASL or of a whole genome shotgun library (WGSL, or 454 pyrosequencing methods. We demonstrate that genomic DNA sample preparation directly from a phage lysate, combined with 454 pyrosequencing, is best suited for phage genome sequencing at scale, as this method is capable of capturing complete continuous genomes with high accuracy. In addition, we describe an automated annotation informatics pipeline that delivers high-quality annotation and yields few false positives and negatives in ORF calling. CONCLUSIONS/SIGNIFICANCE: These DNA preparation, sequencing and annotation strategies enable a high-throughput approach to the burgeoning field of phage genomics.

  2. Fungi Sailing the Arctic Ocean: Speciose Communities in North Atlantic Driftwood as Revealed by High-Throughput Amplicon Sequencing.

    Science.gov (United States)

    Rämä, Teppo; Davey, Marie L; Nordén, Jenni; Halvorsen, Rune; Blaalid, Rakel; Mathiassen, Geir H; Alsos, Inger G; Kauserud, Håvard

    2016-08-01

    High amounts of driftwood sail across the oceans and provide habitat for organisms tolerating the rough and saline environment. Fungi have adapted to the extremely cold and saline conditions which driftwood faces in the high north. For the first time, we applied high-throughput sequencing to fungi residing in driftwood to reveal their taxonomic richness, community composition, and ecology in the North Atlantic. Using pyrosequencing of ITS2 amplicons obtained from 49 marine logs, we found 807 fungal operational taxonomic units (OTUs) based on clustering at 97 % sequence similarity cut-off level. The phylum Ascomycota comprised 74 % of the OTUs and 20 % belonged to Basidiomycota. The richness of basidiomycetes decreased with prolonged submersion in the sea, supporting the general view of ascomycetes being more extremotolerant. However, more than one fourth of the fungal OTUs remained unassigned to any fungal class, emphasising the need for better DNA reference data from the marine habitat. Different fungal communities were detected in coniferous and deciduous logs. Our results highlight that driftwood hosts a considerably higher fungal diversity than currently known. The driftwood fungal community is not a terrestrial relic but a speciose assemblage of fungi adapted to the stressful marine environment and different kinds of wooden substrates found in it.

  3. Rapid detection of SMARCB1 sequence variation using high resolution melting

    Directory of Open Access Journals (Sweden)

    Ashley David M

    2009-12-01

    Full Text Available Abstract Background Rhabdoid tumors are rare cancers of early childhood arising in the kidney, central nervous system and other organs. The majority are caused by somatic inactivating mutations or deletions affecting the tumor suppressor locus SMARCB1 [OMIM 601607]. Germ-line SMARCB1 inactivation has been reported in association with rhabdoid tumor, epitheloid sarcoma and familial schwannomatosis, underscoring the importance of accurate mutation screening to ascertain recurrence and transmission risks. We describe a rapid and sensitive diagnostic screening method, using high resolution melting (HRM, for detecting sequence variations in SMARCB1. Methods Amplicons, encompassing the nine coding exons of SMARCB1, flanking splice site sequences and the 5' and 3' UTR, were screened by both HRM and direct DNA sequencing to establish the reliability of HRM as a primary mutation screening tool. Reaction conditions were optimized with commercially available HRM mixes. Results The false negative rate for detecting sequence variants by HRM in our sample series was zero. Nine amplicons out of a total of 140 (6.4% showed variant melt profiles that were subsequently shown to be false positive. Overall nine distinct pathogenic SMARCB1 mutations were identified in a total of 19 possible rhabdoid tumors. Two tumors had two distinct mutations and two harbored SMARCB1 deletion. Other mutations were nonsense or frame-shifts. The detection sensitivity of the HRM screening method was influenced by both sequence context and specific nucleotide change and varied from 1: 4 to 1:1000 (variant to wild-type DNA. A novel method involving digital HRM, followed by re-sequencing, was used to confirm mutations in tumor specimens containing associated normal tissue. Conclusions This is the first report describing SMARCB1 mutation screening using HRM. HRM is a rapid, sensitive and inexpensive screening technology that is likely to be widely adopted in diagnostic laboratories to

  4. Rapid detection of SMARCB1 sequence variation using high resolution melting

    International Nuclear Information System (INIS)

    Dagar, Vinod; Chow, Chung-Wo; Ashley, David M; Algar, Elizabeth M

    2009-01-01

    Rhabdoid tumors are rare cancers of early childhood arising in the kidney, central nervous system and other organs. The majority are caused by somatic inactivating mutations or deletions affecting the tumor suppressor locus SMARCB1 [OMIM 601607]. Germ-line SMARCB1 inactivation has been reported in association with rhabdoid tumor, epitheloid sarcoma and familial schwannomatosis, underscoring the importance of accurate mutation screening to ascertain recurrence and transmission risks. We describe a rapid and sensitive diagnostic screening method, using high resolution melting (HRM), for detecting sequence variations in SMARCB1. Amplicons, encompassing the nine coding exons of SMARCB1, flanking splice site sequences and the 5' and 3' UTR, were screened by both HRM and direct DNA sequencing to establish the reliability of HRM as a primary mutation screening tool. Reaction conditions were optimized with commercially available HRM mixes. The false negative rate for detecting sequence variants by HRM in our sample series was zero. Nine amplicons out of a total of 140 (6.4%) showed variant melt profiles that were subsequently shown to be false positive. Overall nine distinct pathogenic SMARCB1 mutations were identified in a total of 19 possible rhabdoid tumors. Two tumors had two distinct mutations and two harbored SMARCB1 deletion. Other mutations were nonsense or frame-shifts. The detection sensitivity of the HRM screening method was influenced by both sequence context and specific nucleotide change and varied from 1: 4 to 1:1000 (variant to wild-type DNA). A novel method involving digital HRM, followed by re-sequencing, was used to confirm mutations in tumor specimens containing associated normal tissue. This is the first report describing SMARCB1 mutation screening using HRM. HRM is a rapid, sensitive and inexpensive screening technology that is likely to be widely adopted in diagnostic laboratories to facilitate whole gene mutation screening

  5. Fixing Formalin: A Method to Recover Genomic-Scale DNA Sequence Data from Formalin-Fixed Museum Specimens Using High-Throughput Sequencing.

    Directory of Open Access Journals (Sweden)

    Sarah M Hykin

    Full Text Available For 150 years or more, specimens were routinely collected and deposited in natural history collections without preserving fresh tissue samples for genetic analysis. In the case of most herpetological specimens (i.e. amphibians and reptiles, attempts to extract and sequence DNA from formalin-fixed, ethanol-preserved specimens-particularly for use in phylogenetic analyses-has been laborious and largely ineffective due to the highly fragmented nature of the DNA. As a result, tens of thousands of specimens in herpetological collections have not been available for sequence-based phylogenetic studies. Massively parallel High-Throughput Sequencing methods and the associated bioinformatics, however, are particularly suited to recovering meaningful genetic markers from severely degraded/fragmented DNA sequences such as DNA damaged by formalin-fixation. In this study, we compared previously published DNA extraction methods on three tissue types subsampled from formalin-fixed specimens of Anolis carolinensis, followed by sequencing. Sufficient quality DNA was recovered from liver tissue, making this technique minimally destructive to museum specimens. Sequencing was only successful for the more recently collected specimen (collected ~30 ybp. We suspect this could be due either to the conditions of preservation and/or the amount of tissue used for extraction purposes. For the successfully sequenced sample, we found a high rate of base misincorporation. After rigorous trimming, we successfully mapped 27.93% of the cleaned reads to the reference genome, were able to reconstruct the complete mitochondrial genome, and recovered an accurate phylogenetic placement for our specimen. We conclude that the amount of DNA available, which can vary depending on specimen age and preservation conditions, will determine if sequencing will be successful. The technique described here will greatly improve the value of museum collections by making many formalin-fixed specimens

  6. Fixing Formalin: A Method to Recover Genomic-Scale DNA Sequence Data from Formalin-Fixed Museum Specimens Using High-Throughput Sequencing.

    Science.gov (United States)

    Hykin, Sarah M; Bi, Ke; McGuire, Jimmy A

    2015-01-01

    For 150 years or more, specimens were routinely collected and deposited in natural history collections without preserving fresh tissue samples for genetic analysis. In the case of most herpetological specimens (i.e. amphibians and reptiles), attempts to extract and sequence DNA from formalin-fixed, ethanol-preserved specimens-particularly for use in phylogenetic analyses-has been laborious and largely ineffective due to the highly fragmented nature of the DNA. As a result, tens of thousands of specimens in herpetological collections have not been available for sequence-based phylogenetic studies. Massively parallel High-Throughput Sequencing methods and the associated bioinformatics, however, are particularly suited to recovering meaningful genetic markers from severely degraded/fragmented DNA sequences such as DNA damaged by formalin-fixation. In this study, we compared previously published DNA extraction methods on three tissue types subsampled from formalin-fixed specimens of Anolis carolinensis, followed by sequencing. Sufficient quality DNA was recovered from liver tissue, making this technique minimally destructive to museum specimens. Sequencing was only successful for the more recently collected specimen (collected ~30 ybp). We suspect this could be due either to the conditions of preservation and/or the amount of tissue used for extraction purposes. For the successfully sequenced sample, we found a high rate of base misincorporation. After rigorous trimming, we successfully mapped 27.93% of the cleaned reads to the reference genome, were able to reconstruct the complete mitochondrial genome, and recovered an accurate phylogenetic placement for our specimen. We conclude that the amount of DNA available, which can vary depending on specimen age and preservation conditions, will determine if sequencing will be successful. The technique described here will greatly improve the value of museum collections by making many formalin-fixed specimens available for

  7. Sequence similarity between the erythrocyte binding domain of the Plasmodium vivax Duffy binding protein and the V3 loop of HIV-1 strain MN reveals a functional heparin binding motif involved in binding to the Duffy antigen receptor for chemokines

    Directory of Open Access Journals (Sweden)

    Bolton Michael J

    2011-11-01

    Full Text Available Abstract Background The HIV surface glycoprotein gp120 (SU, gp120 and the Plasmodium vivax Duffy binding protein (PvDBP bind to chemokine receptors during infection and have a site of amino acid sequence similarity in their binding domains that often includes a heparin binding motif (HBM. Infection by either pathogen has been found to be inhibited by polyanions. Results Specific polyanions that inhibit HIV infection and bind to the V3 loop of X4 strains also inhibited DBP-mediated infection of erythrocytes and DBP binding to the Duffy Antigen Receptor for Chemokines (DARC. A peptide including the HBM of PvDBP had similar affinity for heparin as RANTES and V3 loop peptides, and could be specifically inhibited from heparin binding by the same polyanions that inhibit DBP binding to DARC. However, some V3 peptides can competitively inhibit RANTES binding to heparin, but not the PvDBP HBM peptide. Three other members of the DBP family have an HBM sequence that is necessary for erythrocyte binding, however only the protein which binds to DARC, the P. knowlesi alpha protein, is inhibited by heparin from binding to erythrocytes. Heparitinase digestion does not affect the binding of DBP to erythrocytes. Conclusion The HBMs of DBPs that bind to DARC have similar heparin binding affinities as some V3 loop peptides and chemokines, are responsible for specific sulfated polysaccharide inhibition of parasite binding and invasion of red blood cells, and are more likely to bind to negative charges on the receptor than cell surface glycosaminoglycans.

  8. Alignment of high-throughput sequencing data inside in-memory databases.

    Science.gov (United States)

    Firnkorn, Daniel; Knaup-Gregori, Petra; Lorenzo Bermejo, Justo; Ganzinger, Matthias

    2014-01-01

    In times of high-throughput DNA sequencing techniques, performance-capable analysis of DNA sequences is of high importance. Computer supported DNA analysis is still an intensive time-consuming task. In this paper we explore the potential of a new In-Memory database technology by using SAP's High Performance Analytic Appliance (HANA). We focus on read alignment as one of the first steps in DNA sequence analysis. In particular, we examined the widely used Burrows-Wheeler Aligner (BWA) and implemented stored procedures in both, HANA and the free database system MySQL, to compare execution time and memory management. To ensure that the results are comparable, MySQL has been running in memory as well, utilizing its integrated memory engine for database table creation. We implemented stored procedures, containing exact and inexact searching of DNA reads within the reference genome GRCh37. Due to technical restrictions in SAP HANA concerning recursion, the inexact matching problem could not be implemented on this platform. Hence, performance analysis between HANA and MySQL was made by comparing the execution time of the exact search procedures. Here, HANA was approximately 27 times faster than MySQL which means, that there is a high potential within the new In-Memory concepts, leading to further developments of DNA analysis procedures in the future.

  9. High-coverage sequencing and annotated assembly of the genome of the Australian dragon lizard Pogona vitticeps.

    Science.gov (United States)

    Georges, Arthur; Li, Qiye; Lian, Jinmin; O'Meally, Denis; Deakin, Janine; Wang, Zongji; Zhang, Pei; Fujita, Matthew; Patel, Hardip R; Holleley, Clare E; Zhou, Yang; Zhang, Xiuwen; Matsubara, Kazumi; Waters, Paul; Graves, Jennifer A Marshall; Sarre, Stephen D; Zhang, Guojie

    2015-01-01

    The lizards of the family Agamidae are one of the most prominent elements of the Australian reptile fauna. Here, we present a genomic resource built on the basis of a wild-caught male ZZ central bearded dragon Pogona vitticeps. The genomic sequence for P. vitticeps, generated on the Illumina HiSeq 2000 platform, comprised 317 Gbp (179X raw read depth) from 13 insert libraries ranging from 250 bp to 40 kbp. After filtering for low-quality and duplicated reads, 146 Gbp of data (83X) was available for assembly. Exceptionally high levels of heterozygosity (0.85 % of single nucleotide polymorphisms plus sequence insertions or deletions) complicated assembly; nevertheless, 96.4 % of reads mapped back to the assembled scaffolds, indicating that the assembly included most of the sequenced genome. Length of the assembly was 1.8 Gbp in 545,310 scaffolds (69,852 longer than 300 bp), the longest being 14.68 Mbp. N50 was 2.29 Mbp. Genes were annotated on the basis of de novo prediction, similarity to the green anole Anolis carolinensis, Gallus gallus and Homo sapiens proteins, and P. vitticeps transcriptome sequence assemblies, to yield 19,406 protein-coding genes in the assembly, 63 % of which had intact open reading frames. Our assembly captured 99 % (246 of 248) of core CEGMA genes, with 93 % (231) being complete. The quality of the P. vitticeps assembly is comparable or superior to that of other published squamate genomes, and the annotated P. vitticeps genome can be accessed through a genome browser available at https://genomics.canberra.edu.au.

  10. The need for high-quality whole-genome sequence databases in microbial forensics.

    Science.gov (United States)

    Sjödin, Andreas; Broman, Tina; Melefors, Öjar; Andersson, Gunnar; Rasmusson, Birgitta; Knutsson, Rickard; Forsman, Mats

    2013-09-01

    Microbial forensics is an important part of a strengthened capability to respond to biocrime and bioterrorism incidents to aid in the complex task of distinguishing between natural outbreaks and deliberate acts. The goal of a microbial forensic investigation is to identify and criminally prosecute those responsible for a biological attack, and it involves a detailed analysis of the weapon--that is, the pathogen. The recent development of next-generation sequencing (NGS) technologies has greatly increased the resolution that can be achieved in microbial forensic analyses. It is now possible to identify, quickly and in an unbiased manner, previously undetectable genome differences between closely related isolates. This development is particularly relevant for the most deadly bacterial diseases that are caused by bacterial lineages with extremely low levels of genetic diversity. Whole-genome analysis of pathogens is envisaged to be increasingly essential for this purpose. In a microbial forensic context, whole-genome sequence analysis is the ultimate method for strain comparisons as it is informative during identification, characterization, and attribution--all 3 major stages of the investigation--and at all levels of microbial strain identity resolution (ie, it resolves the full spectrum from family to isolate). Given these capabilities, one bottleneck in microbial forensics investigations is the availability of high-quality reference databases of bacterial whole-genome sequences. To be of high quality, databases need to be curated and accurate in terms of sequences, metadata, and genetic diversity coverage. The development of whole-genome sequence databases will be instrumental in successfully tracing pathogens in the future.

  11. High-throughput Sequencing Based Immune Repertoire Study during Infectious Disease

    Directory of Open Access Journals (Sweden)

    Dongni Hou

    2016-08-01

    Full Text Available The selectivity of the adaptive immune response is based on the enormous diversity of T and B cell antigen-specific receptors. The immune repertoire, the collection of T and B cells with functional diversity in the circulatory system at any given time, is dynamic and reflects the essence of immune selectivity. In this article, we review the recent advances in immune repertoire study of infectious diseases that achieved by traditional techniques and high-throughput sequencing techniques. High-throughput sequencing techniques enable the determination of complementary regions of lymphocyte receptors with unprecedented efficiency and scale. This progress in methodology enhances the understanding of immunologic changes during pathogen challenge, and also provides a basis for further development of novel diagnostic markers, immunotherapies and vaccines.

  12. Dairy Attenuates Weight Gain to a Similar Extent as Exercise in Rats Fed a High-Fat, High-Sugar Diet.

    Science.gov (United States)

    Trottier, Sarah K; MacPherson, Rebecca E K; Knuth, Carly M; Townsend, Logan K; Peppler, Willem T; Mikhaeil, John S; Leveille, Cam F; LeBlanc, Paul J; Shearer, Jane; Reimer, Raylene A; Wright, David C

    2017-10-01

    To compare the individual and combined effects of dairy and endurance exercise training in reducing weight gain and adiposity in a rodent model of diet-induced obesity. An 8-week feeding intervention of a high-fat, high-sugar diet was used to induce obesity in male Sprague-Dawley rats. Rats were then assigned to one of four groups for 6 weeks: (1) casein sedentary (casein-S), (2) casein exercise (casein-E), (3) dairy sedentary (dairy-S), and (4) dairy exercise (dairy-E). Rats were exercise trained by treadmill running 5 d/wk. Dairy-E prevented weight gain to a greater extent than either dairy or exercise alone. Adipose tissue and liver mass were reduced to a similar extent in dairy-S, casein-E, and dairy-E groups. Differences in weight gain were not explained by food intake or total energy expenditure. The total amount of lipid excreted was greater in the dairy-S compared to casein-S and dairy-E groups. This study provides evidence that dairy limits weight gain to a similar extent as exercise training and the combined effects are greater than either intervention alone. While exercise training reduces weight gain through increases in energy expenditure, dairy appears to increase lipid excretion in the feces. © 2017 The Obesity Society.

  13. High-performance permanent magnet brushless motors with balanced concentrated windings and similar slot and pole numbers

    International Nuclear Information System (INIS)

    Stumberger, Bojan; Stumberger, Gorazd; Hadziselimovic, Miralem; Hamler, Anton; Trlep, Mladen; Gorican, Viktor; Jesenik, Marko

    2006-01-01

    The paper presents a comparison between the performances of exterior-rotor permanent magnet brushless motors with distributed windings and the performances of exterior-rotor permanent magnet brushless motors with concentrated windings. Finite element method analysis is employed to determine the performance of each motor. It is shown that motors with concentrated windings and similar slot and pole numbers exhibit similar or better performances than motors with distributed windings for brushless AC (BLAC) operation mode and brushless DC (BLDC) operation mode as well

  14. Designing a Bioengine for Detection and Analysis of Base String on an Affected Sequence in High-Concentration Regions

    Directory of Open Access Journals (Sweden)

    Debnath Bhattacharyya

    2013-01-01

    Full Text Available We design an Algorithm for bioengine. As a program are enable optimal alignments searching between two sequences, the host sequence (normal plant as well as query sequence (virus. Searching for homologues has become a routine operation of biological sequences in 4 × 4 combination with different subsequence (word size. This program takes the advantage of the high degree of homology between such sequences to construct an alignment of the matching regions. There is a main aim which is to detect the overlapping reading frames. This program also enables to find out the highly infected colones selection highest matching region with minimum gap or mismatch zones and unique virus colones matches. This is a small, portable, interactive, front-end program intended to be used to find out the regions of matching between host sequence and query subsequences. All the operations are carried out in fraction of seconds, depending on the required task and on the sequence length.

  15. Designing a Bioengine for Detection and Analysis of Base String on an Affected Sequence in High-Concentration Regions

    Science.gov (United States)

    Mandal, Bijoy Kumar; Kim, Tai-hoon

    2013-01-01

    We design an Algorithm for bioengine. As a program are enable optimal alignments searching between two sequences, the host sequence (normal plant) as well as query sequence (virus). Searching for homologues has become a routine operation of biological sequences in 4 × 4 combination with different subsequence (word size). This program takes the advantage of the high degree of homology between such sequences to construct an alignment of the matching regions. There is a main aim which is to detect the overlapping reading frames. This program also enables to find out the highly infected colones selection highest matching region with minimum gap or mismatch zones and unique virus colones matches. This is a small, portable, interactive, front-end program intended to be used to find out the regions of matching between host sequence and query subsequences. All the operations are carried out in fraction of seconds, depending on the required task and on the sequence length. PMID:24000321

  16. Leveraging the Power of High Performance Computing for Next Generation Sequencing Data Analysis: Tricks and Twists from a High Throughput Exome Workflow

    Science.gov (United States)

    Wonczak, Stephan; Thiele, Holger; Nieroda, Lech; Jabbari, Kamel; Borowski, Stefan; Sinha, Vishal; Gunia, Wilfried; Lang, Ulrich; Achter, Viktor; Nürnberg, Peter

    2015-01-01

    Next generation sequencing (NGS) has been a great success and is now a standard method of research in the life sciences. With this technology, dozens of whole genomes or hundreds of exomes can be sequenced in rather short time, producing huge amounts of data. Complex bioinformatics analyses are required to turn these data into scientific findings. In order to run these analyses fast, automated workflows implemented on high performance computers are state of the art. While providing sufficient compute power and storage to meet the NGS data challenge, high performance computing (HPC) systems require special care when utilized for high throughput processing. This is especially true if the HPC system is shared by different users. Here, stability, robustness and maintainability are as important for automated workflows as speed and throughput. To achieve all of these aims, dedicated solutions have to be developed. In this paper, we present the tricks and twists that we utilized in the implementation of our exome data processing workflow. It may serve as a guideline for other high throughput data analysis projects using a similar infrastructure. The code implementing our solutions is provided in the supporting information files. PMID:25942438

  17. Association Study of Gut Flora in Coronary Heart Disease through High-Throughput Sequencing

    OpenAIRE

    Cui, Li; Zhao, Tingting; Hu, Haibing; Zhang, Wen; Hua, Xiuguo

    2017-01-01

    Objectives. We aimed to explore the impact of gut microbiota in coronary heart disease (CHD) patients through high-throughput sequencing. Methods. A total of 29 CHD in-hospital patients and 35 healthy volunteers as controls were included. Nucleic acids were extracted from fecal samples, followed by ? diversity and principal coordinate analysis (PCoA). Based on unweighted UniFrac distance matrices, unweighted-pair group method with arithmetic mean (UPGMA) trees were created. Results. After dat...

  18. Draft Genome Sequences of Klebsiella oxytoca Isolates Originating from a Highly Contaminated Liquid Hand Soap Product.

    Science.gov (United States)

    Hammerl, J A; Lasch, P; Nitsche, A; Dabrowski, P W; Hahmann, H; Wicke, A; Kleta, S; Al Dahouk, S; Dieckmann, R

    2015-07-23

    In 2013, contaminated liquid soap was detected by routine microbiological monitoring of consumer products through state health authorities. Because of its high load of Klebsiella oxytoca, the liquid soap was notified via the European Union Rapid Alert System for Dangerous Non-Food Products (EU-RAPEX) and recalled. Here, we present two draft genome sequences and a summary of their general features. Copyright © 2015 Hammerl et al.

  19. Combining Amplification Typing of L1 Active Subfamilies (ATLAS) with High-Throughput Sequencing.

    Science.gov (United States)

    Rahbari, Raheleh; Badge, Richard M

    2016-01-01

    With the advent of new generations of high-throughput sequencing technologies, the catalog of human genome variants created by retrotransposon activity is expanding rapidly. However, despite these advances in describing L1 diversity and the fact that L1 must retrotranspose in the germline or prior to germline partitioning to be evolutionarily successful, direct assessment of de novo L1 retrotransposition in the germline or early embryogenesis has not been achieved for endogenous L1 elements. A direct study of de novo L1 retrotransposition into susceptible loci within sperm DNA (Freeman et al., Hum Mutat 32(8):978-988, 2011) suggested that the rate of L1 retrotransposition in the germline is much lower than previously estimated (ATLAS L1 display technique (Badge et al., Am J Hum Genet 72(4):823-838, 2003) to investigate de novo L1 retrotransposition in human genomes. In this chapter, we describe how we combined a high-coverage ATLAS variant with high-throughput sequencing, achieving 11-25× sequence depth per single amplicon, to study L1 retrotransposition in whole genome amplified (WGA) DNAs.

  20. Exome Sequencing Identifies Potential Risk Variants for Mendelian Disorders at High Prevalence in Qatar

    Science.gov (United States)

    Rodriguez-Flores, Juan L.; Fakhro, Khalid; Hackett, Neil R.; Salit, Jacqueline; Fuller, Jennifer; Agosto-Perez, Francisco; Gharbiah, Maey; Malek, Joel A.; Zirie, Mahmoud; Jayyousi, Amin; Badii, Ramin; Al-Marri, Ajayeb Al-Nabet; Chouchane, Lotfi; Stadler, Dora J.; Hunter-Zinck, Haley; Mezey, Jason G.; Crystal, Ronald G.

    2013-01-01

    Exome sequencing of families of related individuals has been highly successful in identifying genetic polymorphisms responsible for Mendelian disorders. Here, we demonstrate the value of the reverse approach, where we use exome sequencing of a sample of unrelated individuals to analyze allele frequencies of known causal mutations for Mendelian diseases. We sequenced the exomes of 100 individuals representing the three major genetic subgroups of the Qatari population (Q1 Bedouin, Q2 Persian-South Asian, Q3 African) and identified 37 variants in 33 genes with effects on 36 clinically significant Mendelian diseases. These include variants not present in 1000 Genomes and variants at high frequency when compared to 1000 Genomes populations. Several of these Mendelian variants were only segregating in one Qatari subpopulation, where the observed subpopulation specificity trends were confirmed in an independent population of 386 Qataris. Pre-marital genetic screening in Qatar tests for only 4 out of the 37, such that this study provides a set of Mendelian disease variants with potential impact on the epidemiological profile of the population that could be incorporated into the testing program if further experimental and clinical characterization confirms high penetrance. PMID:24123366

  1. Characterizing ncRNAs in human pathogenic protists using high-throughput sequencing technology

    Directory of Open Access Journals (Sweden)

    Lesley Joan Collins

    2011-12-01

    Full Text Available ncRNAs are key genes in many human diseases including cancer and viral infection, as well as providing critical functions in pathogenic organisms such as fungi, bacteria, viruses and protists. Until now the identification and characterization of ncRNAs associated with disease has been slow or inaccurate requiring many years of testing to understand complicated RNA and protein gene relationships. High-throughput sequencing now offers the opportunity to characterize miRNAs, siRNAs, snoRNAs and long ncRNAs on a genomic scale making it faster and easier to clarify how these ncRNAs contribute to the disease state. However, this technology is still relatively new, and ncRNA discovery is not an application of high priority for streamlined bioinformatics. Here we summarize background concepts and practical approaches for ncRNA analysis using high-throughput sequencing, and how it relates to understanding human disease. As a case study, we focus on the parasitic protists Giardia lamblia and Trichomonas vaginalis, where large evolutionary distance has meant difficulties in comparing ncRNAs with those from model eukaryotes. A combination of biological, computational and sequencing approaches has enabled easier classification of ncRNA classes such as snoRNAs, but has also aided the identification of novel classes. It is hoped that a higher level of understanding of ncRNA expression and interaction may aid in the development of less harsh treatment for protist-based diseases.

  2. Characterizing ncRNAs in Human Pathogenic Protists Using High-Throughput Sequencing Technology

    Science.gov (United States)

    Collins, Lesley Joan

    2011-01-01

    ncRNAs are key genes in many human diseases including cancer and viral infection, as well as providing critical functions in pathogenic organisms such as fungi, bacteria, viruses, and protists. Until now the identification and characterization of ncRNAs associated with disease has been slow or inaccurate requiring many years of testing to understand complicated RNA and protein gene relationships. High-throughput sequencing now offers the opportunity to characterize miRNAs, siRNAs, small nucleolar RNAs (snoRNAs), and long ncRNAs on a genomic scale, making it faster and easier to clarify how these ncRNAs contribute to the disease state. However, this technology is still relatively new, and ncRNA discovery is not an application of high priority for streamlined bioinformatics. Here we summarize background concepts and practical approaches for ncRNA analysis using high-throughput sequencing, and how it relates to understanding human disease. As a case study, we focus on the parasitic protists Giardia lamblia and Trichomonas vaginalis, where large evolutionary distance has meant difficulties in comparing ncRNAs with those from model eukaryotes. A combination of biological, computational, and sequencing approaches has enabled easier classification of ncRNA classes such as snoRNAs, but has also aided the identification of novel classes. It is hoped that a higher level of understanding of ncRNA expression and interaction may aid in the development of less harsh treatment for protist-based diseases. PMID:22303390

  3. High-Throughput Analysis of T-DNA Location and Structure Using Sequence Capture.

    Directory of Open Access Journals (Sweden)

    Soichi Inagaki

    Full Text Available Agrobacterium-mediated transformation of plants with T-DNA is used both to introduce transgenes and for mutagenesis. Conventional approaches used to identify the genomic location and the structure of the inserted T-DNA are laborious and high-throughput methods using next-generation sequencing are being developed to address these problems. Here, we present a cost-effective approach that uses sequence capture targeted to the T-DNA borders to select genomic DNA fragments containing T-DNA-genome junctions, followed by Illumina sequencing to determine the location and junction structure of T-DNA insertions. Multiple probes can be mixed so that transgenic lines transformed with different T-DNA types can be processed simultaneously, using a simple, index-based pooling approach. We also developed a simple bioinformatic tool to find sequence read pairs that span the junction between the genome and T-DNA or any foreign DNA. We analyzed 29 transgenic lines of Arabidopsis thaliana, each containing inserts from 4 different T-DNA vectors. We determined the location of T-DNA insertions in 22 lines, 4 of which carried multiple insertion sites. Additionally, our analysis uncovered a high frequency of unconventional and complex T-DNA insertions, highlighting the needs for high-throughput methods for T-DNA localization and structural characterization. Transgene insertion events have to be fully characterized prior to use as commercial products. Our method greatly facilitates the first step of this characterization of transgenic plants by providing an efficient screen for the selection of promising lines.

  4. High-Throughput Analysis of T-DNA Location and Structure Using Sequence Capture.

    Science.gov (United States)

    Inagaki, Soichi; Henry, Isabelle M; Lieberman, Meric C; Comai, Luca

    2015-01-01

    Agrobacterium-mediated transformation of plants with T-DNA is used both to introduce transgenes and for mutagenesis. Conventional approaches used to identify the genomic location and the structure of the inserted T-DNA are laborious and high-throughput methods using next-generation sequencing are being developed to address these problems. Here, we present a cost-effective approach that uses sequence capture targeted to the T-DNA borders to select genomic DNA fragments containing T-DNA-genome junctions, followed by Illumina sequencing to determine the location and junction structure of T-DNA insertions. Multiple probes can be mixed so that transgenic lines transformed with different T-DNA types can be processed simultaneously, using a simple, index-based pooling approach. We also developed a simple bioinformatic tool to find sequence read pairs that span the junction between the genome and T-DNA or any foreign DNA. We analyzed 29 transgenic lines of Arabidopsis thaliana, each containing inserts from 4 different T-DNA vectors. We determined the location of T-DNA insertions in 22 lines, 4 of which carried multiple insertion sites. Additionally, our analysis uncovered a high frequency of unconventional and complex T-DNA insertions, highlighting the needs for high-throughput methods for T-DNA localization and structural characterization. Transgene insertion events have to be fully characterized prior to use as commercial products. Our method greatly facilitates the first step of this characterization of transgenic plants by providing an efficient screen for the selection of promising lines.

  5. High-resolution analysis of the 5'-end transcriptome using a next generation DNA sequencer.

    Directory of Open Access Journals (Sweden)

    Shin-ichi Hashimoto

    Full Text Available Massively parallel, tag-based sequencing systems, such as the SOLiD system, hold the promise of revolutionizing the study of whole genome gene expression due to the number of data points that can be generated in a simple and cost-effective manner. We describe the development of a 5'-end transcriptome workflow for the SOLiD system and demonstrate the advantages in sensitivity and dynamic range offered by this tag-based application over traditional approaches for the study of whole genome gene expression. 5'-end transcriptome analysis was used to study whole genome gene expression within a colon cancer cell line, HT-29, treated with the DNA methyltransferase inhibitor, 5-aza-2'-deoxycytidine (5Aza. More than 20 million 25-base 5'-end tags were obtained from untreated and 5Aza-treated cells and matched to sequences within the human genome. Seventy three percent of the mapped unique tags were associated with RefSeq cDNA sequences, corresponding to approximately 14,000 different protein-coding genes in this single cell type. The level of expression of these genes ranged from 0.02 to 4,704 transcripts per cell. The sensitivity of a single sequence run of the SOLiD platform was 100-1,000 fold greater than that observed from 5'end SAGE data generated from the analysis of 70,000 tags obtained by Sanger sequencing. The high-resolution 5'end gene expression profiling presented in this study will not only provide novel insight into the transcriptional machinery but should also serve as a basis for a better understanding of cell biology.

  6. High-resolution sequence stratigraphy and continental environmental evolution: An example from east-central Argentina

    Science.gov (United States)

    Beilinson, Elisa; Veiga, Gonzalo D.; Spalletti, Luis A.

    2013-10-01

    The aims of this contribution is to establish a high-resolution sequence stratigraphic scheme for the continental deposits that constitute the Punta San Andrés Alloformation (Plio-Pleistocene) in east-central Argentina, to analyze the basin fill evolution and to identify and assess the role that extrinsic factors such as climate and sea-level oscillations played during evolution of the unit. For the high-resolution sequence stratigraphical study of the Punta San Andrés Alloformation, high- and low-accommodation system tracts were defined mainly on the basis of the architectural elements present in the succession, also taking into account the relative degree of channel and floodplain deposits. Discontinuities and the nature of depositional systems generated during variations in accommodation helped identify two fourth-order high-accommodation system tracts and two fourth-order low-accommodation system tracts. At a third-order scale, the Punta San Andrés Alloformation may be interpreted as the progradation of continental depositional systems, characterized by a braided system in the proximal areas, and a low-sinuosity, single-channel system in the distal areas, defined by a high rate of sediment supply and discharge peaks which periodically flooded the plains and generated high aggradation rates during the late Pliocene and lower Pleistocene.

  7. High-sensitivity HLA typing by Saturated Tiling Capture Sequencing (STC-Seq).

    Science.gov (United States)

    Jiao, Yang; Li, Ran; Wu, Chao; Ding, Yibin; Liu, Yanning; Jia, Danmei; Wang, Lifeng; Xu, Xiang; Zhu, Jing; Zheng, Min; Jia, Junling

    2018-01-15

    Highly polymorphic human leukocyte antigen (HLA) genes are responsible for fine-tuning the adaptive immune system. High-resolution HLA typing is important for the treatment of autoimmune and infectious diseases. Additionally, it is routinely performed for identifying matched donors in transplantation medicine. Although many HLA typing approaches have been developed, the complexity, low-efficiency and high-cost of current HLA-typing assays limit their application in population-based high-throughput HLA typing for donors, which is required for creating large-scale databases for transplantation and precision medicine. Here, we present a cost-efficient Saturated Tiling Capture Sequencing (STC-Seq) approach to capturing 14 HLA class I and II genes. The highly efficient capture (an approximately 23,000-fold enrichment) of these genes allows for simplified allele calling. Tests on five genes (HLA-A/B/C/DRB1/DQB1) from 31 human samples and 351 datasets using STC-Seq showed results that were 98% consistent with the known two sets of digitals (field1 and field2) genotypes. Additionally, STC can capture genomic DNA fragments longer than 3 kb from HLA loci, making the library compatible with the third-generation sequencing. STC-Seq is a highly accurate and cost-efficient method for HLA typing which can be used to facilitate the establishment of population-based HLA databases for the precision and transplantation medicine.

  8. SSR_pipeline--computer software for the identification of microsatellite sequences from paired-end Illumina high-throughput DNA sequence data

    Science.gov (United States)

    Miller, Mark P.; Knaus, Brian J.; Mullins, Thomas D.; Haig, Susan M.

    2013-01-01

    SSR_pipeline is a flexible set of programs designed to efficiently identify simple sequence repeats (SSRs; for example, microsatellites) from paired-end high-throughput Illumina DNA sequencing data. The program suite contains three analysis modules along with a fourth control module that can be used to automate analyses of large volumes of data. The modules are used to (1) identify the subset of paired-end sequences that pass quality standards, (2) align paired-end reads into a single composite DNA sequence, and (3) identify sequences that possess microsatellites conforming to user specified parameters. Each of the three separate analysis modules also can be used independently to provide greater flexibility or to work with FASTQ or FASTA files generated from other sequencing platforms (Roche 454, Ion Torrent, etc). All modules are implemented in the Python programming language and can therefore be used from nearly any computer operating system (Linux, Macintosh, Windows). The program suite relies on a compiled Python extension module to perform paired-end alignments. Instructions for compiling the extension from source code are provided in the documentation. Users who do not have Python installed on their computers or who do not have the ability to compile software also may choose to download packaged executable files. These files include all Python scripts, a copy of the compiled extension module, and a minimal installation of Python in a single binary executable. See program documentation for more information.

  9. Investigation of a Quadruplex-Forming Repeat Sequence Highly Enriched in Xanthomonas and Nostoc sp.

    Directory of Open Access Journals (Sweden)

    Charlotte Rehm

    Full Text Available In prokaryotes simple sequence repeats (SSRs with unit sizes of 1-5 nucleotides (nt are causative for phase and antigenic variation. Although an increased abundance of heptameric repeats was noticed in bacteria, reports about SSRs of 6-9 nt are rare. In particular G-rich repeat sequences with the propensity to fold into G-quadruplex (G4 structures have received little attention. In silico analysis of prokaryotic genomes show putative G4 forming sequences to be abundant. This report focuses on a surprisingly enriched G-rich repeat of the type GGGNATC in Xanthomonas and cyanobacteria such as Nostoc. We studied in detail the genomes of Xanthomonas campestris pv. campestris ATCC 33913 (Xcc, Xanthomonas axonopodis pv. citri str. 306 (Xac, and Nostoc sp. strain PCC7120 (Ana. In all three organisms repeats are spread all over the genome with an over-representation in non-coding regions. Extensive variation of the number of repetitive units was observed with repeat numbers ranging from two up to 26 units. However a clear preference for four units was detected. The strong bias for four units coincides with the requirement of four consecutive G-tracts for G4 formation. Evidence for G4 formation of the consensus repeat sequences was found in biophysical studies utilizing CD spectroscopy. The G-rich repeats are preferably located between aligned open reading frames (ORFs and are under-represented in coding regions or between divergent ORFs. The G-rich repeats are preferentially located within a distance of 50 bp upstream of an ORF on the anti-sense strand or within 50 bp from the stop codon on the sense strand. Analysis of whole transcriptome sequence data showed that the majority of repeat sequences are transcribed. The genetic loci in the vicinity of repeat regions show increased genomic stability. In conclusion, we introduce and characterize a special class of highly abundant and wide-spread quadruplex-forming repeat sequences in bacteria.

  10. Sequence Curriculum: High School to College. Middlesex Community College/Haddam-Killingworth High School. Final Report.

    Science.gov (United States)

    Middlesex Community Coll., Middletown, CT.

    Through a collaborative effort between Middlesex Community College (MxCC) and Haddam-Killingworth High School (HKHS), students taking specific high school courses in television production, broadcast journalism, electronics, and photography are granted college credit by MxCC upon admission to the college's Broadcast Communication Program. The…

  11. A Reference Viral Database (RVDB) To Enhance Bioinformatics Analysis of High-Throughput Sequencing for Novel Virus Detection.

    Science.gov (United States)

    Goodacre, Norman; Aljanahi, Aisha; Nandakumar, Subhiksha; Mikailov, Mike; Khan, Arifa S

    2018-01-01

    Detection of distantly related viruses by high-throughput sequencing (HTS) is bioinformatically challenging because of the lack of a public database containing all viral sequences, without abundant nonviral sequences, which can extend runtime and obscure viral hits. Our reference viral database (RVDB) includes all viral, virus-related, and virus-like nucleotide sequences (excluding bacterial viruses), regardless of length, and with overall reduced cellular sequences. Semantic selection criteria (SEM-I) were used to select viral sequences from GenBank, resulting in a first-generation viral database (VDB). This database was manually and computationally reviewed, resulting in refined, semantic selection criteria (SEM-R), which were applied to a new download of updated GenBank sequences to create a second-generation VDB. Viral entries in the latter were clustered at 98% by CD-HIT-EST to reduce redundancy while retaining high viral sequence diversity. The viral identity of the clustered representative sequences (creps) was confirmed by BLAST searches in NCBI databases and HMMER searches in PFAM and DFAM databases. The resulting RVDB contained a broad representation of viral families, sequence diversity, and a reduced cellular content; it includes full-length and partial sequences and endogenous nonretroviral elements, endogenous retroviruses, and retrotransposons. Testing of RVDBv10.2, with an in-house HTS transcriptomic data set indicated a significantly faster run for virus detection than interrogating the entirety of the NCBI nonredundant nucleotide database, which contains all viral sequences but also nonviral sequences. RVDB is publically available for facilitating HTS analysis, particularly for novel virus detection. It is meant to be updated on a regular basis to include new viral sequences added to GenBank. IMPORTANCE To facilitate bioinformatics analysis of high-throughput sequencing (HTS) data for the detection of both known and novel viruses, we have

  12. International Interlaboratory Digital PCR Study Demonstrating High Reproducibility for the Measurement of a Rare Sequence Variant.

    Science.gov (United States)

    Whale, Alexandra S; Devonshire, Alison S; Karlin-Neumann, George; Regan, Jack; Javier, Leanne; Cowen, Simon; Fernandez-Gonzalez, Ana; Jones, Gerwyn M; Redshaw, Nicholas; Beck, Julia; Berger, Andreas W; Combaret, Valérie; Dahl Kjersgaard, Nina; Davis, Lisa; Fina, Frederic; Forshew, Tim; Fredslund Andersen, Rikke; Galbiati, Silvia; González Hernández, Álvaro; Haynes, Charles A; Janku, Filip; Lacave, Roger; Lee, Justin; Mistry, Vilas; Pender, Alexandra; Pradines, Anne; Proudhon, Charlotte; Saal, Lao H; Stieglitz, Elliot; Ulrich, Bryan; Foy, Carole A; Parkes, Helen; Tzonev, Svilen; Huggett, Jim F

    2017-02-07

    This study tested the claim that digital PCR (dPCR) can offer highly reproducible quantitative measurements in disparate laboratories. Twenty-one laboratories measured four blinded samples containing different quantities of a KRAS fragment encoding G12D, an important genetic marker for guiding therapy of certain cancers. This marker is challenging to quantify reproducibly using quantitative PCR (qPCR) or next generation sequencing (NGS) due to the presence of competing wild type sequences and the need for calibration. Using dPCR, 18 laboratories were able to quantify the G12D marker within 12% of each other in all samples. Three laboratories appeared to measure consistently outlying results; however, proper application of a follow-up analysis recommendation rectified their data. Our findings show that dPCR has demonstrable reproducibility across a large number of laboratories without calibration. This could enable the reproducible application of molecular stratification to guide therapy and, potentially, for molecular diagnostics.

  13. Improving High-Throughput Sequencing Approaches for Reconstructing the Evolutionary Dynamics of Upper Paleolithic Human Groups

    DEFF Research Database (Denmark)

    Seguin-Orlando, Andaine

    the development and testing of innovative molecular approaches aiming at improving the amount of informative HTS data one can recover from ancient DNA extracts. We have characterized important ligation and amplification biases in the sequencing library building and enrichment steps, which can impede further...... been mainly driven by the development of High-Throughput DNA Sequencing (HTS) technologies but also by the implementation of novel molecular tools tailored to the manipulation of ultra short and damaged DNA molecules. Our ability to retrieve traces of genetic material has tremendously improved, pushing......, that impact on the overall efficacy of the method. In a second part, we implemented some of these molecular tools to the processing of five Upper Paleolithic human samples from the Kostenki and Sunghir sites in Western Eurasia, in order to reconstruct the deep genomic history of European populations...

  14. HTSstation: a web application and open-access libraries for high-throughput sequencing data analysis.

    Science.gov (United States)

    David, Fabrice P A; Delafontaine, Julien; Carat, Solenne; Ross, Frederick J; Lefebvre, Gregory; Jarosz, Yohan; Sinclair, Lucas; Noordermeer, Daan; Rougemont, Jacques; Leleu, Marion

    2014-01-01

    The HTSstation analysis portal is a suite of simple web forms coupled to modular analysis pipelines for various applications of High-Throughput Sequencing including ChIP-seq, RNA-seq, 4C-seq and re-sequencing. HTSstation offers biologists the possibility to rapidly investigate their HTS data using an intuitive web application with heuristically pre-defined parameters. A number of open-source software components have been implemented and can be used to build, configure and run HTS analysis pipelines reactively. Besides, our programming framework empowers developers with the possibility to design their own workflows and integrate additional third-party software. The HTSstation web application is accessible at http://htsstation.epfl.ch.

  15. High throughput sequencing identifies chilling responsive genes in sweetpotato (Ipomoea batatas Lam.) during storage.

    Science.gov (United States)

    Xie, Zeyi; Zhou, Zhilin; Li, Hongmin; Yu, Jingjing; Jiang, Jiaojiao; Tang, Zhonghou; Ma, Daifu; Zhang, Baohong; Han, Yonghua; Li, Zongyun

    2018-05-21

    Sweetpotato (Ipomoea batatas L.) is a globally important economic food crop. It belongs to Convolvulaceae family and origins in the tropics; however, sweetpotato is sensitive to cold stress during storage. In this study, we performed transcriptome sequencing to investigate the sweetpotato response to chilling stress during storage. A total of 110,110 unigenes were generated via high-throughput sequencing. Differentially expressed genes (DEGs) analysis showed that 18,681 genes were up-regulated and 21,983 genes were down-regulated in low temperature condition. Many DEGs were related to the cell membrane system, antioxidant enzymes, carbohydrate metabolism, and hormone metabolism, which are potentially associated with sweetpotato resistance to low temperature. The existence of DEGs suggests a molecular basis for the biochemical and physiological consequences of sweetpotato in low temperature storage conditions. Our analysis will provide a new target for enhancement of sweetpotato cold stress tolerance in postharvest storage through genetic manipulation. Copyright © 2018. Published by Elsevier Inc.

  16. Accurate molecular diagnosis of phenylketonuria and tetrahydrobiopterin-deficient hyperphenylalaninemias using high-throughput targeted sequencing

    Science.gov (United States)

    Trujillano, Daniel; Perez, Belén; González, Justo; Tornador, Cristian; Navarrete, Rosa; Escaramis, Georgia; Ossowski, Stephan; Armengol, Lluís; Cornejo, Verónica; Desviat, Lourdes R; Ugarte, Magdalena; Estivill, Xavier

    2014-01-01

    Genetic diagnostics of phenylketonuria (PKU) and tetrahydrobiopterin (BH4) deficient hyperphenylalaninemia (BH4DH) rely on methods that scan for known mutations or on laborious molecular tools that use Sanger sequencing. We have implemented a novel and much more efficient strategy based on high-throughput multiplex-targeted resequencing of four genes (PAH, GCH1, PTS, and QDPR) that, when affected by loss-of-function mutations, cause PKU and BH4DH. We have validated this approach in a cohort of 95 samples with the previously known PAH, GCH1, PTS, and QDPR mutations and one control sample. Pooled barcoded DNA libraries were enriched using a custom NimbleGen SeqCap EZ Choice array and sequenced using a HiSeq2000 sequencer. The combination of several robust bioinformatics tools allowed us to detect all known pathogenic mutations (point mutations, short insertions/deletions, and large genomic rearrangements) in the 95 samples, without detecting spurious calls in these genes in the control sample. We then used the same capture assay in a discovery cohort of 11 uncharacterized HPA patients using a MiSeq sequencer. In addition, we report the precise characterization of the breakpoints of four genomic rearrangements in PAH, including a novel deletion of 899 bp in intron 3. Our study is a proof-of-principle that high-throughput-targeted resequencing is ready to substitute classical molecular methods to perform differential genetic diagnosis of hyperphenylalaninemias, allowing the establishment of specifically tailored treatments a few days after birth. PMID:23942198

  17. Sedimentary dynamics and high-frequency sequence stratigraphy of the southwestern slope of Great Bahama Bank

    Science.gov (United States)

    Wunsch, Marco; Betzler, Christian; Eberli, Gregor P.; Lindhorst, Sebastian; Lüdmann, Thomas; Reijmer, John J. G.

    2018-01-01

    New geophysical data from the leeward slope of Great Bahama Bank show how contour currents shape the slope and induce re-sedimentation processes. Along slope segments with high current control, drift migration and current winnowing at the toe of slope form a deep moat. Here, the slope progradation is inhibited by large channel incisions and the accumulation of large mass transport complexes, triggered by current winnowing. In areas where the slope is bathed by weaker currents, the accumulation of mass transport complexes and channel incision is rather controlled by the position of the sea level. Large slope failures were triggered during the Mid-Pleistocene transition and Mid-Brunhes event, both periods characterized by changes in the cyclicity or the amplitude of sea-level fluctuations. Within the seismic stratigraphic framework of third order sequences, four sequences of higher order were identified in the succession of the upper Pleistocene. These higher order sequences also show clear differences in function of the slope exposure to contour currents. Two stochastic models emphasize the role of the contour currents and slope morphology in the facies distribution in the upper Pleistocene sequences. In areas of high current influence the interplay of erosional and depositional processes form a complex facies pattern with downslope and along strike facies alterations. In zones with lower current influence, major facies alternations occur predominately in downslope direction, and a layer-cake pattern characterizes the along strike direction. Therefore, this study highlights that contour currents are an underestimated driver for the sediment distribution and architecture of carbonate slopes.

  18. New self-similar radiation-hydrodynamics solutions in the high-energy density, equilibrium diffusion limit

    International Nuclear Information System (INIS)

    Lane, Taylor K; McClarren, Ryan G

    2013-01-01

    This work presents semi-analytic solutions to a radiation-hydrodynamics problem of a radiation source driving an initially cold medium. Our solutions are in the equilibrium diffusion limit, include material motion and allow for radiation-dominated situations where the radiation energy is comparable to (or greater than) the material internal energy density. As such, this work is a generalization of the classical Marshak wave problem that assumes no material motion and that the radiation energy is negligible. Including radiation energy density in the model serves to slow down the wave propagation. The solutions provide insight into the impact of radiation energy and material motion, as well as present a novel verification test for radiation transport packages. As a verification test, the solution exercises the radiation–matter coupling terms and their v/c treatment without needing a hydrodynamics solve. An example comparison between the self-similar solution and a numerical code is given. Tables of the self-similar solutions are also provided. (paper)

  19. A massively parallel sequencing approach uncovers ancient origins and high genetic variability of endangered Przewalski's horses.

    Science.gov (United States)

    Goto, Hiroki; Ryder, Oliver A; Fisher, Allison R; Schultz, Bryant; Kosakovsky Pond, Sergei L; Nekrutenko, Anton; Makova, Kateryna D

    2011-01-01

    The endangered Przewalski's horse is the closest relative of the domestic horse and is the only true wild horse species surviving today. The question of whether Przewalski's horse is the direct progenitor of domestic horse has been hotly debated. Studies of DNA diversity within Przewalski's horses have been sparse but are urgently needed to ensure their successful reintroduction to the wild. In an attempt to resolve the controversy surrounding the phylogenetic position and genetic diversity of Przewalski's horses, we used massively parallel sequencing technology to decipher the complete mitochondrial and partial nuclear genomes for all four surviving maternal lineages of Przewalski's horses. Unlike single-nucleotide polymorphism (SNP) typing usually affected by ascertainment bias, the present method is expected to be largely unbiased. Three mitochondrial haplotypes were discovered-two similar ones, haplotypes I/II, and one substantially divergent from the other two, haplotype III. Haplotypes I/II versus III did not cluster together on a phylogenetic tree, rejecting the monophyly of Przewalski's horse maternal lineages, and were estimated to split 0.117-0.186 Ma, significantly preceding horse domestication. In the phylogeny based on autosomal sequences, Przewalski's horses formed a monophyletic clade, separate from the Thoroughbred domestic horse lineage. Our results suggest that Przewalski's horses have ancient origins and are not the direct progenitors of domestic horses. The analysis of the vast amount of sequence data presented here suggests that Przewalski's and domestic horse lineages diverged at least 0.117 Ma but since then have retained ancestral genetic polymorphism and/or experienced gene flow.

  20. High-quality genome sequence and description of Bacillus ndiopicus strain FF3T sp. nov.

    Directory of Open Access Journals (Sweden)

    C.I. Lo

    2015-11-01

    Full Text Available Strain FF3T was isolated from the skin-flora of a 39-year-old healthy Senegalese man. Matrix-assisted laser desorption/ionization time-of-flight mass spectrometry did not allow any identification. This strain exhibited a 16S rRNA sequence similarity of 96.8% with Bacillus massiliensis, the phylogenetically closest species with standing nomenclature. Using a polyphasic study made of phenotypic and genomic analyses, strain FF3T was Gram-positive, aeroanaerobic and rod shaped and exhibited a genome of 4 068 720 bp with a G+C content of 37.03% that coded 3982 protein-coding and 67 RNA genes (including four rRNA operons. On the basis of these data, we propose the creation of Bacillus ndiopicus sp. nov.

  1. Selection of mRNA 5'-untranslated region sequence with high translation efficiency through ribosome display

    International Nuclear Information System (INIS)

    Mie, Masayasu; Shimizu, Shun; Takahashi, Fumio; Kobatake, Eiry

    2008-01-01

    The 5'-untranslated region (5'-UTR) of mRNAs functions as a translation enhancer, promoting translation efficiency. Many in vitro translation systems exhibit a reduced efficiency in protein translation due to decreased translation initiation. The use of a 5'-UTR sequence with high translation efficiency greatly enhances protein production in these systems. In this study, we have developed an in vitro selection system that favors 5'-UTRs with high translation efficiency using a ribosome display technique. A 5'-UTR random library, comprised of 5'-UTRs tagged with a His-tag and Renilla luciferase (R-luc) fusion, were in vitro translated in rabbit reticulocytes. By limiting the translation period, only mRNAs with high translation efficiency were translated. During translation, mRNA, ribosome and translated R-luc with His-tag formed ternary complexes. They were collected with translated His-tag using Ni-particles. Extracted mRNA from ternary complex was amplified using RT-PCR and sequenced. Finally, 5'-UTR with high translation efficiency was obtained from random 5'-UTR library

  2. Target-dependent enrichment of virions determines the reduction of high-throughput sequencing in virus discovery.

    Directory of Open Access Journals (Sweden)

    Randi Holm Jensen

    Full Text Available Viral infections cause many different diseases stemming both from well-characterized viral pathogens but also from emerging viruses, and the search for novel viruses continues to be of great importance. High-throughput sequencing is an important technology for this purpose. However, viral nucleic acids often constitute a minute proportion of the total genetic material in a sample from infected tissue. Techniques to enrich viral targets in high-throughput sequencing have been reported, but the sensitivity of such methods is not well established. This study compares different library preparation techniques targeting both DNA and RNA with and without virion enrichment. By optimizing the selection of intact virus particles, both by physical and enzymatic approaches, we assessed the effectiveness of the specific enrichment of viral sequences as compared to non-enriched sample preparations by selectively looking for and counting read sequences obtained from shotgun sequencing. Using shotgun sequencing of total DNA or RNA, viral targets were detected at concentrations corresponding to the predicted level, providing a foundation for estimating the effectiveness of virion enrichment. Virion enrichment typically produced a 1000-fold increase in the proportion of DNA virus sequences. For RNA virions the gain was less pronounced with a maximum 13-fold increase. This enrichment varied between the different sample concentrations, with no clear trend. Despite that less sequencing was required to identify target sequences, it was not evident from our data that a lower detection level was achieved by virion enrichment compared to shotgun sequencing.

  3. [Study on Microbial Diversity of Peri-implantitis Subgingival by High-throughput Sequencing].

    Science.gov (United States)

    Li, Zhi-jie; Wang, Shao-guo; Li, Yue-hong; Tu, Dong-xiang; Liu, Shi-yun; Nie, Hong-bing; Li, Zhi-qiang; Zhang, Ju-mei

    2015-07-01

    To study microbial diversity of peri-implantitis subgingival with high-throughput sequencing, and investigate microbiological etiology of peri-implantitis. Subgingival plaques were sampled from the patients with peri-implantitis (D group) and non-peri-implantitis subjects (N group). The microbiological diversity of the subgingival plaques was detected by sequencing V4 region of 16S rRNA with Illumina Miseq platform. The diversity of the community structure was analyzed using Mothur software. A total of 156 507 gene sequences were detected in nine samples and 4 402 operational taxonomic units (OTUs) were found. Selenomonas, Pseudomonas, and Fusobacterium were dominant bacteria in D group, while Fusobacterium, Veillonella and Streptococcus were dominant bacteria in N group. Differences between peri-implantitis and non-peri-implantitis bacterial communities were observed at all phylogenetic levels by LEfSe, which was also found in PcoA test. The occurrence of peri-implantitis is not only related to periodontitis pathogenic microbe, but also related with the changes of oral microbial community structure. Treponema, Herbaspirillum, Butyricimonas and Phaeobacte may be closely related to the occurrence and development of peri-implantitis.

  4. Bioassessment of a Drinking Water Reservoir Using Plankton: High Throughput Sequencing vs. Traditional Morphological Method

    Directory of Open Access Journals (Sweden)

    Wanli Gao

    2018-01-01

    Full Text Available Drinking water safety is increasingly perceived as one of the top global environmental issues. Plankton has been commonly used as a bioindicator for water quality in lakes and reservoirs. Recently, DNA sequencing technology has been applied to bioassessment. In this study, we compared the effectiveness of the 16S and 18S rRNA high throughput sequencing method (HTS and the traditional optical microscopy method (TOM in the bioassessment of drinking water quality. Five stations reflecting different habitats and hydrological conditions in Danjiangkou Reservoir, one of the largest drinking water reservoirs in Asia, were sampled May 2016. Non-metric multi-dimensional scaling (NMDS analysis showed that plankton assemblages varied among the stations and the spatial patterns revealed by the two methods were consistent. The correlation between TOM and HTS in a symmetric Procrustes analysis was 0.61, revealing overall good concordance between the two methods. Procrustes analysis also showed that site-specific differences between the two methods varied among the stations. Station Heijizui (H, a site heavily influenced by two tributaries, had the largest difference while station Qushou (Q, a confluence site close to the outlet dam, had the smallest difference between the two methods. Our results show that DNA sequencing has the potential to provide consistent identification of taxa, and reliable bioassessment in a long-term biomonitoring and assessment program for drinking water reservoirs.

  5. HTSeq--a Python framework to work with high-throughput sequencing data.

    Science.gov (United States)

    Anders, Simon; Pyl, Paul Theodor; Huber, Wolfgang

    2015-01-15

    A large choice of tools exists for many standard tasks in the analysis of high-throughput sequencing (HTS) data. However, once a project deviates from standard workflows, custom scripts are needed. We present HTSeq, a Python library to facilitate the rapid development of such scripts. HTSeq offers parsers for many common data formats in HTS projects, as well as classes to represent data, such as genomic coordinates, sequences, sequencing reads, alignments, gene model information and variant calls, and provides data structures that allow for querying via genomic coordinates. We also present htseq-count, a tool developed with HTSeq that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes. HTSeq is released as an open-source software under the GNU General Public Licence and available from http://www-huber.embl.de/HTSeq or from the Python Package Index at https://pypi.python.org/pypi/HTSeq. © The Author 2014. Published by Oxford University Press.

  6. Purification of High Molecular Weight Genomic DNA from Powdery Mildew for Long-Read Sequencing.

    Science.gov (United States)

    Feehan, Joanna M; Scheibel, Katherine E; Bourras, Salim; Underwood, William; Keller, Beat; Somerville, Shauna C

    2017-03-31

    The powdery mildew fungi are a group of economically important fungal plant pathogens. Relatively little is known about the molecular biology and genetics of these pathogens, in part due to a lack of well-developed genetic and genomic resources. These organisms have large, repetitive genomes, which have made genome sequencing and assembly prohibitively difficult. Here, we describe methods for the collection, extraction, purification and quality control assessment of high molecular weight genomic DNA from one powdery mildew species, Golovinomyces cichoracearum. The protocol described includes mechanical disruption of spores followed by an optimized phenol/chloroform genomic DNA extraction. A typical yield was 7 µg DNA per 150 mg conidia. The genomic DNA that is isolated using this procedure is suitable for long-read sequencing (i.e., > 48.5 kbp). Quality control measures to ensure the size, yield, and purity of the genomic DNA are also described in this method. Sequencing of the genomic DNA of the quality described here will allow for the assembly and comparison of multiple powdery mildew genomes, which in turn will lead to a better understanding and improved control of this agricultural pathogen.

  7. elPrep: High-Performance Preparation of Sequence Alignment/Map Files for Variant Calling.

    Directory of Open Access Journals (Sweden)

    Charlotte Herzeel

    Full Text Available elPrep is a high-performance tool for preparing sequence alignment/map files for variant calling in sequencing pipelines. It can be used as a replacement for SAMtools and Picard for preparation steps such as filtering, sorting, marking duplicates, reordering contigs, and so on, while producing identical results. What sets elPrep apart is its software architecture that allows executing preparation pipelines by making only a single pass through the data, no matter how many preparation steps are used in the pipeline. elPrep is designed as a multithreaded application that runs entirely in memory, avoids repeated file I/O, and merges the computation of several preparation steps to significantly speed up the execution time. For example, for a preparation pipeline of five steps on a whole-exome BAM file (NA12878, we reduce the execution time from about 1:40 hours, when using a combination of SAMtools and Picard, to about 15 minutes when using elPrep, while utilising the same server resources, here 48 threads and 23GB of RAM. For the same pipeline on whole-genome data (NA12878, elPrep reduces the runtime from 24 hours to less than 5 hours. As a typical clinical study may contain sequencing data for hundreds of patients, elPrep can remove several hundreds of hours of computing time, and thus substantially reduce analysis time and cost.

  8. New var reconstruction algorithm exposes high var sequence diversity in a single geographic location in Mali.

    Science.gov (United States)

    Dara, Antoine; Drábek, Elliott F; Travassos, Mark A; Moser, Kara A; Delcher, Arthur L; Su, Qi; Hostelley, Timothy; Coulibaly, Drissa; Daou, Modibo; Dembele, Ahmadou; Diarra, Issa; Kone, Abdoulaye K; Kouriba, Bourema; Laurens, Matthew B; Niangaly, Amadou; Traore, Karim; Tolo, Youssouf; Fraser, Claire M; Thera, Mahamadou A; Djimde, Abdoulaye A; Doumbo, Ogobara K; Plowe, Christopher V; Silva, Joana C

    2017-03-28

    Encoded by the var gene family, highly variable Plasmodium falciparum erythrocyte membrane protein-1 (PfEMP1) proteins mediate tissue-specific cytoadherence of infected erythrocytes, resulting in immune evasion and severe malaria disease. Sequencing and assembling the 40-60 var gene complement for individual infections has been notoriously difficult, impeding molecular epidemiological studies and the assessment of particular var elements as subunit vaccine candidates. We developed and validated a novel algorithm, Exon-Targeted Hybrid Assembly (ETHA), to perform targeted assembly of var gene sequences, based on a combination of Pacific Biosciences and Illumina data. Using ETHA, we characterized the repertoire of var genes in 12 samples from uncomplicated malaria infections in children from a single Malian village and showed them to be as genetically diverse as vars from isolates from around the globe. The gene var2csa, a member of the var family associated with placental malaria pathogenesis, was present in each genome, as were vars previously associated with severe malaria. ETHA, a tool to discover novel var sequences from clinical samples, will aid the understanding of malaria pathogenesis and inform the design of malaria vaccines based on PfEMP1. ETHA is available at: https://sourceforge.net/projects/etha/ .

  9. Investigation of Human Cancers for Retrovirus by Low-Stringency Target Enrichment and High-Throughput Sequencing

    DEFF Research Database (Denmark)

    Vinner, Lasse; Mourier, Tobias; Friis-Nielsen, Jens

    2015-01-01

    -stringency in-solution hybridization method enables detection of discovery of hitherto unknown viral sequences by high-throughput sequencing. The sensitivity was sufficient to detect retroviral...... sequences in clinical samples. We used this method to conduct an investigation for novel retrovirus in samples from three cancer types. In accordance with recent studies our investigation revealed no retroviral infections in human B-cell lymphoma cells, cutaneous T-cell lymphoma or colorectal cancer...

  10. Research on Non-Similarity about Thermal Deformation Error of Mechanical Parts in High-accuracy Measurement

    International Nuclear Information System (INIS)

    Luo, Z; Fei, Y T

    2006-01-01

    Expanding with heat and contracting with cold are common physical phenomenon in the nature. The conventional theories and calculations of thermal deformation are approximate and linear, can only be applied in normal or low precision field. The thermal deformation error of mechanical parts doesn't follow the conventional linear formula, it relates to all physical dimension of the mechanical part, and the deformation can be indicated by a nonlinear formula of physical dimensions. A theory on non-similarity about thermal deformation error of mechanical parts is presented. Studies on some common mechanical parts in precision technology have went on and the mathematical models have been set up, hollow piece, gear and cube are included. The experimental results also make it clear that these models are more logical than traditional models

  11. TIMPs of parasitic helminths - a large-scale analysis of high-throughput sequence datasets.

    Science.gov (United States)

    Cantacessi, Cinzia; Hofmann, Andreas; Pickering, Darren; Navarro, Severine; Mitreva, Makedonka; Loukas, Alex

    2013-05-30

    Tissue inhibitors of metalloproteases (TIMPs) are a multifunctional family of proteins that orchestrate extracellular matrix turnover, tissue remodelling and other cellular processes. In parasitic helminths, such as hookworms, TIMPs have been proposed to play key roles in the host-parasite interplay, including invasion of and establishment in the vertebrate animal hosts. Currently, knowledge of helminth TIMPs is limited to a small number of studies on canine hookworms, whereas no information is available on the occurrence of TIMPs in other parasitic helminths causing neglected diseases. In the present study, we conducted a large-scale investigation of TIMP proteins of a range of neglected human parasites including the hookworm Necator americanus, the roundworm Ascaris suum, the liver flukes Clonorchis sinensis and Opisthorchis viverrini, as well as the schistosome blood flukes. This entailed mining available transcriptomic and/or genomic sequence datasets for the presence of homologues of known TIMPs, predicting secondary structures of defined protein sequences, systematic phylogenetic analyses and assessment of differential expression of genes encoding putative TIMPs in the developmental stages of A. suum, N. americanus and Schistosoma haematobium which infect the mammalian hosts. A total of 15 protein sequences with high homology to known eukaryotic TIMPs were predicted from the complement of sequence data available for parasitic helminths and subjected to in-depth bioinformatic analyses. Supported by the availability of gene manipulation technologies such as RNA interference and/or transgenesis, this work provides a basis for future functional explorations of helminth TIMPs and, in particular, of their role/s in fundamental biological pathways linked to long-term establishment in the vertebrate hosts, with a view towards the development of novel approaches for the control of neglected helminthiases.

  12. Identification of microRNAs from Eugenia uniflora by high-throughput sequencing and bioinformatics analysis.

    Science.gov (United States)

    Guzman, Frank; Almerão, Mauricio P; Körbes, Ana P; Loss-Morais, Guilherme; Margis, Rogerio

    2012-01-01

    microRNAs or miRNAs are small non-coding regulatory RNAs that play important functions in the regulation of gene expression at the post-transcriptional level by targeting mRNAs for degradation or inhibiting protein translation. Eugenia uniflora is a plant native to tropical America with pharmacological and ecological importance, and there have been no previous studies concerning its gene expression and regulation. To date, no miRNAs have been reported in Myrtaceae species. Small RNA and RNA-seq libraries were constructed to identify miRNAs and pre-miRNAs in Eugenia uniflora. Solexa technology was used to perform high throughput sequencing of the library, and the data obtained were analyzed using bioinformatics tools. From 14,489,131 small RNA clean reads, we obtained 1,852,722 mature miRNA sequences representing 45 conserved families that have been identified in other plant species. Further analysis using contigs assembled from RNA-seq allowed the prediction of secondary structures of 25 known and 17 novel pre-miRNAs. The expression of twenty-seven identified miRNAs was also validated using RT-PCR assays. Potential targets were predicted for the most abundant mature miRNAs in the identified pre-miRNAs based on sequence homology. This study is the first large scale identification of miRNAs and their potential targets from a species of the Myrtaceae family without genomic sequence resources. Our study provides more information about the evolutionary conservation of the regulatory network of miRNAs in plants and highlights species-specific miRNAs.

  13. Molecular characterisation and similarity relationships among iranian basil (Ocimum basilicum L. accessions using inter simple sequence repeat markers Caracterização molecular de acessos de Ocimum basilicum L. por meio de marcadores ISSR

    Directory of Open Access Journals (Sweden)

    Mohammad Aghaei

    2012-06-01

    Full Text Available The study of genetic relationships is a prerequisite for plant breeding activities as well as for conservation of genetic resources. In the present study, genetic diversity among 50 Iranian basil (Ocimum basilicum L. accessions was determined using inter simple sequence repeat (ISSR markers. Thirty-eight alleles were generated at 12 ISSR loci. The number of alleles per locus ranged from 1 to 5 with an average of 3.17. The maximum number of alleles was observed at the A7, 818, 825 and 849 loci, and their size ranged from 300 to 2500 bp. A similarity matrix based on Jaccard's coefficient for all 50 basil accessions gave values from 1.00-0.60. The maximum similarity (1.00 was observed between the "Urmia" and "Shahr-e-Rey II" accessions as well as between the "Urmia" and "Qazvin II" accessions. The lowest similarity (0.60 was observed between the "Tuyserkan I" and "Gom II" accessions. The unweighted pair- group method using arithmetique average UPGMA clustering algorithm classified the studied accessions into three distinct groups. All of the basil accessions, with the exception of "Babol III", "Ahvaz II", "Yazd II" and "Ardebil I", were placed in groups I and II. Leaf colour was a specific characteristic that influenced the clustering of Iranian basil accessions. Because of this relationship, the results of the principal coordinate analysis (PCoA approximately corresponded to those obtained through cluster analysis. Our results revealed that the geographical distribution of genotypes could not be used as a basis for crossing parents to obtain high heterosis, and therefore, it must be carried out by genetic studies.O estudo das relações genéticas é um pré-requisito para atividades em reprodução de plantas assim como para conservação de recursos genéticos. Neste trabalho a diversidade genética entre 50 acessos de Manejericão Iraniano (Ocimum basilicum L. foram determinadas usando marcadores de Seqüência Simples Repetida Interna (ISSR

  14. Highly diverse microbiota in dental root canals in cases of apical periodontitis (data of illumina sequencing).

    Science.gov (United States)

    Vengerfeldt, Veiko; Špilka, Katerina; Saag, Mare; Preem, Jens-Konrad; Oopkaup, Kristjan; Truu, Jaak; Mändar, Reet

    2014-11-01

    Chronic apical periodontitis (CAP) is a frequent condition that has a considerable effect on a patient's quality of life. We aimed to reveal root canal microbial communities in antibiotic-naive patients by applying Illumina sequencing (Illumina Inc, San Diego, CA). Samples were collected under strict aseptic conditions from 12 teeth (5 with primary CAP, 3 with secondary CAP, and 4 with a periapical abscess [PA]) and characterized by profiling the microbial community on the basis of the V6 hypervariable region of the 16S ribosomal RNA gene by using Illumina HiSeq2000 sequencing combinatorial sequence-tagged polymerase chain reaction products. Root canal specimens displayed highly polymicrobial communities in all 3 patient groups. One sample contained 5-8 (mean = 6.5) phyla of bacteria. The most numerous were Firmicutes and Bacteroidetes, but Actinobacteria, Fusobacteria, Proteobacteria, Spirochaetes, Tenericutes, and Synergistetes were also present in most of the patients. One sample contained 30-70 different operational taxonomic units; the mean (± standard deviation) was lower in the primary CAP group (36 ± 4) than in the PA (45 ± 4) and secondary CAP (43 ± 13) groups (P < .05). The communities were individually different, but anaerobic bacteria predominated as the rule. Enterococcus faecalis was found only in patients with secondary CAP. One PA sample displayed a significantly high proportion (47%) of Proteobacteria, mainly at the expense of Janthinobacterium lividum. This study provided an in-depth characterization of the microbiota of periapical tissues, revealing highly polymicrobial communities and minor differences between the study groups. A full understanding of the etiology of periodontal disease will only be possible through further in-depth systems-level analyses of the host-microbiome interaction. Copyright © 2014 American Association of Endodontists. Published by Elsevier Inc. All rights reserved.

  15. IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth.

    Science.gov (United States)

    Peng, Yu; Leung, Henry C M; Yiu, S M; Chin, Francis Y L

    2012-06-01

    Next-generation sequencing allows us to sequence reads from a microbial environment using single-cell sequencing or metagenomic sequencing technologies. However, both technologies suffer from the problem that sequencing depth of different regions of a genome or genomes from different species are highly uneven. Most existing genome assemblers usually have an assumption that sequencing depths are even. These assemblers fail to construct correct long contigs. We introduce the IDBA-UD algorithm that is based on the de Bruijn graph approach for assembling reads from single-cell sequencing or metagenomic sequencing technologies with uneven sequencing depths. Several non-trivial techniques have been employed to tackle the problems. Instead of using a simple threshold, we use multiple depthrelative thresholds to remove erroneous k-mers in both low-depth and high-depth regions. The technique of local assembly with paired-end information is used to solve the branch problem of low-depth short repeat regions. To speed up the process, an error correction step is conducted to correct reads of high-depth regions that can be aligned to highconfident contigs. Comparison of the performances of IDBA-UD and existing assemblers (Velvet, Velvet-SC, SOAPdenovo and Meta-IDBA) for different datasets, shows that IDBA-UD can reconstruct longer contigs with higher accuracy. The IDBA-UD toolkit is available at our website http://www.cs.hku.hk/~alse/idba_ud

  16. Use of high throughput sequencing to study oomycete communities in soil and roots

    DEFF Research Database (Denmark)

    Sapkota, Rumakanta; Nicolaisen, Mogens

    2015-01-01

    taxonomic units from symptomatic lesions in carrot resulted in 94% of the reads belonging to oomycetes with a dominance of species of Pythium that are known to be involved in causing cavity spot. Moreover, soil samples showed that 95% of the sequences could be assigned to oomycetes including Pythium......, Aphanomyces, Peronospora, Saprolegnia and Phytophthora. A high proportion of oomycete reads was consistently present in all symptomatic lesions and soil samples showing the versatility of the strategy and thus demonstrating the usefulness of the method in plant and soil DNA background....

  17. Rocky Mountain Spotted Fever Characterization and Comparison to Similar Illnesses in a Highly Endemic Area—Arizona, 2002–2011

    Science.gov (United States)

    Traeger, Marc S.; Regan, Joanna J.; Humpherys, Dwight; Mahoney, Dianna L.; Martinez, Michelle; Emerson, Ginny L.; Tack, Danielle M.; Geissler, Aimee; Yasmin, Seema; Lawson, Regina; Hamilton, Charlene; Williams, Velda; Levy, Craig; Komatsu, Kenneth; McQuiston, Jennifer H.; Yost, David A.

    2015-01-01

    Background Rocky Mountain spotted fever (RMSF) has emerged as a significant cause of morbidity and mortality since 2002 on tribal lands in Arizona. The explosive nature of this outbreak and the recognition of an unexpected tick vector, Rhipicephalus sanguineus, prompted an investigation to characterize RMSF in this unique setting and compare RMSF cases to similar illnesses. Methods We compared medical records of 205 patients with RMSF and 175 with non-RMSF illnesses that prompted RMSF testing during 2002–2011 from 2 Indian reservations in Arizona. Results RMSF cases in Arizona occurred year-round and peaked later (July–September) than RMSF cases reported from other US regions. Cases were younger (median age, 11 years) and reported fever and rash less frequently, compared to cases from other US regions. Fever was present in 81% of cases but not significantly different from that in patients with non-RMSF illnesses. Classic laboratory abnormalities such as low sodium and platelet counts had small and subtle differences between cases and patients with non-RMSF illnesses. Imaging studies reflected the variability and complexity of the illness but proved unhelpful in clarifying the early diagnosis. Conclusions RMSF epidemiology in this region appears different than RMSF elsewhere in the United States. No specific pattern of signs, symptoms, or laboratory findings occurred with enough frequency to consistently differentiate RMSF from other illnesses. Due to the nonspecific and variable nature of RMSF presentations, clinicians in this region should aggressively treat febrile illnesses and sepsis with doxycycline for suspected RMSF. PMID:25697743

  18. Highly sulfated hexasaccharide sequences isolated from chondroitin sulfate of shark fin cartilage: insights into the sugar sequences with bioactivities.

    Science.gov (United States)

    Mizumoto, Shuji; Murakoshi, Saori; Kalayanamitra, Kittiwan; Deepa, Sarama Sathyaseelan; Fukui, Shigeyuki; Kongtawelert, Prachya; Yamada, Shuhei; Sugahara, Kazuyuki

    2013-02-01

    Chondroitin sulfate (CS) chains regulate the development of the central nervous system in vertebrates and are linear polysaccharides consisting of variously sulfated repeating disaccharides, [-4GlcUAβ1-3GalNAcβ1-](n), where GlcUA and GalNAc represent D-glucuronic acid and N-acetyl-D-galactosamine, respectively. CS chains containing D-disaccharide units [GlcUA(2-O-sulfate)-GalNAc(6-O-sulfate)] are involved in the development of cerebellar Purkinje cells and neurite outgrowth-promoting activity through interaction with a neurotrophic factor, pleiotrophin, resulting in the regulation of signaling. In this study, to obtain further structural information on the CS chains containing d-disaccharide units involved in brain development, oligosaccharides containing D-units were isolated from a shark fin cartilage. Seven novel hexasaccharide sequences, ΔO-D-D, ΔA-D-D, ΔC-D-D, ΔE-A-D, ΔD-D-C, ΔE-D-D and ΔA-B-D, in addition to three previously reported sequences, ΔC-A-D, ΔC-D-C and ΔA-D-A, were isolated from a CS preparation of shark fin cartilage after exhaustive digestion with chondroitinase AC-I, which cannot act on the galactosaminidic linkages bound to D-units. The symbol Δ stands for a 4,5-unsaturated bond of uronic acids, whereas A, B, C, D, E and O represent [GlcUA-GalNAc(4-O-sulfate)], [GlcUA(2-O-sulfate)-GalNAc(4-O-sulfate)], [GlcUA-GalNAc(6-O-sulfate)], [GlcUA(2-O-sulfate)-GalNAc(6-O-sulfate)], [GlcUA-GalNAc(4-O-, 6-O-sulfate)] and [GlcUA-GalNAc], respectively. In binding studies using an anti-CS monoclonal antibody, MO-225, the epitopes of which are involved in cerebellar development in mammals, novel epitope structures, ΔA-D-A, ΔA-D-D and ΔA-B-D, were revealed. Hexasaccharides containing two consecutive D-units or a B-unit will be useful for the structural and functional analyses of CS chains particularly in the neuroglycobiological fields.

  19. Cis-acting regulatory sequences promote high-frequency gene conversion between repeated sequences in mammalian cells.

    Science.gov (United States)

    Raynard, Steven J; Baker, Mark D

    2004-01-01

    In mammalian cells, little is known about the nature of recombination-prone regions of the genome. Previously, we reported that the immunoglobulin heavy chain (IgH) mu locus behaved as a hotspot for mitotic, intrachromosomal gene conversion (GC) between repeated mu constant (Cmu) regions in mouse hybridoma cells. To investigate whether elements within the mu gene regulatory region were required for hotspot activity, gene targeting was used to delete a 9.1 kb segment encompassing the mu gene promoter (Pmu), enhancer (Emu) and switch region (Smu) from the locus. In these cell lines, GC between the Cmu repeats was significantly reduced, indicating that this 'recombination-enhancing sequence' (RES) is necessary for GC hotspot activity at the IgH locus. Importantly, the RES fragment stimulated GC when appended to the same Cmu repeats integrated at ectopic genomic sites. We also show that deletion of Emu and flanking matrix attachment regions (MARs) from the RES abolishes GC hotspot activity at the IgH locus. However, no stimulation of ectopic GC was observed with the Emu/MARs fragment alone. Finally, we provide evidence that no correlation exists between the level of transcription and GC promoted by the RES. We suggest a model whereby Emu/MARS enhances mitotic GC at the endogenous IgH mu locus by effecting chromatin modifications in adjacent DNA.

  20. Transcriptomic analysis of Petunia hybrida in response to salt stress using high throughput RNA sequencing.

    Directory of Open Access Journals (Sweden)

    Gonzalo H Villarino

    Full Text Available Salinity and drought stress are the primary cause of crop losses worldwide. In sodic saline soils sodium chloride (NaCl disrupts normal plant growth and development. The complex interactions of plant systems with abiotic stress have made RNA sequencing a more holistic and appealing approach to study transcriptome level responses in a single cell and/or tissue. In this work, we determined the Petunia transcriptome response to NaCl stress by sequencing leaf samples and assembling 196 million Illumina reads with Trinity software. Using our reference transcriptome we identified more than 7,000 genes that were differentially expressed within 24 h of acute NaCl stress. The proposed transcriptome can also be used as an excellent tool for biological and bioinformatics in the absence of an available Petunia genome and it is available at the SOL Genomics Network (SGN http://solgenomics.net. Genes related to regulation of reactive oxygen species, transport, and signal transductions as well as novel and undescribed transcripts were among those differentially expressed in response to salt stress. The candidate genes identified in this study can be applied as markers for breeding or to genetically engineer plants to enhance salt tolerance. Gene Ontology analyses indicated that most of the NaCl damage happened at 24 h inducing genotoxicity, affecting transport and organelles due to the high concentration of Na+ ions. Finally, we report a modification to the library preparation protocol whereby cDNA samples were bar-coded with non-HPLC purified primers, without affecting the quality and quantity of the RNA-seq data. The methodological improvement presented here could substantially reduce the cost of sample preparation for future high-throughput RNA sequencing experiments.

  1. Transcriptomic analysis of Petunia hybrida in response to salt stress using high throughput RNA sequencing.

    Science.gov (United States)

    Villarino, Gonzalo H; Bombarely, Aureliano; Giovannoni, James J; Scanlon, Michael J; Mattson, Neil S

    2014-01-01

    Salinity and drought stress are the primary cause of crop losses worldwide. In sodic saline soils sodium chloride (NaCl) disrupts normal plant growth and development. The complex interactions of plant systems with abiotic stress have made RNA sequencing a more holistic and appealing approach to study transcriptome level responses in a single cell and/or tissue. In this work, we determined the Petunia transcriptome response to NaCl stress by sequencing leaf samples and assembling 196 million Illumina reads with Trinity software. Using our reference transcriptome we identified more than 7,000 genes that were differentially expressed within 24 h of acute NaCl stress. The proposed transcriptome can also be used as an excellent tool for biological and bioinformatics in the absence of an available Petunia genome and it is available at the SOL Genomics Network (SGN) http://solgenomics.net. Genes related to regulation of reactive oxygen species, transport, and signal transductions as well as novel and undescribed transcripts were among those differentially expressed in response to salt stress. The candidate genes identified in this study can be applied as markers for breeding or to genetically engineer plants to enhance salt tolerance. Gene Ontology analyses indicated that most of the NaCl damage happened at 24 h inducing genotoxicity, affecting transport and organelles due to the high concentration of Na+ ions. Finally, we report a modification to the library preparation protocol whereby cDNA samples were bar-coded with non-HPLC purified primers, without affecting the quality and quantity of the RNA-seq data. The methodological improvement presented here could substantially reduce the cost of sample preparation for future high-throughput RNA sequencing experiments.

  2. Rocky mountain spotted fever characterization and comparison to similar illnesses in a highly endemic area-Arizona, 2002-2011.

    Science.gov (United States)

    Traeger, Marc S; Regan, Joanna J; Humpherys, Dwight; Mahoney, Dianna L; Martinez, Michelle; Emerson, Ginny L; Tack, Danielle M; Geissler, Aimee; Yasmin, Seema; Lawson, Regina; Hamilton, Charlene; Williams, Velda; Levy, Craig; Komatsu, Kenneth; McQuiston, Jennifer H; Yost, David A

    2015-06-01

    Rocky Mountain spotted fever (RMSF) has emerged as a significant cause of morbidity and mortality since 2002 on tribal lands in Arizona. The explosive nature of this outbreak and the recognition of an unexpected tick vector, Rhipicephalus sanguineus, prompted an investigation to characterize RMSF in this unique setting and compare RMSF cases to similar illnesses. We compared medical records of 205 patients with RMSF and 175 with non-RMSF illnesses that prompted RMSF testing during 2002-2011 from 2 Indian reservations in Arizona. RMSF cases in Arizona occurred year-round and peaked later (July-September) than RMSF cases reported from other US regions. Cases were younger (median age, 11 years) and reported fever and rash less frequently, compared to cases from other US regions. Fever was present in 81% of cases but not significantly different from that in patients with non-RMSF illnesses. Classic laboratory abnormalities such as low sodium and platelet counts had small and subtle differences between cases and patients with non-RMSF illnesses. Imaging studies reflected the variability and complexity of the illness but proved unhelpful in clarifying the early diagnosis. RMSF epidemiology in this region appears different than RMSF elsewhere in the United States. No specific pattern of signs, symptoms, or laboratory findings occurred with enough frequency to consistently differentiate RMSF from other illnesses. Due to the nonspecific and variable nature of RMSF presentations, clinicians in this region should aggressively treat febrile illnesses and sepsis with doxycycline for suspected RMSF. Published by Oxford University Press on behalf of the Infectious Diseases Society of America 2015. This work is written by (a) US Government employee(s) and is in the public domain in the US.

  3. Men and Women Exhibit Similar Acute Hypotensive Responses After Low, Moderate, or High-Intensity Plyometric Training.

    Science.gov (United States)

    Ramírez-Campillo, Rodrigo; Abad-Colil, Felipe; Vera, Maritza; Andrade, David C; Caniuqueo, Alexis; Martínez-Salazar, Cristian; Nakamura, Fábio Y; Arazi, Hamid; Cerda-Kohler, Hugo; Izquierdo, Mikel; Alonso-Martínez, Alicia M

    2016-01-01

    The aim of this study was to compare the acute effects of low-, moderate-, high-, and combined-intensity plyometric training on heart rate (HR), systolic blood pressure (SBP), diastolic blood pressure (DBP), and rate-pressure product (RPP) cardiovascular responses in male and female normotensive subjects. Fifteen (8 women) physically active normotensive subjects participated in this study (age 23.5 ± 2.6 years, body mass index 23.8 ± 2.3 kg · m(-2)). Using a randomized crossover design, trials were conducted with rest intervals of at least 48 hours. Each trial comprised 120 jumps, using boxes of 20, 30, and 40 cm for low, moderate, and high intensity, respectively. For combined intensity, the 3 height boxes were combined. Measurements were taken before and after (i.e., every 10 minutes for a period of 90 minutes) each trial. When data responses of men and women were combined, a mean reduction in SBP, DBP, and RPP was observed after all plyometric intensities. No significant differences were observed pre- or postexercise (at any time point) for HR, SBP, DBP, or RPP when low-, moderate-, high-, or combined-intensity trials were compared. No significant differences were observed between male and female subjects, except for a higher SBP reduction in women (-12%) compared with men (-7%) after high-intensity trial. Although there were minor differences across postexercise time points, collectively, the data demonstrated that all plyometric training intensities can induce an acute postexercise hypotensive effect in young normotensive male and female subjects.

  4. Screening of whole genome sequences identified high-impact variants for stallion fertility.

    Science.gov (United States)

    Schrimpf, Rahel; Gottschalk, Maren; Metzger, Julia; Martinsson, Gunilla; Sieme, Harald; Distl, Ottmar

    2016-04-14

    Stallion fertility is an economically important trait due to the increase of artificial insemination in horses. The availability of whole genome sequence data facilitates identification of rare high-impact variants contributing to stallion fertility. The aim of our study was to genotype rare high-impact variants retrieved from next-generation sequencing (NGS)-data of 11 horses in order to unravel harmful genetic variants in large samples of stallions. Gene ontology (GO) terms and search results from public databases were used to obtain a comprehensive list of human und mice genes predicted to participate in the regulation of male reproduction. The corresponding equine orthologous genes were searched in whole genome sequence data of seven stallions and four mares and filtered for high-impact genetic variants using SnpEFF, SIFT and Polyphen 2 software. All genetic variants with the missing homozygous mutant genotype were genotyped on 337 fertile stallions of 19 breeds using KASP genotyping assays or PCR-RFLP. Mixed linear model analysis was employed for an association analysis with de-regressed estimated breeding values of the paternal component of the pregnancy rate per estrus (EBV-PAT). We screened next generation sequenced data of whole genomes from 11 horses for equine genetic variants in 1194 human and mice genes involved in male fertility and linked through common gene ontology (GO) with male reproductive processes. Variants were filtered for high-impact on protein structure and validated through SIFT and Polyphen 2. Only those genetic variants were followed up when the homozygote mutant genotype was missing in the detection sample comprising 11 horses. After this filtering process, 17 single nucleotide polymorphism (SNPs) were left. These SNPs were genotyped in 337 fertile stallions of 19 breeds using KASP genotyping assays or PCR-RFLP. An association analysis in 216 Hanoverian stallions revealed a significant association of the splice-site disruption variant

  5. Protein structural similarity search by Ramachandran codes

    Directory of Open Access Journals (Sweden)

    Chang Chih-Hung

    2007-08-01

    Full Text Available Abstract Background Protein structural data has increased exponentially, such that fast and accurate tools are necessary to access structure similarity search. To improve the search speed, several methods have been designed to reduce three-dimensional protein structures to one-dimensional text strings that are then analyzed by traditional sequence alignment methods; however, the accuracy is usually sacrificed and the speed is still unable to match sequence similarity search tools. Here, we aimed to improve the linear encoding methodology and develop efficient search tools that can rapidly retrieve structural homologs from large protein databases. Results We propose a new linear encoding method, SARST (Structural similarity search Aided by Ramachandran Sequential Transformation. SARST transforms protein structures into text strings through a Ramachandran map organized by nearest-neighbor clustering and uses a regenerative approach to produce substitution matrices. Then, classical sequence similarity search methods can be applied to the structural similarity search. Its accuracy is similar to Combinatorial Extension (CE and works over 243,000 times faster, searching 34,000 proteins in 0.34 sec with a 3.2-GHz CPU. SARST provides statistically meaningful expectation values to assess the retrieved information. It has been implemented into a web service and a stand-alone Java program that is able to run on many different platforms. Conclusion As a database search method, SARST can rapidly distinguish high from low similarities and efficiently retrieve homologous structures. It demonstrates that the easily accessible linear encoding methodology has the potential to serve as a foundation for efficient protein structural similarity search tools. These search tools are supposed applicable to automated and high-throughput functional annotations or predictions for the ever increasing number of published protein structures in this post-genomic era.

  6. Perchlorate reduction by hydrogen autotrophic bacteria and microbial community analysis using high-throughput sequencing.

    Science.gov (United States)

    Wan, Dongjin; Liu, Yongde; Niu, Zhenhua; Xiao, Shuhu; Li, Daorong

    2016-02-01

    Hydrogen autotrophic reduction of perchlorate have advantages of high removal efficiency and harmless to drinking water. But so far the reported information about the microbial community structure was comparatively limited, changes in the biodiversity and the dominant bacteria during acclimation process required detailed study. In this study, perchlorate-reducing hydrogen autotrophic bacteria were acclimated by hydrogen aeration from activated sludge. For the first time, high-throughput sequencing was applied to analyze changes in biodiversity and the dominant bacteria during acclimation process. The Michaelis-Menten model described the perchlorate reduction kinetics well. Model parameters q(max) and K(s) were 2.521-3.245 (mg ClO4(-)/gVSS h) and 5.44-8.23 (mg/l), respectively. Microbial perchlorate reduction occurred across at pH range 5.0-11.0; removal was highest at pH 9.0. The enriched mixed bacteria could use perchlorate, nitrate and sulfate as electron accepter, and the sequence of preference was: NO3(-) > ClO4(-) > SO4(2-). Compared to the feed culture, biodiversity decreased greatly during acclimation process, the microbial community structure gradually stabilized after 9 acclimation cycles. The Thauera genus related to Rhodocyclales was the dominated perchlorate reducing bacteria (PRB) in the mixed culture.

  7. Penicillium arizonense, a new, genome sequenced fungal species, reveals a high chemical diversity in secreted metabolites

    Science.gov (United States)

    Grijseels, Sietske; Nielsen, Jens Christian; Randelovic, Milica; Nielsen, Jens; Nielsen, Kristian Fog; Workman, Mhairi; Frisvad, Jens Christian

    2016-01-01

    A new soil-borne species belonging to the Penicillium section Canescentia is described, Penicillium arizonense sp. nov. (type strain CBS 141311T = IBT 12289T). The genome was sequenced and assembled into 33.7 Mb containing 12,502 predicted genes. A phylogenetic assessment based on marker genes confirmed the grouping of P. arizonense within section Canescentia. Compared to related species, P. arizonense proved to encode a high number of proteins involved in carbohydrate metabolism, in particular hemicellulases. Mining the genome for genes involved in secondary metabolite biosynthesis resulted in the identification of 62 putative biosynthetic gene clusters. Extracts of P. arizonense were analysed for secondary metabolites and austalides, pyripyropenes, tryptoquivalines, fumagillin, pseurotin A, curvulinic acid and xanthoepocin were detected. A comparative analysis against known pathways enabled the proposal of biosynthetic gene clusters in P. arizonense responsible for the synthesis of all detected compounds except curvulinic acid. The capacity to produce biomass degrading enzymes and the identification of a high chemical diversity in secreted bioactive secondary metabolites, offers a broad range of potential industrial applications for the new species P. arizonense. The description and availability of the genome sequence of P. arizonense, further provides the basis for biotechnological exploitation of this species. PMID:27739446

  8. Extracellular DNA amplicon sequencing reveals high levels of benthic eukaryotic diversity in the central Red Sea

    KAUST Repository

    Pearman, John K.

    2015-11-01

    The present study aims to characterize the benthic eukaryotic biodiversity patterns at a coarse taxonomic level in three areas of the central Red Sea (a lagoon, an offshore area in Thuwal and a shallow coastal area near Jeddah) based on extracellular DNA. High-throughput amplicon sequencing targeting the V9 region of the 18S rRNA gene was undertaken for 32 sediment samples. High levels of alpha-diversity were detected with 16,089 operational taxonomic units (OTUs) being identified. The majority of the OTUs were assigned to Metazoa (29.2%), Alveolata (22.4%) and Stramenopiles (17.8%). Stramenopiles (Diatomea) and Alveolata (Ciliophora) were frequent in a lagoon and in shallower coastal stations, whereas metazoans (Arthropoda: Maxillopoda) were dominant in deeper offshore stations. Only 24.6% of total OTUs were shared among all areas. Beta-diversity was generally lower between the lagoon and Jeddah (nearshore) than between either of those and the offshore area, suggesting a nearshore–offshore biodiversity gradient. The current approach allowed for a broad-range of benthic eukaryotic biodiversity to be analysed with significantly less labour than would be required by other traditional taxonomic approaches. Our findings suggest that next generation sequencing techniques have the potential to provide a fast and standardised screening of benthic biodiversity at large spatial and temporal scales.

  9. Exome sequencing identifies highly recurrent MED12 somatic mutations in breast fibroadenoma.

    Science.gov (United States)

    Lim, Weng Khong; Ong, Choon Kiat; Tan, Jing; Thike, Aye Aye; Ng, Cedric Chuan Young; Rajasegaran, Vikneswari; Myint, Swe Swe; Nagarajan, Sanjanaa; Nasir, Nur Diyana Md; McPherson, John R; Cutcutache, Ioana; Poore, Gregory; Tay, Su Ting; Ooi, Wei Siong; Tan, Veronique Kiak Mien; Hartman, Mikael; Ong, Kong Wee; Tan, Benita K T; Rozen, Steven G; Tan, Puay Hoon; Tan, Patrick; Teh, Bin Tean

    2014-08-01

    Fibroadenomas are the most common breast tumors in women under 30 (refs. 1,2). Exome sequencing of eight fibroadenomas with matching whole-blood samples revealed recurrent somatic mutations solely in MED12, which encodes a Mediator complex subunit. Targeted sequencing of an additional 90 fibroadenomas confirmed highly frequent MED12 exon 2 mutations (58/98, 59%) that are probably somatic, with 71% of mutations occurring in codon 44. Using laser capture microdissection, we show that MED12 fibroadenoma mutations are present in stromal but not epithelial mammary cells. Expression profiling of MED12-mutated and wild-type fibroadenomas revealed that MED12 mutations are associated with dysregulated estrogen signaling and extracellular matrix organization. The fibroadenoma MED12 mutation spectrum is nearly identical to that of previously reported MED12 lesions in uterine leiomyoma but not those of other tumors. Benign tumors of the breast and uterus, both of which are key target tissues of estrogen, may thus share a common genetic basis underpinned by highly frequent and specific MED12 mutations.

  10. Bayesian reconstruction of photon interaction sequences for high-resolution PET detectors

    Energy Technology Data Exchange (ETDEWEB)

    Pratx, Guillem; Levin, Craig S [Molecular Imaging Program at Stanford, Department of Radiology, Stanford, CA (United States)], E-mail: cslevin@stanford.edu

    2009-09-07

    Realizing the full potential of high-resolution positron emission tomography (PET) systems involves accurately positioning events in which the annihilation photon deposits all its energy across multiple detector elements. Reconstructing the complete sequence of interactions of each photon provides a reliable way to select the earliest interaction because it ensures that all the interactions are consistent with one another. Bayesian estimation forms a natural framework to maximize the consistency of the sequence with the measurements while taking into account the physics of {gamma}-ray transport. An inherently statistical method, it accounts for the uncertainty in the measured energy and position of each interaction. An algorithm based on maximum a posteriori (MAP) was evaluated for computer simulations. For a high-resolution PET system based on cadmium zinc telluride detectors, 93.8% of the recorded coincidences involved at least one photon multiple-interactions event (PMIE). The MAP estimate of the first interaction was accurate for 85.2% of the single photons. This represents a two-fold reduction in the number of mispositioned events compared to minimum pair distance, a simpler yet efficient positioning method. The point-spread function of the system presented lower tails and higher peak value when MAP was used. This translated into improved image quality, which we quantified by studying contrast and spatial resolution gains.

  11. Digital PCR provides sensitive and absolute calibration for high throughput sequencing

    Directory of Open Access Journals (Sweden)

    Fan H Christina

    2009-03-01

    Full Text Available Abstract Background Next-generation DNA sequencing on the 454, Solexa, and SOLiD platforms requires absolute calibration of the number of molecules to be sequenced. This requirement has two unfavorable consequences. First, large amounts of sample-typically micrograms-are needed for library preparation, thereby limiting the scope of samples which can be sequenced. For many applications, including metagenomics and the sequencing of ancient, forensic, and clinical samples, the quantity of input DNA can be critically limiting. Second, each library requires a titration sequencing run, thereby increasing the cost and lowering the throughput of sequencing. Results We demonstrate the use of digital PCR to accurately quantify 454 and Solexa sequencing libraries, enabling the preparation of sequencing libraries from nanogram quantities of input material while eliminating costly and time-consuming titration runs of the sequencer. We successfully sequenced low-nanogram scale bacterial and mammalian DNA samples on the 454 FLX and Solexa DNA sequencing platforms. This study is the first to definitively demonstrate the successful sequencing of picogram quantities of input DNA on the 454 platform, reducing the sample requirement more than 1000-fold without pre-amplification and the associated bias and reduction in library depth. Conclusion The digital PCR assay allows absolute quantification of sequencing libraries, eliminates uncertainties associated with the construction and application of standard curves to PCR-based quantification, and with a coefficient of variation close to 10%, is sufficiently precise to enable direct sequencing without titration runs.

  12. Markovian Model in High Order Sequence Prediction From Log-Motif Patterns in Agbada Paralic Section, Niger Delta, Nigeria

    International Nuclear Information System (INIS)

    Olabode, S. O.; Adekoya, J. A.

    2002-01-01

    Markovian model in the elucidation of high order sequence was applied to repetitive events of regressive and transgressive phases in the Agbada paralic section Niger Delta. The repetitive events are made up of delta front, delta topset and fluvio-deltaic sediments. The sediments consist of sands, sandstones, siltstones and shales in various proportions. Five wells: MN1, AA1, NP2, NP6 and NP8 were studied.Summary of biostratigraphic report and well log-motif patterns was used to delineate the third order depositional sequences in the wells.Various Markovian properties - observed transition frequency matrix, observed transition probability matrix, fixed probability vector, expected random matrix (randomised transition matrix) and difference matrix were determined for stacked high order sequence (high frequency cyclic events) nested within the third-order sequences using the log-motif patterns for the various sand bodies and shales. Flow diagrams were constructed for each of the depositional sequences to know the likely occurrence of number of cycles.Upward transition matrix between the log-motif patterns and flow diagram to elucidate cyclicity show that the overall regressive sequence of the Niger Delta has been modified by deltaic depositional elements and fluctuations in sea level. The predictions of higher order sequence within third order sequences from Markovian Properties provide good basis for correlation within the depositional sequences. The model has also been used to decipher the dominant depositional processes during the formation of the sequences. Discrete reservoir intervals and seal potentials within the sequences were also predicted from the flow diagrams constructed

  13. Penicillium arizonense, a new, genome sequenced fungal species, reveals a high chemical diversity in secreted metabolites

    DEFF Research Database (Denmark)

    Grijseels, Sietske; Nielsen, Jens Christian; Randelovic, Milica

    2016-01-01

    A new soil-borne species belonging to the Penicillium section Canescentia is described, Penicillium arizonense sp. nov. (type strain CBS 141311T = IBT 12289T). The genome was sequenced and assembled into 33.7 Mb containing 12,502 predicted genes. A phylogenetic assessment based on marker genes...... confirmed the grouping of P. arizonense within section Canescentia. Compared to related species, P. arizonense proved to encode a high number of proteins involved in carbohydrate metabolism, in particular hemicellulases. Mining the genome for genes involved in secondary metabolite biosynthesis resulted...... of biosynthetic gene clusters in P. arizonense responsible for the synthesis of all detected compounds except curvulinic acid. The capacity to produce biomass degrading enzymes and the identification of a high chemical diversity in secreted bioactive secondary metabolites, offers a broad range of potential...

  14. Highly Stereoselective Synthesis of Cyclopentanes bearing Four Stereocenters by a Rhodium Carbene–Initiated Domino Sequence

    Science.gov (United States)

    Parr, Brendan T.; Davies, Huw M. L.

    2014-01-01

    Stereoselective synthesis of a cyclopentane nucleus by convergent annulations constitutes a significant challenge for synthetic chemists. Though a number of biologically relevant cyclopentane natural products are known, more often than not, the cyclopentane core is assembled in a stepwise fashion due to lack of efficient annulation strategies. Herein, we report the rhodium-catalyzed reactions of vinyldiazoacetates with (E)-1,3-disubstituted 2-butenols generate cyclopentanes, containing four new stereogenic centers with very high levels of stereoselectivity (99% ee, >97 : 3 dr). The reaction proceeds by a carbene–initiated domino sequence consisting of five distinct steps: rhodium–bound oxonium ylide formation, [2,3]-sigmatropic rearrangement, oxy-Cope rearrangement, enol–keto tautomerization, and finally an intramolecular carbonyl ene reaction. A systematic study is presented detailing how to control chirality transfer in each of the four stereo-defining steps of the cascade, consummating in the development of a highly stereoselective process. PMID:25082301

  15. The monoclonal S9.6 antibody exhibits highly variable binding affinities towards different R-loop sequences.

    Directory of Open Access Journals (Sweden)

    Fabian König

    Full Text Available The monoclonal antibody S9.6 is a widely-used tool to purify, analyse and quantify R-loop structures in cells. A previous study using the surface plasmon resonance technology and a single-chain variable fragment (scFv of S9.6 showed high affinity (0.6 nM for DNA-RNA and also a high affinity (2.7 nM for RNA-RNA hybrids. We used the microscale thermophoresis method allowing surface independent interaction studies and electromobility shift assays to evaluate additional RNA-DNA hybrid sequences and to quantify the binding affinities of the S9.6 antibody with respect to distinct sequences and their GC-content. Our results confirm high affinity binding to previously analysed sequences, but reveals that binding affinities are highly sequence specific. Our study presents R-loop sequences that independent of GC-content and in different sequence variations exhibit either no binding, binding affinities in the micromolar range and as well high affinity binding in the nanomolar range. Our study questions the usefulness of the S9.6 antibody in the quantitative analysis of R-loop sequences in vivo.

  16. Identification and Characterization of Wilt and Salt Stress-Responsive MicroRNAs in Chickpea through High-Throughput Sequencing

    Science.gov (United States)

    Deokar, Amit Atmaram; Bhardwaj, Ankur R.; Agarwal, Manu; Katiyar-Agarwal, Surekha; Srinivasan, Ramamurthy; Jain, Pradeep Kumar

    2014-01-01

    Chickpea (Cicer arietinum) is the second most widely grown legume worldwide and is the most important pulse crop in the Indian subcontinent. Chickpea productivity is adversely affected by a large number of biotic and abiotic stresses. MicroRNAs (miRNAs) have been implicated in the regulation of plant responses to several biotic and abiotic stresses. This study is the first attempt to identify chickpea miRNAs that are associated with biotic and abiotic stresses. The wilt infection that is caused by the fungus Fusarium oxysporum f.sp. ciceris is one of the major diseases severely affecting chickpea yields. Of late, increasing soil salinization has become a major problem in realizing these potential yields. Three chickpea libraries using fungal-infected, salt-treated and untreated seedlings were constructed and sequenced using next-generation sequencing technology. A total of 12,135,571 unique reads were obtained. In addition to 122 conserved miRNAs belonging to 25 different families, 59 novel miRNAs along with their star sequences were identified. Four legume-specific miRNAs, including miR5213, miR5232, miR2111 and miR2118, were found in all of the libraries. Poly(A)-based qRT-PCR (Quantitative real-time PCR) was used to validate eleven conserved and five novel miRNAs. miR530 was highly up regulated in response to fungal infection, which targets genes encoding zinc knuckle- and microtubule-associated proteins. Many miRNAs responded in a similar fashion under both biotic and abiotic stresses, indicating the existence of cross talk between the pathways that are involved in regulating these stresses. The potential target genes for the conserved and novel miRNAs were predicted based on sequence homologies. miR166 targets a HD-ZIPIII transcription factor and was validated by 5′ RLM-RACE. This study has identified several conserved and novel miRNAs in the chickpea that are associated with gene regulation following exposure to wilt and salt stress. PMID:25295754

  17. Molecular similarity measures.

    Science.gov (United States)

    Maggiora, Gerald M; Shanmugasundaram, Veerabahu

    2011-01-01

    Molecular similarity is a pervasive concept in chemistry. It is essential to many aspects of chemical reasoning and analysis and is perhaps the fundamental assumption underlying medicinal chemistry. Dissimilarity, the complement of similarity, also plays a major role in a growing number of applications of molecular diversity in combinatorial chemistry, high-throughput screening, and related fields. How molecular information is represented, called the representation problem, is important to the type of molecular similarity analysis (MSA) that can be carried out in any given situation. In this work, four types of mathematical structure are used to represent molecular information: sets, graphs, vectors, and functions. Molecular similarity is a pairwise relationship that induces structure into sets of molecules, giving rise to the concept of chemical space. Although all three concepts - molecular similarity, molecular representation, and chemical space - are treated in this chapter, the emphasis is on molecular similarity measures. Similarity measures, also called similarity coefficients or indices, are functions that map pairs of compatible molecular representations that are of the same mathematical form into real numbers usually, but not always, lying on the unit interval. This chapter presents a somewhat pedagogical discussion of many types of molecular similarity measures, their strengths and limitations, and their relationship to one another. An expanded account of the material on chemical spaces presented in the first edition of this book is also provided. It includes a discussion of the topography of activity landscapes and the role that activity cliffs in these landscapes play in structure-activity studies.

  18. Assessing the impact of water treatment on bacterial biofilms in drinking water distribution systems using high-throughput DNA sequencing.

    Science.gov (United States)

    Shaw, Jennifer L A; Monis, Paul; Fabris, Rolando; Ho, Lionel; Braun, Kalan; Drikas, Mary; Cooper, Alan

    2014-12-01

    Biofilm control in drinking water distribution systems (DWDSs) is crucial, as biofilms are known to reduce flow efficiency, impair taste and quality of drinking water and have been implicated in the transmission of harmful pathogens. Microorganisms within biofilm communities are more resistant to disinfection compared to planktonic microorganisms, making them difficult to manage in DWDSs. This study evaluates the impact of four unique drinking water treatments on biofilm community structure using metagenomic DNA sequencing. Four experimental DWDSs were subjected to the following treatments: (1) conventional coagulation, (2) magnetic ion exchange contact (MIEX) plus conventional coagulation, (3) MIEX plus conventional coagulation plus granular activated carbon, and (4) membrane filtration (MF). Bacterial biofilms located inside the pipes of each system were sampled under sterile conditions both (a) immediately after treatment application ('inlet') and (b) at a 1 km distance from the treatment application ('outlet'). Bacterial 16S rRNA gene sequencing revealed that the outlet biofilms were more diverse than those sampled at the inlet for all treatments. The lowest number of unique operational taxonomic units (OTUs) and lowest diversity was observed in the MF inlet. However, the MF system revealed the greatest increase in diversity and OTU count from inlet to outlet. Further, the biofilm communities at the outlet of each system were more similar to one another than to their respective inlet, suggesting that biofilm communities converge towards a common established equilibrium as distance from treatment application increases. Based on the results, MF treatment is most effective at inhibiting biofilm growth, but a highly efficient post-treatment disinfection regime is also critical in order to prevent the high rates of post-treatment regrowth. Copyright © 2014 Elsevier Ltd. All rights reserved.

  19. SNP discovery and High Resolution Melting Analysis from massive transcriptome sequencing in the California red abalone Haliotis rufescens.

    Science.gov (United States)

    Valenzuela-Muñoz, Valentina; Araya-Garay, José Miguel; Gallardo-Escárate, Cristian

    2013-06-01

    The California red abalone, Haliotis rufescens that belongs to the Haliotidae family, is the largest species of abalone in the world that has sustained the major fishery and aquaculture production in the USA and Mexico. This native mollusk has not been evaluated or assigned a conservation category even though in the last few decades it was heavily exploited until it disappeared in some areas along the California coast. In Chile, the red abalone was introduced in the 1970s from California wild abalone stocks for the purposes of aquaculture. Considering the number of years that the red abalone has been cultivated in Chile crucial genetic information is scarce and critical issues remain unresolved. This study reports and validates novel single nucleotide polymorphisms (SNP) markers for the red abalone H. rufescens using cDNA pyrosequencing. A total of 622 high quality SNPs were identified in 146 sequences with an estimated frequency of 1 SNP each 1000bp. Forty-five SNPs markers with functional information for gene ontology were selected. Of these, 8 were polymorphic among the individuals screened: Heat shock protein 70 (HSP70), vitellogenin (VTG), lysin, alginate lyase enzyme (AL), Glucose-regulated protein 94 (GRP94), fructose-bisphosphate aldolase (FBA), sulfatase 1A precursor (S1AP) and ornithine decarboxylase antizyme (ODC). Two additional sequences were also identified with polymorphisms but no similarities with known proteins were achieved. To validate the putative SNP markers, High Resolution Melting Analysis (HRMA) was conducted in a wild and hatchery-bred population. Additionally, SNP cross-amplifications were tested in two further native abalone species, Haliotis fulgens and Haliotis corrugata. This study provides novel candidate genes that could be used to evaluate loss of genetic diversity due to hatchery selection or inbreeding effects. Copyright © 2013 Elsevier B.V. All rights reserved.

  20. Blood pressure control is similar in treated hypertensive patients with optimal or with high-normal albuminuria.

    Science.gov (United States)

    Oliveras, Anna; Armario, Pedro; Lucas, Silvia; de la Sierra, Alejandro

    2014-09-01

    Although elevated urinary albumin excretion (UAE) is associated with cardiovascular prognosis and high blood pressure (BP), it is unknown whether differences in BP control could also exist between patients with different grades of UAE, even in the normal range. We sought to explore the association between different levels of UAE and BP control in treated hypertensive patients. A cohort of 1,200 treated hypertensive patients was evaluated. Clinical data, including 2 office BP measurements and UAE averaged from 2 samples, were recorded. Albuminuria was categorized into 4 groups: G0 (UAE <10mg/g), G1 (UAE 10-29 mg/g), G2 (UAE 30-299 mg/g), and G3 (UAE ≥300 mg/g). Forty-three percent of patients had systolic BP ≥140 mm Hg and/or diastolic BP ≥90 mm Hg. Median UAE was significantly higher (20.3 vs. 11.7 mg/g; P < 0.001) in these patients than in controlled hypertensive patients (BP<140/90 mm Hg). When UAE was categorized into the 4 groups, there were differences in BP control among groups (P < 0.001).The proportion of noncontrolled patients in G2 (52.3%) was significantly higher than in G0 (36.8%) and G1 (41.5%) (P < 0.01 and P < 0.05, respectively). Importantly, no significant differences were observed between G0 and G1 (P = 0.18) or between G2 and G3 (P = 0.48). With G0 as the reference group, the odds ratio of lack of BP control for the G2 group after adjustment for confounders was 1.40 (95% confidence interval =1.16-1.68; P < 0.001). Lack of BP control is more prevalent among patients with microalbuminuria than in patients with normoalbuminuria. No significant difference was seen between patients with optimal or high-normal UAE. © American Journal of Hypertension, Ltd 2014. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  1. High diagnostic yield of syndromic intellectual disability by targeted next-generation sequencing.

    Science.gov (United States)

    Martínez, Francisco; Caro-Llopis, Alfonso; Roselló, Mónica; Oltra, Silvestre; Mayo, Sonia; Monfort, Sandra; Orellana, Carmen

    2017-02-01

    Intellectual disability is a very complex condition where more than 600 genes have been reported. Due to this extraordinary heterogeneity, a large proportion of patients remain without a specific diagnosis and genetic counselling. The need for new methodological strategies in order to detect a greater number of mutations in multiple genes is therefore crucial. In this work, we screened a large panel of 1256 genes (646 pathogenic, 610 candidate) by next-generation sequencing to determine the molecular aetiology of syndromic intellectual disability. A total of 92 patients, negative for previous genetic analyses, were studied together with their parents. Clinically relevant variants were validated by conventional sequencing. A definitive diagnosis was achieved in 29 families by testing the 646 known pathogenic genes. Mutations were found in 25 different genes, where only the genes KMT2D, KMT2A and MED13L were found mutated in more than one patient. A preponderance of de novo mutations was noted even among the X linked conditions. Additionally, seven de novo probably pathogenic mutations were found in the candidate genes AGO1, JARID2, SIN3B, FBXO11, MAP3K7, HDAC2 and SMARCC2. Altogether, this means a diagnostic yield of 39% of the cases (95% CI 30% to 49%). The developed panel proved to be efficient and suitable for the genetic diagnosis of syndromic intellectual disability in a clinical setting. Next-generation sequencing has the potential for high-throughput identification of genetic variations, although the challenges of an adequate clinical interpretation of these variants and the knowledge on further unknown genes causing intellectual disability remain to be solved. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/.

  2. Genetic profiles of cervical tumors by high-throughput sequencing for personalized medical care

    International Nuclear Information System (INIS)

    Muller, Etienne; Brault, Baptiste; Holmes, Allyson; Legros, Angelina; Jeannot, Emmanuelle; Campitelli, Maura; Rousselin, Antoine; Goardon, Nicolas; Frébourg, Thierry; Krieger, Sophie; Crouet, Hubert; Nicolas, Alain; Sastre, Xavier; Vaur, Dominique; Castéra, Laurent

    2015-01-01

    Cancer treatment is facing major evolution since the advent of targeted therapies. Building genetic profiles could predict sensitivity or resistance to these therapies and highlight disease-specific abnormalities, supporting personalized patient care. In the context of biomedical research and clinical diagnosis, our laboratory has developed an oncogenic panel comprised of 226 genes and a dedicated bioinformatic pipeline to explore somatic mutations in cervical carcinomas, using high-throughput sequencing. Twenty-nine tumors were sequenced for exons within 226 genes. The automated pipeline used includes a database and a filtration system dedicated to identifying mutations of interest and excluding false positive and germline mutations. One-hundred and seventy-six total mutational events were found among the 29 tumors. Our cervical tumor mutational landscape shows that most mutations are found in PIK3CA (E545K, E542K) and KRAS (G12D, G13D) and others in FBXW7 (R465C, R505G, R479Q). Mutations have also been found in ALK (V1149L, A1266T) and EGFR (T259M). These results showed that 48% of patients display at least one deleterious mutation in genes that have been already targeted by the Food and Drug Administration approved therapies. Considering deleterious mutations, 59% of patients could be eligible for clinical trials. Sequencing hundreds of genes in a clinical context has become feasible, in terms of time and cost. In the near future, such an analysis could be a part of a battery of examinations along the diagnosis and treatment of cancer, helping to detect sensitivity or resistance to targeted therapies and allow advancements towards personalized oncology

  3. Diversity and Structure of Diazotrophic Communities in Mangrove Rhizosphere, Revealed by High-Throughput Sequencing.

    Science.gov (United States)

    Zhang, Yanying; Yang, Qingsong; Ling, Juan; Van Nostrand, Joy D; Shi, Zhou; Zhou, Jizhong; Dong, Junde

    2017-01-01

    Diazotrophic communities make an essential contribution to the productivity through providing new nitrogen. However, knowledge of the roles that both mangrove tree species and geochemical parameters play in shaping mangove rhizosphere diazotrophic communities is still elusive. Here, a comprehensive examination of the diversity and structure of microbial communities in the rhizospheres of three mangrove species, Rhizophora apiculata , Avicennia marina , and Ceriops tagal , was undertaken using high - throughput sequencing of the 16S rRNA and nifH genes. Our results revealed a great diversity of both the total microbial composition and the diazotrophic composition specifically in the mangrove rhizosphere. Deltaproteobacteria and Gammaproteobacteria were both ubiquitous and dominant, comprising an average of 45.87 and 86.66% of total microbial and diazotrophic communities, respectively. Sulfate-reducing bacteria belonging to the Desulfobacteraceae and Desulfovibrionaceae were the dominant diazotrophs. Community statistical analyses suggested that both mangrove tree species and additional environmental variables played important roles in shaping total microbial and potential diazotroph communities in mangrove rhizospheres. In contrast to the total microbial community investigated by analysis of 16S rRNA gene sequences, most of the dominant diazotrophic groups identified by nifH gene sequences were significantly different among mangrove species. The dominant diazotrophs of the family Desulfobacteraceae were positively correlated with total phosphorus, but negatively correlated with the nitrogen to phosphorus ratio. The Pseudomonadaceae were positively correlated with the concentration of available potassium, suggesting that diazotrophs potentially play an important role in biogeochemical cycles, such as those of nitrogen, phosphorus, sulfur, and potassium, in the mangrove ecosystem.

  4. Diversity and Structure of Diazotrophic Communities in Mangrove Rhizosphere, Revealed by High-Throughput Sequencing

    Directory of Open Access Journals (Sweden)

    Yanying Zhang

    2017-10-01

    Full Text Available Diazotrophic communities make an essential contribution to the productivity through providing new nitrogen. However, knowledge of the roles that both mangrove tree species and geochemical parameters play in shaping mangove rhizosphere diazotrophic communities is still elusive. Here, a comprehensive examination of the diversity and structure of microbial communities in the rhizospheres of three mangrove species, Rhizophora apiculata, Avicennia marina, and Ceriops tagal, was undertaken using high-throughput sequencing of the 16S rRNA and nifH genes. Our results revealed a great diversity of both the total microbial composition and the diazotrophic composition specifically in the mangrove rhizosphere. Deltaproteobacteria and Gammaproteobacteria were both ubiquitous and dominant, comprising an average of 45.87 and 86.66% of total microbial and diazotrophic communities, respectively. Sulfate-reducing bacteria belonging to the Desulfobacteraceae and Desulfovibrionaceae were the dominant diazotrophs. Community statistical analyses suggested that both mangrove tree species and additional environmental variables played important roles in shaping total microbial and potential diazotroph communities in mangrove rhizospheres. In contrast to the total microbial community investigated by analysis of 16S rRNA gene sequences, most of the dominant diazotrophic groups identified by nifH gene sequences were significantly different among mangrove species. The dominant diazotrophs of the family Desulfobacteraceae were positively correlated with total phosphorus, but negatively correlated with the nitrogen to phosphorus ratio. The Pseudomonadaceae were positively correlated with the concentration of available potassium, suggesting that diazotrophs potentially play an important role in biogeochemical cycles, such as those of nitrogen, phosphorus, sulfur, and potassium, in the mangrove ecosystem.

  5. Unprecedented high-resolution view of bacterial operon architecture revealed by RNA sequencing.

    Science.gov (United States)

    Conway, Tyrrell; Creecy, James P; Maddox, Scott M; Grissom, Joe E; Conkle, Trevor L; Shadid, Tyler M; Teramoto, Jun; San Miguel, Phillip; Shimada, Tomohiro; Ishihama, Akira; Mori, Hirotada; Wanner, Barry L

    2014-07-08

    We analyzed the transcriptome of Escherichia coli K-12 by strand-specific RNA sequencing at single-nucleotide resolution during steady-state (logarithmic-phase) growth and upon entry into stationary phase in glucose minimal medium. To generate high-resolution transcriptome maps, we developed an organizational schema which showed that in practice only three features are required to define operon architecture: the promoter, terminator, and deep RNA sequence read coverage. We precisely annotated 2,122 promoters and 1,774 terminators, defining 1,510 operons with an average of 1.98 genes per operon. Our analyses revealed an unprecedented view of E. coli operon architecture. A large proportion (36%) of operons are complex with internal promoters or terminators that generate multiple transcription units. For 43% of operons, we observed differential expression of polycistronic genes, despite being in the same operons, indicating that E. coli operon architecture allows fine-tuning of gene expression. We found that 276 of 370 convergent operons terminate inefficiently, generating complementary 3' transcript ends which overlap on average by 286 nucleotides, and 136 of 388 divergent operons have promoters arranged such that their 5' ends overlap on average by 168 nucleotides. We found 89 antisense transcripts of 397-nucleotide average length, 7 unannotated transcripts within intergenic regions, and 18 sense transcripts that completely overlap operons on the opposite strand. Of 519 overlapping transcripts, 75% correspond to sequences that are highly conserved in E. coli (>50 genomes). Our data extend recent studies showing unexpected transcriptome complexity in several bacteria and suggest that antisense RNA regulation is widespread. Importance: We precisely mapped the 5' and 3' ends of RNA transcripts across the E. coli K-12 genome by using a single-nucleotide analytical approach. Our resulting high-resolution transcriptome maps show that ca. one-third of E. coli operons are

  6. Transcoding method from H.264/AVC to high efficiency video coding based on similarity of intraprediction, interprediction, and motion vector

    Science.gov (United States)

    Liu, Mei-Feng; Zhong, Guo-Yun; He, Xiao-Hai; Qing, Lin-Bo

    2016-09-01

    Currently, most video resources on line are encoded in the H.264/AVC format. More fluent video transmission can be obtained if these resources are encoded in the newest international video coding standard: high efficiency video coding (HEVC). In order to improve the video transmission and storage on line, a transcoding method from H.264/AVC to HEVC is proposed. In this transcoding algorithm, the coding information of intraprediction, interprediction, and motion vector (MV) in H.264/AVC video stream are used to accelerate the coding in HEVC. It is found through experiments that the region of interprediction in HEVC overlaps that in H.264/AVC. Therefore, the intraprediction for the region in HEVC, which is interpredicted in H.264/AVC, can be skipped to reduce coding complexity. Several macroblocks in H.264/AVC are combined into one PU in HEVC when the MV difference between two of the macroblocks in H.264/AVC is lower than a threshold. This method selects only one coding unit depth and one prediction unit (PU) mode to reduce the coding complexity. An MV interpolation method of combined PU in HEVC is proposed according to the areas and distances between the center of one macroblock in H.264/AVC and that of the PU in HEVC. The predicted MV accelerates the motion estimation for HEVC coding. The simulation results show that our proposed algorithm achieves significant coding time reduction with a little loss in bitrates distortion rate, compared to the existing transcoding algorithms and normal HEVC coding.

  7. Are high energy heavy ion collisions similar to a little bang, or just a very nice firework?

    Energy Technology Data Exchange (ETDEWEB)

    Shuryak, E.V. [State University of New York, NY (United States)

    2001-07-01

    The talk is a brief overview of recent progress in heavy ion physics, with emphasis on applications of macroscopic approaches. The central issues are whether the systems exhibit macroscopic behavior we need in order to interpret it as excited hadronic matter, and, if so, what is its effective Equation of State (EoS). This, in turn, depends on the collision rate in matter: we think we understand in hadronic matter near freeze-out, but certainly not at earlier stages of the collisions. Still (and this is about the most important statement we make) there is no indication that is not high enough, so that a hydro description of excited matter be possible. More specifically, we concentrate on such properties of the produced excited system as collective flow, particle composition and fluctuations relaxation are ultimately a measure of a collision rate we would like to know. We also try to explain what exactly are the expected differences between collisions at AGS/SPS and RHIC energies. (author)

  8. Are high energy heavy ion collisions similar to a little bang, or just a very nice firework?

    International Nuclear Information System (INIS)

    Shuryak, E.V.

    2001-01-01

    The talk is a brief overview of recent progress in heavy ion physics, with emphasis on applications of macroscopic approaches. The central issues are whether the systems exhibit macroscopic behavior we need in order to interpret it as excited hadronic matter, and, if so, what is its effective Equation of State (EoS). This, in turn, depends on the collision rate in matter: we think we understand in hadronic matter near freeze-out, but certainly not at earlier stages of the collisions. Still (and this is about the most important statement we make) there is no indication that is not high enough, so that a hydro description of excited matter be possible. More specifically, we concentrate on such properties of the produced excited system as collective flow, particle composition and fluctuations relaxation are ultimately a measure of a collision rate we would like to know. We also try to explain what exactly are the expected differences between collisions at AGS/SPS and RHIC energies. (author)

  9. Are High Energy Heavy Ion Collisions similar to a Little Bang, or just a very nice Firework?

    Science.gov (United States)

    Shuryak, E. V.

    2001-09-01

    The talk is a brief overview of recent progress in heavy ion physics, with emphasis on applications of macroscopic approaches. The central issues are whether the systems exhibit macroscopic behavior we need in order to interpret it as excited hadronic matter, and, if so, what is its effective Equation of State (EoS). This, in turn, depends on the collision rate in matter: we think we understand in hadronic matter near freeze-out, but certainly not at earlier stages of the collisions. Still (and this is about the most important statement we make) there is no indication that it is not high enough, so that a hydro description of excited matter be possible. More specifically, we concentrate on such properties of the produced excited system as collective flow, particle composition and fluctuations. Note that both a generation of a pressure and the rate of fluctuation relaxation are ultimately a measure of a collision rate we would like to know. We also try to explain what exactly are the expected differences between collisions at AGS/SPS and RHIC energies.

  10. Why barcode? High-throughput multiplex sequencing of mitochondrial genomes for molecular systematics.

    Science.gov (United States)

    Timmermans, M J T N; Dodsworth, S; Culverwell, C L; Bocak, L; Ahrens, D; Littlewood, D T J; Pons, J; Vogler, A P

    2010-11-01

    Mitochondrial genome sequences are important markers for phylogenetics but taxon sampling remains sporadic because of the great effort and cost required to acquire full-length sequences. Here, we demonstrate a simple, cost-effective way to sequence the full complement of protein coding mitochondrial genes from pooled samples using the 454/Roche platform. Multiplexing was achieved without the need for expensive indexing tags ('barcodes'). The method was trialled with a set of long-range polymerase chain reaction (PCR) fragments from 30 species of Coleoptera (beetles) sequenced in a 1/16th sector of a sequencing plate. Long contigs were produced from the pooled sequences with sequencing depths ranging from ∼10 to 100× per contig. Species identity of individual contigs was established via three 'bait' sequences matching disparate parts of the mitochondrial genome obtained by conventional PCR and Sanger sequencing. This proved that assembly of contigs from the sequencing pool was correct. Our study produced sequences for 21 nearly complete and seven partial sets of protein coding mitochondrial genes. Combined with existing sequences for 25 taxa, an improved estimate of basal relationships in Coleoptera was obtained. The procedure could be employed routinely for mitochondrial genome sequencing at the species level, to provide improved species 'barcodes' that currently use the cox1 gene only.

  11. Efficient DNA fingerprinting based on the targeted sequencing of active retrotransposon insertion sites using a bench-top high-throughput sequencing platform.

    Science.gov (United States)

    Monden, Yuki; Yamamoto, Ayaka; Shindo, Akiko; Tahara, Makoto

    2014-10-01

    In many crop species, DNA fingerprinting is required for the precise identification of cultivars to protect the rights of breeders. Many families of retrotransposons have multiple copies throughout the eukaryotic genome and their integrated copies are inherited genetically. Thus, their insertion polymorphisms among cultivars are useful for DNA fingerprinting. In this study, we conducted a DNA fingerprinting based on the insertion polymorphisms of active retrotransposon families (Rtsp-1 and LIb) in sweet potato. Using 38 cultivars, we identified 2,024 insertion sites in the two families with an Illumina MiSeq sequencing platform. Of these insertion sites, 91.4% appeared to be polymorphic among the cultivars and 376 cultivar-specific insertion sites were identified, which were converted directly into cultivar-specific sequence-characterized amplified region (SCAR) markers. A phylogenetic tree was constructed using these insertion sites, which corresponded well with known pedigree information, thereby indicating their suitability for genetic diversity studies. Thus, the genome-wide comparative analysis of active retrotransposon insertion sites using the bench-top MiSeq sequencing platform is highly effective for DNA fingerprinting without any requirement for whole genome sequence information. This approach may facilitate the development of practical polymerase chain reaction-based cultivar diagnostic system and could also be applied to the determination of genetic relationships. © The Author 2014. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  12. Draft Genome Sequence of Komagataeibacter rhaeticus Strain AF1, a High Producer of Cellulose, Isolated from Kombucha Tea.

    Science.gov (United States)

    Dos Santos, Renato Augusto Corrêa; Berretta, Andresa A; Barud, Hernane da Silva; Ribeiro, Sidney José Lima; González-García, Laura Natalia; Zucchi, Tiago Domingues; Goldman, Gustavo H; Riaño-Pachón, Diego M

    2014-07-24

    Here, we present the draft genome sequence of Komagatabaeicter rhaeticus strain AF1, which was isolated from Kombucha tea and is capable of producing high levels of cellulose. Copyright © 2014 dos Santos et al.

  13. Effects of High Intensity White Noise on Short-Term Memory for Position in a List and Sequence

    Science.gov (United States)

    Daee, Safar; Wilding, J. M.

    1977-01-01

    Seven experiments are described investigating the effecy of high intensity white noise during the visual presentation of words on a number of short-term memory tasks. Examines results relative to position learning and sequence learning. (Editor/RK)

  14. Two-stage clustering (TSC: a pipeline for selecting operational taxonomic units for the high-throughput sequencing of PCR amplicons.

    Directory of Open Access Journals (Sweden)

    Xiao-Tao Jiang

    Full Text Available Clustering 16S/18S rRNA amplicon sequences into operational taxonomic units (OTUs is a critical step for the bioinformatic analysis of microbial diversity. Here, we report a pipeline for selecting OTUs with a relatively low computational demand and a high degree of accuracy. This pipeline is referred to as two-stage clustering (TSC because it divides tags into two groups according to their abundance and clusters them sequentially. The more abundant group is clustered using a hierarchical algorithm similar to that in ESPRIT, which has a high degree of accuracy but is computationally costly for large datasets. The rarer group, which includes the majority of tags, is then heuristically clustered to improve efficiency. To further improve the computational efficiency and accuracy, two preclustering steps are implemented. To maintain clustering accuracy, all tags are grouped into an OTU depending on their pairwise Needleman-Wunsch distance. This method not only improved the computational efficiency but also mitigated the spurious OTU estimation from 'noise' sequences. In addition, OTUs clustered using TSC showed comparable or improved performance in beta-diversity comparisons compared to existing OTU selection methods. This study suggests that the distribution of sequencing datasets is a useful property for improving the computational efficiency and increasing the clustering accuracy of the high-throughput sequencing of PCR amplicons. The software and user guide are freely available at http://hwzhoulab.smu.edu.cn/paperdata/.

  15. Viral metagenomics: Analysis of begomoviruses by illumina high-throughput sequencing

    KAUST Repository

    Idris, Ali; Al-Saleh, Mohammed; Piatek, Marek J.; Al-Shahwan, Ibrahim; Ali, Shahjahan; Brown, Judith K.

    2014-01-01

    Traditional DNA sequencing methods are inefficient, lack the ability to discern the least abundant viral sequences, and ineffective for determining the extent of variability in viral populations. Here, populations of single-stranded DNA plant

  16. Artifact free T2{sup *}-weighted imaging at high spatial resolution using segmented EPI sequences

    Energy Technology Data Exchange (ETDEWEB)

    Heiler, Patrick Michael; Schad, Lothar Rudi [Heidelberg Univ., Mannheim (Germany). Computer Assisted Clinical Medicine; Schmitter, Sebastian [German Cancer Research Center, Heidelberg (Germany). Dept. of Medical Physics in Radiology

    2010-07-01

    The aim of this work was the development of novel measurement techniques that acquire high resolution T2{sup *}-weighted datasets in measurement times as short as possible without suffering from noticeable blurring and ghosting artifacts. Therefore, two new measurement techniques were developed that acquire a smoother k-space than generic multi shot echo planar imaging sequences. One is based on the principle of echo train shifting, the other on the reversed gradient method. Simulations and phantom measurements demonstrate that echo train shifting works properly and reduces artifacts in multi shot echo planar imaging. For maximum SNR-efficiency this technique was further improved by adding a second contrast. Both contrasts can be acquired within a prolongation in measurement time by a factor of 1.5, leading to an SNR increase by approximately {radical}2. Furthermore it is demonstrated that the reversed gradient method remarkably reduces artifacts caused by a discontinuous k-space weighting. Assuming sequence parameters as feasible for fMRI experiments, artifact free T2{sup *}-weighted images with a matrix size of 256 x 256 leading to an in-plane resolution in the submillimeter range can be obtained in about 2 s per slice. (orig.)

  17. Massively parallel digital high resolution melt for rapid and absolutely quantitative sequence profiling

    Science.gov (United States)

    Velez, Daniel Ortiz; Mack, Hannah; Jupe, Julietta; Hawker, Sinead; Kulkarni, Ninad; Hedayatnia, Behnam; Zhang, Yang; Lawrence, Shelley; Fraley, Stephanie I.

    2017-02-01

    In clinical diagnostics and pathogen detection, profiling of complex samples for low-level genotypes represents a significant challenge. Advances in speed, sensitivity, and extent of multiplexing of molecular pathogen detection assays are needed to improve patient care. We report the development of an integrated platform enabling the identification of bacterial pathogen DNA sequences in complex samples in less than four hours. The system incorporates a microfluidic chip and instrumentation to accomplish universal PCR amplification, High Resolution Melting (HRM), and machine learning within 20,000 picoliter scale reactions, simultaneously. Clinically relevant concentrations of bacterial DNA molecules are separated by digitization across 20,000 reactions and amplified with universal primers targeting the bacterial 16S gene. Amplification is followed by HRM sequence fingerprinting in all reactions, simultaneously. The resulting bacteria-specific melt curves are identified by Support Vector Machine learning, and individual pathogen loads are quantified. The platform reduces reaction volumes by 99.995% and achieves a greater than 200-fold increase in dynamic range of detection compared to traditional PCR HRM approaches. Type I and II error rates are reduced by 99% and 100% respectively, compared to intercalating dye-based digital PCR (dPCR) methods. This technology could impact a number of quantitative profiling applications, especially infectious disease diagnostics.

  18. Genotyping by PCR and High-Throughput Sequencing of Commercial Probiotic Products Reveals Composition Biases.

    Directory of Open Access Journals (Sweden)

    Wesley Morovic

    2016-11-01

    Full Text Available Recent advances in microbiome research have brought renewed focus on beneficial bacteria, many of which are available in food and dietary supplements. Although probiotics have historically been defined as microorganisms that convey health benefits when ingested in sufficient viable amounts, this description now includes the stipulation well defined strains, encompassing definitive taxonomy for consumer consideration and regulatory oversight. Here, we evaluated 52 commercial dietary supplements covering a range of labeled species, and determined their content using plate counting, targeted genotyping. Additionally, strain identities were assessed using methods recently published by the United States Pharmacopeial Convention. We also determined the relative abundance of individual bacteria by high-throughput sequencing (HTS of the 16S rRNA sequence using paired-end 2x250bp Illumina MiSeq technology. Using multiple methods, we tested the hypothesis that products do contain the quantitative amount of labeled bacteria, and qualitative list of labeled microbial species. We found that 17 samples (33% were below label claim for CFU prior to their expiration dates. A multiplexed-PCR scheme showed that only 30/52 (58% of the products contained a correctly labeled classification, with issues encompassing incorrect taxonomy, missing species and un-labeled species. The HTS revealed that many blended products consisted predominantly of Lactobacillus acidophilus and Bifidobacterium animalis subsp. lactis. These results highlight the need for reliable methods to qualitatively determine the correct taxonomy and quantitatively ascertain the relative amounts of mixed microbial populations in commercial probiotic products.

  19. Comprehensive evaluation and optimization of amplicon library preparation methods for high-throughput antibody sequencing.

    Science.gov (United States)

    Menzel, Ulrike; Greiff, Victor; Khan, Tarik A; Haessler, Ulrike; Hellmann, Ina; Friedensohn, Simon; Cook, Skylar C; Pogson, Mark; Reddy, Sai T

    2014-01-01

    High-throughput sequencing (HTS) of antibody repertoire libraries has become a powerful tool in the field of systems immunology. However, numerous sources of bias in HTS workflows may affect the obtained antibody repertoire data. A crucial step in antibody library preparation is the addition of short platform-specific nucleotide adapter sequences. As of yet, the impact of the method of adapter addition on experimental library preparation and the resulting antibody repertoire HTS datasets has not been thoroughly investigated. Therefore, we compared three standard library preparation methods by performing Illumina HTS on antibody variable heavy genes from murine antibody-secreting cells. Clonal overlap and rank statistics demonstrated that the investigated methods produced equivalent HTS datasets. PCR-based methods were experimentally superior to ligation with respect to speed, efficiency, and practicality. Finally, using a two-step PCR based method we established a protocol for antibody repertoire library generation, beginning from inputs as low as 1 ng of total RNA. In summary, this study represents a major advance towards a standardized experimental framework for antibody HTS, thus opening up the potential for systems-based, cross-experiment meta-analyses of antibody repertoires.

  20. Pyicos: a versatile toolkit for the analysis of high-throughput sequencing data.

    Science.gov (United States)

    Althammer, Sonja; González-Vallinas, Juan; Ballaré, Cecilia; Beato, Miguel; Eyras, Eduardo

    2011-12-15

    High-throughput sequencing (HTS) has revolutionized gene regulation studies and is now fundamental for the detection of protein-DNA and protein-RNA binding, as well as for measuring RNA expression. With increasing variety and sequencing depth of HTS datasets, the need for more flexible and memory-efficient tools to analyse them is growing. We describe Pyicos, a powerful toolkit for the analysis of mapped reads from diverse HTS experiments: ChIP-Seq, either punctuated or broad signals, CLIP-Seq and RNA-Seq. We prove the effectiveness of Pyicos to select for significant signals and show that its accuracy is comparable and sometimes superior to that of methods specifically designed for each particular type of experiment. Pyicos facilitates the analysis of a variety of HTS datatypes through its flexibility and memory efficiency, providing a useful framework for data integration into models of regulatory genomics. Open-source software, with tutorials and protocol files, is available at http://regulatorygenomics.upf.edu/pyicos or as a Galaxy server at http://regulatorygenomics.upf.edu/galaxy eduardo.eyras@upf.edu Supplementary data are available at Bioinformatics online.

  1. Aerobic granulation strategy for bioaugmentation of a sequencing batch reactor (SBR) treating high strength pyridine wastewater

    Energy Technology Data Exchange (ETDEWEB)

    Liu, Xiaodong; Chen, Yan [Jiangsu Key Laboratory for Chemical Pollution Control and Resources Reuse, School of Environmental and Biological Engineering, Nanjing University of Science and Technology, Nanjing 210094, Jiangsu Province (China); Zhang, Xin [Jiangsu Key Laboratory for Chemical Pollution Control and Resources Reuse, School of Environmental and Biological Engineering, Nanjing University of Science and Technology, Nanjing 210094, Jiangsu Province (China); Suzhou Institute of Architectural Design Co., Ltd, Suzhou 215021, Jiangsu Province (China); Jiang, Xinbai; Wu, Shijing [Jiangsu Key Laboratory for Chemical Pollution Control and Resources Reuse, School of Environmental and Biological Engineering, Nanjing University of Science and Technology, Nanjing 210094, Jiangsu Province (China); Shen, Jinyou, E-mail: shenjinyou@mail.njust.edu.cn [Jiangsu Key Laboratory for Chemical Pollution Control and Resources Reuse, School of Environmental and Biological Engineering, Nanjing University of Science and Technology, Nanjing 210094, Jiangsu Province (China); Sun, Xiuyun; Li, Jiansheng; Lu, Lude [Jiangsu Key Laboratory for Chemical Pollution Control and Resources Reuse, School of Environmental and Biological Engineering, Nanjing University of Science and Technology, Nanjing 210094, Jiangsu Province (China); Wang, Lianjun, E-mail: wanglj@mail.njust.edu.cn [Jiangsu Key Laboratory for Chemical Pollution Control and Resources Reuse, School of Environmental and Biological Engineering, Nanjing University of Science and Technology, Nanjing 210094, Jiangsu Province (China)

    2015-09-15

    Abstract: Aerobic granules were successfully cultivated in a sequencing batch reactor (SBR), using a single bacterial strain Rhizobium sp. NJUST18 as the inoculum. NJUST18 presented as both a good pyridine degrader and an efficient autoaggregator. Stable granules with diameter of 0.5–1 mm, sludge volume index of 25.6 ± 3.6 mL g{sup −1} and settling velocity of 37.2 ± 2.7 m h{sup −1}, were formed in SBR following 120-day cultivation. These granules exhibited excellent pyridine degradation performance, with maximum volumetric degradation rate (V{sub max}) varied between 1164.5 mg L{sup −1} h{sup −1} and 1867.4 mg L{sup −1} h{sup −1}. High-throughput sequencing analysis exhibited a large shift in microbial community structure, since the SBR was operated under open condition. Paracoccus and Comamonas were found to be the most predominant species in the aerobic granule system after the system had stabilized. The initially inoculated Rhizobium sp. lost its dominance during aerobic granulation. However, the inoculation of Rhizobium sp. played a key role in the start-up process of this bioaugmentation system. This study demonstrated that, in addition to the hydraulic selection pressure during settling and effluent discharge, the selection of aggregating bacterial inocula is equally important for the formation of the aerobic granule.

  2. Aerobic granulation strategy for bioaugmentation of a sequencing batch reactor (SBR) treating high strength pyridine wastewater

    International Nuclear Information System (INIS)

    Liu, Xiaodong; Chen, Yan; Zhang, Xin; Jiang, Xinbai; Wu, Shijing; Shen, Jinyou; Sun, Xiuyun; Li, Jiansheng; Lu, Lude; Wang, Lianjun

    2015-01-01

    Abstract: Aerobic granules were successfully cultivated in a sequencing batch reactor (SBR), using a single bacterial strain Rhizobium sp. NJUST18 as the inoculum. NJUST18 presented as both a good pyridine degrader and an efficient autoaggregator. Stable granules with diameter of 0.5–1 mm, sludge volume index of 25.6 ± 3.6 mL g −1 and settling velocity of 37.2 ± 2.7 m h −1 , were formed in SBR following 120-day cultivation. These granules exhibited excellent pyridine degradation performance, with maximum volumetric degradation rate (V max ) varied between 1164.5 mg L −1 h −1 and 1867.4 mg L −1 h −1 . High-throughput sequencing analysis exhibited a large shift in microbial community structure, since the SBR was operated under open condition. Paracoccus and Comamonas were found to be the most predominant species in the aerobic granule system after the system had stabilized. The initially inoculated Rhizobium sp. lost its dominance during aerobic granulation. However, the inoculation of Rhizobium sp. played a key role in the start-up process of this bioaugmentation system. This study demonstrated that, in addition to the hydraulic selection pressure during settling and effluent discharge, the selection of aggregating bacterial inocula is equally important for the formation of the aerobic granule

  3. Artifact free T2*-weighted imaging at high spatial resolution using segmented EPI sequences

    International Nuclear Information System (INIS)

    Heiler, Patrick Michael; Schad, Lothar Rudi; Schmitter, Sebastian

    2010-01-01

    The aim of this work was the development of novel measurement techniques that acquire high resolution T2 * -weighted datasets in measurement times as short as possible without suffering from noticeable blurring and ghosting artifacts. Therefore, two new measurement techniques were developed that acquire a smoother k-space than generic multi shot echo planar imaging sequences. One is based on the principle of echo train shifting, the other on the reversed gradient method. Simulations and phantom measurements demonstrate that echo train shifting works properly and reduces artifacts in multi shot echo planar imaging. For maximum SNR-efficiency this technique was further improved by adding a second contrast. Both contrasts can be acquired within a prolongation in measurement time by a factor of 1.5, leading to an SNR increase by approximately √2. Furthermore it is demonstrated that the reversed gradient method remarkably reduces artifacts caused by a discontinuous k-space weighting. Assuming sequence parameters as feasible for fMRI experiments, artifact free T2 * -weighted images with a matrix size of 256 x 256 leading to an in-plane resolution in the submillimeter range can be obtained in about 2 s per slice. (orig.)

  4. Similarity Measure of Graphs

    Directory of Open Access Journals (Sweden)

    Amine Labriji

    2017-07-01

    Full Text Available The topic of identifying the similarity of graphs was considered as highly recommended research field in the Web semantic, artificial intelligence, the shape recognition and information research. One of the fundamental problems of graph databases is finding similar graphs to a graph query. Existing approaches dealing with this problem are usually based on the nodes and arcs of the two graphs, regardless of parental semantic links. For instance, a common connection is not identified as being part of the similarity of two graphs in cases like two graphs without common concepts, the measure of similarity based on the union of two graphs, or the one based on the notion of maximum common sub-graph (SCM, or the distance of edition of graphs. This leads to an inadequate situation in the context of information research. To overcome this problem, we suggest a new measure of similarity between graphs, based on the similarity measure of Wu and Palmer. We have shown that this new measure satisfies the properties of a measure of similarities and we applied this new measure on examples. The results show that our measure provides a run time with a gain of time compared to existing approaches. In addition, we compared the relevance of the similarity values obtained, it appears that this new graphs measure is advantageous and  offers a contribution to solving the problem mentioned above.

  5. High similarity of Trypanosoma cruzi kDNA genetic profiles detected by LSSP-PCR within family groups in an endemic area of Chagas disease in Brazil

    Directory of Open Access Journals (Sweden)

    Sandra Maria Alkmim-Oliveira

    2014-10-01

    Full Text Available Introduction Determining the genetic similarities among Trypanosoma cruzi populations isolated from different hosts and vectors is very important to clarify the epidemiology of Chagas disease. Methods An epidemiological study was conducted in a Brazilian endemic area for Chagas disease, including 76 chronic chagasic individuals (96.1% with an indeterminate form; 46.1% with positive hemoculture. Results T. cruzi I (TcI was isolated from one child and TcII was found in the remaining (97.1% subjects. Low-stringency single-specific-primer-polymerase chain reaction (LSSP-PCR showed high heterogeneity among TcII populations (46% of shared bands; however, high similarities (80-100% among pairs of mothers/children, siblings, or cousins were detected. Conclusions LSSP-PCR showed potential for identifying similar parasite populations among individuals with close kinship in epidemiological studies of Chagas disease.

  6. Sequencing the threat and recommendation components of persuasive messages differentially improves the effectiveness of high- and low-distressing imagery in an anti-alcohol message in students.

    Science.gov (United States)

    Brown, Stephen L; West, Charlotte

    2015-05-01

    Distressing imagery is often used to improve the persuasiveness of mass-reach health promotion messages, but its effectiveness may be limited because audiences avoid attending to content. Prior self-affirmation or self-efficacy inductions have been shown to reduce avoidance and improve audience responsiveness to distressing messages, but these are difficult to introduce into a mass-reach context. Reasoning that a behavioural recommendation may have a similar effect, we reversed the traditional threat-behavioural recommendation health promotion message sequence. 2 × 2 experimental design: Factor 1, high- and low-distress images; Factor 2, threat-recommendation and recommendation-threat sequences. Ninety-one students were exposed to an identical text message accompanied by high- or low-distress imagery presented in threat-recommendation and recommendation-threat sequences. For the high-distress message, greater persuasion was observed for the recommendation-threat than the threat-recommendation sequence. This was partially mediated by participants' greater self-exposure to the threat component of the message, which we attribute to the effect of sequence in reducing attentional avoidance. For the low-distress message, greater persuasion was observed for the threat-recommendation sequence, which was not mediated by reading time allocated to the threat. Tailoring message sequence to suit the degree of distress that message developers wish to induce provides a tool that could improve persuasive messages. These findings provide a first step in this process and discuss further steps needed to consolidate and expand these findings. Statement of contribution What is already known on this subject? Health promotion messages accompanied by distressing imagery might, under some circumstances, persuade individuals to engage in healthier behaviour. Audiences can respond defensively to distressing imagery, but may be less inclined to do so when an easily followed behavioural

  7. Attention-Based Recurrent Temporal Restricted Boltzmann Machine for Radar High Resolution Range Profile Sequence Recognition

    Directory of Open Access Journals (Sweden)

    Yifan Zhang

    2018-05-01

    Full Text Available The High Resolution Range Profile (HRRP recognition has attracted great concern in the field of Radar Automatic Target Recognition (RATR. However, traditional HRRP recognition methods failed to model high dimensional sequential data efficiently and have a poor anti-noise ability. To deal with these problems, a novel stochastic neural network model named Attention-based Recurrent Temporal Restricted Boltzmann Machine (ARTRBM is proposed in this paper. RTRBM is utilized to extract discriminative features and the attention mechanism is adopted to select major features. RTRBM is efficient to model high dimensional HRRP sequences because it can extract the information of temporal and spatial correlation between adjacent HRRPs. The attention mechanism is used in sequential data recognition tasks including machine translation and relation classification, which makes the model pay more attention to the major features of recognition. Therefore, the combination of RTRBM and the attention mechanism makes our model effective for extracting more internal related features and choose the important parts of the extracted features. Additionally, the model performs well with the noise corrupted HRRP data. Experimental results on the Moving and Stationary Target Acquisition and Recognition (MSTAR dataset show that our proposed model outperforms other traditional methods, which indicates that ARTRBM extracts, selects, and utilizes the correlation information between adjacent HRRPs effectively and is suitable for high dimensional data or noise corrupted data.

  8. The main challenges that remain in applying high-throughput sequencing to clinical diagnostics.

    Science.gov (United States)

    Loeffelholz, Michael; Fofanov, Yuriy

    2015-01-01

    Over the last 10 years, the quality, price and availability of high-throughput sequencing instruments have improved to the point that this technology may be close to becoming a routine tool in the diagnostic microbiology laboratory. Two groups of challenges, however, have to be resolved in order to move this powerful research technology into routine use in the clinical microbiology laboratory. The computational/bioinformatics challenges include data storage cost and privacy concerns, requiring analysis to be performed without access to cloud storage or expensive computational infrastructure. The logistical challenges include interpretation of complex results and acceptance and understanding of the advantages and limitations of this technology by the medical community. This article focuses on the approaches to address these challenges, such as file formats, algorithms, data collection, reporting and good laboratory practices.

  9. Opera: reconstructing optimal genomic scaffolds with high-throughput paired-end sequences.

    Science.gov (United States)

    Gao, Song; Sung, Wing-Kin; Nagarajan, Niranjan

    2011-11-01

    Scaffolding, the problem of ordering and orienting contigs, typically using paired-end reads, is a crucial step in the assembly of high-quality draft genomes. Even as sequencing technologies and mate-pair protocols have improved significantly, scaffolding programs still rely on heuristics, with no guarantees on the quality of the solution. In this work, we explored the feasibility of an exact solution for scaffolding and present a first tractable solution for this problem (Opera). We also describe a graph contraction procedure that allows the solution to scale to large scaffolding problems and demonstrate this by scaffolding several large real and synthetic datasets. In comparisons with existing scaffolders, Opera simultaneously produced longer and more accurate scaffolds demonstrating the utility of an exact approach. Opera also incorporates an exact quadratic programming formulation to precisely compute gap sizes (Availability: http://sourceforge.net/projects/operasf/ ).

  10. Comparative transcriptome analysis within the Lolium/Festuca species complex reveals high sequence conservation

    DEFF Research Database (Denmark)

    Czaban, Adrian; Sharma, Sapna; Byrne, Stephen

    2015-01-01

    species from the Lolium-Festuca complex, ranging from 52,166 to 72,133 transcripts per assembly. We have also predicted a set of proteins and validated it with a high-confidence protein database from three closely related species (H. vulgare, B. distachyon and O. sativa). We have obtained gene family...... clusters for the four species using OrthoMCL and analyzed their inferred phylogenetic relationships. Our results indicate that VRN2 is a candidate gene for differentiating vernalization and non-vernalization types in the Lolium-Festuca complex. Grouping of the gene families based on their BLAST identity...... enabled us to divide ortholog groups into those that are very conserved and those that are more evolutionarily relaxed. The ratio of the non-synonumous to synonymous substitutions enabled us to pinpoint protein sequences evolving in response to positive selection. These proteins may explain some...

  11. Detailed evaluation of RCS boundary rupture during high-pressure severe accident sequences

    International Nuclear Information System (INIS)

    Park, Rae-Joon; Hong, Seong-Wan

    2011-01-01

    A depressurization possibility of the reactor coolant system (RCS) before a reactor vessel rupture during a high-pressure severe accident sequence has been evaluated for the consideration of direct containment heating (DCH) and containment bypass. A total loss of feed water (TLOFW) and a station blackout (SBO) of the advanced power reactor 1400 (APR 1400) has been evaluated from an initiating event to a creep rupture of the RCS boundary by using the SCDAP/RELAP5 computer code. In addition, intentional depressurization of the RCS using power-operated safety relief valves (POSRVs) has been evaluated. The SCDAPRELAP5 results have shown that the pressurizer surge line broke before the reactor vessel rupture failure, but a containment bypass did not occur because steam generator U tubes did not break. The intentional depressurization of the RCS using POSRV was effective for the DCH prevention at a reactor vessel rupture. (author)

  12. SNP calling using genotype model selection on high-throughput sequencing data

    KAUST Repository

    You, Na

    2012-01-16

    Motivation: A review of the available single nucleotide polymorphism (SNP) calling procedures for Illumina high-throughput sequencing (HTS) platform data reveals that most rely mainly on base-calling and mapping qualities as sources of error when calling SNPs. Thus, errors not involved in base-calling or alignment, such as those in genomic sample preparation, are not accounted for.Results: A novel method of consensus and SNP calling, Genotype Model Selection (GeMS), is given which accounts for the errors that occur during the preparation of the genomic sample. Simulations and real data analyses indicate that GeMS has the best performance balance of sensitivity and positive predictive value among the tested SNP callers. © The Author 2012. Published by Oxford University Press. All rights reserved.

  13. Barcoding the food chain: from Sanger to high-throughput sequencing.

    Science.gov (United States)

    Littlefair, Joanne E; Clare, Elizabeth L

    2016-11-01

    Society faces the complex challenge of supporting biodiversity and ecosystem functioning, while ensuring food security by providing safe traceable food through an ever-more-complex global food chain. The increase in human mobility brings the added threat of pests, parasites, and invaders that further complicate our agro-industrial efforts. DNA barcoding technologies allow researchers to identify both individual species, and, when combined with universal primers and high-throughput sequencing techniques, the diversity within mixed samples (metabarcoding). These tools are already being employed to detect market substitutions, trace pests through the forensic evaluation of trace "environmental DNA", and to track parasitic infections in livestock. The potential of DNA barcoding to contribute to increased security of the food chain is clear, but challenges remain in regulation and the need for validation of experimental analysis. Here, we present an overview of the current uses and challenges of applied DNA barcoding in agriculture, from agro-ecosystems within farmland to the kitchen table.

  14. Seasonal diversity and dynamics of haptophytes in the Skagerrak, Norway, explored by high-throughput sequencing

    Science.gov (United States)

    Egge, Elianne Sirnæs; Johannessen, Torill Vik; Andersen, Tom; Eikrem, Wenche; Bittner, Lucie; Larsen, Aud; Sandaa, Ruth-Anne; Edvardsen, Bente

    2015-01-01

    Microalgae in the division Haptophyta play key roles in the marine ecosystem and in global biogeochemical processes. Despite their ecological importance, knowledge on seasonal dynamics, community composition and abundance at the species level is limited due to their small cell size and few morphological features visible under the light microscope. Here, we present unique data on haptophyte seasonal diversity and dynamics from two annual cycles, with the taxonomic resolution and sampling depth obtained with high-throughput sequencing. From outer Oslofjorden, S Norway, nano- and picoplanktonic samples were collected monthly for 2 years, and the haptophytes targeted by amplification of RNA/cDNA with Haptophyta-specific 18S rDNA V4 primers. We obtained 156 operational taxonomic units (OTUs), from c. 400.000 454 pyrosequencing reads, after rigorous bioinformatic filtering and clustering at 99.5%. Most OTUs represented uncultured and/or not yet 18S rDNA-sequenced species. Haptophyte OTU richness and community composition exhibited high temporal variation and significant yearly periodicity. Richness was highest in September–October (autumn) and lowest in April–May (spring). Some taxa were detected all year, such as Chrysochromulina simplex, Emiliania huxleyi and Phaeocystis cordata, whereas most calcifying coccolithophores only appeared from summer to early winter. We also revealed the seasonal dynamics of OTUs representing putative novel classes (clades HAP-3–5) or orders (clades D, E, F). Season, light and temperature accounted for 29% of the variation in OTU composition. Residual variation may be related to biotic factors, such as competition and viral infection. This study provides new, in-depth knowledge on seasonal diversity and dynamics of haptophytes in North Atlantic coastal waters. PMID:25893259

  15. Seasonal diversity and dynamics of haptophytes in the Skagerrak, Norway, explored by high-throughput sequencing.

    Science.gov (United States)

    Egge, Elianne Sirnaes; Johannessen, Torill Vik; Andersen, Tom; Eikrem, Wenche; Bittner, Lucie; Larsen, Aud; Sandaa, Ruth-Anne; Edvardsen, Bente

    2015-06-01

    Microalgae in the division Haptophyta play key roles in the marine ecosystem and in global biogeochemical processes. Despite their ecological importance, knowledge on seasonal dynamics, community composition and abundance at the species level is limited due to their small cell size and few morphological features visible under the light microscope. Here, we present unique data on haptophyte seasonal diversity and dynamics from two annual cycles, with the taxonomic resolution and sampling depth obtained with high-throughput sequencing. From outer Oslofjorden, S Norway, nano- and picoplanktonic samples were collected monthly for 2 years, and the haptophytes targeted by amplification of RNA/cDNA with Haptophyta-specific 18S rDNA V4 primers. We obtained 156 operational taxonomic units (OTUs), from c. 400.000 454 pyrosequencing reads, after rigorous bioinformatic filtering and clustering at 99.5%. Most OTUs represented uncultured and/or not yet 18S rDNA-sequenced species. Haptophyte OTU richness and community composition exhibited high temporal variation and significant yearly periodicity. Richness was highest in September-October (autumn) and lowest in April-May (spring). Some taxa were detected all year, such as Chrysochromulina simplex, Emiliania huxleyi and Phaeocystis cordata, whereas most calcifying coccolithophores only appeared from summer to early winter. We also revealed the seasonal dynamics of OTUs representing putative novel classes (clades HAP-3-5) or orders (clades D, E, F). Season, light and temperature accounted for 29% of the variation in OTU composition. Residual variation may be related to biotic factors, such as competition and viral infection. This study provides new, in-depth knowledge on seasonal diversity and dynamics of haptophytes in North Atlantic coastal waters. © 2015 The Authors. Molecular Ecology Published by John Wiley & Sons Ltd.

  16. The complete genome sequence and comparative genome analysis of the high pathogenicity Yersinia enterocolitica strain 8081.

    Directory of Open Access Journals (Sweden)

    Nicholas R Thomson

    2006-12-01

    Full Text Available The human enteropathogen, Yersinia enterocolitica, is a significant link in the range of Yersinia pathologies extending from mild gastroenteritis to bubonic plague. Comparison at the genomic level is a key step in our understanding of the genetic basis for this pathogenicity spectrum. Here we report the genome of Y. enterocolitica strain 8081 (serotype 0:8; biotype 1B and extensive microarray data relating to the genetic diversity of the Y. enterocolitica species. Our analysis reveals that the genome of Y. enterocolitica strain 8081 is a patchwork of horizontally acquired genetic loci, including a plasticity zone of 199 kb containing an extraordinarily high density of virulence genes. Microarray analysis has provided insights into species-specific Y. enterocolitica gene functions and the intraspecies differences between the high, low, and nonpathogenic Y. enterocolitica biotypes. Through comparative genome sequence analysis we provide new information on the evolution of the Yersinia. We identify numerous loci that represent ancestral clusters of genes potentially important in enteric survival and pathogenesis, which have been lost or are in the process of being lost, in the other sequenced Yersinia lineages. Our analysis also highlights large metabolic operons in Y. enterocolitica that are absent in the related enteropathogen, Yersinia pseudotuberculosis, indicating major differences in niche and nutrients used within the mammalian gut. These include clusters directing, the production of hydrogenases, tetrathionate respiration, cobalamin synthesis, and propanediol utilisation. Along with ancestral gene clusters, the genome of Y. enterocolitica has revealed species-specific and enteropathogen-specific loci. This has provided important insights into the pathology of this bacterium and, more broadly, into the evolution of the genus. Moreover, wider investigations looking at the patterns of gene loss and gain in the Yersinia have highlighted common

  17. ClustalXeed: a GUI-based grid computation version for high performance and terabyte size multiple sequence alignment

    Directory of Open Access Journals (Sweden)

    Kim Taeho

    2010-09-01

    Full Text Available Abstract Background There is an increasing demand to assemble and align large-scale biological sequence data sets. The commonly used multiple sequence alignment programs are still limited in their ability to handle very large amounts of sequences because the system lacks a scalable high-performance computing (HPC environment with a greatly extended data storage capacity. Results We designed ClustalXeed, a software system for multiple sequence alignment with incremental improvements over previous versions of the ClustalX and ClustalW-MPI software. The primary advantage of ClustalXeed over other multiple sequence alignment software is its ability to align a large family of protein or nucleic acid sequences. To solve the conventional memory-dependency problem, ClustalXeed uses both physical random access memory (RAM and a distributed file-allocation system for distance matrix construction and pair-align computation. The computation efficiency of disk-storage system was markedly improved by implementing an efficient load-balancing algorithm, called "idle node-seeking task algorithm" (INSTA. The new editing option and the graphical user interface (GUI provide ready access to a parallel-computing environment for users who seek fast and easy alignment of large DNA and protein sequence sets. Conclusions ClustalXeed can now compute a large volume of biological sequence data sets, which were not tractable in any other parallel or single MSA program. The main developments include: 1 the ability to tackle larger sequence alignment problems than possible with previous systems through markedly improved storage-handling capabilities. 2 Implementing an efficient task load-balancing algorithm, INSTA, which improves overall processing times for multiple sequence alignment with input sequences of non-uniform length. 3 Support for both single PC and distributed cluster systems.

  18. Temporal dynamics of soil microbial communities under different moisture regimes: high-throughput sequencing and bioinformatics analysis

    Science.gov (United States)

    Semenov, Mikhail; Zhuravleva, Anna; Semenov, Vyacheslav; Yevdokimov, Ilya; Larionova, Alla

    2017-04-01

    indicator than Shannon index. Chao1 had similar values for OW and IW communities, but alpha-diversity of microbial communities has sharply decreased under PW treatment. There was no visible difference in beta-diversity depending on sampling date and wetting regime, however, it could be possible to distinguish microbial communities in soils with maize and without plants. The presence of maize was acting as scattering agent, making microbial communities more distinguished. In all studied samples, the most dominant phyla were Proteobacteria, Firmicutes, Verrucomicrobia, Actinobacteria, and Acidobacteria. Chthoniobacter, Bacillus, Alicyclobacillus, Rhodoplanes, Cohnella, Kaistobacter, and Solibacter were the most abundant genera. Moreover, these genera were found as the most reactive and variable taxa in microbial community. Thus, DNA high-throughput sequencing revealed no dramatic shifts in bacterial community structure in soils under different moisture regimes. However, this technique allowed us to determine the effect of wetting regime and the presence of plants on soil microbial community which were adaptable to insufficient wetting, but lost diversity under periodic wetting. Furthermore, we detected the indicative taxa which dominate in microbial communities and at the same time strongly react to environmental changes.

  19. SSR_pipeline: a bioinformatic infrastructure for identifying microsatellites from paired-end Illumina high-throughput DNA sequencing data

    Science.gov (United States)

    Miller, Mark P.; Knaus, Brian J.; Mullins, Thomas D.; Haig, Susan M.

    2013-01-01

    SSR_pipeline is a flexible set of programs designed to efficiently identify simple sequence repeats (e.g., microsatellites) from paired-end high-throughput Illumina DNA sequencing data. The program suite contains 3 analysis modules along with a fourth control module that can automate analyses of large volumes of data. The modules are used to 1) identify the subset of paired-end sequences that pass Illumina quality standards, 2) align paired-end reads into a single composite DNA sequence, and 3) identify sequences that possess microsatellites (both simple and compound) conforming to user-specified parameters. The microsatellite search algorithm is extremely efficient, and we have used it to identify repeats with motifs from 2 to 25bp in length. Each of the 3 analysis modules can also be used independently to provide greater flexibility or to work with FASTQ or FASTA files generated from other sequencing platforms (Roche 454, Ion Torrent, etc.). We demonstrate use of the program with data from the brine fly Ephydra packardi (Diptera: Ephydridae) and provide empirical timing benchmarks to illustrate program performance on a common desktop computer environment. We further show that the Illumina platform is capable of identifying large numbers of microsatellites, even when using unenriched sample libraries and a very small percentage of the sequencing capacity from a single DNA sequencing run. All modules from SSR_pipeline are implemented in the Python programming language and can therefore be used from nearly any computer operating system (Linux, Macintosh, and Windows).

  20. SSR_pipeline: a bioinformatic infrastructure for identifying microsatellites from paired-end Illumina high-throughput DNA sequencing data.

    Science.gov (United States)

    Miller, Mark P; Knaus, Brian J; Mullins, Thomas D; Haig, Susan M

    2013-01-01

    SSR_pipeline is a flexible set of programs designed to efficiently identify simple sequence repeats (e.g., microsatellites) from paired-end high-throughput Illumina DNA sequencing data. The program suite contains 3 analysis modules along with a fourth control module that can automate analyses of large volumes of data. The modules are used to 1) identify the subset of paired-end sequences that pass Illumina quality standards, 2) align paired-end reads into a single composite DNA sequence, and 3) identify sequences that possess microsatellites (both simple and compound) conforming to user-specified parameters. The microsatellite search algorithm is extremely efficient, and we have used it to identify repeats with motifs from 2 to 25 bp in length. Each of the 3 analysis modules can also be used independently to provide greater flexibility or to work with FASTQ or FASTA files generated from other sequencing platforms (Roche 454, Ion Torrent, etc.). We demonstrate use of the program with data from the brine fly Ephydra packardi (Diptera: Ephydridae) and provide empirical timing benchmarks to illustrate program performance on a common desktop computer environment. We further show that the Illumina platform is capable of identifying large numbers of microsatellites, even when using unenriched sample libraries and a very small percentage of the sequencing capacity from a single DNA sequencing run. All modules from SSR_pipeline are implemented in the Python programming language and can therefore be used from nearly any computer operating system (Linux, Macintosh, and Windows).

  1. Usefulness of Genetic Study by Next-generation Sequencing in High-risk Arrhythmogenic Cardiomyopathy.

    Science.gov (United States)

    Ruiz Salas, Amalio; Peña Hernández, José; Medina Palomo, Carmen; Barrera Cordero, Alberto; Cabrera Bueno, Fernando; García Pinilla, José Manuel; Guijarro, Ana; Morcillo-Hidalgo, Luis; Jiménez Navarro, Manuel; Gómez Doblas, Juan José; de Teresa, Eduardo; Alzueta, Javier

    2018-03-29

    Arrhythmogenic right ventricular cardiomyopathy (ARVC) is an inherited cardiomyopathy characterized by progressive fibrofatty replacement of predominantly right ventricular myocardium. This cardiomyopathy is a frequent cause of sudden cardiac death in young people and athletes. The aim of our study was to determine the incidence of pathological or likely pathological desmosomal mutations in patients with high-risk definite ARVC. This was an observational, retrospective cohort study, which included 36 patients diagnosed with high-risk ARVC in our hospital between January 1998 and January 2015. Genetic analysis was performed using next-generation sequencing. Most patients were male (28 patients, 78%) with a mean age at diagnosis of 45 ± 18 years. A pathogenic or probably pathogenic desmosomal mutation was detected in 26 of the 35 index cases (74%): 5 nonsense, 14 frameshift, 1 splice, and 6 missense. Novel mutations were found in 15 patients (71%). The presence or absence of desmosomal mutations causing the disease and the type of mutation were not associated with specific electrocardiographic, clinical, arrhythmic, anatomic, or prognostic characteristics. The incidence of pathological or likely pathological desmosomal mutations in ARVC is very high, with most mutations causing truncation. The presence of desmosomal mutations was not associated with prognosis. Copyright © 2017 Sociedad Española de Cardiología. Published by Elsevier España, S.L.U. All rights reserved.

  2. Sequence assembly

    DEFF Research Database (Denmark)

    Scheibye-Alsing, Karsten; Hoffmann, S.; Frankel, Annett Maria

    2009-01-01

    Despite the rapidly increasing number of sequenced and re-sequenced genomes, many issues regarding the computational assembly of large-scale sequencing data have remain unresolved. Computational assembly is crucial in large genome projects as well for the evolving high-throughput technologies and...... in genomic DNA, highly expressed genes and alternative transcripts in EST sequences. We summarize existing comparisons of different assemblers and provide a detailed descriptions and directions for download of assembly programs at: http://genome.ku.dk/resources/assembly/methods.html....

  3. Whole-exome sequencing and high throughput genotyping identified KCNJ11 as the thirteenth MODY gene.

    Science.gov (United States)

    Bonnefond, Amélie; Philippe, Julien; Durand, Emmanuelle; Dechaume, Aurélie; Huyvaert, Marlène; Montagne, Louise; Marre, Michel; Balkau, Beverley; Fajardy, Isabelle; Vambergue, Anne; Vatin, Vincent; Delplanque, Jérôme; Le Guilcher, David; De Graeve, Franck; Lecoeur, Cécile; Sand, Olivier; Vaxillaire, Martine; Froguel, Philippe

    2012-01-01

    Maturity-onset of the young (MODY) is a clinically heterogeneous form of diabetes characterized by an autosomal-dominant mode of inheritance, an onset before the age of 25 years, and a primary defect in the pancreatic beta-cell function. Approximately 30% of MODY families remain genetically unexplained (MODY-X). Here, we aimed to use whole-exome sequencing (WES) in a four-generation MODY-X family to identify a new susceptibility gene for MODY. WES (Agilent-SureSelect capture/Illumina-GAIIx sequencing) was performed in three affected and one non-affected relatives in the MODY-X family. We then performed a high-throughput multiplex genotyping (Illumina-GoldenGate assay) of the putative causal mutations in the whole family and in 406 controls. A linkage analysis was also carried out. By focusing on variants of interest (i.e. gains of stop codon, frameshift, non-synonymous and splice-site variants not reported in dbSNP130) present in the three affected relatives and not present in the control, we found 69 mutations. However, as WES was not uniform between samples, a total of 324 mutations had to be assessed in the whole family and in controls. Only one mutation (p.Glu227Lys in KCNJ11) co-segregated with diabetes in the family (with a LOD-score of 3.68). No KCNJ11 mutation was found in 25 other MODY-X unrelated subjects. Beyond neonatal diabetes mellitus (NDM), KCNJ11 is also a MODY gene ('MODY13'), confirming the wide spectrum of diabetes related phenotypes due to mutations in NDM genes (i.e. KCNJ11, ABCC8 and INS). Therefore, the molecular diagnosis of MODY should include KCNJ11 as affected carriers can be ideally treated with oral sulfonylureas.

  4. Whole-exome sequencing and high throughput genotyping identified KCNJ11 as the thirteenth MODY gene.

    Directory of Open Access Journals (Sweden)

    Amélie Bonnefond

    Full Text Available BACKGROUND: Maturity-onset of the young (MODY is a clinically heterogeneous form of diabetes characterized by an autosomal-dominant mode of inheritance, an onset before the age of 25 years, and a primary defect in the pancreatic beta-cell function. Approximately 30% of MODY families remain genetically unexplained (MODY-X. Here, we aimed to use whole-exome sequencing (WES in a four-generation MODY-X family to identify a new susceptibility gene for MODY. METHODOLOGY: WES (Agilent-SureSelect capture/Illumina-GAIIx sequencing was performed in three affected and one non-affected relatives in the MODY-X family. We then performed a high-throughput multiplex genotyping (Illumina-GoldenGate assay of the putative causal mutations in the whole family and in 406 controls. A linkage analysis was also carried out. PRINCIPAL FINDINGS: By focusing on variants of interest (i.e. gains of stop codon, frameshift, non-synonymous and splice-site variants not reported in dbSNP130 present in the three affected relatives and not present in the control, we found 69 mutations. However, as WES was not uniform between samples, a total of 324 mutations had to be assessed in the whole family and in controls. Only one mutation (p.Glu227Lys in KCNJ11 co-segregated with diabetes in the family (with a LOD-score of 3.68. No KCNJ11 mutation was found in 25 other MODY-X unrelated subjects. CONCLUSIONS/SIGNIFICANCE: Beyond neonatal diabetes mellitus (NDM, KCNJ11 is also a MODY gene ('MODY13', confirming the wide spectrum of diabetes related phenotypes due to mutations in NDM genes (i.e. KCNJ11, ABCC8 and INS. Therefore, the molecular diagnosis of MODY should include KCNJ11 as affected carriers can be ideally treated with oral sulfonylureas.

  5. Robust DNA Isolation and High-throughput Sequencing Library Construction for Herbarium Specimens.

    Science.gov (United States)

    Saeidi, Saman; McKain, Michael R; Kellogg, Elizabeth A

    2018-03-08

    Herbaria are an invaluable source of plant material that can be used in a variety of biological studies. The use of herbarium specimens is associated with a number of challenges including sample preservation quality, degraded DNA, and destructive sampling of rare specimens. In order to more effectively use herbarium material in large sequencing projects, a dependable and scalable method of DNA isolation and library preparation is needed. This paper demonstrates a robust, beginning-to-end protocol for DNA isolation and high-throughput library construction from herbarium specimens that does not require modification for individual samples. This protocol is tailored for low quality dried plant material and takes advantage of existing methods by optimizing tissue grinding, modifying library size selection, and introducing an optional reamplification step for low yield libraries. Reamplification of low yield DNA libraries can rescue samples derived from irreplaceable and potentially valuable herbarium specimens, negating the need for additional destructive sampling and without introducing discernible sequencing bias for common phylogenetic applications. The protocol has been tested on hundreds of grass species, but is expected to be adaptable for use in other plant lineages after verification. This protocol can be limited by extremely degraded DNA, where fragments do not exist in the desired size range, and by secondary metabolites present in some plant material that inhibit clean DNA isolation. Overall, this protocol introduces a fast and comprehensive method that allows for DNA isolation and library preparation of 24 samples in less than 13 h, with only 8 h of active hands-on time with minimal modifications.

  6. High-throughput sequencing of RNA silencing-associated small RNAs in olive (Olea europaea L..

    Directory of Open Access Journals (Sweden)

    Livia Donaire

    Full Text Available Small RNAs (sRNAs of 20 to 25 nucleotides (nt in length maintain genome integrity and control gene expression in a multitude of developmental and physiological processes. Despite RNA silencing has been primarily studied in model plants, the advent of high-throughput sequencing technologies has enabled profiling of the sRNA component of more than 40 plant species. Here, we used deep sequencing and molecular methods to report the first inventory of sRNAs in olive (Olea europaea L.. sRNA libraries prepared from juvenile and adult shoots revealed that the 24-nt class dominates the sRNA transcriptome and atypically accumulates to levels never seen in other plant species, suggesting an active role of heterochromatin silencing in the maintenance and integrity of its large genome. A total of 18 known miRNA families were identified in the libraries. Also, 5 other sRNAs derived from potential hairpin-like precursors remain as plausible miRNA candidates. RNA blots confirmed miRNA expression and suggested tissue- and/or developmental-specific expression patterns. Target mRNAs of conserved miRNAs were computationally predicted among the olive cDNA collection and experimentally validated through endonucleolytic cleavage assays. Finally, we use expression data to uncover genetic components of the miR156, miR172 and miR390/TAS3-derived trans-acting small interfering RNA (tasiRNA regulatory nodes, suggesting that these interactive networks controlling developmental transitions are fully operational in olive.

  7. Peptide Pattern Recognition for high-throughput protein sequence analysis and clustering

    DEFF Research Database (Denmark)

    Busk, Peter Kamp

    2017-01-01

    Large collections of protein sequences with divergent sequences are tedious to analyze for understanding their phylogenetic or structure-function relation. Peptide Pattern Recognition is an algorithm that was developed to facilitate this task but the previous version does only allow a limited...... number of sequences as input. I implemented Peptide Pattern Recognition as a multithread software designed to handle large numbers of sequences and perform analysis in a reasonable time frame. Benchmarking showed that the new implementation of Peptide Pattern Recognition is twenty times faster than...... the previous implementation on a small protein collection with 673 MAP kinase sequences. In addition, the new implementation could analyze a large protein collection with 48,570 Glycosyl Transferase family 20 sequences without reaching its upper limit on a desktop computer. Peptide Pattern Recognition...

  8. High-throughput sequencing of nematode communities from total soil DNA extractions

    DEFF Research Database (Denmark)

    Sapkota, Rumakanta; Nicolaisen, Mogens

    2015-01-01

    nematodes without the need for enrichment was developed. Using this strategy on DNA templates from a set of 22 agricultural soils, we obtained 64.4% sequences of nematode origin in total, whereas the remaining sequences were almost entirely from other metazoans. The nematode sequences were derived from...... in previous sequence-based studies are not nematode specific but also amplify other groups of organisms such as fungi and plantae, and thus require a nematode enrichment step that may introduce biases. Results: In this study an amplification strategy which selectively amplifies a fragment of the SSU from...... a broad taxonomic range and most sequences were from nematode taxa that have previously been found to be abundant in soil such as Tylenchida, Rhabditida, Dorylaimida, Triplonchida and Araeolaimida. Conclusions: Our amplification and sequencing strategy for assessing nematode diversity was able to collect...

  9. High similarity of phylogenetic profiles of rate-limiting enzymes with inhibitory relation in Human, Mouse, Rat, budding Yeast and E. coli.

    Science.gov (United States)

    Zhao, Min; Qu, Hong

    2011-11-30

    The phylogenetic profile is widely used to characterize functional linkage and conservation between proteins without amino acid sequence similarity. To survey the conservative regulatory properties of rate-limiting enzymes (RLEs) in metabolic inhibitory network across different species, we define the enzyme inhibiting pair as: where the first enzyme in a pair is the inhibitor provider and the second is the target of the inhibitor. Phylogenetic profiles of enzymes in the inhibiting pairs are further generated to measure the functional linkage of these enzymes during evolutionary history. We find that the RLEs generate, on average, over half of all in vivo inhibitors in each surveyed model organism. And these inhibitors inhibit on average over 85% targets in metabolic inhibitory network and cover the majority of targets of cross-pathway inhibiting relations. Furthermore, we demonstrate that the phylogenetic profiles of the enzymes in inhibiting pairs in which at least one enzyme is rate-limiting often show higher similarities than those in common inhibiting enzyme pairs. In addition, RLEs, compared to common metabolic enzymes, often tend to produce ADP instead of AMP in conservative inhibitory networks. Combined with the conservative roles of RLEs in their efficiency in sensing metabolic signals and transmitting regulatory signals to the rest of the metabolic system, the RLEs may be important molecules in balancing energy homeostasis via maintaining the ratio of ATP to ADP in living cells. Furthermore, our results indicate that similarities of phylogenetic profiles of enzymes in the inhibiting enzyme pairs are not only correlated with enzyme topological importance, but also related with roles of the enzymes in metabolic inhibitory network.

  10. Integrated analysis of RNA-binding protein complexes using in vitro selection and high-throughput sequencing and sequence specificity landscapes (SEQRS).

    Science.gov (United States)

    Lou, Tzu-Fang; Weidmann, Chase A; Killingsworth, Jordan; Tanaka Hall, Traci M; Goldstrohm, Aaron C; Campbell, Zachary T

    2017-04-15

    RNA-binding proteins (RBPs) collaborate to control virtually every aspect of RNA function. Tremendous progress has been made in the area of global assessment of RBP specificity using next-generation sequencing approaches both in vivo and in vitro. Understanding how protein-protein interactions enable precise combinatorial regulation of RNA remains a significant problem. Addressing this challenge requires tools that can quantitatively determine the specificities of both individual proteins and multimeric complexes in an unbiased and comprehensive way. One approach utilizes in vitro selection, high-throughput sequencing, and sequence-specificity landscapes (SEQRS). We outline a SEQRS experiment focused on obtaining the specificity of a multi-protein complex between Drosophila RBPs Pumilio (Pum) and Nanos (Nos). We discuss the necessary controls in this type of experiment and examine how the resulting data can be complemented with structural and cell-based reporter assays. Additionally, SEQRS data can be integrated with functional genomics data to uncover biological function. Finally, we propose extensions of the technique that will enhance our understanding of multi-protein regulatory complexes assembled onto RNA. Copyright © 2016 Elsevier Inc. All rights reserved.

  11. Woods with physical, mechanical and acoustic properties similar to those of Caesalpinia echinata have high potential as alternative woods for bow makers

    Directory of Open Access Journals (Sweden)

    Eduardo Luiz Longui

    2014-09-01

    Full Text Available For nearly two hundred years, Caesalpinia echinata wood has been the standard for modern bows. However, the threat of extinction and the enforcement of trade bans have required bow makers to seek alternative woods. The hypothesis tested was that woods with physical, mechanical and acoustic properties similar to those of C. echinata would have high potential as alternative woods for bows. Accordingly, were investigated Handroanthus spp., Mezilaurus itauba, Hymenaea spp., Dipteryx spp., Diplotropis spp. and Astronium lecointei. Handroanthus and Diplotropis have the greatest number of similarities with C. echinata, but only Handroanthus spp. showed significant results in actual bow manufacture, suggesting the importance of such key properties as specific gravity, speed of sound propagation and modulus of elasticity. In practice, Handroanthus and Dipteryx produced bows of quality similar to that of C. echinata.

  12. The determination of high-resolution spatio-temporal glacier motion fields from time-lapse sequences

    Science.gov (United States)

    Schwalbe, Ellen; Maas, Hans-Gerd

    2017-12-01

    This paper presents a comprehensive method for the determination of glacier surface motion vector fields at high spatial and temporal resolution. These vector fields can be derived from monocular terrestrial camera image sequences and are a valuable data source for glaciological analysis of the motion behaviour of glaciers. The measurement concepts for the acquisition of image sequences are presented, and an automated monoscopic image sequence processing chain is developed. Motion vector fields can be derived with high precision by applying automatic subpixel-accuracy image matching techniques on grey value patterns in the image sequences. Well-established matching techniques have been adapted to the special characteristics of the glacier data in order to achieve high reliability in automatic image sequence processing, including the handling of moving shadows as well as motion effects induced by small instabilities in the camera set-up. Suitable geo-referencing techniques were developed to transform image measurements into a reference coordinate system.The result of monoscopic image sequence analysis is a dense raster of glacier surface point trajectories for each image sequence. Each translation vector component in these trajectories can be determined with an accuracy of a few centimetres for points at a distance of several kilometres from the camera. Extensive practical validation experiments have shown that motion vector and trajectory fields derived from monocular image sequences can be used for the determination of high-resolution velocity fields of glaciers, including the analysis of tidal effects on glacier movement, the investigation of a glacier's motion behaviour during calving events, the determination of the position and migration of the grounding line and the detection of subglacial channels during glacier lake outburst floods.

  13. Strategies for achieving high sequencing accuracy for low diversity samples and avoiding sample bleeding using illumina platform.

    Science.gov (United States)

    Mitra, Abhishek; Skrzypczak, Magdalena; Ginalski, Krzysztof; Rowicka, Maga

    2015-01-01

    Sequencing microRNA, reduced representation sequencing, Hi-C technology and any method requiring the use of in-house barcodes result in sequencing libraries with low initial sequence diversity. Sequencing such data on the Illumina platform typically produces low quality data due to the limitations of the Illumina cluster calling algorithm. Moreover, even in the case of diverse samples, these limitations are causing substantial inaccuracies in multiplexed sample assignment (sample bleeding). Such inaccuracies are unacceptable in clinical applications, and in some other fields (e.g. detection of rare variants). Here, we discuss how both problems with quality of low-diversity samples and sample bleeding are caused by incorrect detection of clusters on the flowcell during initial sequencing cycles. We propose simple software modifications (Long Template Protocol) that overcome this problem. We present experimental results showing that our Long Template Protocol remarkably increases data quality for low diversity samples, as compared with the standard analysis protocol; it also substantially reduces sample bleeding for all samples. For comprehensiveness, we also discuss and compare experimental results from alternative approaches to sequencing low diversity samples. First, we discuss how the low diversity problem, if caused by barcodes, can be avoided altogether at the barcode design stage. Second and third, we present modified guidelines, which are more stringent than the manufacturer's, for mixing low diversity samples with diverse samples and lowering cluster density, which in our experience consistently produces high quality data from low diversity samples. Fourth and fifth, we present rescue strategies that can be applied when sequencing results in low quality data and when there is no more biological material available. In such cases, we propose that the flowcell be re-hybridized and sequenced again using our Long Template Protocol. Alternatively, we discuss how

  14. Strategies for achieving high sequencing accuracy for low diversity samples and avoiding sample bleeding using illumina platform.

    Directory of Open Access Journals (Sweden)

    Abhishek Mitra

    Full Text Available Sequencing microRNA, reduced representation sequencing, Hi-C technology and any method requiring the use of in-house barcodes result in sequencing libraries with low initial sequence diversity. Sequencing such data on the Illumina platform typically produces low quality data due to the limitations of the Illumina cluster calling algorithm. Moreover, even in the case of diverse samples, these limitations are causing substantial inaccuracies in multiplexed sample assignment (sample bleeding. Such inaccuracies are unacceptable in clinical applications, and in some other fields (e.g. detection of rare variants. Here, we discuss how both problems with quality of low-diversity samples and sample bleeding are caused by incorrect detection of clusters on the flowcell during initial sequencing cycles. We propose simple software modifications (Long Template Protocol that overcome this problem. We present experimental results showing that our Long Template Protocol remarkably increases data quality for low diversity samples, as compared with the standard analysis protocol; it also substantially reduces sample bleeding for all samples. For comprehensiveness, we also discuss and compare experimental results from alternative approaches to sequencing low diversity samples. First, we discuss how the low diversity problem, if caused by barcodes, can be avoided altogether at the barcode design stage. Second and third, we present modified guidelines, which are more stringent than the manufacturer's, for mixing low diversity samples with diverse samples and lowering cluster density, which in our experience consistently produces high quality data from low diversity samples. Fourth and fifth, we present rescue strategies that can be applied when sequencing results in low quality data and when there is no more biological material available. In such cases, we propose that the flowcell be re-hybridized and sequenced again using our Long Template Protocol. Alternatively

  15. High Diversity of Myocyanophage in Various Aquatic Environments Revealed by High-Throughput Sequencing of Major Capsid Protein Gene With a New Set of Primers

    Directory of Open Access Journals (Sweden)

    Weiguo Hou

    2018-05-01

    Full Text Available Myocyanophages, a group of viruses infecting cyanobacteria, are abundant and play important roles in elemental cycling. Here we investigated the particle-associated viral communities retained on 0.2 μm filters and in sediment samples (representing ancient cyanophage communities from four ocean and three lake locations, using high-throughput sequencing and a newly designed primer pair targeting a gene fragment (∼145-bp in length encoding the cyanophage gp23 major capsid protein (MCP. Diverse viral communities were detected in all samples. The fragments of 142-, 145-, and 148-bp in length were most abundant in the amplicons, and most sequences (>92% belonged to cyanophages. Additionally, different sequencing depths resulted in different diversity estimates of the viral community. Operational taxonomic units obtained from deep sequencing of the MCP gene covered the majority of those obtained from shallow sequencing, suggesting that deep sequencing exhibited a more complete picture of cyanophage community than shallow sequencing. Our results also revealed a wide geographic distribution of marine myocyanophages, i.e., higher dissimilarities of the myocyanophage communities corresponded with the larger distances between the sampling sites. Collectively, this study suggests that the newly designed primer pair can be effectively used to study the community and diversity of myocyanophage from different environments, and the high-throughput sequencing represents a good method to understand viral diversity.

  16. Domain similarity based orthology detection.

    Science.gov (United States)

    Bitard-Feildel, Tristan; Kemena, Carsten; Greenwood, Jenny M; Bornberg-Bauer, Erich

    2015-05-13

    Orthologous protein detection software mostly uses pairwise comparisons of amino-acid sequences to assert whether two proteins are orthologous or not. Accordingly, when the number of sequences for comparison increases, the number of comparisons to compute grows in a quadratic order. A current challenge of bioinformatic research, especially when taking into account the increasing number of sequenced organisms available, is to make this ever-growing number of comparisons computationally feasible in a reasonable amount of time. We propose to speed up the detection of orthologous proteins by using strings of domains to characterize the proteins. We present two new protein similarity measures, a cosine and a maximal weight matching score based on domain content similarity, and new software, named porthoDom. The qualities of the cosine and the maximal weight matching similarity measures are compared against curated datasets. The measures show that domain content similarities are able to correctly group proteins into their families. Accordingly, the cosine similarity measure is used inside porthoDom, the wrapper developed for proteinortho. porthoDom makes use of domain content similarity measures to group proteins together before searching for orthologs. By using domains instead of amino acid sequences, the reduction of the search space decreases the computational complexity of an all-against-all sequence comparison. We demonstrate that representing and comparing proteins as strings of discrete domains, i.e. as a concatenation of their unique identifiers, allows a drastic simplification of search space. porthoDom has the advantage of speeding up orthology detection while maintaining a degree of accuracy similar to proteinortho. The implementation of porthoDom is released using python and C++ languages and is available under the GNU GPL licence 3 at http://www.bornberglab.org/pages/porthoda .

  17. High-specificity detection of rare alleles with Paired-End Low Error Sequencing (PELE-Seq).

    Science.gov (United States)

    Preston, Jessica L; Royall, Ariel E; Randel, Melissa A; Sikkink, Kristin L; Phillips, Patrick C; Johnson, Eric A

    2016-06-14

    Polymorphic loci exist throughout the genomes of a population and provide the raw genetic material needed for a species to adapt to changes in the environment. The minor allele frequencies of rare Single Nucleotide Polymorphisms (SNPs) within a population have been difficult to track with Next-Generation Sequencing (NGS), due to the high error rate of standard methods such as Illumina sequencing. We have developed a wet-lab protocol and variant-calling method that identifies both sequencing and PCR errors, called Paired-End Low Error Sequencing (PELE-Seq). To test the specificity and sensitivity of the PELE-Seq method, we sequenced control E. coli DNA libraries containing known rare alleles present at frequencies ranging from 0.2-0.4 % of the total reads. PELE-Seq had higher specificity and sensitivity than standard libraries. We then used PELE-Seq to characterize rare alleles in a Caenorhabditis remanei nematode worm population before and after laboratory adaptation, and found that minor and rare alleles can undergo large changes in frequency during lab-adaptation. We have developed a method of rare allele detection that mitigates both sequencing and PCR errors, called PELE-Seq. PELE-Seq was evaluated using control E. coli populations and was then used to compare a wild C. remanei population to a lab-adapted population. The PELE-Seq method is ideal for investigating the dynamics of rare alleles in a broad range of reduced-representation sequencing methods, including targeted amplicon sequencing, RAD-Seq, ddRAD, and GBS. PELE-Seq is also well-suited for whole genome sequencing of mitochondria and viruses, and for high-throughput rare mutation screens.

  18. High-Throughput Sequencing of Microbial Community Diversity and Dynamics during Douchi Fermentation

    Science.gov (United States)

    Tu, Zong-cai; Wang, Xiao-lan

    2016-01-01

    Douchi is a type of Chinese traditional fermented food that is an important source of protein and is used in flavouring ingredients. The end product is affected by the microbial community present during fermentation, but exactly how microbes influence the fermentation process remains poorly understood. We used an Illumina MiSeq approach to investigate bacterial and fungal community diversity during both douchi-koji making and fermentation. A total of 181,443 high quality bacterial 16S rRNA sequences and 221,059 high quality fungal internal transcribed spacer reads were used for taxonomic classification, revealing eight bacterial and three fungal phyla. Firmicutes, Actinobacteria and Proteobacteria were the dominant bacterial phyla, while Ascomycota and Zygomycota were the dominant fungal phyla. At the genus level, Staphylococcus and Weissella were the dominant bacteria, while Aspergillus and Lichtheimia were the dominant fungi. Principal coordinate analysis showed structural separation between the composition of bacteria in koji making and fermentation. However, multivariate analysis of variance based on unweighted UniFrac distances did identify distinct differences (p fermentation. This is the first investigation to integrate douchi fermentation and koji making and fermentation processes through this technological approach. The results provide insight into the microbiome of the douchi fermentation process, and reveal a structural separation that may be stratified by the environment during the production of this traditional fermented food. PMID:27992473

  19. Sequence and expression analyses of ethylene response factors highly expressed in latex cells from Hevea brasiliensis.

    Directory of Open Access Journals (Sweden)

    Piyanuch Piyatrakul

    Full Text Available The AP2/ERF superfamily encodes transcription factors that play a key role in plant development and responses to abiotic and biotic stress. In Hevea brasiliensis, ERF genes have been identified by RNA sequencing. This study set out to validate the number of HbERF genes, and identify ERF genes involved in the regulation of latex cell metabolism. A comprehensive Hevea transcriptome was improved using additional RNA reads from reproductive tissues. Newly assembled contigs were annotated in the Gene Ontology database and were assigned to 3 main categories. The AP2/ERF superfamily is the third most represented compared with other transcription factor families. A comparison with genomic scaffolds led to an estimation of 114 AP2/ERF genes and 1 soloist in Hevea brasiliensis. Based on a phylogenetic analysis, functions were predicted for 26 HbERF genes. A relative transcript abundance analysis was performed by real-time RT-PCR in various tissues. Transcripts of ERFs from group I and VIII were very abundant in all tissues while those of group VII were highly accumulated in latex cells. Seven of the thirty-five ERF expression marker genes were highly expressed in latex. Subcellular localization and transactivation analyses suggested that HbERF-VII candidate genes encoded functional transcription factors.

  20. glbase: a framework for combining, analyzing and displaying heterogeneous genomic and high-throughput sequencing data.

    Science.gov (United States)

    Hutchins, Andrew Paul; Jauch, Ralf; Dyla, Mateusz; Miranda-Saavedra, Diego

    2014-01-01

    Genomic datasets and the tools to analyze them have proliferated at an astonishing rate. However, such tools are often poorly integrated with each other: each program typically produces its own custom output in a variety of non-standard file formats. Here we present glbase, a framework that uses a flexible set of descriptors that can quickly parse non-binary data files. glbase includes many functions to intersect two lists of data, including operations on genomic interval data and support for the efficient random access to huge genomic data files. Many glbase functions can produce graphical outputs, including scatter plots, heatmaps, boxplots and other common analytical displays of high-throughput data such as RNA-seq, ChIP-seq and microarray expression data. glbase is designed to rapidly bring biological data into a Python-based analytical environment to facilitate analysis and data processing. In summary, glbase is a flexible and multifunctional toolkit that allows the combination and analysis of high-throughput data (especially next-generation sequencing and genome-wide data), and which has been instrumental in the analysis of complex data sets. glbase is freely available at http://bitbucket.org/oaxiom/glbase/.

  1. glbase: a framework for combining, analyzing and displaying heterogeneous genomic and high-throughput sequencing data

    Directory of Open Access Journals (Sweden)

    Andrew Paul Hutchins

    2014-01-01

    Full Text Available Genomic datasets and the tools to analyze them have proliferated at an astonishing rate. However, such tools are often poorly integrated with each other: each program typically produces its own custom output in a variety of non-standard file formats. Here we present glbase, a framework that uses a flexible set of descriptors that can quickly parse non-binary data files. glbase includes many functions to intersect two lists of data, including operations on genomic interval data and support for the efficient random access to huge genomic data files. Many glbase functions can produce graphical outputs, including scatter plots, heatmaps, boxplots and other common analytical displays of high-throughput data such as RNA-seq, ChIP-seq and microarray expression data. glbase is designed to rapidly bring biological data into a Python-based analytical environment to facilitate analysis and data processing. In summary, glbase is a flexible and multifunctional toolkit that allows the combination and analysis of high-throughput data (especially next-generation sequencing and genome-wide data, and which has been instrumental in the analysis of complex data sets. glbase is freely available at http://bitbucket.org/oaxiom/glbase/.

  2. Multilocus sequence analysis (MLSA) of Bradyrhizobium strains: revealing high diversity of tropical diazotrophic symbiotic bacteria.

    Science.gov (United States)

    Delamuta, Jakeline Renata Marçon; Ribeiro, Renan Augusto; Menna, Pâmela; Bangel, Eliane Villamil; Hungria, Mariangela

    2012-04-01

    Symbiotic association of several genera of bacteria collectively called as rhizobia and plants belonging to the family Leguminosae (=Fabaceae) results in the process of biological nitrogen fixation, playing a key role in global N cycling, and also bringing relevant contributions to the agriculture. Bradyrhizobium is considered as the ancestral of all nitrogen-fixing rhizobial species, probably originated in the tropics. The genus encompasses a variety of diverse bacteria, but the diversity captured in the analysis of the 16S rRNA is often low. In this study, we analyzed twelve Bradyrhizobium strains selected from previous studies performed by our group for showing high genetic diversity in relation to the described species. In addition to the 16S rRNA, five housekeeping genes (recA, atpD, glnII, gyrB and rpoB) were analyzed in the MLSA (multilocus sequence analysis) approach. Analysis of each gene and of the concatenated housekeeping genes captured a considerably higher level of genetic diversity, with indication of putative new species. The results highlight the high genetic variability associated with Bradyrhizobium microsymbionts of a variety of legumes. In addition, the MLSA approach has proved to represent a rapid and reliable method to be employed in phylogenetic and taxonomic studies, speeding the identification of the still poorly known diversity of nitrogen-fixing rhizobia in the tropics.

  3. LncRNA Expression Profile of Human Thoracic Aortic Dissection by High-Throughput Sequencing.

    Science.gov (United States)

    Sun, Jie; Chen, Guojun; Jing, Yuanwen; He, Xiang; Dong, Jianting; Zheng, Junmeng; Zou, Meisheng; Li, Hairui; Wang, Shifei; Sun, Yili; Liao, Wangjun; Liao, Yulin; Feng, Li; Bin, Jianping

    2018-01-01

    In this study, the long non-coding RNA (lncRNA) expression profile in human thoracic aortic dissection (TAD), a highly lethal cardiovascular disease, was investigated. Human TAD (n=3) and normal aortic tissues (NA) (n=3) were examined by high-throughput sequencing. Bioinformatics analyses were performed to predict the roles of aberrantly expressed lncRNAs. Quantitative real-time polymerase chain reaction (qRT-PCR) was applied to validate the results. A total of 269 lncRNAs (159 up-regulated and 110 down-regulated) and 2, 255 mRNAs (1 294 up-regulated and 961 down-regulated) were aberrantly expressed in human TAD (fold-change> 1.5, PTAD than in NA. The predicted binding motifs of three up-regulated lncRNAs (ENSG00000248508, ENSG00000226530, and EG00000259719) were correlated with up-regulated RUNX1 (R=0.982, PTAD. These findings suggest that lncRNAs are novel potential therapeutic targets for human TAD. © 2018 The Author(s). Published by S. Karger AG, Basel.

  4. Intergenic DNA sequences from the human X chromosome reveal high rates of global gene flow

    Directory of Open Access Journals (Sweden)

    Wall Jeffrey D

    2008-11-01

    Full Text Available Abstract Background Despite intensive efforts devoted to collecting human polymorphism data, little is known about the role of gene flow in the ancestry of human populations. This is partly because most analyses have applied one of two simple models of population structure, the island model or the splitting model, which make unrealistic biological assumptions. Results Here, we analyze 98-kb of DNA sequence from 20 independently evolving intergenic regions on the X chromosome in a sample of 90 humans from six globally diverse populations. We employ an isolation-with-migration (IM model, which assumes that populations split and subsequently exchange migrants, to independently estimate effective population sizes and migration rates. While the maximum effective size of modern humans is estimated at ~10,000, individual populations vary substantially in size, with African populations tending to be larger (2,300–9,000 than non-African populations (300–3,300. We estimate mean rates of bidirectional gene flow at 4.8 × 10-4/generation. Bidirectional migration rates are ~5-fold higher among non-African populations (1.5 × 10-3 than among African populations (2.7 × 10-4. Interestingly, because effective sizes and migration rates are inversely related in African and non-African populations, population migration rates are similar within Africa and Eurasia (e.g., global mean Nm = 2.4. Conclusion We conclude that gene flow has played an important role in structuring global human populations and that migration rates should be incorporated as critical parameters in models of human demography.

  5. Mining environmental high-throughput sequence data sets to identify divergent amplicon clusters for phylogenetic reconstruction and morphotype visualization.

    Science.gov (United States)

    Gimmler, Anna; Stoeck, Thorsten

    2015-08-01

    Environmental high-throughput sequencing (envHTS) is a very powerful tool, which in protistan ecology is predominantly used for the exploration of diversity and its geographic and local patterns. We here used a pyrosequenced V4-SSU rDNA data set from a solar saltern pond as test case to exploit such massive protistan amplicon data sets beyond this descriptive purpose. Therefore, we combined a Swarm-based blastn network including 11 579 ciliate V4 amplicons to identify divergent amplicon clusters with targeted polymerase chain reaction (PCR) primer design for full-length small subunit of the ribosomal DNA retrieval and probe design for fluorescence in situ hybridization (FISH). This powerful strategy allows to benefit from envHTS data sets to (i) reveal the phylogenetic position of the taxon behind divergent amplicons; (ii) improve phylogenetic resolution and evolutionary history of specific taxon groups; (iii) solidly assess an amplicons (species') degree of similarity to its closest described relative; (iv) visualize the morphotype behind a divergent amplicons cluster; (v) rapidly FISH screen many environmental samples for geographic/habitat distribution and abundances of the respective organism and (vi) to monitor the success of enrichment strategies in live samples for cultivation and isolation of the respective organisms. © 2015 Society for Applied Microbiology and John Wiley & Sons Ltd.

  6. Bacterial community compositions of coking wastewater treatment plants in steel industry revealed by Illumina high-throughput sequencing.

    Science.gov (United States)

    Ma, Qiao; Qu, Yuanyuan; Shen, Wenli; Zhang, Zhaojing; Wang, Jingwei; Liu, Ziyan; Li, Duanxing; Li, Huijie; Zhou, Jiti

    2015-03-01

    In this study, Illumina high-throughput sequencing was used to reveal the community structures of nine coking wastewater treatment plants (CWWTPs) in China for the first time. The sludge systems exhibited a similar community composition at each taxonomic level. Compared to previous studies, some of the core genera in municipal wastewater treatment plants such as Zoogloea, Prosthecobacter and Gp6 were detected as minor species. Thiobacillus (20.83%), Comamonas (6.58%), Thauera (4.02%), Azoarcus (7.78%) and Rhodoplanes (1.42%) were the dominant genera shared by at least six CWWTPs. The percentages of autotrophic ammonia-oxidizing bacteria and nitrite-oxidizing bacteria were unexpectedly low, which were verified by both real-time PCR and fluorescence in situ hybridization analyses. Hierarchical clustering and canonical correspondence analysis indicated that operation mode, flow rate and temperature might be the key factors in community formation. This study provides new insights into our understanding of microbial community compositions and structures of CWWTPs. Copyright © 2014 Elsevier Ltd. All rights reserved.

  7. Adaptation of Shift Sequence Based Method for High Number in Shifts Rostering Problem for Health Care Workers

    Directory of Open Access Journals (Sweden)

    Mindaugas Liogys

    2013-08-01

    Full Text Available Purpose—is to investigate a shift sequence-based approach efficiency then problem consisting of a high number of shifts.Research objectives:• Solve health care workers rostering problem using a shift sequence based method.• Measure its efficiency then number of shifts increases.Design/methodology/approach—Usually rostering problems are highly constrained. Constraints are classified to soft and hard constraints. Soft and hard constraints of the problem are additionally classified to: sequence constraints, schedule constraints and roster constraints. Sequence constraints are considered when constructing shift sequences. Schedule constraints are considered when constructing a schedule. Roster constraints are applied, then constructing overall solution, i.e. combining all schedules.Shift sequence based approach consists of two stages:• Shift sequences construction,• The construction of schedules.In the shift sequences construction stage, the shift sequences are constructed for each set of health care workers of different skill, considering sequence constraints. Shifts sequences are ranked by their penalties for easier retrieval in later stage.In schedules construction stage, schedules for each health care worker are constructed iteratively, using the shift sequences produced in stage 1.Shift sequence based method is an adaptive iterative method where health care workers who received the highest schedule penalties in the last iteration are scheduled first at the current iteration.During the roster construction, and after a schedule has been generated for the current health care worker, an improvement method based on an efficient greedy local search is carried out on the partial roster. It simply swaps any pair of shifts between two health care workers in the (partial roster, as long as the swaps satisfy hard constraints and decrease the roster penalty.Findings—Using shift sequence method for solving health care workers rostering problem

  8. Adaptation of Shift Sequence Based Method for High Number in Shifts Rostering Problem for Health Care Workers

    Directory of Open Access Journals (Sweden)

    Mindaugas Liogys

    2011-08-01

    Full Text Available Purpose—is to investigate a shift sequence-based approach efficiency then problem consisting of a high number of shifts. Research objectives:• Solve health care workers rostering problem using a shift sequence based method.• Measure its efficiency then number of shifts increases. Design/methodology/approach—Usually rostering problems are highly constrained.Constraints are classified to soft and hard constraints. Soft and hard constraints of the problem are additionally classified to: sequence constraints, schedule constraints and roster constraints. Sequence constraints are considered when constructing shift sequences. Schedule constraints are considered when constructing a schedule. Roster constraints are applied, then constructing overall solution, i.e. combining all schedules.Shift sequence based approach consists of two stages:• Shift sequences construction,• The construction of schedules.In the shift sequences construction stage, the shift sequences are constructed for each set of health care workers of different skill, considering sequence constraints. Shifts sequences are ranked by their penalties for easier retrieval in later stage.In schedules construction stage, schedules for each health care worker are constructed iteratively, using the shift sequences produced in stage 1. Shift sequence based method is an adaptive iterative method where health care workers who received the highest schedule penalties in the last iteration are scheduled first at the current iteration. During the roster construction, and after a schedule has been generated for the current health care worker, an improvement method based on an efficient greedy local search is carried out on the partial roster. It simply swaps any pair of shifts between two health care workers in the (partial roster, as long as the swaps satisfy hard constraints and decrease the roster penalty.Findings—Using shift sequence method for solving health care workers rostering

  9. Determination of 5 '-leader sequences from radically disparate strains of porcine reproductive and respiratory syndrome virus reveals the presence of highly conserved sequence motifs

    DEFF Research Database (Denmark)

    Oleksiewicz, M.B.; Bøtner, Anette; Nielsen, Jens

    1999-01-01

    We determined the untranslated 5'-leader sequence for three different isolates of porcine reproductive and respiratory syndrome virus (PRRSV): pathogenic European- and American-types, as well as an American-type vaccine strain. 5'-leader from European- and American-type PRRSV differed in length...... (220 and 190 nt, respectively), and exhibited only approximately 50% nucleotide homology. Nevertheless, highly conserved areas were identified in the leader of all 3 PRRSV isolates, which constitute candidate motifs for binding of protein(s) involved in viral replication. These comparative data provide...

  10. High throughput sequencing and proteomics to identify immunogenic proteins of a new pathogen: the dirty genome approach.

    Science.gov (United States)

    Greub, Gilbert; Kebbi-Beghdadi, Carole; Bertelli, Claire; Collyn, François; Riederer, Beat M; Yersin, Camille; Croxatto, Antony; Raoult, Didier

    2009-12-23

    With the availability of new generation sequencing technologies, bacterial genome projects have undergone a major boost. Still, chromosome completion needs a costly and time-consuming gap closure, especially when containing highly repetitive elements. However, incomplete genome data may be sufficiently informative to derive the pursued information. For emerging pathogens, i.e. newly identified pathogens, lack of release of genome data during gap closure stage is clearly medically counterproductive. We thus investigated the feasibility of a dirty genome approach, i.e. the release of unfinished genome sequences to develop serological diagnostic tools. We showed that almost the whole genome sequence of the emerging pathogen Parachlamydia acanthamoebae was retrieved even with relatively short reads from Genome Sequencer 20 and Solexa. The bacterial proteome was analyzed to select immunogenic proteins, which were then expressed and used to elaborate the first steps of an ELISA. This work constitutes the proof of principle for a dirty genome approach, i.e. the use of unfinished genome sequences of pathogenic bacteria, coupled with proteomics to rapidly identify new immunogenic proteins useful to develop in the future specific diagnostic tests such as ELISA, immunohistochemistry and direct antigen detection. Although applied here to an emerging pathogen, this combined dirty genome sequencing/proteomic approach may be used for any pathogen for which better diagnostics are needed. These genome sequences may also be very useful to develop DNA based diagnostic tests. All these diagnostic tools will allow further evaluations of the pathogenic potential of this obligate intracellular bacterium.

  11. Similarity transformed equation of motion coupled-cluster theory based on an unrestricted Hartree-Fock reference for applications to high-spin open-shell systems.

    Science.gov (United States)

    Huntington, Lee M J; Krupička, Martin; Neese, Frank; Izsák, Róbert

    2017-11-07

    The similarity transformed equation of motion coupled-cluster approach is extended for applications to high-spin open-shell systems, within the unrestricted Hartree-Fock (UHF) formalism. An automatic active space selection scheme has also been implemented such that calculations can be performed in a black-box fashion. It is observed that both the canonical and automatic active space selecting similarity transformed equation of motion (STEOM) approaches perform about as well as the more expensive equation of motion coupled-cluster singles doubles (EOM-CCSD) method for the calculation of the excitation energies of doublet radicals. The automatic active space selecting UHF STEOM approach can therefore be employed as a viable, lower scaling alternative to UHF EOM-CCSD for the calculation of excited states in high-spin open-shell systems.

  12. Similarity transformed equation of motion coupled-cluster theory based on an unrestricted Hartree-Fock reference for applications to high-spin open-shell systems

    Science.gov (United States)

    Huntington, Lee M. J.; Krupička, Martin; Neese, Frank; Izsák, Róbert

    2017-11-01

    The similarity transformed equation of motion coupled-cluster approach is extended for applications to high-spin open-shell systems, within the unrestricted Hartree-Fock (UHF) formalism. An automatic active space selection scheme has also been implemented such that calculations can be performed in a black-box fashion. It is observed that both the canonical and automatic active space selecting similarity transformed equation of motion (STEOM) approaches perform about as well as the more expensive equation of motion coupled-cluster singles doubles (EOM-CCSD) method for the calculation of the excitation energies of doublet radicals. The automatic active space selecting UHF STEOM approach can therefore be employed as a viable, lower scaling alternative to UHF EOM-CCSD for the calculation of excited states in high-spin open-shell systems.

  13. High-Resolution Analysis of Coronavirus Gene Expression by RNA Sequencing and Ribosome Profiling.

    Science.gov (United States)

    Irigoyen, Nerea; Firth, Andrew E; Jones, Joshua D; Chung, Betty Y-W; Siddell, Stuart G; Brierley, Ian

    2016-02-01

    Members of the family Coronaviridae have the largest genomes of all RNA viruses, typically in the region of 30 kilobases. Several coronaviruses, such as Severe acute respiratory syndrome-related coronavirus (SARS-CoV) and Middle East respiratory syndrome-related coronavirus (MERS-CoV), are of medical importance, with high mortality rates and, in the case of SARS-CoV, significant pandemic potential. Other coronaviruses, such as Porcine epidemic diarrhea virus and Avian coronavirus, are important livestock pathogens. Ribosome profiling is a technique which exploits the capacity of the translating ribosome to protect around 30 nucleotides of mRNA from ribonuclease digestion. Ribosome-protected mRNA fragments are purified, subjected to deep sequencing and mapped back to the transcriptome to give a global "snap-shot" of translation. Parallel RNA sequencing allows normalization by transcript abundance. Here we apply ribosome profiling to cells infected with Murine coronavirus, mouse hepatitis virus, strain A59 (MHV-A59), a model coronavirus in the same genus as SARS-CoV and MERS-CoV. The data obtained allowed us to study the kinetics of virus transcription and translation with exquisite precision. We studied the timecourse of positive and negative-sense genomic and subgenomic viral RNA production and the relative translation efficiencies of the different virus ORFs. Virus mRNAs were not found to be translated more efficiently than host mRNAs; rather, virus translation dominates host translation at later time points due to high levels of virus transcripts. Triplet phasing of the profiling data allowed precise determination of translated reading frames and revealed several translated short open reading frames upstream of, or embedded within, known virus protein-coding regions. Ribosome pause sites were identified in the virus replicase polyprotein pp1a ORF and investigated experimentally. Contrary to expectations, ribosomes were not found to pause at the ribosomal

  14. High-Resolution Analysis of Coronavirus Gene Expression by RNA Sequencing and Ribosome Profiling.

    Directory of Open Access Journals (Sweden)

    Nerea Irigoyen

    2016-02-01

    Full Text Available Members of the family Coronaviridae have the largest genomes of all RNA viruses, typically in the region of 30 kilobases. Several coronaviruses, such as Severe acute respiratory syndrome-related coronavirus (SARS-CoV and Middle East respiratory syndrome-related coronavirus (MERS-CoV, are of medical importance, with high mortality rates and, in the case of SARS-CoV, significant pandemic potential. Other coronaviruses, such as Porcine epidemic diarrhea virus and Avian coronavirus, are important livestock pathogens. Ribosome profiling is a technique which exploits the capacity of the translating ribosome to protect around 30 nucleotides of mRNA from ribonuclease digestion. Ribosome-protected mRNA fragments are purified, subjected to deep sequencing and mapped back to the transcriptome to give a global "snap-shot" of translation. Parallel RNA sequencing allows normalization by transcript abundance. Here we apply ribosome profiling to cells infected with Murine coronavirus, mouse hepatitis virus, strain A59 (MHV-A59, a model coronavirus in the same genus as SARS-CoV and MERS-CoV. The data obtained allowed us to study the kinetics of virus transcription and translation with exquisite precision. We studied the timecourse of positive and negative-sense genomic and subgenomic viral RNA production and the relative translation efficiencies of the different virus ORFs. Virus mRNAs were not found to be translated more efficiently than host mRNAs; rather, virus translation dominates host translation at later time points due to high levels of virus transcripts. Triplet phasing of the profiling data allowed precise determination of translated reading frames and revealed several translated short open reading frames upstream of, or embedded within, known virus protein-coding regions. Ribosome pause sites were identified in the virus replicase polyprotein pp1a ORF and investigated experimentally. Contrary to expectations, ribosomes were not found to pause at the

  15. Viral metagenomics: Analysis of begomoviruses by illumina high-throughput sequencing

    KAUST Repository

    Idris, Ali

    2014-03-12

    Traditional DNA sequencing methods are inefficient, lack the ability to discern the least abundant viral sequences, and ineffective for determining the extent of variability in viral populations. Here, populations of single-stranded DNA plant begomoviral genomes and their associated beta- and alpha-satellite molecules (virus-satellite complexes) (genus, Begomovirus; family, Geminiviridae) were enriched from total nucleic acids isolated from symptomatic, field-infected plants, using rolling circle amplification (RCA). Enriched virus-satellite complexes were subjected to Illumina-Next Generation Sequencing (NGS). CASAVA and SeqMan NGen programs were implemented, respectively, for quality control and for de novo and reference-guided contig assembly of viral-satellite sequences. The authenticity of the begomoviral sequences, and the reproducibility of the Illumina-NGS approach for begomoviral deep sequencing projects, were validated by comparing NGS results with those obtained using traditional molecular cloning and Sanger sequencing of viral components and satellite DNAs, also enriched by RCA or amplified by polymerase chain reaction. As the use of NGS approaches, together with advances in software development, make possible deep sequence coverage at a lower cost; the approach described herein will streamline the exploration of begomovirus diversity and population structure from naturally infected plants, irrespective of viral abundance. This is the first report of the implementation of Illumina-NGS to explore the diversity and identify begomoviral-satellite SNPs directly from plants naturally-infected with begomoviruses under field conditions. 2014 by the authors; licensee MDPI, Basel, Switzerland.

  16. DNA template strand sequencing of single-cells maps genomic rearrangements at high resolution

    NARCIS (Netherlands)

    Falconer, Ester; Hills, Mark; Naumann, Ulrike; Poon, Steven S. S.; Chavez, Elizabeth A.; Sanders, Ashley D.; Zhao, Yongjun; Hirst, Martin; Lansdorp, Peter M.

    DNA rearrangements such as sister chromatid exchanges (SCEs) are sensitive indicators of genomic stress and instability, but they are typically masked by single-cell sequencing techniques. We developed Strand-seq to independently sequence parental DNA template strands from single cells, making it

  17. Viral Metagenomics: Analysis of Begomoviruses by Illumina High-Throughput Sequencing

    Directory of Open Access Journals (Sweden)

    Ali Idris

    2014-03-01

    Full Text Available Traditional DNA sequencing methods are inefficient, lack the ability to discern the least abundant viral sequences, and ineffective for determining the extent of variability in viral populations. Here, populations of single-stranded DNA plant begomoviral genomes and their associated beta- and alpha-satellite molecules (virus-satellite complexes (genus, Begomovirus; family, Geminiviridae were enriched from total nucleic acids isolated from symptomatic, field-infected plants, using rolling circle amplification (RCA. Enriched virus-satellite complexes were subjected to Illumina-Next Generation Sequencing (NGS. CASAVA and SeqMan NGen programs were implemented, respectively, for quality control and for de novo and reference-guided contig assembly of viral-satellite sequences. The authenticity of the begomoviral sequences, and the reproducibility of the Illumina-NGS approach for begomoviral deep sequencing projects, were validated by comparing NGS results with those obtained using traditional molecular cloning and Sanger sequencing of viral components and satellite DNAs, also enriched by RCA or amplified by polymerase chain reaction. As the use of NGS approaches, together with advances in software development, make possible deep sequence coverage at a lower cost; the approach described herein will streamline the exploration of begomovirus diversity and population structure from naturally infected plants, irrespective of viral abundance. This is the first report of the implementation of Illumina-NGS to explore the diversity and identify begomoviral-satellite SNPs directly from plants naturally-infected with begomoviruses under field conditions.

  18. A highly abundant bacteriophage discovered in the unknown sequences of human faecal metagenomes

    NARCIS (Netherlands)

    Dutilh, Bas E; Cassman, Noriko; McNair, Katelyn; Sanchez, Savannah E; Silva, Genivaldo G Z; Boling, Lance; Barr, Jeremy J; Speth, Daan R; Seguritan, Victor; Aziz, Ramy K; Felts, Ben; Dinsdale, Elizabeth A; Mokili, John L; Edwards, Robert A

    2014-01-01

    Metagenomics, or sequencing of the genetic material from a complete microbial community, is a promising tool to discover novel microbes and viruses. Viral metagenomes typically contain many unknown sequences. Here we describe the discovery of a previously unidentified bacteriophage present in the

  19. Self-similar analysis of the spherical implosion process

    International Nuclear Information System (INIS)

    Ishiguro, Yukio; Katsuragi, Satoru.

    1976-07-01

    The implosion processes caused by laser-heating ablation has been studied by self-similarity analysis. Attention is paid to the possibility of existence of the self-similar solution which reproduces the implosion process of high compression. Details of the self-similar analysis are reproduced and conclusions are drawn quantitatively on the gas compression by a single shock. The compression process by a sequence of shocks is discussed in self-similarity. The gas motion followed by a homogeneous isentropic compression is represented by a self-similar motion. (auth.)

  20. Comparative analysis of transcriptomes in aerial stems and roots of Ephedra sinica based on high-throughput mRNA sequencing

    Directory of Open Access Journals (Sweden)

    Taketo Okada

    2016-12-01

    Full Text Available Ephedra plants are taxonomically classified as gymnosperms, and are medicinally important as the botanical origin of crude drugs and as bioresources that contain pharmacologically active chemicals. Here we show a comparative analysis of the transcriptomes of aerial stems and roots of Ephedra sinica based on high-throughput mRNA sequencing by RNA-Seq. De novo assembly of short cDNA sequence reads generated 23,358, 13,373, and 28,579 contigs longer than 200 bases from aerial stems, roots, or both aerial stems and roots, respectively. The presumed functions encoded by these contig sequences were annotated by BLAST (blastx. Subsequently, these contigs were classified based on gene ontology slims, Enzyme Commission numbers, and the InterPro database. Furthermore, comparative gene expression analysis was performed between aerial stems and roots. These transcriptome analyses revealed differences and similarities between the transcriptomes of aerial stems and roots in E. sinica. Deep transcriptome sequencing of Ephedra should open the door to molecular biological studies based on the entire transcriptome, tissue- or organ-specific transcriptomes, or targeted genes of interest.

  1. High-throughput sequencing enhanced phage display enables the identification of patient-specific epitope motifs in serum

    DEFF Research Database (Denmark)

    Christiansen, Anders; Kringelum, Jens Vindahl; Hansen, Christian Skjødt

    2015-01-01

    of the bioinformatic approach was demonstrated by identifying epitopes of a prominent peanut allergen, Ara h 1, in sera from patients with severe peanut allergy. The identified epitopes were confirmed by high-density peptide micro-arrays. The present study demonstrates that high-throughput sequencing can empower phage...

  2. Taxonomy of anaerobic digestion microbiome reveals biases associated with the applied high throughput sequencing strategies

    DEFF Research Database (Denmark)

    Campanaro, Stefano; Treu, Laura; Kougias, Panagiotis

    2018-01-01

    In the past few years, many studies investigated the anaerobic digestion microbiome by means of 16S rRNA amplicon sequencing. Results obtained from these studies were compared to each other without taking into consideration the followed procedure for amplicons preparation and data analysis...... specifically, the microbial compositions of three laboratory scale biogas reactors were analyzed before and after addition of sodium oleate by sequencing the microbiome with three different approaches: 16S rRNA amplicon sequencing, shotgun DNA and shotgun RNA. This comparative analysis revealed that......, in amplicon sequencing, abundance of some taxa (Euryarchaeota and Spirochaetes) was biased by the inefficiency of universal primers to hybridize all the templates. Reliability of the results obtained was also influenced by the number of hypervariable regions under investigation. Finally, amplicon sequencing...

  3. Nitrate removal from high strength nitrate-bearing wastes in granular sludge sequencing batch reactors.

    Science.gov (United States)

    Krishna Mohan, Tulasi Venkata; Renu, Kadali; Nancharaiah, Yarlagadda Venkata; Satya Sai, Pedapati Murali; Venugopalan, Vayalam Purath

    2016-02-01

    A 6-L sequencing batch reactor (SBR) was operated for development of granular sludge capable of denitrification of high strength nitrates. Complete and stable denitrification of up to 5420 mg L(-1) nitrate-N (2710 mg L(-1) nitrate-N in reactor) was achieved by feeding simulated nitrate waste at a C/N ratio of 3. Compact and dense denitrifying granular sludge with relatively stable microbial community was developed during reactor operation. Accumulation of large amounts of nitrite due to incomplete denitrification occurred when the SBR was fed with 5420 mg L(-1) NO3-N at a C/N ratio of 2. Complete denitrification could not be achieved at this C/N ratio, even after one week of reactor operation as the nitrite levels continued to accumulate. In order to improve denitrification performance, the reactor was fed with nitrate concentrations of 1354 mg L(-1), while keeping C/N ratio at 2. Subsequently, nitrate concentration in the feed was increased in a step-wise manner to establish complete denitrification of 5420 mg L(-1) NO3-N at a C/N ratio of 2. The results show that substrate concentration plays an important role in denitrification of high strength nitrate by influencing nitrite accumulation. Complete denitrification of high strength nitrates can be achieved at lower substrate concentrations, by an appropriate acclimatization strategy. Copyright © 2015 The Society for Biotechnology, Japan. Published by Elsevier B.V. All rights reserved.

  4. In-situ high resolution particle sampling by large time sequence inertial spectrometry

    International Nuclear Information System (INIS)

    Prodi, V.; Belosi, F.

    1990-09-01

    In situ sampling is always preferred, when possible, because of the artifacts that can arise when the aerosol has to flow through long sampling lines. On the other hand, the amount of possible losses can be calculated with some confidence only when the size distribution can be measured with a sufficient precision and the losses are not too large. This makes it desirable to sample directly in the vicinity of the aerosol source or containment. High temperature sampling devices with a detailed aerodynamic separation are extremely useful to this purpose. Several measurements are possible with the inertial spectrometer (INSPEC), but not with cascade impactors or cyclones. INSPEC - INertial SPECtrometer - has been conceived to measure the size distribution of aerosols by separating the particles while airborne according to their size and collecting them on a filter. It consists of a channel of rectangular cross-section with a 90 degree bend. Clean air is drawn through the channel, with a thin aerosol sheath injected close to the inner wall. Due to the bend, the particles are separated according to their size, leaving the original streamline by a distance which is a function of particle inertia and resistance, i.e. of aerodynamic diameter. The filter collects all the particles of the same aerodynamic size at the same distance from the inlet, in a continuous distribution. INSPEC particle separation at high temperature (up to 800 C) has been tested with Zirconia particles as calibration aerosols. The feasibility study has been concerned with resolution and time sequence sampling capabilities under high temperature (700 C)

  5. Deciphering the Resistome of the Widespread Pseudomonas aeruginosa Sequence Type 175 International High-Risk Clone through Whole-Genome Sequencing.

    Science.gov (United States)

    Cabot, Gabriel; López-Causapé, Carla; Ocampo-Sosa, Alain A; Sommer, Lea M; Domínguez, María Ángeles; Zamorano, Laura; Juan, Carlos; Tubau, Fe; Rodríguez, Cristina; Moyà, Bartolomé; Peña, Carmen; Martínez-Martínez, Luis; Plesiat, Patrick; Oliver, Antonio

    2016-12-01

    Whole-genome sequencing (WGS) was used for the characterization of the frequently extensively drug resistant (XDR) Pseudomonas aeruginosa sequence type 175 (ST175) high-risk clone. A total of 18 ST175 isolates recovered from 8 different Spanish hospitals were analyzed; 4 isolates from 4 different French hospitals were included for comparison. The typical resistance profile of ST175 included penicillins, cephalosporins, monobactams, carbapenems, aminoglycosides, and fluoroquinolones. In the phylogenetic analysis, the four French isolates clustered together with two isolates from one of the Spanish regions. Sequence variation was analyzed for 146 chromosomal genes related to antimicrobial resistance, and horizontally acquired genes were explored using online databases. The resistome of ST175 was determined mainly by mutational events; resistance traits common to all or nearly all of the strains included specific ampR mutations leading to ampC overexpression, specific mutations in oprD conferring carbapenem resistance, or a mexZ mutation leading to MexXY overexpression. All isolates additionally harbored an aadB gene conferring gentamicin and tobramycin resistance. Several other resistance traits were specific to certain geographic areas, such as a streptomycin resistance gene, aadA13, detected in all four isolates from France and in the two isolates from the Cantabria region and a glpT mutation conferring fosfomycin resistance, detected in all but these six isolates. Finally, several unique resistance mutations were detected in single isolates; particularly interesting were those in genes encoding penicillin-binding proteins (PBP1A, PBP3, and PBP4). Thus, these results provide information valuable for understanding the genetic basis of resistance and the dynamics of the dissemination and evolution of high-risk clones. Copyright © 2016, American Society for Microbiology. All Rights Reserved.

  6. Characterization of Intestinal Microbiomes of Hirschsprung's Disease Patients with or without Enterocolitis Using Illumina-MiSeq High-Throughput Sequencing.

    Directory of Open Access Journals (Sweden)

    Yuqing Li

    Full Text Available Hirschsprung-associated enterocolitis (HAEC is a life-threatening complication of Hirschsprung's disease (HD. Although the pathological mechanisms are still unclear, studies have shown that HAEC has a close relationship with the disturbance of intestinal microbiota. This study aimed to investigate the characteristics of the intestinal microbiome of HD patients with or without enterocolitis. During routine or emergency surgery, we collected 35 intestinal content samples from five patients with HAEC and eight HD patients, including three HD patients with a history of enterocolitis who were in a HAEC remission (HAEC-R phase. Using Illumina-MiSeq high-throughput sequencing, we sequenced the V4 region of bacterial 16S rRNA, and operational taxonomic units (OTUs were defined by 97% sequence similarity. Principal coordinate analysis (PCoA of weighted UniFrac distances was performed to evaluate the diversity of each intestinal microbiome sample. The microbiota differed significantly between the HD patients (characterized by the prevalence of Bacteroidetes and HAEC patients (characterized by the prevalence of Proteobacteria, while the microbiota of the HAEC-R patients was more similar to that of the HAEC patients. We also observed that the specimens from different intestinal sites of each HD patient differed significantly, while the specimens from different intestinal sites of each HAEC and HAEC-R patient were more similar. In conclusion, the microbiome pattern of the HAEC-R patients was more similar to that of the HAEC patients than to that of the HD patients. The HD patients had a relatively distinct, more stable community than the HAEC and HAEC-R patients, suggesting that enterocolitis may either be caused by or result in a disruption of the patient's uniquely adapted intestinal flora. The intestinal microbiota associated with enterocolitis may persist following symptom resolution and can be implicated in the symptom recurrence.

  7. Constraining controls on carbonate sequences with high-resolution chronostratigraphy: Upper Miocene, Cabo de Gata region, SE Spain

    Science.gov (United States)

    Montgomery, P.; Farr, M.R.; Franseen, E.K.; Goldstein, R.H.

    2001-01-01

    A high-resolution chronostratigraphy has been developed for Miocene shallow-water carbonate strata in the Cabo de Gata region of SE Spain for evaluation of local, regional and global factors that controlled platform architecture prior to and during the Messinian salinity crisis. Paleomagnetic data were collected from strata at three localities. Mean natural remanent magnetization (NRM) ranges between 1.53 ?? 10-8 and 5.2 ?? 10-3 Am2/kg. Incremental thermal and alternating field demagnetization isolated the characteristic remanent magnetization (ChRM). Rock magnetic studies show that the dominant magnetic mineral is magnetite, but mixtures of magnetite and hematite occur. A composite chronostratigraphy was derived from five stratigraphic sections. Regional stratigraphic data, biostratigraphic data, and an 40Ar/39Ar date of 8.5 ?? 0.1 Ma, for an interbedded volcanic flow, place the strata in geomagnetic polarity Chrons C4r to C3r. Sequence-stratigraphic and diagenetic evidence indicate a major unconformity at the base of depositional sequence (DS)3 that contains a prograding reef complex, suggesting that approximately 250 000 yr of record (Subchrons C3Br.2r to 3Br.1r) are missing near the Messinian-Tortonian boundary. Correlation to the GPTS shows that the studied strata represent five third- to fourth-order DSs. Basal units are temperate to subtropical ramps (DS1A, DS1B, DS2); these are overlain by subtropical to tropical reefal platforms (DS3), which are capped by subtropical to tropical cyclic carbonates (Terminal Carbonate Complex, TCC). Correlation of the Cabo de Gata record to the Melilla area of Morocco, and the Sorbas basin of Spain indicate that early - Late Tortonian ramp strata from these areas are partially time-equivalent. Similar strata are extensively developed in the Western Mediterranean and likely were influenced by a cool climate or influx of nutrients during an overall rise in global sea-level. After ramp deposition, a sequence boundary (SB3) in

  8. Identification of QTLs for 14 Agronomically Important Traits in Setaria italica Based on SNPs Generated from High-Throughput Sequencing

    Directory of Open Access Journals (Sweden)

    Kai Zhang

    2017-05-01

    Full Text Available Foxtail millet (Setaria italica is an important crop possessing C4 photosynthesis capability. The S. italica genome was de novo sequenced in 2012, but the sequence lacked high-density genetic maps with agronomic and yield trait linkages. In the present study, we resequenced a foxtail millet population of 439 recombinant inbred lines (RILs and developed high-resolution bin map and high-density SNP markers, which could provide an effective approach for gene identification. A total of 59 QTL for 14 agronomic traits in plants grown under long- and short-day photoperiods were identified. The phenotypic variation explained ranged from 4.9 to 43.94%. In addition, we suggested that there may be segregation distortion on chromosome 6 that is significantly distorted toward Zhang gu. The newly identified QTL will provide a platform for sequence-based research on the S. italica genome, and for molecular marker-assisted breeding.

  9. Identification of QTLs for 14 Agronomically Important Traits in Setaria italica Based on SNPs Generated from High-Throughput Sequencing.

    Science.gov (United States)

    Zhang, Kai; Fan, Guangyu; Zhang, Xinxin; Zhao, Fang; Wei, Wei; Du, Guohua; Feng, Xiaolei; Wang, Xiaoming; Wang, Feng; Song, Guoliang; Zou, Hongfeng; Zhang, Xiaolei; Li, Shuangdong; Ni, Xuemei; Zhang, Gengyun; Zhao, Zhihai

    2017-05-05

    Foxtail millet ( Setaria italica ) is an important crop possessing C4 photosynthesis capability. The S. italica genome was de novo sequenced in 2012, but the sequence lacked high-density genetic maps with agronomic and yield trait linkages. In the present study, we resequenced a foxtail millet population of 439 recombinant inbred lines (RILs) and developed high-resolution bin map and high-density SNP markers, which could provide an effective approach for gene identification. A total of 59 QTL for 14 agronomic traits in plants grown under long- and short-day photoperiods were identified. The phenotypic variation explained ranged from 4.9 to 43.94%. In addition, we suggested that there may be segregation distortion on chromosome 6 that is significantly distorted toward Zhang gu. The newly identified QTL will provide a platform for sequence-based research on the S. italica genome, and for molecular marker-assisted breeding. Copyright © 2017 Zhang et al.

  10. Association Study of Gut Flora in Coronary Heart Disease through High-Throughput Sequencing

    Directory of Open Access Journals (Sweden)

    Li Cui

    2017-01-01

    Full Text Available Objectives. We aimed to explore the impact of gut microbiota in coronary heart disease (CHD patients through high-throughput sequencing. Methods. A total of 29 CHD in-hospital patients and 35 healthy volunteers as controls were included. Nucleic acids were extracted from fecal samples, followed by α diversity and principal coordinate analysis (PCoA. Based on unweighted UniFrac distance matrices, unweighted-pair group method with arithmetic mean (UPGMA trees were created. Results. After data optimization, an average of 121312±19293 reads in CHD patients and 234372±108725 reads in controls was obtained. Reads corresponding to 38 phyla, 90 classes, and 584 genera were detected in CHD patients, whereas 40 phyla, 99 classes, and 775 genera were detected in controls. The proportion of phylum Bacteroidetes (56.12% was lower and that of phylum Firmicutes was higher (37.06% in CHD patients than those in the controls (60.92% and 32.06%, P<0.05. PCoA and UPGMA tree analysis showed that there were significant differences of gut microbial compositions between the two groups. Conclusion. The diversity and compositions of gut flora were different between CHD patients and healthy controls. The incidence of CHD might be associated with the alteration of gut microbiota.

  11. Evaluation of the microbial diversity in amyotrophic lateral sclerosis using high-throughput sequencing

    Directory of Open Access Journals (Sweden)

    Xin Fang

    2016-09-01

    Full Text Available More and more evidences indicate that diseases of the central nervous system (CNS have been seriously affected by faecal microbes. However, little work is done to explore interaction between amyotrophic lateral sclerosis (ALS and faecal microbes. In the present study, high-throughput sequencing method was used to compare the intestinal microbial diversity of healthy people and ALS patients. The principal coordinate analysis (PCoA, Venn and unweighted pair-group method using arithmetic averages (UPGMA showed an obvious microbial changes between healthy people (group H and ALS patients (group A, and the average ratios of Bacteroides, Faecalibacterium, Anaerostipes, Prevotella, Escherichia and Lachnospira at genus level between ALS patients and healthy people were 0.78, 2.18, 3.41, 0.35, 0.79 and 13.07. Furthermore, the decreased Firmicutes/Bacteroidetes ratio at phylum level using LEfSE (LDA >4.0, together with the significant increased genus Dorea (harmful microorganisms and significant reduced genus Oscillibacter, Anaerostipes, Lachnospiraceae (beneficial microorganisms in ALS patients, indicated that the imbalance in intestinal microflora constitution had a strong association with the pathogenesis of ALS.

  12. Evaluation of the Microbial Diversity in Amyotrophic Lateral Sclerosis Using High-Throughput Sequencing.

    Science.gov (United States)

    Fang, Xin; Wang, Xin; Yang, Shaoguo; Meng, Fanjing; Wang, Xiaolei; Wei, Hua; Chen, Tingtao

    2016-01-01

    More and more evidences indicate that diseases of the central nervous system have been seriously affected by fecal microbes. However, little work is done to explore interaction between amyotrophic lateral sclerosis (ALS) and fecal microbes. In the present study, high-throughput sequencing method was used to compare the intestinal microbial diversity of healthy people and ALS patients. The principal coordinate analysis, Venn and unweighted pair-group method using arithmetic averages (UPGMA) showed an obvious microbial changes between healthy people (group H) and ALS patients (group A), and the average ratios of Bacteroides , Faecalibacterium , Anaerostipes , Prevotella , Escherichia , and Lachnospira at genus level between ALS patients and healthy people were 0.78, 2.18, 3.41, 0.35, 0.79, and 13.07. Furthermore, the decreased Firmicutes/Bacteroidetes ratio at phylum level using LEfSE (LDA > 4.0), together with the significant increased genus Dorea (harmful microorganisms) and significant reduced genus Oscillibacter , Anaerostipes , Lachnospiraceae (beneficial microorganisms) in ALS patients, indicated that the imbalance in intestinal microflora constitution had a strong association with the pathogenesis of ALS.

  13. High-throughput sequencing of plasma microRNA in chronic fatigue syndrome/myalgic encephalomyelitis.

    Directory of Open Access Journals (Sweden)

    Ekua W Brenu

    Full Text Available BACKGROUND: MicroRNAs (miRNAs are known to regulate many biological processes and their dysregulation has been associated with a variety of diseases including Chronic Fatigue Syndrome/Myalgic Encephalomyelitis (CFS/ME. The recent discovery of stable and reproducible miRNA in plasma has raised the possibility that circulating miRNAs may serve as novel diagnostic markers. The objective of this study was to determine the role of plasma miRNA in CFS/ME. RESULTS: Using Illumina high-throughput sequencing we identified 19 miRNAs that were differentially expressed in the plasma of CFS/ME patients in comparison to non-fatigued controls. Following RT-qPCR analysis, we were able to confirm the significant up-regulation of three miRNAs (hsa-miR-127-3p, hsa-miR-142-5p and hsa-miR-143-3p in the CFS/ME patients. CONCLUSION: Our study is the first to identify circulating miRNAs from CFS/ME patients and also to confirm three differentially expressed circulating miRNAs in CFS/ME patients, providing a basis for further study to find useful CFS/ME biomarkers.

  14. High-Throughput Sequencing of Plasma MicroRNA in Chronic Fatigue Syndrome/Myalgic Encephalomyelitis

    Science.gov (United States)

    Brenu, Ekua W.; Ashton, Kevin J.; Batovska, Jana; Staines, Donald R.; Marshall-Gradisnik, Sonya M.

    2014-01-01

    Background MicroRNAs (miRNAs) are known to regulate many biological processes and their dysregulation has been associated with a variety of diseases including Chronic Fatigue Syndrome/Myalgic Encephalomyelitis (CFS/ME). The recent discovery of stable and reproducible miRNA in plasma has raised the possibility that circulating miRNAs may serve as novel diagnostic markers. The objective of this study was to determine the role of plasma miRNA in CFS/ME. Results Using Illumina high-throughput sequencing we identified 19 miRNAs that were differentially expressed in the plasma of CFS/ME patients in comparison to non-fatigued controls. Following RT-qPCR analysis, we were able to confirm the significant up-regulation of three miRNAs (hsa-miR-127-3p, hsa-miR-142-5p and hsa-miR-143-3p) in the CFS/ME patients. Conclusion Our study is the first to identify circulating miRNAs from CFS/ME patients and also to confirm three differentially expressed circulating miRNAs in CFS/ME patients, providing a basis for further study to find useful CFS/ME biomarkers. PMID:25238588

  15. Identification of microRNAs and their targets in Finger millet by high throughput sequencing.

    Science.gov (United States)

    Usha, S; Jyothi, M N; Sharadamma, N; Dixit, Rekha; Devaraj, V R; Nagesh Babu, R

    2015-12-15

    MicroRNAs are short non-coding RNAs which play an important role in regulating gene expression by mRNA cleavage or by translational repression. The majority of identified miRNAs were evolutionarily conserved; however, others expressed in a species-specific manner. Finger millet is an important cereal crop; nonetheless, no practical information is available on microRNAs to date. In this study, we have identified 95 conserved microRNAs belonging to 39 families and 3 novel microRNAs by high throughput sequencing. For the identified conserved and novel miRNAs a total of 507 targets were predicted. 11 miRNAs were validated and tissue specificity was determined by stem loop RT-qPCR, Northern blot. GO analyses revealed targets of miRNA were involved in wide range of regulatory functions. This study implies large number of known and novel miRNAs found in Finger millet which may play important role in growth and development. Copyright © 2015 Elsevier B.V. All rights reserved.

  16. High Performance Biological Pairwise Sequence Alignment: FPGA versus GPU versus Cell BE versus GPP

    Directory of Open Access Journals (Sweden)

    Khaled Benkrid

    2012-01-01

    Full Text Available This paper explores the pros and cons of reconfigurable computing in the form of FPGAs for high performance efficient computing. In particular, the paper presents the results of a comparative study between three different acceleration technologies, namely, Field Programmable Gate Arrays (FPGAs, Graphics Processor Units (GPUs, and IBM’s Cell Broadband Engine (Cell BE, in the design and implementation of the widely-