WorldWideScience

Sample records for high confidence sequence

  1. High Confidence Software and Systems Research Needs

    Data.gov (United States)

    Networking and Information Technology Research and Development, Executive Office of the President — This White Paper presents a survey of high confidence software and systems research needs. It has been prepared by the High Confidence Software and Systems...

  2. High-Confidence Quantum Gate Tomography

    Science.gov (United States)

    Johnson, Blake; da Silva, Marcus; Ryan, Colm; Kimmel, Shelby; Donovan, Brian; Ohki, Thomas

    2014-03-01

    Debugging and verification of high-fidelity quantum gates requires the development of new tools and protocols to unwrap the performance of the gate from the rest of the sequence. Randomized benchmarking tomography[2] allows one to extract full information of the unital portion of the gate with high confidence. We report experimental confirmation of the technique's applicability to quantum gate tomography. We show that the method is robust to common experimental imperfections such as imperfect single-shot readout and state preparation. We also demonstrate the ability to characterize non-Clifford gates. To assist in the experimental implementation we introduce two techniques. ``Atomic Cliffords'' use phase ramping and frame tracking to allow single-pulse implementation of the full group of single-qubit Clifford gates. Domain specific pulse sequencers allow rapid implementation of the many thousands of sequences needed. This research was funded by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), through the Army Research Office contract no. W911NF-10-1-0324.

  3. Hypercorrection of High Confidence Errors in Children

    Science.gov (United States)

    Metcalfe, Janet; Finn, Bridgid

    2012-01-01

    Three experiments investigated whether the hypercorrection effect--the finding that errors committed with high confidence are easier, rather than more difficult, to correct than are errors committed with low confidence--occurs in grade school children as it does in young adults. All three experiments showed that Grade 3-6 children hypercorrected…

  4. Hypercorrection of High Confidence Errors in Children

    Science.gov (United States)

    Metcalfe, Janet; Finn, Bridgid

    2012-01-01

    Three experiments investigated whether the hypercorrection effect--the finding that errors committed with high confidence are easier, rather than more difficult, to correct than are errors committed with low confidence--occurs in grade school children as it does in young adults. All three experiments showed that Grade 3-6 children hypercorrected…

  5. Assessment of cartilage-dedicated sequences at ultra-high-field MRI: comparison of imaging performance and diagnostic confidence between 3.0 and 7.0 T with respect to osteoarthritis-induced changes at the knee joint

    Energy Technology Data Exchange (ETDEWEB)

    Stahl, Robert [University of California, Musculoskeletal and Quantitative Imaging Group, Department of Radiology, San Francisco, CA (United States); University Hospitals - Campus Grosshadern, Ludwig Maximilians University of Munich, Department of Clinical Radiology, Munich (Germany); Krug, Roland; Zuo, Jin; Majumdar, Sharmila; Link, Thomas M. [University of California, Musculoskeletal and Quantitative Imaging Group, Department of Radiology, San Francisco, CA (United States); Kelley, Douglas A.C. [General Electrics Healthcare Technologies, San Francisco, CA (United States); Ma, C.B. [University of California, Department of Orthopedic Surgery, San Francisco, CA (United States)

    2009-08-15

    The objectives of the study were to optimize three cartilage-dedicated sequences for in vivo knee imaging at 7.0 T ultra-high-field (UHF) magnetic resonance imaging (MRI) and to compare imaging performance and diagnostic confidence concerning osteoarthritis (OA)-induced changes at 7.0 and 3.0 T MRI. Optimized MRI sequences for cartilage imaging at 3.0 T were tailored for 7.0 T: an intermediate-weighted fast spin-echo (IM-w FSE), a fast imaging employing steady-state acquisition (FIESTA) and a T1-weighted 3D high-spatial-resolution volumetric fat-suppressed spoiled gradient-echo (SPGR) sequence. Three healthy subjects and seven patients with mild OA were examined. Signal-to-noise ratio (SNR), contrast-to-noise ratio (CNR), diagnostic confidence in assessing cartilage abnormalities, and image quality were determined. Abnormalities were assessed with the whole organ magnetic resonance imaging score (WORMS). Focal cartilage lesions and bone marrow edema pattern (BMEP) were also quantified. At 7.0 T, SNR was increased (p<0.05) for all sequences. For the IM-w FSE sequence, limitations with the specific absorption rate (SAR) required modifications of the scan parameters yielding an incomplete coverage of the knee joint, extensive artifacts, and a less effective fat saturation. CNR and image quality were increased (p<0.05) for SPGR and FIESTA and decreased for IM-w FSE. Diagnostic confidence for cartilage lesions was highest (p<0.05) for FIESTA at 7.0 T. Evaluation of BMEP was decreased (p < 0.05) at 7.0 T due to limited performance of IM-w FSE. Gradient echo-based pulse sequences like SPGR and FIESTA are well suited for imaging at UHF which may improve early detection of cartilage lesions. However, UHF IM-w FSE sequences are less feasible for clinical use. (orig.)

  6. Asymptotically Honest Confidence Regions for High Dimensional

    DEFF Research Database (Denmark)

    Caner, Mehmet; Kock, Anders Bredahl

    While variable selection and oracle inequalities for the estimation and prediction error have received considerable attention in the literature on high-dimensional models, very little work has been done in the area of testing and construction of confidence bands in high-dimensional models. However...... develop an oracle inequality for the conservative Lasso only assuming the existence of a certain number of moments. This is done by means of the Marcinkiewicz-Zygmund inequality which in our context provides sharper bounds than Nemirovski's inequality. As opposed to van de Geer et al. (2014) we allow...

  7. Autism Spectrum Disorder and High Confidence Gene Factors

    OpenAIRE

    Mai, MOCHIZUKI

    2017-01-01

    Autism spectrum disorder (ASD) is a neurological developmental disorder whose mechanism isyet unclear. However, recent ASD studies, which employ exome- and genome-wide sequencing,have identified some high-confidence ASD genes. Those ASD studies have revealed that CHD8is likely associated with ASD. In this article, we highlight that CHD8 may regulate othercandidate ASD risk genes. Current research indicates that there exist some thousand autismsusceptibility candidate genes. Moreover, we sugge...

  8. High-Confidence Predictions under Adversarial Uncertainty

    CERN Document Server

    Drucker, Andrew

    2011-01-01

    We study the setting in which the bits of an unknown infinite binary sequence x are revealed sequentially to an observer. We show that very limited assumptions about x allow one to make successful predictions about unseen bits of x. First, we study the problem of successfully predicting a single 0 from among the bits of x. In our model we have only one chance to make a prediction, but may do so at a time of our choosing. We describe and motivate this as the problem of a frog who wants to cross a road safely. Letting N_t denote the number of 1s among the first t bits of x, we say that x is "eps-weakly sparse" if lim inf (N_t/t) 0, we give a randomized forecasting algorithm S_eps that, given sequential access to a binary sequence x, makes a predi ction of the form: "A p fraction of the next N bits will be 1s." (The algorithm gets to choose p, N, and the time of the prediction.) For any fixed sequence x, the forecast fraction p is accurate to within +-eps with probability 1 - eps.

  9. Business confidence still high in Zimbabwe.

    Science.gov (United States)

    Amanor-wilks, D

    1995-12-01

    Business confidence has not been affected in Zimbabwe despite the AIDS epidemic in that country. An Australian mining company has recruited people to work at its platinum mine in Zimbabwe and also instituted an AIDS awareness program. The National Chamber of Commerce disclosed that semiskilled and unskilled workers who are the "easiest to replace" have been most affected by the epidemic. The impact of AIDS has not been as bad as had been predicted several years ago. By the end of the 1990s, however, there might be a skills shortage. The first AIDS case was detected in 1985 in Zimbabwe. By the end of 1995 a cumulative total of 38,500 cases had been reported, but the National AIDS Control Program believes that the true figure is over 100,000. The estimated number of HIV-infected people is about 1 million. The most economically productive age group (30-50) has the highest rates of infection. Transport is affected most, followed by mining and commercial farming. Infection rates among miners are estimated to be 20-30% and the rates are the highest at the mines on the major transport routes. The mining industry has not had any problems in recruiting labor, but, increasingly, deaths are AIDS-related. The growing sex industry at the mines has accelerated the spread of HIV. In addition, small mines do not have AIDS awareness programs in place. The National Employment Council runs a project for the transport industry, which seeks to intensify AIDS campaigns at truck stops. This also entails talks to drivers about AIDS; courses for police, nurses, and sex workers; and the distribution of condoms. In commercial farming, two-thirds of workers are unskilled casual laborers who live in squalid conditions that foster the spread of AIDS. At these farms there is also a growing number of orphans, whose number is estimated to rise to 60,000 by the late 1990s.

  10. Response confidence for emotion perception in schizophrenia using a Continuous Facial Sequence Task.

    Science.gov (United States)

    Moritz, Steffen; Woznica, Aneta; Andreou, Christina; Köther, Ulf

    2012-12-30

    Deficits in emotion perception and overconfidence in errors are well-documented in schizophrenia but have not been examined concurrently. The present study aimed to fill this gap. Twenty-three schizophrenia patients and twenty-nine healthy subjects underwent a Continuous Facial Sequence Task (CFST). The CFST comprised two blocks: a female (1st block) and a male protagonist (2nd block) displayed the six basic emotions postulated by Ekman as well as two more complex mental states and a neutral expression. Participants were first asked to identify the affect displayed by the performer and then to judge their response confidence. No group differences emerged regarding overall emotion perception. Follow-up analyses showed that patients were less correct in detecting some negative emotions but performed better for neutral or positive emotions. Regarding confidence, incorrect decisions in patients were associated with higher confidence than in controls (statistical trend level, moderate effect size). Patients displayed significant overconfidence in errors for negative emotions. In addition, patients were more prone to high-confident errors for emotions that were displayed in weak emotional intensity. While the study supports the view that the examination of confidence adds unique information to our understanding of social cognition, several methodological limitations render its findings preliminary. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.

  11. Distinguishing highly confident accurate and inaccurate memory: insights about relevant and irrelevant influences on memory confidence

    Science.gov (United States)

    Chua, Elizabeth F.; Hannula, Deborah E.; Ranganath, Charan

    2012-01-01

    It is generally believed that accuracy and confidence in one’s memory are related, but there are many instances when they diverge. Accordingly, it is important to disentangle the factors which contribute to memory accuracy and confidence, especially those factors that contribute to confidence, but not accuracy. We used eye movements to separately measure fluent cue processing, the target recognition experience, and relative evidence assessment on recognition confidence and accuracy. Eye movements were monitored during a face-scene associative recognition task, in which participants first saw a scene cue, followed by a forced-choice recognition test for the associated face, with confidence ratings. Eye movement indices of the target recognition experience were largely indicative of accuracy, and showed a relationship to confidence for accurate decisions. In contrast, eye movements during the scene cue raised the possibility that more fluent cue processing was related to higher confidence for both accurate and inaccurate recognition decisions. In a second experiment, we manipulated cue familiarity, and therefore cue fluency. Participants showed higher confidence for cue-target associations for when the cue was more familiar, especially for incorrect responses. These results suggest that over-reliance on cue familiarity and under-reliance on the target recognition experience may lead to erroneous confidence. PMID:22171810

  12. Distinguishing highly confident accurate and inaccurate memory: insights about relevant and irrelevant influences on memory confidence.

    Science.gov (United States)

    Chua, Elizabeth F; Hannula, Deborah E; Ranganath, Charan

    2012-01-01

    It is generally believed that accuracy and confidence in one's memory are related, but there are many instances when they diverge. Accordingly it is important to disentangle the factors that contribute to memory accuracy and confidence, especially those factors that contribute to confidence, but not accuracy. We used eye movements to separately measure fluent cue processing, the target recognition experience, and relative evidence assessment on recognition confidence and accuracy. Eye movements were monitored during a face-scene associative recognition task, in which participants first saw a scene cue, followed by a forced-choice recognition test for the associated face, with confidence ratings. Eye movement indices of the target recognition experience were largely indicative of accuracy, and showed a relationship to confidence for accurate decisions. In contrast, eye movements during the scene cue raised the possibility that more fluent cue processing was related to higher confidence for both accurate and inaccurate recognition decisions. In a second experiment we manipulated cue familiarity, and therefore cue fluency. Participants showed higher confidence for cue-target associations for when the cue was more familiar, especially for incorrect responses. These results suggest that over-reliance on cue familiarity and under-reliance on the target recognition experience may lead to erroneous confidence.

  13. Inferring high-confidence human protein-protein interactions

    Directory of Open Access Journals (Sweden)

    Yu Xueping

    2012-05-01

    Full Text Available Abstract Background As numerous experimental factors drive the acquisition, identification, and interpretation of protein-protein interactions (PPIs, aggregated assemblies of human PPI data invariably contain experiment-dependent noise. Ascertaining the reliability of PPIs collected from these diverse studies and scoring them to infer high-confidence networks is a non-trivial task. Moreover, a large number of PPIs share the same number of reported occurrences, making it impossible to distinguish the reliability of these PPIs and rank-order them. For example, for the data analyzed here, we found that the majority (>83% of currently available human PPIs have been reported only once. Results In this work, we proposed an unsupervised statistical approach to score a set of diverse, experimentally identified PPIs from nine primary databases to create subsets of high-confidence human PPI networks. We evaluated this ranking method by comparing it with other methods and assessing their ability to retrieve protein associations from a number of diverse and independent reference sets. These reference sets contain known biological data that are either directly or indirectly linked to interactions between proteins. We quantified the average effect of using ranked protein interaction data to retrieve this information and showed that, when compared to randomly ranked interaction data sets, the proposed method created a larger enrichment (~134% than either ranking based on the hypergeometric test (~109% or occurrence ranking (~46%. Conclusions From our evaluations, it was clear that ranked interactions were always of value because higher-ranked PPIs had a higher likelihood of retrieving high-confidence experimental data. Reducing the noise inherent in aggregated experimental PPIs via our ranking scheme further increased the accuracy and enrichment of PPIs derived from a number of biologically relevant data sets. These results suggest that using our high-confidence

  14. Augmenting Chinese hamster genome assembly by identifying regions of high confidence.

    Science.gov (United States)

    Vishwanathan, Nandita; Bandyopadhyay, Arpan A; Fu, Hsu-Yuan; Sharma, Mohit; Johnson, Kathryn C; Mudge, Joann; Ramaraj, Thiruvarangan; Onsongo, Getiria; Silverstein, Kevin A T; Jacob, Nitya M; Le, Huong; Karypis, George; Hu, Wei-Shou

    2016-09-01

    Chinese hamster Ovary (CHO) cell lines are the dominant industrial workhorses for therapeutic recombinant protein production. The availability of genome sequence of Chinese hamster and CHO cells will spur further genome and RNA sequencing of producing cell lines. However, the mammalian genomes assembled using shot-gun sequencing data still contain regions of uncertain quality due to assembly errors. Identifying high confidence regions in the assembled genome will facilitate its use for cell engineering and genome engineering. We assembled two independent drafts of Chinese hamster genome by de novo assembly from shotgun sequencing reads and by re-scaffolding and gap-filling the draft genome from NCBI for improved scaffold lengths and gap fractions. We then used the two independent assemblies to identify high confidence regions using two different approaches. First, the two independent assemblies were compared at the sequence level to identify their consensus regions as "high confidence regions" which accounts for at least 78 % of the assembled genome. Further, a genome wide comparison of the Chinese hamster scaffolds with mouse chromosomes revealed scaffolds with large blocks of collinearity, which were also compiled as high-quality scaffolds. Genome scale collinearity was complemented with EST based synteny which also revealed conserved gene order compared to mouse. As cell line sequencing becomes more commonly practiced, the approaches reported here are useful for assessing the quality of assembly and potentially facilitate the engineering of cell lines.

  15. Enabling high confidence detections of gravitational-wave bursts

    CERN Document Server

    Littenberg, Tyson B; Cornish, Neil J; Millhouse, Margaret

    2015-01-01

    With the advanced LIGO and Virgo detectors taking observations the detection of gravitational waves is expected within the next few years. Extracting astrophysical information from gravitational wave detections is a well-posed problem and thoroughly studied when detailed models for the waveforms are available. However, one motivation for the field of gravitational wave astronomy is the potential for new discoveries. Recognizing and characterizing unanticipated signals requires data analysis techniques which do not depend on theoretical predictions for the gravitational waveform. Past searches for short-duration un-modeled gravitational wave signals have been hampered by transient noise artifacts, or "glitches," in the detectors. In some cases, even high signal-to-noise simulated astrophysical signals have proven difficult to distinguish from glitches, so that essentially any plausible signal could be detected with at most 2-3 $\\sigma$ level confidence. We have put forth the BayesWave algorithm to differentiat...

  16. Technical Report: Algorithm and Implementation for Quasispecies Abundance Inference with Confidence Intervals from Metagenomic Sequence Data

    Energy Technology Data Exchange (ETDEWEB)

    McLoughlin, Kevin [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

    2016-01-11

    This report describes the design and implementation of an algorithm for estimating relative microbial abundances, together with confidence limits, using data from metagenomic DNA sequencing. For the background behind this project and a detailed discussion of our modeling approach for metagenomic data, we refer the reader to our earlier technical report, dated March 4, 2014. Briefly, we described a fully Bayesian generative model for paired-end sequence read data, incorporating the effects of the relative abundances, the distribution of sequence fragment lengths, fragment position bias, sequencing errors and variations between the sampled genomes and the nearest reference genomes. A distinctive feature of our modeling approach is the use of a Chinese restaurant process (CRP) to describe the selection of genomes to be sampled, and thus the relative abundances. The CRP component is desirable for fitting abundances to reads that may map ambiguously to multiple targets, because it naturally leads to sparse solutions that select the best representative from each set of nearly equivalent genomes.

  17. High-confidence coding and noncoding transcriptome maps

    Science.gov (United States)

    2017-01-01

    The advent of high-throughput RNA sequencing (RNA-seq) has led to the discovery of unprecedentedly immense transcriptomes encoded by eukaryotic genomes. However, the transcriptome maps are still incomplete partly because they were mostly reconstructed based on RNA-seq reads that lack their orientations (known as unstranded reads) and certain boundary information. Methods to expand the usability of unstranded RNA-seq data by predetermining the orientation of the reads and precisely determining the boundaries of assembled transcripts could significantly benefit the quality of the resulting transcriptome maps. Here, we present a high-performing transcriptome assembly pipeline, called CAFE, that significantly improves the original assemblies, respectively assembled with stranded and/or unstranded RNA-seq data, by orienting unstranded reads using the maximum likelihood estimation and by integrating information about transcription start sites and cleavage and polyadenylation sites. Applying large-scale transcriptomic data comprising 230 billion RNA-seq reads from the ENCODE, Human BodyMap 2.0, The Cancer Genome Atlas, and GTEx projects, CAFE enabled us to predict the directions of about 220 billion unstranded reads, which led to the construction of more accurate transcriptome maps, comparable to the manually curated map, and a comprehensive lncRNA catalog that includes thousands of novel lncRNAs. Our pipeline should not only help to build comprehensive, precise transcriptome maps from complex genomes but also to expand the universe of noncoding genomes. PMID:28396519

  18. Preparation for high-acuity clinical placement: confidence levels of final-year nursing students

    Directory of Open Access Journals (Sweden)

    Porter J

    2013-04-01

    Full Text Available Joanne Porter, Julia Morphet, Karen Missen, Anita Raymond School of Nursing and Midwifery, Monash University, Churchill, VIC, Australia Aim: To measure final-year nursing students’ preparation for high-acuity placement with emphasis on clinical skill performance confidence. Background: Self-confidence has been reported as being a key component for effective clinical performance, and confident students are more likely to be more effective nurses. Clinical skill performance is reported to be the most influential source of self-confidence. Student preparation and skill acquisition are therefore important aspects in ensuring students have successful clinical placements, especially in areas of high acuity. Curriculum development should aim to assist students with their theoretical and clinical preparedness for the clinical environment. Method: A modified pretest/posttest survey design was used to measure the confidence of third-year undergraduate nursing students (n = 318 for placement into a high-acuity clinical setting. The survey comprised four questions related to clinical placement and prospect of participating in a cardiac arrest scenario, and confidence rating levels of skills related to practice in a high-acuity setting. Content and face validity were established by an expert panel (α = 0.90 and reliability was established by the pilot study in 2009. Comparisons were made between confidence levels at the beginning and end of semester. Results: Student confidence to perform individual clinical skills increased over the semester; however their feelings of preparedness for high-acuity clinical placement decreased over the same time period. Reported confidence levels improved with further exposure to clinical placement. Conclusion: There may be many external factors that influence students’ perceptions of confidence and preparedness for practice. Further research is recommended to identify causes of poor self-confidence in final-year nursing

  19. People’s Hypercorrection of High Confidence Errors: Did They Know it All Along?

    OpenAIRE

    Metcalfe, Janet; Finn, Bridgid

    2011-01-01

    This study investigated the ‘knew it all along’ explanation of the hypercorrection effect. The hypercorrection effect refers to the finding that when given corrective feedback, errors that are committed with high confidence are easier to correct than low confidence errors. Experiment 1 showed that people were more likely to claim that they ‘knew it all along,’ when they were given the answers to high confidence errors as compared to low confidence errors. Experiments 2 and 3 investigated whet...

  20. Technical Report on Modeling for Quasispecies Abundance Inference with Confidence Intervals from Metagenomic Sequence Data

    Energy Technology Data Exchange (ETDEWEB)

    McLoughlin, K. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

    2016-01-11

    The overall aim of this project is to develop a software package, called MetaQuant, that can determine the constituents of a complex microbial sample and estimate their relative abundances by analysis of metagenomic sequencing data. The goal for Task 1 is to create a generative model describing the stochastic process underlying the creation of sequence read pairs in the data set. The stages in this generative process include the selection of a source genome sequence for each read pair, with probability dependent on its abundance in the sample. The other stages describe the evolution of the source genome from its nearest common ancestor with a reference genome, breakage of the source DNA into short fragments, and the errors in sequencing the ends of the fragments to produce read pairs.

  1. People's Hypercorrection of High-Confidence Errors: Did They Know It All Along?

    Science.gov (United States)

    Metcalfe, Janet; Finn, Bridgid

    2011-01-01

    This study investigated the "knew it all along" explanation of the hypercorrection effect. The hypercorrection effect refers to the finding that when people are given corrective feedback, errors that are committed with high confidence are easier to correct than low-confidence errors. Experiment 1 showed that people were more likely to…

  2. Technical Report: Benchmarking for Quasispecies Abundance Inference with Confidence Intervals from Metagenomic Sequence Data

    Energy Technology Data Exchange (ETDEWEB)

    McLoughlin, K. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

    2016-01-22

    The software application “MetaQuant” was developed by our group at Lawrence Livermore National Laboratory (LLNL). It is designed to profile microbial populations in a sample using data from whole-genome shotgun (WGS) metagenomic DNA sequencing. Several other metagenomic profiling applications have been described in the literature. We ran a series of benchmark tests to compare the performance of MetaQuant against that of a few existing profiling tools, using real and simulated sequence datasets. This report describes our benchmarking procedure and results.

  3. Common and specific brain regions in high- versus low-confidence recognition memory

    Science.gov (United States)

    Kim, Hongkeun; Cabeza, Roberto

    2009-01-01

    The goal of the present functional magnetic resonance imaging (fMRI) study was to investigate whether and to what extent brain regions involved in high-confidence recognition (HCR) versus low-confidence recognition (LCR) overlap or separate from each other. To this end, we performed conjunction analyses involving activations elicited during high-confidence hit, low-confidence hit, and high-confidence correct-rejection responses. The analyses yielded 3 main findings. First, sensory/perceptual and associated posterior regions were common to HCR and LCR, indicating contribution of these regions to both HCR and LCR activity. This finding may help explain why these regions are among the most common in functional neuroimaging studies of episodic retrieval. Second, medial temporal lobe (MTL) and associated midline regions were associated with HCR, possibly reflecting recollection-related processes, whereas specific prefrontal cortex (PFC) regions were associated with LCR, possibly reflecting executive control processes. This finding is consistent with the notion that the MTL and PFC networks play complementary roles during episodic retrieval. Finally, within posterior parietal cortex, a dorsal region was associated with LCR, possibly reflecting top-down attentional processes, whereas a ventral region was associated with HCR, possibly reflecting bottom-up attentional processes. This finding may help explain why functional neuroimaging studies have found diverse parietal effects during episodic retrieval. Taken together, our findings provide strong evidence that HCR versus LCR, and by implication, recollection versus familiarity processes, are represented in common as well as specific brain regions. PMID:19501072

  4. Understanding Parental Confidence in an Inclusive High School: A Pilot Survey

    Science.gov (United States)

    Morewood, Gareth D.; Bond, Caroline

    2012-01-01

    A questionnaire was developed and trialled in an inclusive high school with the aim of understanding factors that contribute to parental confidence in school provision for students with special educational needs and disabilities (SEND). Parents of all students at School Action, School Action Plus and those with Statements of special educational…

  5. Refutations in science texts lead to hypercorrection of misconceptions held with high confidence

    NARCIS (Netherlands)

    Van Loon, Mariëtte H.; Dunlosky, John; Van Gog, Tamara; Van Merriënboer, Jeroen J.g.; De Bruin, Anique B.h.

    2015-01-01

    Misconceptions about science are often not corrected during study when they are held with high confidence. However, when corrective feedback co-activates a misconception together with the correct conception, this feedback may surprise the learner and draw attention, especially when the misconception

  6. High-fidelity nursing simulation: impact on student self-confidence and clinical competence.

    Science.gov (United States)

    Blum, Cynthia A; Borglund, Susan; Parcells, Dax

    2010-01-01

    Development of safe nursing practice in entry-level nursing students requires special consideration from nurse educators. The paucity of data supporting high-fidelity patient simulation effectiveness in this population informed the development of a quasi-experimental, quantitative study of the relationship between simulation and student self-confidence and clinical competence. Moreover, the study reports a novel approach to measuring self-confidence and competence of entry-level nursing students. Fifty-three baccalaureate students, enrolled in either a traditional or simulation-enhanced laboratory, participated during their first clinical rotation. Student self-confidence and faculty perception of student clinical competence were measured using selected scale items of the Lasater Clinical Judgment Rubric. The results indicated an overall improvement in self-confidence and competence across the semester, however, simulation did not significantly enhance these caring attributes. The study highlights the need for further examination of teaching strategies developed to promote the transfer of self-confidence and competence from the laboratory to the clinical setting.

  7. A computational framework for boosting confidence in high-throughput protein-protein interaction datasets.

    Science.gov (United States)

    Hosur, Raghavendra; Peng, Jian; Vinayagam, Arunachalam; Stelzl, Ulrich; Xu, Jinbo; Perrimon, Norbert; Bienkowska, Jadwiga; Berger, Bonnie

    2012-08-31

    Improving the quality and coverage of the protein interactome is of tantamount importance for biomedical research, particularly given the various sources of uncertainty in high-throughput techniques. We introduce a structure-based framework, Coev2Net, for computing a single confidence score that addresses both false-positive and false-negative rates. Coev2Net is easily applied to thousands of binary protein interactions and has superior predictive performance over existing methods. We experimentally validate selected high-confidence predictions in the human MAPK network and show that predicted interfaces are enriched for cancer -related or damaging SNPs. Coev2Net can be downloaded at http://struct2net.csail.mit.edu.

  8. Decision-making patterns and self-confidence in high school adolescents

    Directory of Open Access Journals (Sweden)

    Alejandro César Antonio Luna Bernal

    2014-07-01

    Full Text Available The present study aimed to analyse the factor structure of the Melbourne Decision Making Questionnaire (DMQ-II, and to examine the relationships between the factors identified and Self-confidence, in order to conceptualize the decision-making process in adolescents under the Conflict Model of Decision Making. Participants were 992 Mexican high-school students, aged between 15 and 19 years. The three factors were identified as decision-making patterns in adolescents: a Vigilance, b Hipervigilance/Procrastination and c Buck-passing. The Self-confidence showed a positive effect on Vigilance, and a negative effect on theother two patterns. Results are discussed considering the literature on decision making in adolescence.

  9. Data on electrical energy conservation using high efficiency motors for the confidence bounds using statistical techniques.

    Science.gov (United States)

    Shaikh, Muhammad Mujtaba; Memon, Abdul Jabbar; Hussain, Manzoor

    2016-09-01

    In this article, we describe details of the data used in the research paper "Confidence bounds for energy conservation in electric motors: An economical solution using statistical techniques" [1]. The data presented in this paper is intended to show benefits of high efficiency electric motors over the standard efficiency motors of similar rating in the industrial sector of Pakistan. We explain how the data was collected and then processed by means of formulas to show cost effectiveness of energy efficient motors in terms of three important parameters: annual energy saving, cost saving and payback periods. This data can be further used to construct confidence bounds for the parameters using statistical techniques as described in [1].

  10. A high confidence, manually validated human blood plasma protein reference set

    DEFF Research Database (Denmark)

    Schenk, Susann; Schoenhals, Gary J; de Souza, Gustavo

    2008-01-01

    sources, including the HUPO PPP dataset. CONCLUSION: Superior instrumentation combined with rigorous validation criteria gave rise to a set of 697 plasma proteins in which we have very high confidence, demonstrated by an exceptionally low false peptide identification rate of 0.29%.......BACKGROUND: The immense diagnostic potential of human plasma has prompted great interest and effort in cataloging its contents, exemplified by the Human Proteome Organization (HUPO) Plasma Proteome Project (PPP) pilot project. Due to challenges in obtaining a reliable blood plasma protein list...

  11. Are Confidence Ratings Test- or Trait-Driven? Individual Differences among High, Average, and Low Comprehenders in Fourth Grade

    Science.gov (United States)

    Kasperski, Ronen; Katzir, Tami

    2013-01-01

    The aim of this study was to examine whether low, average, and high comprehenders (LC, AC, and HC, respectively) differed in their reading self-confidence and bias ratings, and whether confidence ratings were driven by reading ability or distributed evenly among diverse readers. Seventy fourth-graders with good decoding abilities were administered…

  12. Are Confidence Ratings Test- or Trait-Driven? Individual Differences among High, Average, and Low Comprehenders in Fourth Grade

    Science.gov (United States)

    Kasperski, Ronen; Katzir, Tami

    2013-01-01

    The aim of this study was to examine whether low, average, and high comprehenders (LC, AC, and HC, respectively) differed in their reading self-confidence and bias ratings, and whether confidence ratings were driven by reading ability or distributed evenly among diverse readers. Seventy fourth-graders with good decoding abilities were administered…

  13. A High-confidence Cyber-Physical Alarm System: Design and Implementation

    CERN Document Server

    Ma, Longhua; Xia, Feng; Xu, Ming; Yao, Jun; Shao, Meng

    2010-01-01

    Most traditional alarm systems cannot address security threats in a satisfactory manner. To alleviate this problem, we developed a high-confidence cyber-physical alarm system (CPAS), a new kind of alarm systems. This system establishes the connection of the Internet (i.e. TCP/IP) through GPRS/CDMA/3G. It achieves mutual communication control among terminal equipments, human machine interfaces and users by using the existing mobile communication network. The CPAS will enable the transformation in alarm mode from traditional one-way alarm to two-way alarm. The system has been successfully applied in practice. The results show that the CPAS could avoid false alarms and satisfy residents' security needs.

  14. Decreased memory confidence in obsessive-compulsive disorder for scenarios high and low on responsibility: is low still too high?

    Science.gov (United States)

    Moritz, Steffen; Jaeger, Anne

    2017-03-28

    Previous research suggests that patients with obsessive-compulsive disorder (OCD), particularly checkers, display an inflated sense of responsibility. For the present study, we tested whether memory confidence in OCD is reduced under conditions of heightened responsibility and/or reflects poor memory vividness. A computerized task designed to modulate perceived responsibility was administered to 26 OCD patients (12 checkers) and 21 healthy controls. In the experimental condition (high responsibility), participants had to allocate daily medications to ten fictive patients in a hospital emergency ward, whereas in the control condition (low responsibility) participants had to allocate bath essences for ten hotel guests. Participants' response time and accuracy were recorded as well as their memory confidence, memory vividness, and subjective success. Irrespective of the condition, OCD patients performed as accurately as healthy controls, but appraised their performance as worse than that of controls. Memory confidence was decreased in patients, particularly checkers. No group differences emerged on vividness, and none of the effects were moderated by the condition (high versus low responsibility). The relationship between responsibility and OCD behavior is complex. Results suggest metamemory problems in OCD checkers, even when induced responsibility is low. The findings speak against "cold" memory deficits in OCD, as patients did not differ from controls on accuracy, reaction time, or vividness. Future research should focus on idiosyncratic beliefs and scenarios that inflate responsibility and elicit cognitive biases.

  15. Gene expression correlation analysis predicts involvement of high- and low-confidence risk genes in different stages of prostate carcinogenesis.

    Science.gov (United States)

    Yano, Kojiro

    2010-12-01

    Whole genome association studies have identified many loci associated with the risk of prostate cancer (PC). However, very few of the genes associated with these loci have been related to specific processes of prostate carcinogenesis. Therefore I inferred biological functions associated with these risk genes using gene expression correlation analysis. PC risk genes reported in the literature were classified as having high (Plow (Phigh-confidence genes and other genes in the microarray dataset, whereas correlation between low-confidence genes and other genes in PC showed smaller decrease. Genes involved in developmental processes were significantly correlated with all risk gene categories. Ectoderm development genes, which may be related to squamous metaplasia, and genes enriched in fetal prostate stem cells (PSCs) showed strong association with the high-confidence genes. The association between the PSC genes and the low-confidence genes was weak, but genes related to neural system genes showed strong association with low-confidence genes. The high-confidence risk genes may be associated with an early stage of prostate carcinogenesis, possibly involving PSCs and squamous metaplasia. The low-confidence genes may be involved in a later stage of carcinogenesis. © 2010 Wiley-Liss, Inc.

  16. Quest-V: A Virtualized Multikernel for High-Confidence Systems

    CERN Document Server

    Li, Ye; West, Richard

    2011-01-01

    This paper outlines the design of `Quest-V', which is implemented as a collection of separate kernels operating together as a distributed system on a chip. Quest-V uses virtualization techniques to isolate kernels and prevent local faults from affecting remote kernels. This leads to a high-confidence multikernel approach, where failures of system subcomponents do not render the entire system inoperable. A virtual machine monitor for each kernel keeps track of shadow page table mappings that control immutable memory access capabilities. This ensures a level of security and fault tolerance in situations where a service in one kernel fails, or is corrupted by a malicious attack. Communication is supported between kernels using shared memory regions for message passing. Similarly, device driver data structures are shareable between kernels to avoid the need for complex I/O virtualization, or communication with a dedicated kernel responsible for I/O. In Quest-V, device interrupts are delivered directly to a kernel...

  17. Discovery of a high confidence soft lag from an X-ray flare of Markarian 421

    Institute of Scientific and Technical Information of China (English)

    2010-01-01

    We present the X-ray variability properties of the X-ray and TeV bright blazar Mrk 421 with a-60 ks long XMM-Newton observation performed on November 9-10,2005.The source experienced a pronounced flare,of which the inter-band time lags were determined with a very high confidence level.The soft(0.6-0.8 keV) X-ray variations lagged the hard(4-10 keV) ones by 1.09+0.11-0.12 ks,and the soft lag increases with increasing difference in the photon energy.The energy-dependent soft lags can be well fitted with the difference of the energy-dependent cooling timescales of the relativistic electron distribution responsible for the observed X-ray emission,which constrains the magnetic field strength and Doppler factor of the emitting region to be Bδ 1/3-1.78 Gauss.

  18. A Near-Term, High-Confidence Heavy Lift Launch Vehicle

    Science.gov (United States)

    Rothschild, William J.; Talay, Theodore A.

    2009-01-01

    The use of well understood, legacy elements of the Space Shuttle system could yield a near-term, high-confidence Heavy Lift Launch Vehicle that offers significant performance, reliability, schedule, risk, cost, and work force transition benefits. A side-mount Shuttle-Derived Vehicle (SDV) concept has been defined that has major improvements over previous Shuttle-C concepts. This SDV is shown to carry crew plus large logistics payloads to the ISS, support an operationally efficient and cost effective program of lunar exploration, and offer the potential to support commercial launch operations. This paper provides the latest data and estimates on the configurations, performance, concept of operations, reliability and safety, development schedule, risks, costs, and work force transition opportunities for this optimized side-mount SDV concept. The results presented in this paper have been based on established models and fully validated analysis tools used by the Space Shuttle Program, and are consistent with similar analysis tools commonly used throughout the aerospace industry. While these results serve as a factual basis for comparisons with other launch system architectures, no such comparisons are presented in this paper. The authors welcome comparisons between this optimized SDV and other Heavy Lift Launch Vehicle concepts.

  19. Visual Confidence.

    Science.gov (United States)

    Mamassian, Pascal

    2016-10-14

    Visual confidence refers to an observer's ability to judge the accuracy of her perceptual decisions. Even though confidence judgments have been recorded since the early days of psychophysics, only recently have they been recognized as essential for a deeper understanding of visual perception. The reluctance to study visual confidence may have come in part from obtaining convincing experimental evidence in favor of metacognitive abilities rather than just perceptual sensitivity. Some effort has thus been dedicated to offer different experimental paradigms to study visual confidence in humans and nonhuman animals. To understand the origins of confidence judgments, investigators have developed two competing frameworks. The approach based on signal decision theory is popular but fails to account for response times. In contrast, the approach based on accumulation of evidence models naturally includes the dynamics of perceptual decisions. These models can explain a range of results, including the apparently paradoxical dissociation between performance and confidence that is sometimes observed.

  20. Confidant Relations in Italy

    Directory of Open Access Journals (Sweden)

    Jenny Isaacs

    2015-02-01

    Full Text Available Confidants are often described as the individuals with whom we choose to disclose personal, intimate matters. The presence of a confidant is associated with both mental and physical health benefits. In this study, 135 Italian adults responded to a structured questionnaire that asked if they had a confidant, and if so, to describe various features of the relationship. The vast majority of participants (91% reported the presence of a confidant and regarded this relationship as personally important, high in mutuality and trust, and involving minimal lying. Confidants were significantly more likely to be of the opposite sex. Participants overall were significantly more likely to choose a spouse or other family member as their confidant, rather than someone outside of the family network. Familial confidants were generally seen as closer, and of greater value, than non-familial confidants. These findings are discussed within the context of Italian culture.

  1. Factors Related to Family Therapists' Breaking Confidence When Clients Disclose High-Risks-to-HIV/AIDS Sexual Behaviors.

    Science.gov (United States)

    Pais, Shobha; Piercy, Fred; Miller, JoAnn

    1998-01-01

    Through a national survey of marriage and family therapists, this study examines what therapists do when their HIV-positive clients disclose that they are engaging in high-risk sexual behavior. Participants (N=309) were more likely to break confidence when their clients were male, young, gay, or African American. Describes characteristic of…

  2. Factors Related to Family Therapists' Breaking Confidence When Clients Disclose High-Risks-to-HIV/AIDS Sexual Behaviors.

    Science.gov (United States)

    Pais, Shobha; Piercy, Fred; Miller, JoAnn

    1998-01-01

    Through a national survey of marriage and family therapists, this study examines what therapists do when their HIV-positive clients disclose that they are engaging in high-risk sexual behavior. Participants (N=309) were more likely to break confidence when their clients were male, young, gay, or African American. Describes characteristic of…

  3. HAMSA: Highly Accelerated Multiple Sequence Aligner

    Directory of Open Access Journals (Sweden)

    Naglaa M. Reda

    2016-06-01

    Full Text Available For biologists, the existence of an efficient tool for multiple sequence alignment is essential. This work presents a new parallel aligner called HAMSA. HAMSA is a bioinformatics application designed for highly accelerated alignment of multiple sequences of proteins and DNA/RNA on a multi-core cluster system. The design of HAMSA is based on a combination of our new optimized algorithms proposed recently of vectorization, partitioning, and scheduling. It mainly operates on a distance vector instead of a distance matrix. It accomplishes similarity computations and generates the guide tree in a highly accelerated and accurate manner. HAMSA outperforms MSAProbs with 21.9- fold speedup, and ClustalW-MPI of 11-fold speedup. It can be considered as an essential tool for structure prediction, protein classification, motive finding and drug design studies.

  4. Applications of High Throughput Nucleotide Sequencing

    DEFF Research Database (Denmark)

    Waage, Johannes Eichler

    The recent advent of high throughput sequencing of nucleic acids (RNA and DNA) has vastly expanded research into the functional and structural biology of the genome of all living organisms (and even a few dead ones). With this enormous and exponential growth in biological data generation come......-sequencing, a study of the effects on alternative RNA splicing of KO of the nonsense mediated RNA decay system in Mus, using digital gene expression and a custom-built exon-exon junction mapping pipeline is presented (article I). Evolved from this work, a Bioconductor package, spliceR, for classifying alternative...... splicing events and coding potential of isoforms from full isoform deconvolution software, such as Cufflinks (article II), is presented. Finally, a study using 5’-end RNA-seq for alternative promoter detection between healthy patients and patients with acute promyelocytic leukemia is presented (article III...

  5. Confidence improvement of disosal safety bydevelopement of a safety case for high-level radioactive waste disposal

    Energy Technology Data Exchange (ETDEWEB)

    Baik, Min Hoon; Ko, Nak Youl; Jeong, Jong Tae; Kim, Kyung Su [Korea Atomic Energy Research Institute, Daejeon (Korea, Republic of)

    2016-12-15

    Many countries have developed a safety case suitable to their own countries in order to improve the confidence of disposal safety in deep geological disposal of high-level radioactive waste as well as to develop a disposal program and obtain its license. This study introduces and summarizes the meaning, necessity, and development process of the safety case for radioactive waste disposal. The disposal safety is also discussed in various aspects of the safety case. In addition, the status of safety case development in the foreign countries is briefly introduced for Switzerland, Japan, the United States of America, Sweden, and Finland. The strategy for the safety case development that is being developed by KAERI is also briefly introduced. Based on the safety case, we analyze the efforts necessary to improve confidence in disposal safety for high-level radioactive waste. Considering domestic situations, we propose and discuss some implementing methods for the improvement of disposal safety, such as construction of a reliable information database, understanding of processes related to safety, reduction of uncertainties in safety assessment, communication with stakeholders, and ensuring justice and transparency. This study will contribute to the understanding of the safety case for deep geological disposal and to improving confidence in disposal safety through the development of the safety case in Korea for the disposal of high-level radioactive waste.

  6. High-confidence software for safety-critical process-control systems

    Energy Technology Data Exchange (ETDEWEB)

    Bastani, F.B. [Univ. of Houston, TX (United States)

    1997-12-01

    Software for safety-critical systems, such as nuclear power plant control systems; avionic systems; and medical, defense, and manufacturing systems, must be highly reliable because failures can have catastrophic consequences. While existing methods, such as formal techniques, testing, and fault-tolerant software, can significantly enhance software reliability, they have some limitations in achieving ultrahigh reliability requirements. Formal methods are not able to cope with specification faults, testing is not cost-effective for high-assurance systems, and fault-tolerant software based on diverse designs is susceptible to common-mode failures.

  7. High School Students' Proficiency and Confidence Levels in Displaying Their Understanding of Basic Electrolysis Concepts

    Science.gov (United States)

    Sia, Ding Teng; Treagust, David F.; Chandrasegaran, A. L.

    2012-01-01

    This study was conducted with 330 Form 4 (grade 10) students (aged 15-16 years) who were involved in a course of instruction on electrolysis concepts. The main purposes of this study were (1) to assess high school chemistry students' understanding of 19 major principles of electrolysis using a recently developed 2-tier multiple-choice diagnostic…

  8. High School Students' Proficiency and Confidence Levels in Displaying Their Understanding of Basic Electrolysis Concepts

    Science.gov (United States)

    Sia, Ding Teng; Treagust, David F.; Chandrasegaran, A. L.

    2012-01-01

    This study was conducted with 330 Form 4 (grade 10) students (aged 15-16 years) who were involved in a course of instruction on electrolysis concepts. The main purposes of this study were (1) to assess high school chemistry students' understanding of 19 major principles of electrolysis using a recently developed 2-tier multiple-choice diagnostic…

  9. Assessment of risk to wildlife from ionising radiation: can initial screening tiers be used with a high level of confidence?

    Energy Technology Data Exchange (ETDEWEB)

    Beresford, N A; Barnett, C L [Centre for Ecology and Hydrology Lancaster, Lancaster Environment Centre, Library Avenue, Bailrigg, Lancaster LA1 4AP (United Kingdom); Hosseini, A; Brown, J E [Norwegian Radiation Protection Authority, Department of Emergency Preparedness and Environmental Radioactivity, Grini naeringspark 13 Postbox 55, NO-1332 Oesteraas (Norway); Cailes, C; Copplestone, D [Environment Agency, PO Box 12, Richard Fairclough House, Knutsford Road, Warrington WA4 1HG (United Kingdom); Beaugelin-Seiller, K, E-mail: nab@ceh.ac.u [Institut de Radioprotection et de Surete Nucleaire DEI/SECRE, CE Cadarache-Batiment 159, BP 3, 13115 Saint-Paul-lez-Durance (France)

    2010-06-15

    A number of models are being used to assess the potential environmental impact of releases of radioactivity. These often use a tiered assessment structure whose first tier is designed to be highly conservative and simple to use. An aim of using this initial tier is to identify sites of negligible concern and to remove them from further consideration with a high degree of confidence. In this paper we compare the screening assessment outputs of three freely available models. The outputs of these models varied considerably in terms of estimated risk quotient (RQ) and the radionuclide-organism combinations identified as being the most limiting. A number of factors are identified as contributing to this variability: values of transfer parameters (concentration ratios and K{sub d}) used; organisms considered; different input options and how these are utilised in the assessment; assumptions as regards secular equilibrium; geometries and exposure scenarios. This large variation in RQ values between models means that the level of confidence required by users is not achieved. We recommend that the factors contributing to the variation in screening assessments be subjected to further investigation so that they can be more fully understood and assessors (and those reviewing assessment outputs) can better justify and evaluate the results obtained.

  10. Logging Data High-Resolution Sequence Stratigraphy

    Institute of Scientific and Technical Information of China (English)

    Li Hongqi; Xie Yinfu; Sun Zhongchun; Luo Xingping

    2006-01-01

    The recognition and contrast of bed sets in parasequence is difficult in terrestrial basin high-resolution sequence stratigraphy. This study puts forward new methods for the boundary identification and contrast of bed sets on the basis of manifold logging data. The formation of calcareous interbeds, shale resistivity differences and the relation of reservoir resistivity to altitude are considered on the basis of log curve morphological characteristics, core observation, cast thin section, X-ray diffraction and scanning electron microscopy. The results show that the thickness of calcareous interbeds is between 0.5 m and 2 m, increasing on weathering crusts and faults. Calcareous interbeds occur at the bottom of Reservoir resistivity increases with altitude. Calcareous interbeds may be a symbol of recognition for the boundary of bed sets and isochronous contrast bed sets, and shale resistivity differences may confirm the stack relation and connectivity of bed sets. Based on this, a high-rcsolution chronostratigraphic framework of Xi-1 segment in Shinan area, Junggar basin is presented, and the connectivity of bed sets and oil-water contact is confirmed. In this chronostratigraphic framework, the growth order, stack mode and space shape of bed sets are qualitatively and quantitatively described.

  11. The Effects of Game-Based Learning on Mathematical Confidence and Performance: High Ability vs. Low Ability

    Science.gov (United States)

    Ku, Oskar; Chen, Sherry Y.; Wu, Denise H.; Lao, Andrew C. C.; Chan, Tak-Wai

    2014-01-01

    Many students possess low confidence toward learning mathematics, which, in turn, may lead them to give up pursuing more mathematics knowledge. Recently, game-based learning (GBL) is regarded as a potential means in improving students' confidence. Thus, this study tried to promote students' confidence toward mathematics by using GBL. In addition,…

  12. The Effects of Game-Based Learning on Mathematical Confidence and Performance: High Ability vs. Low Ability

    Science.gov (United States)

    Ku, Oskar; Chen, Sherry Y.; Wu, Denise H.; Lao, Andrew C. C.; Chan, Tak-Wai

    2014-01-01

    Many students possess low confidence toward learning mathematics, which, in turn, may lead them to give up pursuing more mathematics knowledge. Recently, game-based learning (GBL) is regarded as a potential means in improving students' confidence. Thus, this study tried to promote students' confidence toward mathematics by using GBL. In…

  13. Targeted high-throughput sequencing of tagged nucleic acid samples

    OpenAIRE

    M.; Meyer; Stenzel, U.; Myles, S.; Prüfer, K; Hofreiter, M.

    2007-01-01

    High-throughput 454 DNA sequencing technology allows much faster and more cost-effective sequencing than traditional Sanger sequencing. However, the technology imposes inherent limitations on the number of samples that can be processed in parallel. Here we introduce parallel tagged sequencing (PTS), a simple, inexpensive and flexible barcoding technique that can be used for parallel sequencing any number and type of double-stranded nucleic acid samples. We demonstrate that PTS is particularly...

  14. Statistics with confidence confidence intervals and statistical guidelines

    CERN Document Server

    Altman, Douglas; Bryant, Trevor; Gardner, Stephen

    2013-01-01

    This highly popular introduction to confidence intervals has been thoroughly updated and expanded. It includes methods for using confidence intervals, with illustrative worked examples and extensive guidelines and checklists to help the novice.

  15. Sequence characteristics of T4-like bacteriophage IME08 benome termini revealed by high throughput sequencing

    Directory of Open Access Journals (Sweden)

    An Xiaoping

    2011-04-01

    Full Text Available Abstract Background T4 phage is a model species that has contributed broadly to our understanding of molecular biology. T4 DNA replication and packaging share various mechanisms with human double-stranded DNA viruses such as herpes virus. The literature indicates that T4-like phage genomes have permuted terminal sequences, and are generated by a DNA terminase in a sequence-independent manner; Methods genomic DNA of T4-like bacteriophage IME08 was subjected to high throughput sequencing, and the read sequences with extraordinarily high occurrences were analyzed; Results we demonstrate that both the 5' and 3' termini of the IME08 genome starts with base G or A. The presence of a consensus sequence TTGGA|G around the breakpoint of the high frequency read sequences suggests that the terminase cuts the branched pre-genome in a sequence-preferred manner. Our analysis also shows that terminal cleavage is asymmetric, with one end cut at a consensus sequence, and the other end generated randomly. The sequence-preferred cleavage may produce sticky-ends, but with each end being packaged with different efficiencies; Conclusions this study illustrates how high throughput sequencing can be used to probe replication and packaging mechanisms in bacteriophages and/or viruses.

  16. Generating barcoded libraries for multiplex high-throughput sequencing.

    Science.gov (United States)

    Knapp, Michael; Stiller, Mathias; Meyer, Matthias

    2012-01-01

    Molecular barcoding is an essential tool to use the high throughput of next generation sequencing platforms optimally in studies involving more than one sample. Various barcoding strategies allow for the incorporation of short recognition sequences (barcodes) into sequencing libraries, either by ligation or polymerase chain reaction (PCR). Here, we present two approaches optimized for generating barcoded sequencing libraries from low copy number extracts and amplification products typical of ancient DNA studies.

  17. Global high resolution versus Limited Area Model climate change projections over Europe: quantifying confidence level from PRUDENCE results

    Energy Technology Data Exchange (ETDEWEB)

    Deque, M. [Centre National de Recherches Meteorologiques, Meteo-France, Toulouse Cedex 01 (France); Jones, R.G.; Hassell, D.C. [Hadley Centre for Climate Prediction and Research, Met Office, Devon (United Kingdom); Wild, M.; Vidale, P.L. [Swiss Federal Institute of Technology, Institute for Atmospheric and Climate Science, ETH, Zurich (Switzerland); Giorgi, F.; Kucharski, F. [Abdus Salam International Centre for Theoretical Physics, Trieste (Italy); Christensen, J.H. [Danish Meteorological Institute, Copenhagen (Denmark); Rockel, B. [Institute of Coastal Research, GKSS Forschungszentrum Geesthacht GmbH, Geesthacht (Germany); Jacob, D. [Max-Planck-Institut fuer Meteorologie, Hamburg (Germany); Kjellstroem, E. [Swedish Meteorological and Hydrological Institute, Norrkoeping (Sweden); Castro, M. de. [Universidad de Castilla La Mancha, Dept. de Ciencias Ambientales, Toledo (Spain); Hurk, B. van den [KNMI, Postbus 201, AE De Bilt (Netherlands)

    2005-11-01

    Four high resolution atmospheric general circulation models (GCMs) have been integrated with the standard forcings of the PRUDENCE experiment: IPCC-SRES A2 radiative forcing and Hadley Centre sea surface temperature and sea-ice extent. The response over Europe, calculated as the difference between the 2071-2100 and the 1961-1990 means is compared with the same diagnostic obtained with nine Regional Climate Models (RCM) all driven by the Hadley Centre atmospheric GCM. The seasonal mean response for 2m temperature and precipitation is investigated. For temperature, GCMs and RCMs behave similarly, except that GCMs exhibit a larger spread. However, during summer, the spread of the RCMs - in particular in terms of precipitation - is larger than that of the GCMs. This indicates that the European summer climate is strongly controlled by parameterized physics and/or high-resolution processes. The temperature response is larger than the systematic error. The situation is different for precipitation. The model bias is twice as large as the climate response. The confidence in PRUDENCE results comes from the fact that the models have a similar response to the IPCC-SRES A2 forcing, whereas their systematic errors are more spread. In addition, GCM precipitation response is slightly but significantly different from that of the RCMs. (orig.)

  18. Monitoring Genomic Sequences during SELEX Using High-Throughput Sequencing: Neutral SELEX

    Science.gov (United States)

    Chen, Doris; Lorenz, Christina; Schroeder, Renée

    2010-01-01

    Background SELEX is a well established in vitro selection tool to analyze the structure of ligand-binding nucleic acid sequences called aptamers. Genomic SELEX transforms SELEX into a tool to discover novel, genomically encoded RNA or DNA sequences binding a ligand of interest, called genomic aptamers. Concerns have been raised regarding requirements imposed on RNA sequences undergoing SELEX selection. Methodology/Principal Findings To evaluate SELEX and assess the extent of these effects, we designed and performed a Neutral SELEX experiment omitting the selection step, such that the sequences are under the sole selective pressure of SELEX's amplification steps. Using high-throughput sequencing, we obtained thousands of full-length sequences from the initial genomic library and the pools after each of the 10 rounds of Neutral SELEX. We compared these to sequences obtained from a Genomic SELEX experiment deriving from the same initial library, but screening for RNAs binding with high affinity to the E. coli regulator protein Hfq. With each round of Neutral SELEX, sequences became less stable and changed in nucleotide content, but no sequences were enriched. In contrast, we detected substantial enrichment in the Hfq-selected set with enriched sequences having structural stability similar to the neutral sequences but with significantly different nucleotide selection. Conclusions/Significance Our data indicate that positive selection in SELEX acts independently of the neutral selective requirements imposed on the sequences. We conclude that Genomic SELEX, when combined with high-throughput sequencing of positively and neutrally selected pools, as well as the gnomic library, is a powerful method to identify genomic aptamers. PMID:20161784

  19. Monitoring genomic sequences during SELEX using high-throughput sequencing: neutral SELEX.

    Directory of Open Access Journals (Sweden)

    Bob Zimmermann

    Full Text Available BACKGROUND: SELEX is a well established in vitro selection tool to analyze the structure of ligand-binding nucleic acid sequences called aptamers. Genomic SELEX transforms SELEX into a tool to discover novel, genomically encoded RNA or DNA sequences binding a ligand of interest, called genomic aptamers. Concerns have been raised regarding requirements imposed on RNA sequences undergoing SELEX selection. METHODOLOGY/PRINCIPAL FINDINGS: To evaluate SELEX and assess the extent of these effects, we designed and performed a Neutral SELEX experiment omitting the selection step, such that the sequences are under the sole selective pressure of SELEX's amplification steps. Using high-throughput sequencing, we obtained thousands of full-length sequences from the initial genomic library and the pools after each of the 10 rounds of Neutral SELEX. We compared these to sequences obtained from a Genomic SELEX experiment deriving from the same initial library, but screening for RNAs binding with high affinity to the E. coli regulator protein Hfq. With each round of Neutral SELEX, sequences became less stable and changed in nucleotide content, but no sequences were enriched. In contrast, we detected substantial enrichment in the Hfq-selected set with enriched sequences having structural stability similar to the neutral sequences but with significantly different nucleotide selection. CONCLUSIONS/SIGNIFICANCE: Our data indicate that positive selection in SELEX acts independently of the neutral selective requirements imposed on the sequences. We conclude that Genomic SELEX, when combined with high-throughput sequencing of positively and neutrally selected pools, as well as the gnomic library, is a powerful method to identify genomic aptamers.

  20. High-throughput sequence alignment using Graphics Processing Units

    Directory of Open Access Journals (Sweden)

    Trapnell Cole

    2007-12-01

    Full Text Available Abstract Background The recent availability of new, less expensive high-throughput DNA sequencing technologies has yielded a dramatic increase in the volume of sequence data that must be analyzed. These data are being generated for several purposes, including genotyping, genome resequencing, metagenomics, and de novo genome assembly projects. Sequence alignment programs such as MUMmer have proven essential for analysis of these data, but researchers will need ever faster, high-throughput alignment tools running on inexpensive hardware to keep up with new sequence technologies. Results This paper describes MUMmerGPU, an open-source high-throughput parallel pairwise local sequence alignment program that runs on commodity Graphics Processing Units (GPUs in common workstations. MUMmerGPU uses the new Compute Unified Device Architecture (CUDA from nVidia to align multiple query sequences against a single reference sequence stored as a suffix tree. By processing the queries in parallel on the highly parallel graphics card, MUMmerGPU achieves more than a 10-fold speedup over a serial CPU version of the sequence alignment kernel, and outperforms the exact alignment component of MUMmer on a high end CPU by 3.5-fold in total application time when aligning reads from recent sequencing projects using Solexa/Illumina, 454, and Sanger sequencing technologies. Conclusion MUMmerGPU is a low cost, ultra-fast sequence alignment program designed to handle the increasing volume of data produced by new, high-throughput sequencing technologies. MUMmerGPU demonstrates that even memory-intensive applications can run significantly faster on the relatively low-cost GPU than on the CPU.

  1. Automated cleaning and pre-processing of immunoglobulin gene sequences from high-throughput sequencing

    Directory of Open Access Journals (Sweden)

    Miri eMichaeli

    2012-12-01

    Full Text Available High throughput sequencing (HTS yields tens of thousands to millions of sequences that require a large amount of pre-processing work to clean various artifacts. Such cleaning cannot be performed manually. Existing programs are not suitable for immunoglobulin (Ig genes, which are variable and often highly mutated. This paper describes Ig-HTS-Cleaner (Ig High Throughput Sequencing Cleaner, a program containing a simple cleaning procedure that successfully deals with pre-processing of Ig sequences derived from HTS, and Ig-Indel-Identifier (Ig Insertion – Deletion Identifier, a program for identifying legitimate and artifact insertions and/or deletions (indels. Our programs were designed for analyzing Ig gene sequences obtained by 454 sequencing, but they are applicable to all types of sequences and sequencing platforms. Ig-HTS-Cleaner and Ig-Indel-Identifier have been implemented in Java and saved as executable JAR files, supported on Linux and MS Windows. No special requirements are needed in order to run the programs, except for correctly constructing the input files as explained in the text. The programs' performance has been tested and validated on real and simulated data sets.

  2. Confidence Estimation in Structured Prediction

    CERN Document Server

    Mejer, Avihai

    2011-01-01

    Structured classification tasks such as sequence labeling and dependency parsing have seen much interest by the Natural Language Processing and the machine learning communities. Several online learning algorithms were adapted for structured tasks such as Perceptron, Passive- Aggressive and the recently introduced Confidence-Weighted learning . These online algorithms are easy to implement, fast to train and yield state-of-the-art performance. However, unlike probabilistic models like Hidden Markov Model and Conditional random fields, these methods generate models that output merely a prediction with no additional information regarding confidence in the correctness of the output. In this work we fill the gap proposing few alternatives to compute the confidence in the output of non-probabilistic algorithms.We show how to compute confidence estimates in the prediction such that the confidence reflects the probability that the word is labeled correctly. We then show how to use our methods to detect mislabeled wor...

  3. Algorithms for mapping high-throughput DNA sequences

    DEFF Research Database (Denmark)

    Frellsen, Jes; Menzel, Peter; Krogh, Anders

    2014-01-01

    Abstract High-throughput sequencing (HTS) technologies revolutionized the field of molecular biology by enabling large scale whole genome sequencing as well as a broad range of experiments for studying the cell's inner workings directly on DNA or RNA level. Given the dramatically increased rate...

  4. An improved high throughput sequencing method for studying oomycete communities

    DEFF Research Database (Denmark)

    Sapkota, Rumakanta; Nicolaisen, Mogens

    2015-01-01

    Culture-independent studies using next generation sequencing have revolutionizedmicrobial ecology, however, oomycete ecology in soils is severely lagging behind. The aimof this study was to improve and validate standard techniques for using high throughput sequencing as a tool for studying oomyce...

  5. The sequence of learning cycle activities in high school chemistry

    Science.gov (United States)

    Abraham, Michael R.; Renner, John W.

    The sequence of the three phases of two high school learning cycles in chemistry was altered in order to: (I ) give insights into the factors which account for the success of the learning cycle, (2) serve as an indirect test of the association between Piaget's theory and the learning cycle, and (3) to compare the learning cycle with traditional instruction. Each of the six sequences (one n o d and five altered) was studied with content and atritudc measures. The outcomes of the study supported the contention that the normal learning cycle sequence is the optimum sequence for achievement of content knowledge.

  6. Directed PCR-free engineering of highly repetitive DNA sequences

    Directory of Open Access Journals (Sweden)

    Preissler Steffen

    2011-09-01

    Full Text Available Abstract Background Highly repetitive nucleotide sequences are commonly found in nature e.g. in telomeres, microsatellite DNA, polyadenine (poly(A tails of eukaryotic messenger RNA as well as in several inherited human disorders linked to trinucleotide repeat expansions in the genome. Therefore, studying repetitive sequences is of biological, biotechnological and medical relevance. However, cloning of such repetitive DNA sequences is challenging because specific PCR-based amplification is hampered by the lack of unique primer binding sites resulting in unspecific products. Results For the PCR-free generation of repetitive DNA sequences we used antiparallel oligonucleotides flanked by restriction sites of Type IIS endonucleases. The arrangement of recognition sites allowed for stepwise and seamless elongation of repetitive sequences. This facilitated the assembly of repetitive DNA segments and open reading frames encoding polypeptides with periodic amino acid sequences of any desired length. By this strategy we cloned a series of polyglutamine encoding sequences as well as highly repetitive polyadenine tracts. Such repetitive sequences can be used for diverse biotechnological applications. As an example, the polyglutamine sequences were expressed as His6-SUMO fusion proteins in Escherichia coli cells to study their aggregation behavior in vitro. The His6-SUMO moiety enabled affinity purification of the polyglutamine proteins, increased their solubility, and allowed controlled induction of the aggregation process. We successfully purified the fusions proteins and provide an example for their applicability in filter retardation assays. Conclusion Our seamless cloning strategy is PCR-free and allows the directed and efficient generation of highly repetitive DNA sequences of defined lengths by simple standard cloning procedures.

  7. High efficiency spreading spectrum modulation using double orthogonal complex sequences

    Institute of Scientific and Technical Information of China (English)

    Shi Xiaohong

    2012-01-01

    This paper presents a novel scheme of high efficiency spreading spectrum modulation using double orthogonal complex sequences (DoCS). In this scheme, input data bit-stream is split into many groups with length M. Each group is then mapped into a word of width M and then utihzed to select one sequence from 2u-2 DoCS sequences each with length L. After that, the selected sequence is modulated on carrier in quadrature phase shift keying (QPSK) mode. In addition, a new method named forward phase correction (FPC) is put forward for carrier recovery. Theoretical analysis and bit-error-ratio(BER) experiment results indicate that the proposed scheme has better performance than the conventional direct sequence spread spectrum(DSSS) scheme both in bandwidth efficiency and processing gain of the receiver.

  8. Library preparation for highly accurate population sequencing of RNA viruses

    Science.gov (United States)

    Acevedo, Ashley; Andino, Raul

    2015-01-01

    Circular resequencing (CirSeq) is a novel technique for efficient and highly accurate next-generation sequencing (NGS) of RNA virus populations. The foundation of this approach is the circularization of fragmented viral RNAs, which are then redundantly encoded into tandem repeats by ‘rolling-circle’ reverse transcription. When sequenced, the redundant copies within each read are aligned to derive a consensus sequence of their initial RNA template. This process yields sequencing data with error rates far below the variant frequencies observed for RNA viruses, facilitating ultra-rare variant detection and accurate measurement of low-frequency variants. Although library preparation takes ~5 d, the high-quality data generated by CirSeq simplifies downstream data analysis, making this approach substantially more tractable for experimentalists. PMID:24967624

  9. High sequence conservation among cucumber mosaic virus isolates from lily.

    Science.gov (United States)

    Chen, Y K; Derks, A F; Langeveld, S; Goldbach, R; Prins, M

    2001-08-01

    For classification of Cucumber mosaic virus (CMV) isolates from ornamental crops of different geographical areas, these were characterized by comparing the nucleotide sequences of RNAs 4 and the encoded coat proteins. Within the ornamental-infecting CMV viruses both subgroups were represented. CMV isolates of Alstroemeria and crocus were classified as subgroup II isolates, whereas 8 other isolates, from lily, gladiolus, amaranthus, larkspur, and lisianthus, were identified as subgroup I members. In general, nucleotide sequence comparisons correlated well with geographic distribution, with one notable exception: the analyzed nucleotide sequences of 5 lily isolates showed remarkably high homology despite different origins.

  10. High-throughput DNA sequencing: a genomic data manufacturing process.

    Science.gov (United States)

    Huang, G M

    1999-01-01

    The progress trends in automated DNA sequencing operation are reviewed. Technological development in sequencing instruments, enzymatic chemistry and robotic stations has resulted in ever-increasing capacity of sequence data production. This progress leads to a higher demand on laboratory information management and data quality assessment. High-throughput laboratories face the challenge of organizational management, as well as technology management. Engineering principles of process control should be adopted in this biological data manufacturing procedure. While various systems attempt to provide solutions to automate different parts of, or even the entire process, new technical advances will continue to change the paradigm and provide new challenges.

  11. Automated degenerate PCR primer design for high-throughput sequencing improves efficiency of viral sequencing

    Directory of Open Access Journals (Sweden)

    Li Kelvin

    2012-11-01

    Full Text Available Abstract Background In a high-throughput environment, to PCR amplify and sequence a large set of viral isolates from populations that are potentially heterogeneous and continuously evolving, the use of degenerate PCR primers is an important strategy. Degenerate primers allow for the PCR amplification of a wider range of viral isolates with only one set of pre-mixed primers, thus increasing amplification success rates and minimizing the necessity for genome finishing activities. To successfully select a large set of degenerate PCR primers necessary to tile across an entire viral genome and maximize their success, this process is best performed computationally. Results We have developed a fully automated degenerate PCR primer design system that plays a key role in the J. Craig Venter Institute’s (JCVI high-throughput viral sequencing pipeline. A consensus viral genome, or a set of consensus segment sequences in the case of a segmented virus, is specified using IUPAC ambiguity codes in the consensus template sequence to represent the allelic diversity of the target population. PCR primer pairs are then selected computationally to produce a minimal amplicon set capable of tiling across the full length of the specified target region. As part of the tiling process, primer pairs are computationally screened to meet the criteria for successful PCR with one of two described amplification protocols. The actual sequencing success rates for designed primers for measles virus, mumps virus, human parainfluenza virus 1 and 3, human respiratory syncytial virus A and B and human metapneumovirus are described, where >90% of designed primer pairs were able to consistently successfully amplify >75% of the isolates. Conclusions Augmenting our previously developed and published JCVI Primer Design Pipeline, we achieved similarly high sequencing success rates with only minor software modifications. The recommended methodology for the construction of the consensus

  12. Automated degenerate PCR primer design for high-throughput sequencing improves efficiency of viral sequencing.

    Science.gov (United States)

    Li, Kelvin; Shrivastava, Susmita; Brownley, Anushka; Katzel, Dan; Bera, Jayati; Nguyen, Anh Thu; Thovarai, Vishal; Halpin, Rebecca; Stockwell, Timothy B

    2012-11-06

    In a high-throughput environment, to PCR amplify and sequence a large set of viral isolates from populations that are potentially heterogeneous and continuously evolving, the use of degenerate PCR primers is an important strategy. Degenerate primers allow for the PCR amplification of a wider range of viral isolates with only one set of pre-mixed primers, thus increasing amplification success rates and minimizing the necessity for genome finishing activities. To successfully select a large set of degenerate PCR primers necessary to tile across an entire viral genome and maximize their success, this process is best performed computationally. We have developed a fully automated degenerate PCR primer design system that plays a key role in the J. Craig Venter Institute's (JCVI) high-throughput viral sequencing pipeline. A consensus viral genome, or a set of consensus segment sequences in the case of a segmented virus, is specified using IUPAC ambiguity codes in the consensus template sequence to represent the allelic diversity of the target population. PCR primer pairs are then selected computationally to produce a minimal amplicon set capable of tiling across the full length of the specified target region. As part of the tiling process, primer pairs are computationally screened to meet the criteria for successful PCR with one of two described amplification protocols. The actual sequencing success rates for designed primers for measles virus, mumps virus, human parainfluenza virus 1 and 3, human respiratory syncytial virus A and B and human metapneumovirus are described, where >90% of designed primer pairs were able to consistently successfully amplify >75% of the isolates. Augmenting our previously developed and published JCVI Primer Design Pipeline, we achieved similarly high sequencing success rates with only minor software modifications. The recommended methodology for the construction of the consensus sequence that encapsulates the allelic variation of the targeted

  13. Roche genome sequencer FLX based high-throughput sequencing of ancient DNA

    DEFF Research Database (Denmark)

    Alquezar-Planas, David E; Fordyce, Sarah Louise

    2012-01-01

    Since the development of so-called "next generation" high-throughput sequencing in 2005, this technology has been applied to a variety of fields. Such applications include disease studies, evolutionary investigations, and ancient DNA. Each application requires a specialized protocol to ensure tha...

  14. High-throughput sequencing in veterinary infection biology and diagnostics.

    Science.gov (United States)

    Belák, S; Karlsson, O E; Leijon, M; Granberg, F

    2013-12-01

    Sequencing methods have improved rapidly since the first versions of the Sanger techniques, facilitating the development of very powerful tools for detecting and identifying various pathogens, such as viruses, bacteria and other microbes. The ongoing development of high-throughput sequencing (HTS; also known as next-generation sequencing) technologies has resulted in a dramatic reduction in DNA sequencing costs, making the technology more accessible to the average laboratory. In this White Paper of the World Organisation for Animal Health (OIE) Collaborating Centre for the Biotechnology-based Diagnosis of Infectious Diseases in Veterinary Medicine (Uppsala, Sweden), several approaches and examples of HTS are summarised, and their diagnostic applicability is briefly discussed. Selected future aspects of HTS are outlined, including the need for bioinformatic resources, with a focus on improving the diagnosis and control of infectious diseases in veterinary medicine.

  15. Nullomers and High Order Nullomers in Genomic Sequences

    Science.gov (United States)

    Vergni, Davide; Santoni, Daniele

    2016-01-01

    A nullomer is an oligomer that does not occur as a subsequence in a given DNA sequence, i.e. it is an absent word of that sequence. The importance of nullomers in several applications, from drug discovery to forensic practice, is now debated in the literature. Here, we investigated the nature of nullomers, whether their absence in genomes has just a statistical explanation or it is a peculiar feature of genomic sequences. We introduced an extension of the notion of nullomer, namely high order nullomers, which are nullomers whose mutated sequences are still nullomers. We studied different aspects of them: comparison with nullomers of random sequences, CpG distribution and mean helical rise. In agreement with previous results we found that the number of nullomers in the human genome is much larger than expected by chance. Nevertheless antithetical results were found when considering a random DNA sequence preserving dinucleotide frequencies. The analysis of CpG frequencies in nullomers and high order nullomers revealed, as expected, a high CpG content but it also highlighted a strong dependence of CpG frequencies on the dinucleotide position, suggesting that nullomers have their own peculiar structure and are not simply sequences whose CpG frequency is biased. Furthermore, phylogenetic trees were built on eleven species based on both the similarities between the dinucleotide frequencies and the number of nullomers two species share, showing that nullomers are fairly conserved among close species. Finally the study of mean helical rise of nullomers sequences revealed significantly high mean rise values, reinforcing the hypothesis that those sequences have some peculiar structural features. The obtained results show that nullomers are the consequence of the peculiar structure of DNA (also including biased CpG frequency and CpGs islands), so that the hypermutability model, also taking into account CpG islands, seems to be not sufficient to explain nullomer phenomenon

  16. Confidence-based somatic mutation evaluation and prioritization.

    Directory of Open Access Journals (Sweden)

    Martin Löwer

    Full Text Available Next generation sequencing (NGS has enabled high throughput discovery of somatic mutations. Detection depends on experimental design, lab platforms, parameters and analysis algorithms. However, NGS-based somatic mutation detection is prone to erroneous calls, with reported validation rates near 54% and congruence between algorithms less than 50%. Here, we developed an algorithm to assign a single statistic, a false discovery rate (FDR, to each somatic mutation identified by NGS. This FDR confidence value accurately discriminates true mutations from erroneous calls. Using sequencing data generated from triplicate exome profiling of C57BL/6 mice and B16-F10 melanoma cells, we used the existing algorithms GATK, SAMtools and SomaticSNiPer to identify somatic mutations. For each identified mutation, our algorithm assigned an FDR. We selected 139 mutations for validation, including 50 somatic mutations assigned a low FDR (high confidence and 44 mutations assigned a high FDR (low confidence. All of the high confidence somatic mutations validated (50 of 50, none of the 44 low confidence somatic mutations validated, and 15 of 45 mutations with an intermediate FDR validated. Furthermore, the assignment of a single FDR to individual mutations enables statistical comparisons of lab and computation methodologies, including ROC curves and AUC metrics. Using the HiSeq 2000, single end 50 nt reads from replicates generate the highest confidence somatic mutation call set.

  17. Scrutinizing virus genome termini by high-throughput sequencing.

    Directory of Open Access Journals (Sweden)

    Shasha Li

    Full Text Available Analysis of genomic terminal sequences has been a major step in studies on viral DNA replication and packaging mechanisms. However, traditional methods to study genome termini are challenging due to the time-consuming protocols and their inefficiency where critical details are lost easily. Recent advances in next generation sequencing (NGS have enabled it to be a powerful tool to study genome termini. In this study, using NGS we sequenced one iridovirus genome and twenty phage genomes and confirmed for the first time that the high frequency sequences (HFSs found in the NGS reads are indeed the terminal sequences of viral genomes. Further, we established a criterion to distinguish the type of termini and the viral packaging mode. We also obtained additional terminal details such as terminal repeats, multi-termini, asymmetric termini. With this approach, we were able to simultaneously detect details of the genome termini as well as obtain the complete sequence of bacteriophage genomes. Theoretically, this application can be further extended to analyze larger and more complicated genomes of plant and animal viruses. This study proposed a novel and efficient method for research on viral replication, packaging, terminase activity, transcription regulation, and metabolism of the host cell.

  18. High-resolution mapping of protein sequence-function relationships.

    Science.gov (United States)

    Fowler, Douglas M; Araya, Carlos L; Fleishman, Sarel J; Kellogg, Elizabeth H; Stephany, Jason J; Baker, David; Fields, Stanley

    2010-09-01

    We present a large-scale approach to investigate the functional consequences of sequence variation in a protein. The approach entails the display of hundreds of thousands of protein variants, moderate selection for activity and high-throughput DNA sequencing to quantify the performance of each variant. Using this strategy, we tracked the performance of >600,000 variants of a human WW domain after three and six rounds of selection by phage display for binding to its peptide ligand. Binding properties of these variants defined a high-resolution map of mutational preference across the WW domain; each position had unique features that could not be captured by a few representative mutations. Our approach could be applied to many in vitro or in vivo protein assays, providing a general means for understanding how protein function relates to sequence.

  19. Exome sequencing identifies ZNF644 mutations in high myopia.

    Directory of Open Access Journals (Sweden)

    Yi Shi

    2011-06-01

    Full Text Available Myopia is the most common ocular disorder worldwide, and high myopia in particular is one of the leading causes of blindness. Genetic factors play a critical role in the development of myopia, especially high myopia. Recently, the exome sequencing approach has been successfully used for the disease gene identification of Mendelian disorders. Here we show a successful application of exome sequencing to identify a gene for an autosomal dominant disorder, and we have identified a gene potentially responsible for high myopia in a monogenic form. We captured exomes of two affected individuals from a Han Chinese family with high myopia and performed sequencing analysis by a second-generation sequencer with a mean coverage of 30× and sufficient depth to call variants at ∼97% of each targeted exome. The shared genetic variants of these two affected individuals in the family being studied were filtered against the 1000 Genomes Project and the dbSNP131 database. A mutation A672G in zinc finger protein 644 isoform 1 (ZNF644 was identified as being related to the phenotype of this family. After we performed sequencing analysis of the exons in the ZNF644 gene in 300 sporadic cases of high myopia, we identified an additional five mutations (I587V, R680G, C699Y, 3'UTR+12 C>G, and 3'UTR+592 G>A in 11 different patients. All these mutations were absent in 600 normal controls. The ZNF644 gene was expressed in human retinal and retinal pigment epithelium (RPE. Given that ZNF644 is predicted to be a transcription factor that may regulate genes involved in eye development, mutation may cause the axial elongation of eyeball found in high myopia patients. Our results suggest that ZNF644 might be a causal gene for high myopia in a monogenic form.

  20. The Model Confidence Set

    DEFF Research Database (Denmark)

    Hansen, Peter Reinhard; Lunde, Asger; Nason, James M.

    The paper introduces the model confidence set (MCS) and applies it to the selection of models. A MCS is a set of models that is constructed such that it will contain the best model with a given level of confidence. The MCS is in this sense analogous to a confidence interval for a parameter. The M...

  1. Increasing Mobility Confidence

    Science.gov (United States)

    ... Español In Your Area NPF Shop Increasing Mobility Confidence To increase your confidence moving, you have to move! Make Text Smaller ... It might seem counterintuitive, but to increase your confidence moving, you have to move! Build physical activity ...

  2. Binary interactions with high accretion rates onto main sequence stars

    Science.gov (United States)

    Shiber, Sagiv; Schreier, Ron; Soker, Noam

    2016-07-01

    Energetic outflows from main sequence stars accreting mass at very high rates might account for the powering of some eruptive objects, such as merging main sequence stars, major eruptions of luminous blue variables, e.g., the Great Eruption of Eta Carinae, and other intermediate luminosity optical transients (ILOTs; red novae; red transients). These powerful outflows could potentially also supply the extra energy required in the common envelope process and in the grazing envelope evolution of binary systems. We propose that a massive outflow/jets mediated by magnetic fields might remove energy and angular momentum from the accretion disk to allow such high accretion rate flows. By examining the possible activity of the magnetic fields of accretion disks, we conclude that indeed main sequence stars might accrete mass at very high rates, up to ≈ 10-2 M ⊙ yr-1 for solar type stars, and up to ≈ 1 M ⊙ yr-1 for very massive stars. We speculate that magnetic fields amplified in such extreme conditions might lead to the formation of massive bipolar outflows that can remove most of the disk's energy and angular momentum. It is this energy and angular momentum removal that allows the very high mass accretion rate onto main sequence stars.

  3. High nucleosome occupancy is encoded at human regulatory sequences.

    Directory of Open Access Journals (Sweden)

    Desiree Tillo

    Full Text Available Active eukaryotic regulatory sites are characterized by open chromatin, and yeast promoters and transcription factor binding sites (TFBSs typically have low intrinsic nucleosome occupancy. Here, we show that in contrast to yeast, DNA at human promoters, enhancers, and TFBSs generally encodes high intrinsic nucleosome occupancy. In most cases we examined, these elements also have high experimentally measured nucleosome occupancy in vivo. These regions typically have high G+C content, which correlates positively with intrinsic nucleosome occupancy, and are depleted for nucleosome-excluding poly-A sequences. We propose that high nucleosome preference is directly encoded at regulatory sequences in the human genome to restrict access to regulatory information that will ultimately be utilized in only a subset of differentiated cells.

  4. High Throughput Plasmid Sequencing with Illumina and CLC Bio (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

    Energy Technology Data Exchange (ETDEWEB)

    Athavale, Ajay [Monsanto

    2012-06-01

    Ajay Athavale (Monsanto) presents "High Throughput Plasmid Sequencing with Illumina and CLC Bio" at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

  5. Generating long sequences of high-intensity femtosecond pulses

    CERN Document Server

    Bitter, Martin

    2015-01-01

    We present an approach to create pulse sequences extending beyond 150~picoseconds in duration, comprised of $100~\\mu$J femtosecond pulses. A quarter of the pulse train is produced by a high-resolution pulse shaper, which allows full controllability over the timing of each pulse. Two nested Michelson interferometers follow to quadruple the pulse number and the sequence duration. To boost the pulse energy, the long train is sent through a multi-pass Ti:Sapphire amplifier, followed by an external compressor. A periodic sequence of 84~pulses of 120~fs width and an average pulse energy of 107~$\\mu$J, separated by 2~ps, is demonstrated as a proof of principle.

  6. Next-generation sequencing: big data meets high performance computing.

    Science.gov (United States)

    Schmidt, Bertil; Hildebrandt, Andreas

    2017-02-02

    The progress of next-generation sequencing has a major impact on medical and genomic research. This high-throughput technology can now produce billions of short DNA or RNA fragments in excess of a few terabytes of data in a single run. This leads to massive datasets used by a wide range of applications including personalized cancer treatment and precision medicine. In addition to the hugely increased throughput, the cost of using high-throughput technologies has been dramatically decreasing. A low sequencing cost of around US$1000 per genome has now rendered large population-scale projects feasible. However, to make effective use of the produced data, the design of big data algorithms and their efficient implementation on modern high performance computing systems is required.

  7. Compression of structured high-throughput sequencing data.

    Directory of Open Access Journals (Sweden)

    Fabien Campagne

    Full Text Available Large biological datasets are being produced at a rapid pace and create substantial storage challenges, particularly in the domain of high-throughput sequencing (HTS. Most approaches currently used to store HTS data are either unable to quickly adapt to the requirements of new sequencing or analysis methods (because they do not support schema evolution, or fail to provide state of the art compression of the datasets. We have devised new approaches to store HTS data that support seamless data schema evolution and compress datasets substantially better than existing approaches. Building on these new approaches, we discuss and demonstrate how a multi-tier data organization can dramatically reduce the storage, computational and network burden of collecting, analyzing, and archiving large sequencing datasets. For instance, we show that spliced RNA-Seq alignments can be stored in less than 4% the size of a BAM file with perfect data fidelity. Compared to the previous compression state of the art, these methods reduce dataset size more than 40% when storing exome, gene expression or DNA methylation datasets. The approaches have been integrated in a comprehensive suite of software tools (http://goby.campagnelab.org that support common analyses for a range of high-throughput sequencing assays.

  8. The Effect of High-Fidelity Cardiopulmonary Resuscitation (CPR) Simulation on Athletic Training Student Knowledge, Confidence, Emotions, and Experiences

    Science.gov (United States)

    Tivener, Kristin Ann; Gloe, Donna Sue

    2015-01-01

    Context: High-fidelity simulation is widely used in healthcare for the training and professional education of students though literature of its application to athletic training education remains sparse. Objective: This research attempts to address a wide-range of data. This includes athletic training student knowledge acquisition from…

  9. The Effect of High-Fidelity Cardiopulmonary Resuscitation (CPR) Simulation on Athletic Training Student Knowledge, Confidence, Emotions, and Experiences

    Science.gov (United States)

    Tivener, Kristin Ann; Gloe, Donna Sue

    2015-01-01

    Context: High-fidelity simulation is widely used in healthcare for the training and professional education of students though literature of its application to athletic training education remains sparse. Objective: This research attempts to address a wide-range of data. This includes athletic training student knowledge acquisition from…

  10. Reader Accuracy and Confidence in Diagnosing Diffuse Lung Disease on High-Resolution Computed Tomography of the Lungs: Impact of Sampling Frequency

    Energy Technology Data Exchange (ETDEWEB)

    Sundaram, B.; Gross, B.H.; Oh, E.; Mueller, N.; Myles, J.D.; Kazerooni, E.A. (Dept. of Radiology, Michigan Institute for Clinical Health Research, Univ. of Michigan Health System, Ann Arbor, Michigan (United States))

    2008-10-15

    Background: The accuracy of the number of high-resolution computed tomography (HRCT) images necessary to diagnose diffuse lung disease (DLD) is not well established. Purpose: To evaluate the impact of HRCT sampling frequency on reader confidence and accuracy for diagnosing DLD. Material and Methods: HRCT images of 100 consecutive patients with proven DLD were reviewed. They were: 48 usual interstitial pneumonia, 22 sarcoidosis, six hypersensitivity pneumonitis, five each of desquamative interstitial pneumonitis, eosinophilic granulomatosis, and lymphangioleiomyomatosis, and nine others. Inspiratory images at 1-cm increments throughout the lungs and three specified levels formed complete and limited examinations. In random order, three experts (readers 1, 2, and 3) ranked their top three diagnoses and rated confidence for their top diagnosis, independently and blinded to clinical information. Results: Using the complete versus limited examinations for correct first-choice diagnosis, accuracy for reader 1 (R1) was 81% versus 80%, respectively, for reader 2 (R2) 70% versus 70%, and for reader 3 (R3) 64% versus 59%. Reader accuracy within their top three choices for complete versus limited examinations was: R1 91% versus 91% of cases, respectively, R2 84% versus 83%, and R3 79% versus 72% of cases. No statistically significant differences were found between the diagnosis methods (P=0.28 for first diagnosis and P=0.17 for top three choices). The confidence intervals for individual raters showed considerable overlap, and the point estimates are almost identical. The mean interreader agreement for complete versus limited HRCT for both top and top three diagnoses were the same (moderate and fair, respectively). The mean intrareader agreement between complete and limited HRCT for top and top three diagnoses were substantial and moderate, respectively. Conclusion: Overall reader accuracy and confidence in diagnosis did not significantly differ when fewer or more HRCT images

  11. The use of coded PCR primers enables high-throughput sequencing of multiple homolog amplification products by 454 parallel sequencing.

    Directory of Open Access Journals (Sweden)

    Jonas Binladen

    Full Text Available BACKGROUND: The invention of the Genome Sequence 20 DNA Sequencing System (454 parallel sequencing platform has enabled the rapid and high-volume production of sequence data. Until now, however, individual emulsion PCR (emPCR reactions and subsequent sequencing runs have been unable to combine template DNA from multiple individuals, as homologous sequences cannot be subsequently assigned to their original sources. METHODOLOGY: We use conventional PCR with 5'-nucleotide tagged primers to generate homologous DNA amplification products from multiple specimens, followed by sequencing through the high-throughput Genome Sequence 20 DNA Sequencing System (GS20, Roche/454 Life Sciences. Each DNA sequence is subsequently traced back to its individual source through 5'tag-analysis. CONCLUSIONS: We demonstrate that this new approach enables the assignment of virtually all the generated DNA sequences to the correct source once sequencing anomalies are accounted for (miss-assignment rate<0.4%. Therefore, the method enables accurate sequencing and assignment of homologous DNA sequences from multiple sources in single high-throughput GS20 run. We observe a bias in the distribution of the differently tagged primers that is dependent on the 5' nucleotide of the tag. In particular, primers 5' labelled with a cytosine are heavily overrepresented among the final sequences, while those 5' labelled with a thymine are strongly underrepresented. A weaker bias also exists with regards to the distribution of the sequences as sorted by the second nucleotide of the dinucleotide tags. As the results are based on a single GS20 run, the general applicability of the approach requires confirmation. However, our experiments demonstrate that 5'primer tagging is a useful method in which the sequencing power of the GS20 can be applied to PCR-based assays of multiple homologous PCR products. The new approach will be of value to a broad range of research areas, such as those of

  12. Upsampling range camera depth maps using high-resolution vision camera and pixel-level confidence classification

    Science.gov (United States)

    Tian, Chao; Vaishampayan, Vinay; Zhang, Yifu

    2011-03-01

    We consider the problem of upsampling a low-resolution depth map generated by a range camera, by using information from one or more additional high-resolution vision cameras. The goal is to provide an accurate high resolution depth map from the viewpoint of one of the vision cameras. We propose an algorithm that first converts the low resolution depth map into a depth/disparity map through coordinate mappings into the coordinate frame of one vision camera, then classifies the pixels into regions according to whether the range camera depth map is trustworthy, and finally refine the depth values for the pixels in the untrustworthy regions. For the last refinement step, both a method based on graph cut optimization and that based on bilateral filtering are examined. Experimental results show that the proposed methods using classification are able to upsample the depth map by a factor of 10 x 10 with much improved depth details, with significantly better accuracy comparing to those without the classification. The improvements are visually perceptible on a 3D auto-stereoscopic display.

  13. Strengthening Public Confidence.

    Science.gov (United States)

    Herlihy, John J.

    Board members and administrators can build public confidence in their schools by taking every opportunity to communicate positive attitudes about the people in the schools. As leaders, they have the responsibility to use people power to promote the schools. If school employees feel good about their jobs, they will build confidence within the…

  14. High throughput 16S rRNA gene amplicon sequencing

    DEFF Research Database (Denmark)

    Nierychlo, Marta; Larsen, Poul; Jørgensen, Mads Koustrup

    S rRNA gene amplicon sequencing has been developed over the past few years and is now ready to use for more comprehensive studies related to plant operation and optimization thanks to short analysis time, low cost, high throughput, and high taxonomic resolution. In this study we show how 16S r...... to the presence of filamentous microorganisms was monitored weekly over 4 months. Microthrix was identified as a causative filament and suitable control measures were introduced. The level of Microthrix was reduced after 1-2 months but a number of other filamentous species were still present, with most of them...

  15. Evolutionary growth process of highly conserved sequences in vertebrate genomes.

    Science.gov (United States)

    Ishibashi, Minaka; Noda, Akiko Ogura; Sakate, Ryuichi; Imanishi, Tadashi

    2012-08-01

    Genome sequence comparison between evolutionarily distant species revealed ultraconserved elements (UCEs) among mammals under strong purifying selection. Most of them were also conserved among vertebrates. Because they tend to be located in the flanking regions of developmental genes, they would have fundamental roles in creating vertebrate body plans. However, the evolutionary origin and selection mechanism of these UCEs remain unclear. Here we report that UCEs arose in primitive vertebrates, and gradually grew in vertebrate evolution. We searched for UCEs in two teleost fishes, Tetraodon nigroviridis and Oryzias latipes, and found 554 UCEs with 100% identity over 100 bps. Comparison of teleost and mammalian UCEs revealed 43 pairs of common, jawed-vertebrate UCEs (jUCE) with high sequence identities, ranging from 83.1% to 99.2%. Ten of them retain lower similarities to the Petromyzon marinus genome, and the substitution rates of four non-exonic jUCEs were reduced after the teleost-mammal divergence, suggesting that robust conservation had been acquired in the jawed vertebrate lineage. Our results indicate that prototypical UCEs originated before the divergence of jawed and jawless vertebrates and have been frozen as perfect conserved sequences in the jawed vertebrate lineage. In addition, our comparative sequence analyses of UCEs and neighboring regions resulted in a discovery of lineage-specific conserved sequences. They were added progressively to prototypical UCEs, suggesting step-wise acquisition of novel regulatory roles. Our results indicate that conserved non-coding elements (CNEs) consist of blocks with distinct evolutionary history, each having been frozen since different evolutionary era along the vertebrate lineage. Copyright © 2012 Elsevier B.V. All rights reserved.

  16. Applications of High-Throughput Nucleotide Sequencing (PhD)

    DEFF Research Database (Denmark)

    Waage, Johannes

    The recent advent of high throughput sequencing of nucleic acids (RNA and DNA) has vastly expanded research into the functional and structural biology of the genome of all living organisms (and even a few dead ones). With this enormous and exponential growth in biological data generation come......-sequencing, a study of the effects on alternative RNA splicing of KO of the nonsense mediated RNA decay system in Mus, using digital gene expression and a custom-built exon-exon junction mapping pipeline is presented (article I). Evolved from this work, a Bioconductor package, spliceR, for classifying alternative...... splicing events and coding potential of isoforms from full isoform deconvolution software, such as Cufflinks (article II), is presented. Finally, a study using 5’-end RNA-seq for alternative promoter detection between healthy patients and patients with acute promyelocytic leukemia is presented (article III...

  17. The use of coded PCR primers enables high-throughput sequencing of multiple homolog amplification products by 454 parallel sequencing

    DEFF Research Database (Denmark)

    Binladen, Jonas; Gilbert, M Thomas P; Bollback, Jonathan P

    2007-01-01

    BACKGROUND: The invention of the Genome Sequence 20 DNA Sequencing System (454 parallel sequencing platform) has enabled the rapid and high-volume production of sequence data. Until now, however, individual emulsion PCR (emPCR) reactions and subsequent sequencing runs have been unable to combine ...... be applied to PCR-based assays of multiple homologous PCR products. The new approach will be of value to a broad range of research areas, such as those of comparative genomics, complete mitochondrial analyses, population genetics, and phylogenetics.......BACKGROUND: The invention of the Genome Sequence 20 DNA Sequencing System (454 parallel sequencing platform) has enabled the rapid and high-volume production of sequence data. Until now, however, individual emulsion PCR (emPCR) reactions and subsequent sequencing runs have been unable to combine...... template DNA from multiple individuals, as homologous sequences cannot be subsequently assigned to their original sources. METHODOLOGY: We use conventional PCR with 5'-nucleotide tagged primers to generate homologous DNA amplification products from multiple specimens, followed by sequencing through...

  18. Communicating Low-Probability High-Consequence Risk, Uncertainty and Expert Confidence: Induced Seismicity of Deep Geothermal Energy and Shale Gas.

    Science.gov (United States)

    Knoblauch, Theresa A K; Stauffacher, Michael; Trutnevyte, Evelina

    2017-08-10

    Subsurface energy activities entail the risk of induced seismicity including low-probability high-consequence (LPHC) events. For designing respective risk communication, the scientific literature lacks empirical evidence of how the public reacts to different written risk communication formats about such LPHC events and to related uncertainty or expert confidence. This study presents findings from an online experiment (N = 590) that empirically tested the public's responses to risk communication about induced seismicity and to different technology frames, namely deep geothermal energy (DGE) and shale gas (between-subject design). Three incrementally different formats of written risk communication were tested: (i) qualitative, (ii) qualitative and quantitative, and (iii) qualitative and quantitative with risk comparison. Respondents found the latter two the easiest to understand, the most exact, and liked them the most. Adding uncertainty and expert confidence statements made the risk communication less clear, less easy to understand and increased concern. Above all, the technology for which risks are communicated and its acceptance mattered strongly: respondents in the shale gas condition found the identical risk communication less trustworthy and more concerning than in the DGE conditions. They also liked the risk communication overall less. For practitioners in DGE or shale gas projects, the study shows that the public would appreciate efforts in describing LPHC risks with numbers and optionally risk comparisons. However, there seems to be a trade-off between aiming for transparency by disclosing uncertainty and limited expert confidence, and thereby decreasing clarity and increasing concern in the view of the public. © 2017 Society for Risk Analysis.

  19. Confidence in Numerical Simulations

    Energy Technology Data Exchange (ETDEWEB)

    Hemez, Francois M. [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

    2015-02-23

    This PowerPoint presentation offers a high-level discussion of uncertainty, confidence and credibility in scientific Modeling and Simulation (M&S). It begins by briefly evoking M&S trends in computational physics and engineering. The first thrust of the discussion is to emphasize that the role of M&S in decision-making is either to support reasoning by similarity or to “forecast,” that is, make predictions about the future or extrapolate to settings or environments that cannot be tested experimentally. The second thrust is to explain that M&S-aided decision-making is an exercise in uncertainty management. The three broad classes of uncertainty in computational physics and engineering are variability and randomness, numerical uncertainty and model-form uncertainty. The last part of the discussion addresses how scientists “think.” This thought process parallels the scientific method where by a hypothesis is formulated, often accompanied by simplifying assumptions, then, physical experiments and numerical simulations are performed to confirm or reject the hypothesis. “Confidence” derives, not just from the levels of training and experience of analysts, but also from the rigor with which these assessments are performed, documented and peer-reviewed.

  20. Trust vs. Confidence

    Science.gov (United States)

    2016-06-07

    defined. Although there are many different definitions of trust, our definition (Adams and Webb, 2003) is as follows: Trust is a psychological state...Judgments: Experiments on the Time to Determine Confidence. Journal of Experimental Psychology : Human Perception and Performance, 24(3), 929-945. BARANSKI...PETRUSIC, W. (2001). Testing Architectures of the Decision-Confidence Relation. Canadian Journal of Experimental Psychology , 55(3): 195-206. PETRUSIC, W

  1. High order coherent control sequences of fat pulses

    CERN Document Server

    Pasini, S; Uhrig, G S

    2010-01-01

    We analyze the performance of sequences of fat pulses of various lengths and shapes for dynamic decoupling and we compare it with that of sequences of ideal, instantaneous pulses. The use of second order, shaped pulses represents a significant improvement. Non-equidistant sequences characterized by pulse durations scaled proportional to the duration T of the sequence strikingly outperform the sequences with pulses of constant length for small T. Interestingly, for longer durations sequences of pulses of substantial length are found to suppress dephasing better than sequences of ideal pulses.

  2. Benefit-of-doubt (BOD) scoring: a sequencing-based method for SNP candidate assessment from high to medium read number data sets.

    Science.gov (United States)

    Sedlazeck, Fritz Joachim; Talloji, Prabhavathi; von Haeseler, Arndt; Bachmair, Andreas

    2013-03-01

    Identification of single nucleotide polymorphisms (SNPs) is a key element in sequence-based genetic analysis. Next generation sequencing offers a cost-effective basis to generate the necessary, large sequence data sets, and bioinformatic methods are being developed to process sequencing machine readouts. We were interested in detection of SNPs in a 350 kb region of an EMS-mutagenized Arabidopsis chromosome 3. The region was selectively analyzed using PCR-generated, overlapping fragments for Solexa sequencing. The ensuing reads provided a high coverage and were processed bioinformatically. In order to assess the SNP candidates obtained with a frequently used alignment program and SNP caller, we developed an additional method that allows the identification of high confidence SNP loci. The method can easily be applied to complete genome sequence data of sufficient coverage.

  3. Determination of red blood cell fatty acid profiles: Rapid and high-confident analysis by chemical ionization-gas chromatography-tandem mass spectrometry.

    Science.gov (United States)

    Schober, Yvonne; Wahl, Hans Günther; Renz, Harald; Nockher, Wolfgang Andreas

    2017-01-01

    Cellular fatty acid (FA) profiles have been acknowledged as biomarkers in various human diseases. Nevertheless, common FA analysis by gas chromatography mass spectrometry (GC-MS) requires long analysis time. Hence, there is a need for feasible methods for high throughput analysis in clinical studies. FA was extracted from red blood cells (RBC) and derivatized to fatty acid methyl esters (FAME). A method using gas chromatography tandem mass spectrometry (GC-MS/MS) with ammonia-induced chemical ionization (CI) was developed for the analysis of FA profiles in human RBC. We compared this method with classical single GC-MS using electron impact ionization (EI). The FA profiles of 703 RBC samples were determined by GC-MS/MS. In contrast to EI ammonia-induced CI resulted in adequate amounts of molecular ions for further fragmentation of FAME. Specific fragments for confident quantification and fragmentation were determined for 45 FA. The GC-MS/MS method has a total run time of 9min compared to typical analysis times of up to 60min in conventional GC-MS. Intra and inter assay variations were <10% for all FA analyzed. Analysis of RBC FA composition revealed an age-dependent increase of the omega-3 eicosapentaenoic and docosahexaenoic acid, and a decline of the omega-6 linoleic acid with a corresponding rise of the omega-3 index. The combination of ammonia-induced CI and tandem mass spectrometry after GC separation allows for high-throughput, robust and confident analysis of FA profiles in the clinical laboratory. Copyright © 2016. Published by Elsevier B.V.

  4. Confidence and Construal Framing: When Confidence Increases versus Decreases Information Processing

    OpenAIRE

    Echo Wen Wan; Derek D. Rucker

    2013-01-01

    A large literature demonstrates that people process information more carefully in states of low compared to high confidence. This article presents an alternative hypothesis that either high or low confidence can increase or decrease information processing on the basis of how information is construed. Five experiments demonstrate two sets of findings supporting this alternative formulation. First, low confidence leads people to focus on concrete construals, whereas high confidence leads people...

  5. Plasmodium falciparum antigenic variation. Mapping mosaic var gene sequences onto a network of shared, highly polymorphic sequence blocks.

    Science.gov (United States)

    Bull, Peter C; Buckee, Caroline O; Kyes, Sue; Kortok, Moses M; Thathy, Vandana; Guyah, Bernard; Stoute, José A; Newbold, Chris I; Marsh, Kevin

    2008-06-01

    Plasmodium falciparum erythrocyte membrane protein 1 (PfEMP1) is a potentially important family of immune targets, encoded by an extremely diverse gene family called var. Understanding of the genetic organization of var genes is hampered by sequence mosaicism that results from a long history of non-homologous recombination. Here we have used software designed to analyse social networks to visualize the relationships between large collections of short var sequences tags sampled from clinical parasite isolates. In this approach, two sequences are connected if they share one or more highly polymorphic sequence blocks. The results show that the majority of analysed sequences including several var-like sequences from the chimpanzee parasite Plasmodium reichenowi can be either directly or indirectly linked together in a single unbroken network. However, the network is highly structured and contains putative subgroups of recombining sequences. The major subgroup contains the previously described group A var genes, previously proposed to be genetically distinct. Another subgroup contains sequences found to be associated with rosetting, a parasite virulence phenotype. The mosaic structure of the sequences and their division into subgroups may reflect the conflicting problems of maximizing antigenic diversity and minimizing epitope sharing between variants while maintaining their host cell binding functions.

  6. High throughput sequencing reveals a novel fabavirus infecting sweet cherry.

    Science.gov (United States)

    Villamor, D E V; Pillai, S S; Eastwell, K C

    2017-03-01

    The genus Fabavirus currently consists of five species represented by viruses that infect a wide range of hosts but none reported from temperate climate fruit trees. A virus with genomic features resembling fabaviruses (tentatively named Prunus virus F, PrVF) was revealed by high throughput sequencing of extracts from a sweet cherry tree (Prunus avium). PrVF was subsequently shown to be graft transmissible and further identified in three other non-symptomatic Prunus spp. from different geographical locations. Two genetic variants of RNA1 and RNA2 coexisted in the same samples. RNA1 consisted of 6,165 and 6,163 nucleotides, and RNA2 consisted of 3,622 and 3,468 nucleotides.

  7. The Confidence Trick

    Directory of Open Access Journals (Sweden)

    Steve Keen

    2009-03-01

    Full Text Available This article reflects on the role that confidence plays in recovery from a financial crisis.The author reflects on lessons from the past – specifically The Great Crash of 1929 and on thework of economists Keynes and Fisher to apply to our current economic woes.The role of overconfidence in our current crisis is also examined.

  8. The Confidence Trick

    OpenAIRE

    2009-01-01

    This article reflects on the role that confidence plays in recovery from a financial crisis.The author reflects on lessons from the past – specifically The Great Crash of 1929 and on thework of economists Keynes and Fisher to apply to our current economic woes.The role of overconfidence in our current crisis is also examined.

  9. Confidence in Coastal Forecasts

    NARCIS (Netherlands)

    Baart, F.

    2013-01-01

    This thesis answers the question "How can we show and improve our confidence in coastal forecasts?", by providing four examples of common coastal forecasts. The first example shows how to improve the estimate of the one in ten thousand year storm-surge level. The three dimensional reconstruction,

  10. Adding Confidence to Knowledge

    Science.gov (United States)

    Goodson, Ludwika Aniela; Slater, Don; Zubovic, Yvonne

    2015-01-01

    A "knowledge survey" and a formative evaluation process led to major changes in an instructor's course and teaching methods over a 5-year period. Design of the survey incorporated several innovations, including: a) using "confidence survey" rather than "knowledge survey" as the title; b) completing an instructional…

  11. Raising Confident Kids

    Science.gov (United States)

    ... new skill and milestone, kids can develop increasing confidence. Parents can help by giving kids lots of opportunities to practice and master their skills, letting kids make mistakes and being there to boost their spirits so they keep trying. Respond with ...

  12. Resolving the Confidence Crisis

    Science.gov (United States)

    Apter, Terri

    2006-01-01

    As children approach adolescence, they often experience confusion and uncertainty as they attempt to appear more grown up than they really feel. Research on both girls and boys has documented that the buoyant self-confidence of younger children often gives way to self-consciousness as young adolescents become aware of the complexity and difficulty…

  13. PCR Strategies for Complete Allele Calling in Multigene Families Using High-Throughput Sequencing Approaches.

    Directory of Open Access Journals (Sweden)

    Elena Marmesat

    Full Text Available The characterization of multigene families with high copy number variation is often approached through PCR amplification with highly degenerate primers to account for all expected variants flanking the region of interest. Such an approach often introduces PCR biases that result in an unbalanced representation of targets in high-throughput sequencing libraries that eventually results in incomplete detection of the targeted alleles. Here we confirm this result and propose two different amplification strategies to alleviate this problem. The first strategy (called pooled-PCRs targets different subsets of alleles in multiple independent PCRs using different moderately degenerate primer pairs, whereas the second approach (called pooled-primers uses a custom-made pool of non-degenerate primers in a single PCR. We compare their performance to the common use of a single PCR with highly degenerate primers using the MHC class I of the Iberian lynx as a model. We found both novel approaches to work similarly well and better than the conventional approach. They significantly scored more alleles per individual (11.33 ± 1.38 and 11.72 ± 0.89 vs 7.94 ± 1.95, yielded more complete allelic profiles (96.28 ± 8.46 and 99.50 ± 2.12 vs 63.76 ± 15.43, and revealed more alleles at a population level (13 vs 12. Finally, we could link each allele's amplification efficiency with the primer-mismatches in its flanking sequences and show that ultra-deep coverage offered by high-throughput technologies does not fully compensate for such biases, especially as real alleles may reach lower coverage than artefacts. Adopting either of the proposed amplification methods provides the opportunity to attain more complete allelic profiles at lower coverages, improving confidence over the downstream analyses and subsequent applications.

  14. Hybridization Capture Using Short PCR Products Enriches Small Genomes by Capturing Flanking Sequences (CapFlank)

    DEFF Research Database (Denmark)

    Tsangaras, Kyriakos; Wales, Nathan; Sicheritz-Pontén, Thomas;

    2014-01-01

    Solution hybridization capture methods utilize biotinylated oligonucleotides as baits to enrich homologous sequences from next generation sequencing (NGS) libraries. Coupled with NGS, the method generates kilo to gigabases of high confidence consensus targeted sequence. However, in many experimen...

  15. Probabilistic Methods for Processing High-Throughput Sequencing Signals

    DEFF Research Database (Denmark)

    Sørensen, Lasse Maretty

    for reconstructing transcript sequences from RNA sequencing data. The method is based on a novel sparse prior distribution over transcript abundances and is markedly more accurate than existing approaches. The second chapter describes a new method for calling genotypes from a fixed set of candidate variants...... insights is far from trivial. A key challenge is that these methods cannot read the input sequences in their entirety. Due to technological constraints, they instead provide the sequences of very many fragments of the input molecules. Furthermore, not all nucleotides in these fragments are measured...... correctly and the final output of a typical experiment thus consists of hundreds of millions of error-containing sequence fragments. This thesis concerns the development of methods for transforming such a raw sequencing signal into a simpler representation from which biological inferences can then be made...

  16. Business Confidence Survey 2000

    Institute of Scientific and Technical Information of China (English)

    Guo Yan

    2009-01-01

    @@ In order to gain a better understanding about the depth and breadth of its effect on European companies'businesses,the new strategies they are adopting to cope with the crisis,and how their attitudes to towards China-including long-term plans-have changed in its aftermath,the European Union Chamber of Commerce in China today launches its sixth annual European Chamber Business Confidence Survey,which is published in partnership with Roland Berger Strategy Consultants in Beijing on June 30,2009.Drawing on the responses of more than 300 European companies active in China.the 2009 Survey highlights a European business community that remains bullish in China in most sectors and read to back up that confidence with continued investment in the local economy provided that Chinese government is committed to creating a more free,fair and competitive market.

  17. Varieties of Confidence Intervals.

    Science.gov (United States)

    Cousineau, Denis

    2017-01-01

    Error bars are useful to understand data and their interrelations. Here, it is shown that confidence intervals of the mean (CI M s) can be adjusted based on whether the objective is to highlight differences between measures or not and based on the experimental design (within- or between-group designs). Confidence intervals (CIs) can also be adjusted to take into account the sampling mechanisms and the population size (if not infinite). Names are proposed to distinguish the various types of CIs and the assumptions underlying them, and how to assess their validity is explained. The various CIs presented here are easily obtained from a succession of multiplicative adjustments to the basic (unadjusted) CI width. All summary results should present a measure of precision, such as CIs, as this information is complementary to effect sizes.

  18. We will be champions: Leaders' confidence in 'us' inspires team members' team confidence and performance.

    Science.gov (United States)

    Fransen, K; Steffens, N K; Haslam, S A; Vanbeselaere, N; Vande Broek, G; Boen, F

    2016-12-01

    The present research examines the impact of leaders' confidence in their team on the team confidence and performance of their teammates. In an experiment involving newly assembled soccer teams, we manipulated the team confidence expressed by the team leader (high vs neutral vs low) and assessed team members' responses and performance as they unfolded during a competition (i.e., in a first baseline session and a second test session). Our findings pointed to team confidence contagion such that when the leader had expressed high (rather than neutral or low) team confidence, team members perceived their team to be more efficacious and were more confident in the team's ability to win. Moreover, leaders' team confidence affected individual and team performance such that teams led by a highly confident leader performed better than those led by a less confident leader. Finally, the results supported a hypothesized mediational model in showing that the effect of leaders' confidence on team members' team confidence and performance was mediated by the leader's perceived identity leadership and members' team identification. In conclusion, the findings of this experiment suggest that leaders' team confidence can enhance members' team confidence and performance by fostering members' identification with the team. © 2015 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  19. DUC-Curve, a highly compact 2D graphical representation of DNA sequences and its application in sequence alignment

    Science.gov (United States)

    Li, Yushuang; Liu, Qian; Zheng, Xiaoqi

    2016-08-01

    A highly compact and simple 2D graphical representation of DNA sequences, named DUC-Curve, is constructed through mapping four nucleotides to a unit circle with a cyclic order. DUC-Curve could directly detect nucleotide, di-nucleotide compositions and microsatellite structure from DNA sequences. Moreover, it also could be used for DNA sequence alignment. Taking geometric center vectors of DUC-Curves as sequence descriptor, we perform similarity analysis on the first exons of β-globin genes of 11 species, oncogene TP53 of 27 species and twenty-four Influenza A viruses, respectively. The obtained reasonable results illustrate that the proposed method is very effective in sequence comparison problems, and will at least play a complementary role in classification and clustering problems.

  20. Experimental design-based functional mining and characterization of high-throughput sequencing data in the sequence read archive.

    Directory of Open Access Journals (Sweden)

    Takeru Nakazato

    Full Text Available High-throughput sequencing technology, also called next-generation sequencing (NGS, has the potential to revolutionize the whole process of genome sequencing, transcriptomics, and epigenetics. Sequencing data is captured in a public primary data archive, the Sequence Read Archive (SRA. As of January 2013, data from more than 14,000 projects have been submitted to SRA, which is double that of the previous year. Researchers can download raw sequence data from SRA website to perform further analyses and to compare with their own data. However, it is extremely difficult to search entries and download raw sequences of interests with SRA because the data structure is complicated, and experimental conditions along with raw sequences are partly described in natural language. Additionally, some sequences are of inconsistent quality because anyone can submit sequencing data to SRA with no quality check. Therefore, as a criterion of data quality, we focused on SRA entries that were cited in journal articles. We extracted SRA IDs and PubMed IDs (PMIDs from SRA and full-text versions of journal articles and retrieved 2748 SRA ID-PMID pairs. We constructed a publication list referring to SRA entries. Since, one of the main themes of -omics analyses is clarification of disease mechanisms, we also characterized SRA entries by disease keywords, according to the Medical Subject Headings (MeSH extracted from articles assigned to each SRA entry. We obtained 989 SRA ID-MeSH disease term pairs, and constructed a disease list referring to SRA data. We previously developed feature profiles of diseases in a system called "Gendoo". We generated hyperlinks between diseases extracted from SRA and the feature profiles of it. The developed project, publication and disease lists resulting from this study are available at our web service, called "DBCLS SRA" (http://sra.dbcls.jp/. This service will improve accessibility to high-quality data from SRA.

  1. Algorithms for mapping high-throughput DNA sequences

    DEFF Research Database (Denmark)

    Frellsen, Jes; Menzel, Peter; Krogh, Anders

    2014-01-01

    of data generation, new bioinformatics approaches have been developed to cope with the large amount of sequencing reads obtained in these experiments. In this chapter, we first introduce HTS technologies and their usage in molecular biology and discuss the problem of mapping sequencing reads...

  2. Recent research on the high-probability instructional sequence: A brief review.

    Science.gov (United States)

    Lipschultz, Joshua; Wilder, David A

    2017-04-01

    The high-probability (high-p) instructional sequence consists of the delivery of a series of high-probability instructions immediately before delivery of a low-probability or target instruction. It is commonly used to increase compliance in a variety of populations. Recent research has described variations of the high-p instructional sequence and examined the conditions under which the sequence is most effective. This manuscript reviews the most recent research on the sequence and identifies directions for future research. Recommendations for practitioners regarding the use of the high-p instructional sequence are also provided. © 2017 Society for the Experimental Analysis of Behavior.

  3. Consumer confidence or the business cycle

    DEFF Research Database (Denmark)

    Møller, Stig Vinther; Nørholm, Henrik; Rangvid, Jesper

    2014-01-01

    Answer: The business cycle. We show that consumer confidence and the output gap both excess returns on stocks in many European countries: When the output gap is positive (the economy is doing well), expected returns are low, and when consumer confidence is high, expected returns are also low....... Consumer confidence and the output gap are also highly positively correlated. In fact, we find that consumer confidence does not contain independent information (i.e. information over and above that contained by the output gap) about expected returns. Our use of European data allows us to examine both...... aggregate European and local-country data on consumer confidence and output gaps. We find that even local-country consumer confidence does not contain independent information about expected returns. Our findings have asset pricing implication: We show taht the cross-country distribution of expected returns...

  4. Expenditure, Confidence, and Uncertainty: Identifying Shocks to Consumer Confidence Using Daily Data

    OpenAIRE

    Lachowska, Marta

    2013-01-01

    The importance of consumer confidence in stimulating economic activity is a disputed issue in macroeconomics. Do changes in confidence represent autonomous fluctuations in optimism, independent of information on economic fundamentals, or are they a reflection of economic news? I study this question by using high-frequency microdata on spending and consumer confidence, and I find that consumer confidence contains information relevant to predicting spending, independent from other indicators. T...

  5. High resolution clustering of Salmonella enterica serovar Montevideo strains using a next-generation sequencing approach

    Directory of Open Access Journals (Sweden)

    Allard Marc W

    2012-01-01

    Full Text Available Abstract Background Next-Generation Sequencing (NGS is increasingly being used as a molecular epidemiologic tool for discerning ancestry and traceback of the most complicated, difficult to resolve bacterial pathogens. Making a linkage between possible food sources and clinical isolates requires distinguishing the suspected pathogen from an environmental background and placing the variation observed into the wider context of variation occurring within a serovar and among other closely related foodborne pathogens. Equally important is the need to validate these high resolution molecular tools for use in molecular epidemiologic traceback. Such efforts include the examination of strain cluster stability as well as the cumulative genetic effects of sub-culturing on these clusters. Numerous isolates of S. Montevideo were shot-gun sequenced including diverse lineage representatives as well as numerous replicate clones to determine how much variability is due to bias, sequencing error, and or the culturing of isolates. All new draft genomes were compared to 34 S. Montevideo isolates previously published during an NGS-based molecular epidemiological case study. Results Intraserovar lineages of S. Montevideo differ by thousands of SNPs, that are only slightly less than the number of SNPs observed between S. Montevideo and other distinct serovars. Much less variability was discovered within an individual S. Montevideo clade implicated in a recent foodborne outbreak as well as among individual NGS replicates. These findings were similar to previous reports documenting homopolymeric and deletion error rates with the Roche 454 GS Titanium technology. In no case, however, did variability associated with sequencing methods or sample preparations create inconsistencies with our current phylogenetic results or the subsequent molecular epidemiological evidence gleaned from these data. Conclusions Implementation of a validated pipeline for NGS data acquisition and

  6. Evaluation of a pooled strategy for high-throughput sequencing of cosmid clones from metagenomic libraries.

    Directory of Open Access Journals (Sweden)

    Kathy N Lam

    Full Text Available High-throughput sequencing methods have been instrumental in the growing field of metagenomics, with technological improvements enabling greater throughput at decreased costs. Nonetheless, the economy of high-throughput sequencing cannot be fully leveraged in the subdiscipline of functional metagenomics. In this area of research, environmental DNA is typically cloned to generate large-insert libraries from which individual clones are isolated, based on specific activities of interest. Sequence data are required for complete characterization of such clones, but the sequencing of a large set of clones requires individual barcode-based sample preparation; this can become costly, as the cost of clone barcoding scales linearly with the number of clones processed, and thus sequencing a large number of metagenomic clones often remains cost-prohibitive. We investigated a hybrid Sanger/Illumina pooled sequencing strategy that omits barcoding altogether, and we evaluated this strategy by comparing the pooled sequencing results to reference sequence data obtained from traditional barcode-based sequencing of the same set of clones. Using identity and coverage metrics in our evaluation, we show that pooled sequencing can generate high-quality sequence data, without producing problematic chimeras. Though caveats of a pooled strategy exist and further optimization of the method is required to improve recovery of complete clone sequences and to avoid circumstances that generate unrecoverable clone sequences, our results demonstrate that pooled sequencing represents an effective and low-cost alternative for sequencing large sets of metagenomic clones.

  7. Evaluation of a pooled strategy for high-throughput sequencing of cosmid clones from metagenomic libraries.

    Science.gov (United States)

    Lam, Kathy N; Hall, Michael W; Engel, Katja; Vey, Gregory; Cheng, Jiujun; Neufeld, Josh D; Charles, Trevor C

    2014-01-01

    High-throughput sequencing methods have been instrumental in the growing field of metagenomics, with technological improvements enabling greater throughput at decreased costs. Nonetheless, the economy of high-throughput sequencing cannot be fully leveraged in the subdiscipline of functional metagenomics. In this area of research, environmental DNA is typically cloned to generate large-insert libraries from which individual clones are isolated, based on specific activities of interest. Sequence data are required for complete characterization of such clones, but the sequencing of a large set of clones requires individual barcode-based sample preparation; this can become costly, as the cost of clone barcoding scales linearly with the number of clones processed, and thus sequencing a large number of metagenomic clones often remains cost-prohibitive. We investigated a hybrid Sanger/Illumina pooled sequencing strategy that omits barcoding altogether, and we evaluated this strategy by comparing the pooled sequencing results to reference sequence data obtained from traditional barcode-based sequencing of the same set of clones. Using identity and coverage metrics in our evaluation, we show that pooled sequencing can generate high-quality sequence data, without producing problematic chimeras. Though caveats of a pooled strategy exist and further optimization of the method is required to improve recovery of complete clone sequences and to avoid circumstances that generate unrecoverable clone sequences, our results demonstrate that pooled sequencing represents an effective and low-cost alternative for sequencing large sets of metagenomic clones.

  8. Communicating the Benefits of a Full Sequence of High School Science Courses

    Science.gov (United States)

    Nicholas, Catherine Marie

    2014-01-01

    High school students are generally uninformed about the benefits of enrolling in a full sequence of science courses, therefore only about a third of our nation's high school graduates have completed the science sequence of Biology, Chemistry and Physics. The lack of students completing a full sequence of science courses contributes to the deficit…

  9. Alan Greenspan, the confidence strategy

    Directory of Open Access Journals (Sweden)

    Edwin Le Heron

    2006-12-01

    Full Text Available To evaluate the Greenspan era, we nevertheless need to address three questions: Is his success due to talent or just luck? Does he have a system of monetary policy or is he himself the system? What will be his legacy? Greenspan was certainly lucky, but he was also clairvoyant. Above all, he has developed a profoundly original monetary policy. His confidence strategy is clearly opposed to the credibility strategy developed in central banks and the academic milieu after 1980, but also inflation targeting, which today constitutes the mainstream monetary policy regime. The question of his legacy seems more nuanced. However, Greenspan will remain 'for a considerable period of time' a highly heterodox and original central banker. His political vision, his perception of an uncertain world, his pragmatism and his openness form the structure of a powerful alternative system, the confidence strategy, which will leave its mark on the history of monetary policy.

  10. Reclaim your creative confidence.

    Science.gov (United States)

    Kelley, Tom; Kelley, David

    2012-12-01

    Most people are born creative. But over time, a lot of us learn to stifle those impulses. We become warier of judgment, more cautious more analytical. The world seems to divide into "creatives" and "noncreatives," and too many people resign themselves to the latter category. And yet we know that creativity is essential to success in any discipline or industry. The good news, according to authors Tom Kelley and David Kelley of IDEO, is that we all can rediscover our creative confidence. The trick is to overcome the four big fears that hold most of us back: fear of the messy unknown, fear of judgment, fear of the first step, and fear of losing control. The authors use an approach based on the work of psychologist Albert Bandura in helping patients get over their snake phobias: You break challenges down into small steps and then build confidence by succeeding on one after another. Creativity is something you practice, say the authors, not just a talent you are born with.

  11. Highly Informative Simple Sequence Repeat (SSR) Markers for Fingerprinting Hazelnut

    Science.gov (United States)

    Simple sequence repeat (SSR) or microsatellite markers have many applications in breeding and genetic studies of plants, including fingerprinting of cultivars and investigations of genetic diversity, and therefore provide information for better management of germplasm collections. They are repeatab...

  12. Interrelation of economic confidence with other types of confidence

    OpenAIRE

    Бонецький, Орест Олегович

    2013-01-01

    The paper gives the object and the subject of the study, which are used as a criterion allowing to separate the economic confidence from other types of confidence. The terms describing the psychological and sociological confidence are proposed. It was found that the economic confidence is interrelated with psychological confidence by motivation and advertising, sociological – by the results of activity of public organizations, state regulation of the economy. On the example of information-com...

  13. Picking Funds with Confidence

    DEFF Research Database (Denmark)

    Grønborg, Niels Strange; Lunde, Asger; Timmermann, Allan

    We present a new approach to selecting active mutual funds that uses both holdings and return information to eliminate funds with predicted inferior performance through a sequence of pair-wise comparisons. Our methodology determines both the number of skilled funds and their identity, funds ident...

  14. Regional Competition for Confidence: Features of Formation

    Directory of Open Access Journals (Sweden)

    Irina Svyatoslavovna Vazhenina

    2016-09-01

    Full Text Available The increase in economic independence of the regions inevitably leads to an increase in the quality requirements of the regional economic policy. The key to successful regional policy, both during its development and implementation, is the understanding of the necessity of gaining confidence (at all levels, and the inevitable participation in the competition for confidence. The importance of confidence in the region is determined by its value as a competitive advantage in the struggle for partners, resources and tourists, and attracting investments. In today’s environment the focus of governments, regions and companies on long-term cooperation is clearly expressed, which is impossible without a high level of confidence between partners. Therefore, the most important competitive advantages of territories are intangible assets such as an attractive image and a good reputation, which builds up confidence of the population and partners. The higher the confidence in the region is, the broader is the range of potential partners, the larger is the planning horizon of long-term concerted action, the better are the chances of acquiring investment, the higher is the level of competitive immunity of the territories. The article defines competition for confidence as purposeful behavior of a market participant in economic environment, aimed at acquiring specific intangible competitive advantage – the confidence of the largest possible number of other market actors. The article also highlights the specifics of confidence as a competitive goal, presents factors contributing to the destruction of confidence, proposes a strategy to fight for confidence as a program of four steps, considers the factors which integrate regional confidence and offers several recommendations for the establishment of effective regional competition for confidence

  15. Prediction Models of Retention Indices for Increased Confidence in Structural Elucidation during Complex Matrix Analysis: Application to Gas Chromatography Coupled with High-Resolution Mass Spectrometry.

    Science.gov (United States)

    Dossin, Eric; Martin, Elyette; Diana, Pierrick; Castellon, Antonio; Monge, Aurelien; Pospisil, Pavel; Bentley, Mark; Guy, Philippe A

    2016-08-02

    Monitoring of volatile and semivolatile compounds was performed using gas chromatography (GC) coupled to high-resolution electron ionization mass spectrometry, using both headspace and liquid injection modes. A total of 560 reference compounds, including 8 odd n-alkanes, were analyzed and experimental linear retention indices (LRI) were determined. These reference compounds were randomly split into training (n = 401) and test (n = 151) sets. LRI for all 552 reference compounds were also calculated based upon computational Quantitative Structure-Property Relationship (QSPR) models, using two independent approaches RapidMiner (coupled to Dragon) and ACD/ChromGenius software. Correlation coefficients for experimental versus predicted LRI values calculated for both training and test set compounds were calculated at 0.966 and 0.949 for RapidMiner and at 0.977 and 0.976 for ACD/ChromGenius, respectively. In addition, the cross-validation correlation was calculated at 0.96 from RapidMiner and the residual standard error value obtained from ACD/ChromGenius was 53.635. These models were then used to predict LRI values for several thousand compounds reported present in tobacco and tobacco-related fractions, plus a range of specific flavor compounds. It was demonstrated that using the mean of the LRI values predicted by RapidMiner and ACD/ChromGenius, in combination with accurate mass data, could enhance the confidence level for compound identification from the analysis of complex matrixes, particularly when the two predicted LRI values for a compound were in close agreement. Application of this LRI modeling approach to matrixes with unknown composition has already enabled the confirmation of 23 postulated compounds, demonstrating its ability to facilitate compound identification in an analytical workflow. The goal is to reduce the list of putative candidates to a reasonable relevant number that can be obtained and measured for confirmation.

  16. Simulation integration with confidence

    Science.gov (United States)

    Strelich, Tom; Stalcup, Bruce W.

    1999-07-01

    Current financial, schedule and risk constraints mandate reuse of software components when building large-scale simulations. While integration of simulation components into larger systems is a well-understood process, it is extremely difficult to do while ensuring that the results are correct. Illgen Simulation Technologies Incorporated and Litton PRC have joined forces to provide tools to integrate simulations with confidence. Illgen Simulation Technologies has developed an extensible and scaleable, n-tier, client- server, distributed software framework for integrating legacy simulations, models, tools, utilities, and databases. By utilizing the Internet, Java, and the Common Object Request Brokering Architecture as the core implementation technologies, the framework provides built-in scalability and extensibility.

  17. High-throughput, high-fidelity HLA genotyping with deep sequencing.

    Science.gov (United States)

    Wang, Chunlin; Krishnakumar, Sujatha; Wilhelmy, Julie; Babrzadeh, Farbod; Stepanyan, Lilit; Su, Laura F; Levinson, Douglas; Fernandez-Viña, Marcelo A; Davis, Ronald W; Davis, Mark M; Mindrinos, Michael

    2012-05-29

    Human leukocyte antigen (HLA) genes are the most polymorphic in the human genome. They play a pivotal role in the immune response and have been implicated in numerous human pathologies, especially autoimmunity and infectious diseases. Despite their importance, however, they are rarely characterized comprehensively because of the prohibitive cost of standard technologies and the technical challenges of accurately discriminating between these highly related genes and their many allelles. Here we demonstrate a high-resolution, and cost-effective methodology to type HLA genes by sequencing, which combines the advantage of long-range amplification, the power of high-throughput sequencing platforms, and a unique genotyping algorithm. We calibrated our method for HLA-A, -B, -C, and -DRB1 genes with both reference cell lines and clinical samples and identified several previously undescribed alleles with mismatches, insertions, and deletions. We have further demonstrated the utility of this method in a clinical setting by typing five clinical samples in an Illumina MiSeq instrument with a 5-d turnaround. Overall, this technology has the capacity to deliver low-cost, high-throughput, and accurate HLA typing by multiplexing thousands of samples in a single sequencing run, which will enable comprehensive disease-association studies with large cohorts. Furthermore, this approach can also be extended to include other polymorphic genes.

  18. PhyPA: Phylogenetic method with pairwise sequence alignment outperforms likelihood methods in phylogenetics involving highly diverged sequences.

    Science.gov (United States)

    Xia, Xuhua

    2016-09-01

    While pairwise sequence alignment (PSA) by dynamic programming is guaranteed to generate one of the optimal alignments, multiple sequence alignment (MSA) of highly divergent sequences often results in poorly aligned sequences, plaguing all subsequent phylogenetic analysis. One way to avoid this problem is to use only PSA to reconstruct phylogenetic trees, which can only be done with distance-based methods. I compared the accuracy of this new computational approach (named PhyPA for phylogenetics by pairwise alignment) against the maximum likelihood method using MSA (the ML+MSA approach), based on nucleotide, amino acid and codon sequences simulated with different topologies and tree lengths. I present a surprising discovery that the fast PhyPA method consistently outperforms the slow ML+MSA approach for highly diverged sequences even when all optimization options were turned on for the ML+MSA approach. Only when sequences are not highly diverged (i.e., when a reliable MSA can be obtained) does the ML+MSA approach outperforms PhyPA. The true topologies are always recovered by ML with the true alignment from the simulation. However, with MSA derived from alignment programs such as MAFFT or MUSCLE, the recovered topology consistently has higher likelihood than that for the true topology. Thus, the failure to recover the true topology by the ML+MSA is not because of insufficient search of tree space, but by the distortion of phylogenetic signal by MSA methods. I have implemented in DAMBE PhyPA and two approaches making use of multi-gene data sets to derive phylogenetic support for subtrees equivalent to resampling techniques such as bootstrapping and jackknifing.

  19. Probabilistic Methods for Processing High-Throughput Sequencing Signals

    DEFF Research Database (Denmark)

    Sørensen, Lasse Maretty

    correctly and the final output of a typical experiment thus consists of hundreds of millions of error-containing sequence fragments. This thesis concerns the development of methods for transforming such a raw sequencing signal into a simpler representation from which biological inferences can then be made....... Importantly, the fact that the fragments are short and contain errors implies that there may be significant uncertainty associated with the signal. By using probabilistic models, we are able to quantify this uncertainty and propagate it to downstream analyses. The first chapter describes a new method...

  20. Sequence-Specific Covalent Capture Coupled with High-Contrast Nanopore Detection of a Disease-Derived Nucleic Acid Sequence.

    Science.gov (United States)

    Nejad, Maryam Imani; Shi, Ruicheng; Zhang, Xinyue; Gu, Li-Qun; Gates, Kent S

    2017-07-18

    Hybridization-based methods for the detection of nucleic acid sequences are important in research and medicine. Short probes provide sequence specificity, but do not always provide a durable signal. Sequence-specific covalent crosslink formation can anchor probes to target DNA and might also provide an additional layer of target selectivity. Here, we developed a new crosslinking reaction for the covalent capture of specific nucleic acid sequences. This process involved reaction of an abasic (Ap) site in a probe strand with an adenine residue in the target strand and was used for the detection of a disease-relevant T→A mutation at position 1799 of the human BRAF kinase gene sequence. Ap-containing probes were easily prepared and displayed excellent specificity for the mutant sequence under isothermal assay conditions. It was further shown that nanopore technology provides a high contrast-in essence, digital-signal that enables sensitive, single-molecule sensing of the cross-linked duplexes. © 2017 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.

  1. Automated and high confidence protein phosphorylation site localization using complementary collision-activated dissociation and electron transfer dissociation tandem mass spectrometry

    DEFF Research Database (Denmark)

    Hansen, Thomas A; Sylvester, Marc; Jensen, Ole N

    2012-01-01

    -activated dissociation and electron transfer dissociation, an approach termed the Cscore. The scoring algorithm used in the Cscore was adapted from the widely used Ascore method. The analytical benefit of integrating the product ion information of both ETD and CAD data are evident by increased confidence in phospho...

  2. High sequence conservation among cucumber mosaic virus isolates from Lily

    NARCIS (Netherlands)

    Chen, Y.K.; Derks, A.F.L.M.; Langeveld, S.; Goldbach, R.; Prins, M.

    2001-01-01

    For classification of Cucumber mosaic virus (CMV) isolates from ornamental crops of different geographical areas, these were characterized by comparing the nucleotide sequences of RNAs 4 and the encoded coat proteins. Within the ornamental-infecting CMV viruses both subgroups were represented. CMV i

  3. Putting Physics First: Three Case Studies of High School Science Department and Course Sequence Reorganization

    Science.gov (United States)

    Larkin, Douglas B.

    2016-01-01

    This article examines the process of shifting to a "Physics First" sequence in science course offerings in three school districts in the United States. This curricular sequence reverses the more common U.S. high school sequence of biology/chemistry/physics, and has gained substantial support in the physics education community over the…

  4. A 454 multiplex sequencing method for rapid and reliable genotyping of highly polymorphic genes in large-scale studies

    Directory of Open Access Journals (Sweden)

    Charbonnel Nathalie

    2010-05-01

    Full Text Available Abstract Background High-throughput sequencing technologies offer new perspectives for biomedical, agronomical and evolutionary research. Promising progresses now concern the application of these technologies to large-scale studies of genetic variation. Such studies require the genotyping of high numbers of samples. This is theoretically possible using 454 pyrosequencing, which generates billions of base pairs of sequence data. However several challenges arise: first in the attribution of each read produced to its original sample, and second, in bioinformatic analyses to distinguish true from artifactual sequence variation. This pilot study proposes a new application for the 454 GS FLX platform, allowing the individual genotyping of thousands of samples in one run. A probabilistic model has been developed to demonstrate the reliability of this method. Results DNA amplicons from 1,710 rodent samples were individually barcoded using a combination of tags located in forward and reverse primers. Amplicons consisted in 222 bp fragments corresponding to DRB exon 2, a highly polymorphic gene in mammals. A total of 221,789 reads were obtained, of which 153,349 were finally assigned to original samples. Rules based on a probabilistic model and a four-step procedure, were developed to validate sequences and provide a confidence level for each genotype. The method gave promising results, with the genotyping of DRB exon 2 sequences for 1,407 samples from 24 different rodent species and the sequencing of 392 variants in one half of a 454 run. Using replicates, we estimated that the reproducibility of genotyping reached 95%. Conclusions This new approach is a promising alternative to classical methods involving electrophoresis-based techniques for variant separation and cloning-sequencing for sequence determination. The 454 system is less costly and time consuming and may enhance the reliability of genotypes obtained when high numbers of samples are studied

  5. The computation of Buehler confidence limits

    Institute of Scientific and Technical Information of China (English)

    FANG; Xiangzhong; CHEN; Jiading

    2005-01-01

    In medicine and industry, small sample size often arises owing to the high test cost. Then exact confidence inference is important. Buehler confidence limit is a kind of exact confidence limit for the function of parameters in a model. It can be always defined if the order in sample space is given. But the computing problem is often difficult, especially for the cases with high dimension parameter or with incomplete data. This paper presents an algorithm to compute the Buehler confidence limits by EM algorithm. This is the firsttime usage of EM algorithm on Buehler confidence limits, but the algorithm is often used for maximum likelihood estimate in literatures. Three computation examples are given to illustrate the method.

  6. Zero-field nuclear magnetic resonance in high field by modulated rf sequences.

    Science.gov (United States)

    Nishiyama, Yusuke; Yamazaki, Toshio

    2007-04-07

    The authors propose a novel approach to design and evaluate sequences for zero-field NMR spectra in high field (ZFHF) by using amplitude and phase modulated rf sequences. ZFHF provide sharp peaks for the dipolar interaction between two nuclear spins even if the orientation of the molecules is distributed. The internuclear distance r can be directly obtained from the peak position which is proportional to r-3. Numerous ZFHF sequences are obtained. A sequence is selected from them by the systematic evaluation of the sequences. The new ZFHF sequence is less affected by chemical shift anisotropy (CSA) than the previous sequences; the sequence can be used for systems with large CSA such as a dipolar coupled 13C-pair system under realistically high field. 13C ZFHF spectra of 13C2 diammonium succinate and 13C2 diammonium oxalate were observed under the 9.4 T field.

  7. Forecasting Ecological Genomics: High-Tech Animal Instrumentation Meets High-Throughput Sequencing.

    Science.gov (United States)

    Shafer, Aaron B A; Northrup, Joseph M; Wikelski, Martin; Wittemyer, George; Wolf, Jochen B W

    2016-01-01

    Recent advancements in animal tracking technology and high-throughput sequencing are rapidly changing the questions and scope of research in the biological sciences. The integration of genomic data with high-tech animal instrumentation comes as a natural progression of traditional work in ecological genetics, and we provide a framework for linking the separate data streams from these technologies. Such a merger will elucidate the genetic basis of adaptive behaviors like migration and hibernation and advance our understanding of fundamental ecological and evolutionary processes such as pathogen transmission, population responses to environmental change, and communication in natural populations.

  8. High-Throughput Sequencing Based Methods of RNA Structure Investigation

    DEFF Research Database (Denmark)

    Kielpinski, Lukasz Jan

    In this thesis we describe the development of four related methods for RNA structure probing that utilize massive parallel sequencing. Using them, we were able to gather structural data for multiple, long molecules simultaneously. First, we have established an easy to follow experimental and comp......In this thesis we describe the development of four related methods for RNA structure probing that utilize massive parallel sequencing. Using them, we were able to gather structural data for multiple, long molecules simultaneously. First, we have established an easy to follow experimental...... and computational protocol for detecting the reverse transcription termination sites (RTTS-Seq). This protocol was subsequently applied to hydroxyl radical footprinting of three dimensional RNA structures to give a probing signal that correlates well with the RNA backbone solvent accessibility. Moreover, we applied...

  9. Targeted high throughput sequencing in hereditary ataxia and spastic paraplegia

    Science.gov (United States)

    Koht, Jeanette; Pihlstrøm, Lasse; Rengmark, Aina H.; Henriksen, Sandra P.; Tallaksen, Chantal M. E.; Toft, Mathias

    2017-01-01

    Hereditary ataxia and spastic paraplegia are heterogeneous monogenic neurodegenerative disorders. To date, a large number of individuals with such disorders remain undiagnosed. Here, we have assessed molecular diagnosis by gene panel sequencing in 105 early and late-onset hereditary ataxia and spastic paraplegia probands, in whom extensive previous investigations had failed to identify the genetic cause of disease. Pathogenic and likely-pathogenic variants were identified in 20 probands (19%) and variants of uncertain significance in ten probands (10%). Together these accounted for 30 probands (29%) and involved 18 different genes. Among several interesting findings, dominantly inherited KIF1A variants, p.(Val8Met) and p.(Ile27Thr) segregated in two independent families, both presenting with a pure spastic paraplegia phenotype. Two homozygous missense variants, p.(Gly4230Ser) and p.(Leu4221Val) were found in SACS in one consanguineous family, presenting with spastic ataxia and isolated cerebellar atrophy. The average disease duration in probands with pathogenic and likely-pathogenic variants was 31 years, ranging from 4 to 51 years. In conclusion, this study confirmed and expanded the clinical phenotypes associated with known disease genes. The results demonstrate that gene panel sequencing and similar sequencing approaches can serve as efficient diagnostic tools for different heterogeneous disorders. Early use of such strategies may help to reduce both costs and time of the diagnostic process. PMID:28362824

  10. Identification of errors introduced during high throughput sequencing of the T cell receptor repertoire

    Directory of Open Access Journals (Sweden)

    Cheng Cheng

    2011-02-01

    Full Text Available Abstract Background Recent advances in massively parallel sequencing have increased the depth at which T cell receptor (TCR repertoires can be probed by >3log10, allowing for saturation sequencing of immune repertoires. The resolution of this sequencing is dependent on its accuracy, and direct assessments of the errors formed during high throughput repertoire analyses are limited. Results We analyzed 3 monoclonal TCR from TCR transgenic, Rag-/- mice using Illumina® sequencing. A total of 27 sequencing reactions were performed for each TCR using a trifurcating design in which samples were divided into 3 at significant processing junctures. More than 20 million complementarity determining region (CDR 3 sequences were analyzed. Filtering for lower quality sequences diminished but did not eliminate sequence errors, which occurred within 1-6% of sequences. Erroneous sequences were pre-dominantly of correct length and contained single nucleotide substitutions. Rates of specific substitutions varied dramatically in a position-dependent manner. Four substitutions, all purine-pyrimidine transversions, predominated. Solid phase amplification and sequencing rather than liquid sample amplification and preparation appeared to be the primary sources of error. Analysis of polyclonal repertoires demonstrated the impact of error accumulation on data parameters. Conclusions Caution is needed in interpreting repertoire data due to potential contamination with mis-sequence reads. However, a high association of errors with phred score, high relatedness of erroneous sequences with the parental sequence, dominance of specific nt substitutions, and skewed ratio of forward to reverse reads among erroneous sequences indicate approaches to filter erroneous sequences from repertoire data sets.

  11. Manufacturing of High-Strength and High-Ductility Pearlitic Steel Wires Using Noncircular Drawing Sequence

    Energy Technology Data Exchange (ETDEWEB)

    Baek, Hyun Moo; Joo, Ho Seon; Im, Yong-Taek [KAIST, Daejeon (Korea, Republic of); Hwang, Sun Kwang [KITECH, Cheonan (Korea, Republic of); Son, Il-Heon; Bae, Chul Min [POSCO, Pohang (Korea, Republic of)

    2014-07-15

    In this study, a noncircular drawing (NCD) sequence for manufacturing high-strength and high-ductility pearlitic steel wires was investigated. Multipass NCD was conducted up to the 12th pass at room temperature with two processing routes (defined as the NCDA and NCDB), and compared with the wire drawing (WD). During the torsion test, delamination fracture in the drawn wire was observed in the 10th pass of the WD whereas it was not observed until the 12th pass of the NCDB. From X-ray diffraction, the circular texture component that increases the likelihood of delamination fracture of the drawn wire was rarely observed in the NCDB. Thus, the improved ability of the multipass NCDB to manufacture high-strength pearlitic steel wires with high torsional ductility compared to the WD (by reducing the likelihood of delamination fracture) was demonstrated.

  12. Whole Genome Sequencing of Enterovirus species C Isolates by High-throughput Sequencing: Development of Generic Primers

    Directory of Open Access Journals (Sweden)

    Maël Bessaud

    2016-08-01

    Full Text Available Enteroviruses are among the most common viruses infecting humans and can cause diverse clinical syndromes ranging from minor febrile illness to severe and potentially fatal diseases. Enterovirus species C (EV-C consists of more than 20 types, among which the 3 serotypes of polioviruses, the etiological agents of poliomyelitis, are included. Biodiversity and evolution of EV-C genomes are shaped by frequent recombination events. Therefore, identification and characterization of circulating EV-C strains require the sequencing of different genomic regions.A simple method was developed to sequence quickly the entire genome of EV-C isolates. Four overlapping fragments were produced separately by RT-PCR performed with generic primers. The four amplicons were then pooled and purified prior to be sequenced by high-throughput technique.The method was assessed on a panel of EV-Cs belonging to a wide-range of types. It can be used to determine full-length genome sequences through de novo assembly of thousands of reads. It was also able to discriminate reads from closely related viruses in mixtures.By decreasing the workload compared to classical Sanger-based techniques, this method will serve as a precious tool for sequencing large panels of EV-Cs isolated in cell cultures during environmental surveillance or from patients, including vaccine-derived polioviruses.

  13. Whole Genome Sequencing of Enterovirus species C Isolates by High-Throughput Sequencing: Development of Generic Primers

    Science.gov (United States)

    Bessaud, Maël; Sadeuh-Mba, Serge A.; Joffret, Marie-Line; Razafindratsimandresy, Richter; Polston, Patsy; Volle, Romain; Rakoto-Andrianarivelo, Mala; Blondel, Bruno; Njouom, Richard; Delpeyroux, Francis

    2016-01-01

    Enteroviruses are among the most common viruses infecting humans and can cause diverse clinical syndromes ranging from minor febrile illness to severe and potentially fatal diseases. Enterovirus species C (EV-C) consists of more than 20 types, among which the three serotypes of polioviruses, the etiological agents of poliomyelitis, are included. Biodiversity and evolution of EV-C genomes are shaped by frequent recombination events. Therefore, identification and characterization of circulating EV-C strains require the sequencing of different genomic regions. A simple method was developed to quickly sequence the entire genome of EV-C isolates. Four overlapping fragments were produced separately by RT-PCR performed with generic primers. The four amplicons were then pooled and purified prior to being sequenced by a high-throughput technique. The method was assessed on a panel of EV-Cs belonging to a wide-range of types. It can be used to determine full-length genome sequences through de novo assembly of thousands of reads. It was also able to discriminate reads from closely related viruses in mixtures. By decreasing the workload compared to classical Sanger-based techniques, this method will serve as a precious tool for sequencing large panels of EV-Cs isolated in cell cultures during environmental surveillance or from patients, including vaccine-derived polioviruses. PMID:27617004

  14. Confidence and Cognitive Test Performance

    Science.gov (United States)

    Stankov, Lazar; Lee, Jihyun

    2008-01-01

    This article examines the nature of confidence in relation to abilities, personality, and metacognition. Confidence scores were collected during the administration of Reading and Listening sections of the Test of English as a Foreign Language Internet-Based Test (TOEFL iBT) to 824 native speakers of English. Those confidence scores were correlated…

  15. Confidence and Cognitive Test Performance

    Science.gov (United States)

    Stankov, Lazar; Lee, Jihyun

    2008-01-01

    This article examines the nature of confidence in relation to abilities, personality, and metacognition. Confidence scores were collected during the administration of Reading and Listening sections of the Test of English as a Foreign Language Internet-Based Test (TOEFL iBT) to 824 native speakers of English. Those confidence scores were correlated…

  16. Explorations in Statistics: Confidence Intervals

    Science.gov (United States)

    Curran-Everett, Douglas

    2009-01-01

    Learning about statistics is a lot like learning about science: the learning is more meaningful if you can actively explore. This third installment of "Explorations in Statistics" investigates confidence intervals. A confidence interval is a range that we expect, with some level of confidence, to include the true value of a population parameter…

  17. Fostering English Learners' Confidence

    Science.gov (United States)

    Bondie, Rhonda; Gaughran, Laurie; Zusho, Akane

    2014-01-01

    A teacher is doing something right when his high school students--kids with limited English, no less--form groups and begin discussing a lesson on quadratic equations at the start of class, without any teacher direction. Bondie, Gaughran, and Zusho describe "discussion routines" that teachers at International Community High School in the…

  18. High-throughput sequencing of black pepper root transcriptome

    Directory of Open Access Journals (Sweden)

    Gordo Sheila MC

    2012-09-01

    Full Text Available Abstract Background Black pepper (Piper nigrum L. is one of the most popular spices in the world. It is used in cooking and the preservation of food and even has medicinal properties. Losses in production from disease are a major limitation in the culture of this crop. The major diseases are root rot and foot rot, which are results of root infection by Fusarium solani and Phytophtora capsici, respectively. Understanding the molecular interaction between the pathogens and the host’s root region is important for obtaining resistant cultivars by biotechnological breeding. Genetic and molecular data for this species, though, are limited. In this paper, RNA-Seq technology has been employed, for the first time, to describe the root transcriptome of black pepper. Results The root transcriptome of black pepper was sequenced by the NGS SOLiD platform and assembled using the multiple-k method. Blast2Go and orthoMCL methods were used to annotate 10338 unigenes. The 4472 predicted proteins showed about 52% homology with the Arabidopsis proteome. Two root proteomes identified 615 proteins, which seem to define the plant’s root pattern. Simple-sequence repeats were identified that may be useful in studies of genetic diversity and may have applications in biotechnology and ecology. Conclusions This dataset of 10338 unigenes is crucially important for the biotechnological breeding of black pepper and the ecogenomics of the Magnoliids, a major group of basal angiosperms.

  19. Molecular characterization of a novel luteovirus from peach identified by high-throughput sequencing.

    Science.gov (United States)

    Wu, L-P; Liu, H-W; Bateman, M; Liu, Z; Li, R

    2017-05-26

    Contigs with sequence homologies to cherry-associated luteovirus were identified by high-throughput sequencing analysis in two peach accessions. Complete genomic sequences of the two isolates of this virus were determined to be 5,819 and 5,814 nucleotides long, respectively. The genome of the new virus is typical of luteoviruses, containing eight open reading frames in a very similar arrangement. Its genomic sequence is 58-74% identical to those of other members of the genus Luteovirus. These sequences thus belong to a new virus, which we have named "peach-associated luteovirus".

  20. Targeted Capture and High-Throughput Sequencing Using Molecular Inversion Probes (MIPs).

    Science.gov (United States)

    Cantsilieris, Stuart; Stessman, Holly A; Shendure, Jay; Eichler, Evan E

    2017-01-01

    Molecular inversion probes (MIPs) in combination with massively parallel DNA sequencing represent a versatile, yet economical tool for targeted sequencing of genomic DNA. Several thousand genomic targets can be selectively captured using long oligonucleotides containing unique targeting arms and universal linkers. The ability to append sequencing adaptors and sample-specific barcodes allows large-scale pooling and subsequent high-throughput sequencing at relatively low cost per sample. Here, we describe a "wet bench" protocol detailing the capture and subsequent sequencing of >2000 genomic targets from 192 samples, representative of a single lane on the Illumina HiSeq 2000 platform.

  1. Depletion of Abundant Sequences by Hybridization (DASH): using Cas9 to remove unwanted high-abundance species in sequencing libraries and molecular counting applications.

    Science.gov (United States)

    Gu, W; Crawford, E D; O'Donovan, B D; Wilson, M R; Chow, E D; Retallack, H; DeRisi, J L

    2016-03-04

    Next-generation sequencing has generated a need for a broadly applicable method to remove unwanted high-abundance species prior to sequencing. We introduce DASH (Depletion of Abundant Sequences by Hybridization). Sequencing libraries are 'DASHed' with recombinant Cas9 protein complexed with a library of guide RNAs targeting unwanted species for cleavage, thus preventing them from consuming sequencing space. We demonstrate a more than 99 % reduction of mitochondrial rRNA in HeLa cells, and enrichment of pathogen sequences in patient samples. We also demonstrate an application of DASH in cancer. This simple method can be adapted for any sample type and increases sequencing yield without additional cost.

  2. Very high resolution single pass HLA genotyping using amplicon sequencing on the 454 next generation DNA sequencers: Comparison with Sanger sequencing.

    Science.gov (United States)

    Yamamoto, F; Höglund, B; Fernandez-Vina, M; Tyan, D; Rastrou, M; Williams, T; Moonsamy, P; Goodridge, D; Anderson, M; Erlich, H A; Holcomb, C L

    2015-12-01

    Compared to Sanger sequencing, next-generation sequencing offers advantages for high resolution HLA genotyping including increased throughput, lower cost, and reduced genotype ambiguity. Here we describe an enhancement of the Roche 454 GS GType HLA genotyping assay to provide very high resolution (VHR) typing, by the addition of 8 primer pairs to the original 14, to genotype 11 HLA loci. These additional amplicons help resolve common and well-documented alleles and exclude commonly found null alleles in genotype ambiguity strings. Simplification of workflow to reduce the initial preparation effort using early pooling of amplicons or the Fluidigm Access Array™ is also described. Performance of the VHR assay was evaluated on 28 well characterized cell lines using Conexio Assign MPS software which uses genomic, rather than cDNA, reference sequence. Concordance was 98.4%; 1.6% had no genotype assignment. Of concordant calls, 53% were unambiguous. To further assess the assay, 59 clinical samples were genotyped and results compared to unambiguous allele assignments obtained by prior sequence-based typing supplemented with SSO and/or SSP. Concordance was 98.7% with 58.2% as unambiguous calls; 1.3% could not be assigned. Our results show that the amplicon-based VHR assay is robust and can replace current Sanger methodology. Together with software enhancements, it has the potential to provide even higher resolution HLA typing. Copyright © 2015. Published by Elsevier Inc.

  3. Explicit representation of confidence informs future value-based decisions

    DEFF Research Database (Denmark)

    Folke, Tomas; Jacobsen, Catrine; Fleming, Stephen M.

    2016-01-01

    Humans can reflect on decisions and report variable levels of confidence. But why maintain an explicit representation of confidence for choices that have already been made and therefore cannot be undone? Here we show that an explicit representation of confidence is harnessed for subsequent changes...... of mind. Specifically, when confidence is low, participants are more likely to change their minds when the same choice is presented again, an effect that is most pronounced in participants with greater fidelity in their confidence reports. Furthermore, we show that choices reported with high confidence...... of confidence has a positive impact on the quality of future value-based decisions....

  4. US outlook and German confidence : does the confidence channel work?

    OpenAIRE

    Horn, Gustav Adolf

    2003-01-01

    One channel of business cycle shock transmission which gained attraction only recently is the confidence channel. The aim of the paper is to find out whether the confidence channel is actually working between the US and Germany. This is analysed using times series methods. In contrast to other studies the direct informational content of leading US indicators for German producer confidence and the significance of asymmetric reactions is tested. The results show that there is a relationship bet...

  5. Confidence intervals for the MMPI-2.

    Science.gov (United States)

    Munley, P H

    1991-08-01

    The confidence intervals for the Minnesota Multiphasic Personality Inventory (MMPI-2) clinical scales were investigated. Based on the clinical scale reliabilities published in the MMPI-2 manual, estimated true scores, standard errors of measurement for estimated true scores, and 95% confidence intervals centered around estimated true scores were calculated at 5-point MMPI-2 T-score intervals. The relationships between obtained T-scores, estimated true T-scores, scale reliabilities, and confidence intervals are discussed. The possible role of error measurement in defining scale high point and code types is noted.

  6. First Year K-12 Teachers as High Leverage Point to Implement GEMS Space Science Curriculum Sequence

    Science.gov (United States)

    Slater, Timothy F.; Mendez, B. J.; Schultz, G.; Wierman, T.

    2013-01-01

    The recurring challenge for curriculum developers is how to efficiently prepare K-12 classroom teachers to use new curricula. First-year teachers, numbering nearly 250,000 in the US each year, have the greatest potential to impact the largest number of students because they have potential to be in the classroom for thirty years. At the same time, these novice teachers are often the most open minded about adopting curricular innovation because they are not yet deeply entrenched in existing practices. To take advantage of this high leverage point, a collaborative of space scientists and science educators at the University of California, Berkeley’s Lawrence Hall of Science and Center for Science Education at the Space Sciences Laboratory with experts from the Astronomical Society of the Pacific, the University of Wyoming, and the CAPER Center for Astronomy & Physics Education experimented with a unique professional development model focused on helping master teachers work closely with pre-service teachers during their student teaching internship field experience. The Advancing Mentor and Novice Teachers in Space Science (AMANTISS) team first identified master teachers who supervise novice, student teachers and trained these master teachers to use the GEMS Space Science Curriculum Sequence. Then, these master teachers were mentored in coaching interning student teachers assigned to them in using GEMS materials. Evaluation showed that novice teachers mentored by the master teachers felt knowledgeable after teaching the GEMS units. However, they seemed relatively less confident about the solar system and objects beyond the solar system. Overall, mentees felt strongly at the end of the year that they have acquired good strategies for teaching the various topics, suggesting that the support they received while teaching and working with a mentor was of real benefit to them. Funding provided in part by NASA ROSES AMANTISS NNX09AD51G

  7. Weighting Mean and Variability during Confidence Judgments

    Science.gov (United States)

    de Gardelle, Vincent; Mamassian, Pascal

    2015-01-01

    Humans can not only perform some visual tasks with great precision, they can also judge how good they are in these tasks. However, it remains unclear how observers produce such metacognitive evaluations, and how these evaluations might be dissociated from the performance in the visual task. Here, we hypothesized that some stimulus variables could affect confidence judgments above and beyond their impact on performance. In a motion categorization task on moving dots, we manipulated the mean and the variance of the motion directions, to obtain a low-mean low-variance condition and a high-mean high-variance condition with matched performances. Critically, in terms of confidence, observers were not indifferent between these two conditions. Observers exhibited marked preferences, which were heterogeneous across individuals, but stable within each observer when assessed one week later. Thus, confidence and performance are dissociable and observers’ confidence judgments put different weights on the stimulus variables that limit performance. PMID:25793275

  8. Recent Progress Using High-throughput Sequencing Technologies in Plant Molecular Breeding

    Institute of Scientific and Technical Information of China (English)

    Qiang Gao; Guidong Yue; Wenqi Li; Junyi Wang; Jiaohui Xu; Ye Yin

    2012-01-01

    High-throughput sequencing is a revolutionary technological innovation in DNA sequencing.This technology has an ultra-low cost per base of sequencing and an overwhelmingly high data output.High-throughput sequencing has brought novel research methods and solutions to the research fields of genomics and post-genomics.Furthermore,this technology is leading to a new molecular breeding revolution that has landmark significance for scientific research and enables us to launch multi-level,multifaceted,and multi-extent studies in the fields of crop genetics,genomics,and crop breeding.In this paper,we review progress in the application of high-throughput sequencing technologies to plant molecular breeding studies.

  9. Characterization of a highly repeated DNA sequence family in five species of the genus Eulemur.

    Science.gov (United States)

    Ventura, M; Boniotto, M; Cardone, M F; Fulizio, L; Archidiacono, N; Rocchi, M; Crovella, S

    2001-09-19

    The karyotypes of Eulemur species exhibit a high degree of variation, as a consequence of the Robertsonian fusion and/or centromere fission. Centromeric and pericentromeric heterochromatin of eulemurs is constituted by highly repeated DNA sequences (including some telomeric TTAGGG repeats) which have so far been investigated and used for the study of the systematic relationships of the different species of the genus Eulemur. In our study, we have cloned a set of repetitive pericentromeric sequences of five Eulemur species: E. fulvus fulvus (EFU), E. mongoz (EMO), E. macaco (EMA), E. rubriventer (ERU), and E. coronatus (ECO). We have characterized these clones by sequence comparison and by comparative fluorescence in situ hybridization analysis in EMA and EFU. Our results showed a high degree of sequence similarity among Eulemur species, indicating a strong conservation, within the five species, of these pericentromeric highly repeated DNA sequences.

  10. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega.

    Science.gov (United States)

    Sievers, Fabian; Wilm, Andreas; Dineen, David; Gibson, Toby J; Karplus, Kevin; Li, Weizhong; Lopez, Rodrigo; McWilliam, Hamish; Remmert, Michael; Söding, Johannes; Thompson, Julie D; Higgins, Desmond G

    2011-10-11

    Multiple sequence alignments are fundamental to many sequence analysis methods. Most alignments are computed using the progressive alignment heuristic. These methods are starting to become a bottleneck in some analysis pipelines when faced with data sets of the size of many thousands of sequences. Some methods allow computation of larger data sets while sacrificing quality, and others produce high-quality alignments, but scale badly with the number of sequences. In this paper, we describe a new program called Clustal Omega, which can align virtually any number of protein sequences quickly and that delivers accurate alignments. The accuracy of the package on smaller test cases is similar to that of the high-quality aligners. On larger data sets, Clustal Omega outperforms other packages in terms of execution time and quality. Clustal Omega also has powerful features for adding sequences to and exploiting information in existing alignments, making use of the vast amount of precomputed information in public databases like Pfam.

  11. A Multilocus Sequence Typing System (MLST) reveals a high level of diversity and a genetic component to Entamoeba histolytica virulence

    Science.gov (United States)

    2012-01-01

    Background The outcome of an Entamoeba histolytica infection is variable and can result in either asymptomatic carriage, immediate or latent disease (diarrhea/dysentery/amebic liver abscess). An E. histolytica multilocus genotyping system based on tRNA gene-linked arrays has shown that genetic differences exist among parasites isolated from patients with different symptoms however, the tRNA gene-linked arrays cannot be located in the current assembly of the E. histolytica Reference genome (strain HM-1:IMSS) and are highly variable. Results To probe the population structure of E. histolytica and identify genetic markers associated with clinical outcome we identified in E. histolytica positive samples selected single nucleotide polymorphisms (SNPs) by multiplexed massive parallel sequencing. Profile SNPs were selected which, compared to the reference strain HM-1:IMSS sequence, changed an encoded amino acid at the SNP position, and were present in independent E. histolytica isolates from different geographical origins. The samples used in this study contained DNA isolated from either xenic strains of E. histolytica trophozoites established in culture or E. histolytica positive clinical specimens (stool and amebic liver abscess aspirates). A record of the SNPs present at 16 loci out of the original 21 candidate targets was obtained for 63 of the initial 84 samples (63% of asymptomatically colonized stool samples, 80% of diarrheal stool, 73% of xenic cultures and 84% of amebic liver aspirates). The sequences in all the 63 samples both passed sequence quality control metrics and also had the required greater than 8X sequence coverage for all 16 SNPs in order to confidently identify variants. Conclusions Our work is in agreement with previous findings of extensive diversity among E. histolytica isolates from the same geographic origin. In phylogenetic trees, only four of the 63 samples were able to group in two sets of two with greater than 50% confidence. Two SNPs in the

  12. Better Confidence Intervals for Importance Sampling

    OpenAIRE

    HALIS SAK; WOLFGANG HÖRMANN; JOSEF LEYDOLD

    2010-01-01

    It is well known that for highly skewed distributions the standard method of using the t statistic for the confidence interval of the mean does not give robust results. This is an important problem for importance sampling (IS) as its final distribution is often skewed due to a heavy tailed weight distribution. In this paper, we first explain Hall's transformation and its variants to correct the confidence interval of the mean and then evaluate the performance of these methods for two numerica...

  13. Sources of PCR-induced distortions in high-throughput sequencing data sets

    Science.gov (United States)

    Kebschull, Justus M.; Zador, Anthony M.

    2015-01-01

    PCR permits the exponential and sequence-specific amplification of DNA, even from minute starting quantities. PCR is a fundamental step in preparing DNA samples for high-throughput sequencing. However, there are errors associated with PCR-mediated amplification. Here we examine the effects of four important sources of error—bias, stochasticity, template switches and polymerase errors—on sequence representation in low-input next-generation sequencing libraries. We designed a pool of diverse PCR amplicons with a defined structure, and then used Illumina sequencing to search for signatures of each process. We further developed quantitative models for each process, and compared predictions of these models to our experimental data. We find that PCR stochasticity is the major force skewing sequence representation after amplification of a pool of unique DNA amplicons. Polymerase errors become very common in later cycles of PCR but have little impact on the overall sequence distribution as they are confined to small copy numbers. PCR template switches are rare and confined to low copy numbers. Our results provide a theoretical basis for removing distortions from high-throughput sequencing data. In addition, our findings on PCR stochasticity will have particular relevance to quantification of results from single cell sequencing, in which sequences are represented by only one or a few molecules. PMID:26187991

  14. FASTAptamer: A Bioinformatic Toolkit for High-throughput Sequence Analysis of Combinatorial Selections

    Directory of Open Access Journals (Sweden)

    Khalid K Alam

    2015-01-01

    Full Text Available High-throughput sequence (HTS analysis of combinatorial selection populations accelerates lead discovery and optimization and offers dynamic insight into selection processes. An underlying principle is that selection enriches high-fitness sequences as a fraction of the population, whereas low-fitness sequences are depleted. HTS analysis readily provides the requisite numerical information by tracking the evolutionary trajectory of individual sequences in response to selection pressures. Unlike genomic data, for which a number of software solutions exist, user-friendly tools are not readily available for the combinatorial selections field, leading many users to create custom software. FASTAptamer was designed to address the sequence-level analysis needs of the field. The open source FASTAptamer toolkit counts, normalizes and ranks read counts in a FASTQ file, compares populations for sequence distribution, generates clusters of sequence families, calculates fold-enrichment of sequences throughout the course of a selection and searches for degenerate sequence motifs. While originally designed for aptamer selections, FASTAptamer can be applied to any selection strategy that can utilize next-generation DNA sequencing, such as ribozyme or deoxyribozyme selections, in vivo mutagenesis and various surface display technologies (peptide, antibody fragment, mRNA, etc.. FASTAptamer software, sample data and a user's guide are available for download at http://burkelab.missouri.edu/fastaptamer.html.

  15. A high-throughput de novo sequencing approach for shotgun proteomics using high-resolution tandem mass spectrometry

    Directory of Open Access Journals (Sweden)

    Banfield Jillian F

    2010-03-01

    Full Text Available Abstract Background High-resolution tandem mass spectra can now be readily acquired with hybrid instruments, such as LTQ-Orbitrap and LTQ-FT, in high-throughput shotgun proteomics workflows. The improved spectral quality enables more accurate de novo sequencing for identification of post-translational modifications and amino acid polymorphisms. Results In this study, a new de novo sequencing algorithm, called Vonode, has been developed specifically for analysis of such high-resolution tandem mass spectra. To fully exploit the high mass accuracy of these spectra, a unique scoring system is proposed to evaluate sequence tags based primarily on mass accuracy information of fragment ions. Consensus sequence tags were inferred for 11,422 spectra with an average peptide length of 5.5 residues from a total of 40,297 input spectra acquired in a 24-hour proteomics measurement of Rhodopseudomonas palustris. The accuracy of inferred consensus sequence tags was 84%. According to our comparison, the performance of Vonode was shown to be superior to the PepNovo v2.0 algorithm, in terms of the number of de novo sequenced spectra and the sequencing accuracy. Conclusions Here, we improved de novo sequencing performance by developing a new algorithm specifically for high-resolution tandem mass spectral data. The Vonode algorithm is freely available for download at http://compbio.ornl.gov/Vonode.

  16. Effective DNA fragmentation technique for simple sequence repeat detection with a microsatellite-enriched library and high-throughput sequencing.

    Science.gov (United States)

    Tanaka, Keisuke; Ohtake, Rumi; Yoshida, Saki; Shinohara, Takashi

    2017-04-01

    Two different techniques for genomic DNA fragmentation before microsatellite-enriched library construction-restriction enzyme (NlaIII and MseI) digestion and sonication-were compared to examine their effects on simple sequence repeat (SSR) detection using high-throughput sequencing. Tens of thousands of SSR regions from 5 species of the plant family Myrtaceae were detected when the output of individual samples was >1 million paired-end reads. Comparison of the two DNA fragmentation techniques showed that restriction enzyme digestion was superior to sonication for identification of heterozygous genotypes, whereas sonication was superior for detection of various SSR flanking regions with both species-specific and common characteristics. Therefore, choosing the most suitable DNA fragmentation method depends on the type of analysis that is planned.

  17. Investigation of Human Cancers for Retrovirus by Low-Stringency Target Enrichment and High-Throughput Sequencing

    DEFF Research Database (Denmark)

    Vinner, Lasse; Mourier, Tobias; Friis-Nielsen, Jens

    2015-01-01

    small to be detected. Sequence variation among virus genomes complicates application of sequence-specific, and highly sensitive, PCR methods. Therefore, we aimed to develop and characterize a method that permits sensitive detection of sequences despite considerable variation. We demonstrate that our low...... biopsies. Nonetheless, our generally applicable method makes sensitive detection possible and permits sequencing of distantly related sequences from complex material....

  18. Extraction of high-molecular-weight genomic DNA for long-read sequencing of single molecules.

    Science.gov (United States)

    Mayjonade, Baptiste; Gouzy, Jérôme; Donnadieu, Cécile; Pouilly, Nicolas; Marande, William; Callot, Caroline; Langlade, Nicolas; Muños, Stéphane

    2016-10-01

    De novo sequencing of complex genomes is one of the main challenges for researchers seeking high-quality reference sequences. Many de novo assemblies are based on short reads, producing fragmented genome sequences. Third-generation sequencing, with read lengths >10 kb, will improve the assembly of complex genomes, but these techniques require high-molecular-weight genomic DNA (gDNA), and gDNA extraction protocols used for obtaining smaller fragments for short-read sequencing are not suitable for this purpose. Methods of preparing gDNA for bacterial artificial chromosome (BAC) libraries could be adapted, but these approaches are time-consuming, and commercial kits for these methods are expensive. Here, we present a protocol for rapid, inexpensive extraction of high-molecular-weight gDNA from bacteria, plants, and animals. Our technique was validated using sunflower leaf samples, producing a mean read length of 12.6 kb and a maximum read length of 80 kb.

  19. Molecular characterization and physical localization of highly repetitive DNA sequences from Brazilian Alstroemeria species

    NARCIS (Netherlands)

    Kuipers, A.G.J.; Kamstra, S.A.; Jeu, de M.J.; Jacobsen, E.

    2002-01-01

    Highly repetitive DNA sequences were isolated from genomic DNA libraries of Alstroemeria psittacina and A. inodora. Among the repetitive sequences that were isolated, tandem repeats as well as dispersed repeats could be discerned. The tandem repeats belonged to a family of interlinked Sau3A subfragm

  20. Using Next-Generation Sequencing to Explore Genetics and Race in the High School Classroom

    Science.gov (United States)

    Yang, Xinmiao; Hartman, Mark R.; Harrington, Kristin T.; Etson, Candice M.; Fierman, Matthew B.; Slonim, Donna K.; Walt, David R.

    2017-01-01

    With the development of new sequencing and bioinformatics technologies, concepts relating to personal genomics play an increasingly important role in our society. To promote interest and understanding of sequencing and bioinformatics in the high school classroom, we developed and implemented a laboratory-based teaching module called "The…

  1. Multiple Teaching Approaches, Teaching Sequence and Concept Retention in High School Physics Education

    Science.gov (United States)

    Fogarty, Ian; Geelan, David

    2013-01-01

    Students in 4 Canadian high school physics classes completed instructional sequences in two key physics topics related to motion--Straight Line Motion and Newton's First Law. Different sequences of laboratory investigation, teacher explanation (lecture) and the use of computer-based scientific visualizations (animations and simulations) were…

  2. Multiple Teaching Approaches, Teaching Sequence and Concept Retention in High School Physics Education

    Science.gov (United States)

    Fogarty, Ian; Geelan, David

    2013-01-01

    Students in 4 Canadian high school physics classes completed instructional sequences in two key physics topics related to motion--Straight Line Motion and Newton's First Law. Different sequences of laboratory investigation, teacher explanation (lecture) and the use of computer-based scientific visualizations (animations and simulations) were…

  3. Assessing Undergraduate Students' Conceptual Understanding and Confidence of Electromagnetics

    Science.gov (United States)

    Leppavirta, Johanna

    2012-01-01

    The study examines how students' conceptual understanding changes from high confidence with incorrect conceptions to high confidence with correct conceptions when reasoning about electromagnetics. The Conceptual Survey of Electricity and Magnetism test is weighted with students' self-rated confidence on each item in order to infer how strongly…

  4. Further evaluation of the high-probability instructional sequence with and without programmed reinforcement.

    Science.gov (United States)

    Wilder, David A; Majdalany, Lina; Sturkie, Latasha; Smeltz, Lindsay

    2015-09-01

    In 2 experiments, we examined the effects of programmed reinforcement for compliance with high-probability (high-p) instructions on compliance with low-probability (low-p) instructions. In Experiment 1, we compared the high-p sequence with and without programmed reinforcement (i.e., edible items) for compliance with high-p instructions. Results showed that the high-p sequence increased compliance with low-p instructions only when compliance with high-p instructions was followed by reinforcement. In Experiment 2, we examined the role of reinforcer quality by delivering a lower quality reinforcer (praise) for compliance with high-p instructions. Results of Experiment 2 showed that the high-p sequence with lower quality reinforcement did not improve compliance with low-p instructions; the addition of a higher quality reinforcer (i.e., edible items) contingent on compliance with high-p instructions did increase compliance with low-p instructions.

  5. A fast Boyer-Moore type pattern matching algorithm for highly similar sequences.

    Science.gov (United States)

    Ben Nsira, Nadia; Lecroq, Thierry; Elloumi, Mourad

    2015-01-01

    In the last decade, biology and medicine have undergone a fundamental change: next generation sequencing (NGS) technologies have enabled to obtain genomic sequences very quickly and at small costs compared to the traditional Sanger method. These NGS technologies have thus permitted to collect genomic sequences (genes, exomes or even full genomes) of individuals of the same species. These latter sequences are identical to more than 99%. There is thus a strong need for efficient algorithms for indexing and performing fast pattern matching in such specific sets of sequences. In this paper we propose a very efficient algorithm that solves the exact pattern matching problem in a set of highly similar DNA sequences where only the pattern can be pre-processed. This new algorithm extends variants of the Boyer-Moore exact string matching algorithm. Experimental results show that it exhibits the best performances in practice.

  6. Subfamily logos: visualization of sequence deviations at alignment positions with high information content

    Directory of Open Access Journals (Sweden)

    Beitz Eric

    2006-06-01

    Full Text Available Abstract Background Recognition of relevant sequence deviations can be valuable for elucidating functional differences between protein subfamilies. Interesting residues at highly conserved positions can then be mutated and experimentally analyzed. However, identification of such sites is tedious because automated approaches are scarce. Results Subfamily logos visualize subfamily-specific sequence deviations. The display is similar to classical sequence logos but extends into the negative range. Positive, upright characters correspond to residues which are characteristic for the subfamily, negative, upside-down characters to residues typical for the remaining sequences. The symbol height is adjusted to the information content of the alignment position. Residues which are conserved throughout do not appear. Conclusion Subfamily logos provide an intuitive display of relevant sequence deviations. The method has proven to be valid using a set of 135 aligned aquaporin sequences in which established subfamily-specific positions were readily identified by the algorithm.

  7. QTrim : a novel tool for the quality trimming of sequence reads generated using the Roche/454 sequencing platform

    OpenAIRE

    Shrestha, Ram; Lubinsky, Baruch; Bansode, Vijay B; Moinz, Mónica B. J.; McCormack, Grace P.; Travers, Simon A

    2014-01-01

    Background\\ud Many high throughput sequencing (HTS) approaches, such as the Roche/454 platform, produce sequences in which the quality of the sequence (as measured by a Phred-like quality scores) decreases linearly across a sequence read. Undertaking quality trimming of this data is essential to enable confidence in the results of subsequent downstream analysis. Here, we have developed a novel, highly sensitive and accurate approach (QTrim) for the quality trimming of sequence reads generated...

  8. Rapid genome mapping in nanochannel arrays for highly complete and accurate de novo sequence assembly of the complex Aegilops tauschii genome.

    Science.gov (United States)

    Hastie, Alex R; Dong, Lingli; Smith, Alexis; Finklestein, Jeff; Lam, Ernest T; Huo, Naxin; Cao, Han; Kwok, Pui-Yan; Deal, Karin R; Dvorak, Jan; Luo, Ming-Cheng; Gu, Yong; Xiao, Ming

    2013-01-01

    Next-generation sequencing (NGS) technologies have enabled high-throughput and low-cost generation of sequence data; however, de novo genome assembly remains a great challenge, particularly for large genomes. NGS short reads are often insufficient to create large contigs that span repeat sequences and to facilitate unambiguous assembly. Plant genomes are notorious for containing high quantities of repetitive elements, which combined with huge genome sizes, makes accurate assembly of these large and complex genomes intractable thus far. Using two-color genome mapping of tiling bacterial artificial chromosomes (BAC) clones on nanochannel arrays, we completed high-confidence assembly of a 2.1-Mb, highly repetitive region in the large and complex genome of Aegilops tauschii, the D-genome donor of hexaploid wheat (Triticum aestivum). Genome mapping is based on direct visualization of sequence motifs on single DNA molecules hundreds of kilobases in length. With the genome map as a scaffold, we anchored unplaced sequence contigs, validated the initial draft assembly, and resolved instances of misassembly, some involving contigs assembly from 75% to 95% complete.

  9. Rapid genome mapping in nanochannel arrays for highly complete and accurate de novo sequence assembly of the complex Aegilops tauschii genome.

    Directory of Open Access Journals (Sweden)

    Alex R Hastie

    Full Text Available Next-generation sequencing (NGS technologies have enabled high-throughput and low-cost generation of sequence data; however, de novo genome assembly remains a great challenge, particularly for large genomes. NGS short reads are often insufficient to create large contigs that span repeat sequences and to facilitate unambiguous assembly. Plant genomes are notorious for containing high quantities of repetitive elements, which combined with huge genome sizes, makes accurate assembly of these large and complex genomes intractable thus far. Using two-color genome mapping of tiling bacterial artificial chromosomes (BAC clones on nanochannel arrays, we completed high-confidence assembly of a 2.1-Mb, highly repetitive region in the large and complex genome of Aegilops tauschii, the D-genome donor of hexaploid wheat (Triticum aestivum. Genome mapping is based on direct visualization of sequence motifs on single DNA molecules hundreds of kilobases in length. With the genome map as a scaffold, we anchored unplaced sequence contigs, validated the initial draft assembly, and resolved instances of misassembly, some involving contigs <2 kb long, to dramatically improve the assembly from 75% to 95% complete.

  10. Highly conserved non-coding sequences are associated with vertebrate development.

    Directory of Open Access Journals (Sweden)

    Adam Woolfe

    2005-01-01

    Full Text Available In addition to protein coding sequence, the human genome contains a significant amount of regulatory DNA, the identification of which is proving somewhat recalcitrant to both in silico and functional methods. An approach that has been used with some success is comparative sequence analysis, whereby equivalent genomic regions from different organisms are compared in order to identify both similarities and differences. In general, similarities in sequence between highly divergent organisms imply functional constraint. We have used a whole-genome comparison between humans and the pufferfish, Fugu rubripes, to identify nearly 1,400 highly conserved non-coding sequences. Given the evolutionary divergence between these species, it is likely that these sequences are found in, and furthermore are essential to, all vertebrates. Most, and possibly all, of these sequences are located in and around genes that act as developmental regulators. Some of these sequences are over 90% identical across more than 500 bases, being more highly conserved than coding sequence between these two species. Despite this, we cannot find any similar sequences in invertebrate genomes. In order to begin to functionally test this set of sequences, we have used a rapid in vivo assay system using zebrafish embryos that allows tissue-specific enhancer activity to be identified. Functional data is presented for highly conserved non-coding sequences associated with four unrelated developmental regulators (SOX21, PAX6, HLXB9, and SHH, in order to demonstrate the suitability of this screen to a wide range of genes and expression patterns. Of 25 sequence elements tested around these four genes, 23 show significant enhancer activity in one or more tissues. We have identified a set of non-coding sequences that are highly conserved throughout vertebrates. They are found in clusters across the human genome, principally around genes that are implicated in the regulation of development

  11. Highly parallel translation of DNA sequences into small molecules.

    Directory of Open Access Journals (Sweden)

    Rebecca M Weisinger

    Full Text Available A large body of in vitro evolution work establishes the utility of biopolymer libraries comprising 10(10 to 10(15 distinct molecules for the discovery of nanomolar-affinity ligands to proteins. Small-molecule libraries of comparable complexity will likely provide nanomolar-affinity small-molecule ligands. Unlike biopolymers, small molecules can offer the advantages of cell permeability, low immunogenicity, metabolic stability, rapid diffusion and inexpensive mass production. It is thought that such desirable in vivo behavior is correlated with the physical properties of small molecules, specifically a limited number of hydrogen bond donors and acceptors, a defined range of hydrophobicity, and most importantly, molecular weights less than 500 Daltons. Creating a collection of 10(10 to 10(15 small molecules that meet these criteria requires the use of hundreds to thousands of diversity elements per step in a combinatorial synthesis of three to five steps. With this goal in mind, we have reported a set of mesofluidic devices that enable DNA-programmed combinatorial chemistry in a highly parallel 384-well plate format. Here, we demonstrate that these devices can translate DNA genes encoding 384 diversity elements per coding position into corresponding small-molecule gene products. This robust and efficient procedure yields small molecule-DNA conjugates suitable for in vitro evolution experiments.

  12. Bayesian estimation of keyword confidence in Chinese continuous speech recognition

    Institute of Scientific and Technical Information of China (English)

    HAO Jie; LI Xing

    2003-01-01

    In a syllable-based speaker-independent Chinese continuous speech recognition system based on classical Hidden Markov Model (HMM), a Bayesian approach of keyword confidence estimation is studied, which utilizes both acoustic layer scores and syllable-based statistical language model (LM) score. The Maximum a posteriori (MAP) confidence measure is proposed, and the forward-backward algorithm calculating the MAP confidence scores is deduced. The performance of the MAP confidence measure is evaluated in keyword spotting application and the experiment results show that the MAP confidence scores provide high discriminability for keyword candidates. Furthermore, the MAP confidence measure can be applied to various speech recognition applications.

  13. [Recent progress in gene mapping through high-throughput sequencing technology and forward genetic approaches].

    Science.gov (United States)

    Lu, Cairui; Zou, Changsong; Song, Guoli

    2015-08-01

    Traditional gene mapping using forward genetic approaches is conducted primarily through construction of a genetic linkage map, the process of which is tedious and time-consuming, and often results in low accuracy of mapping and large mapping intervals. With the rapid development of high-throughput sequencing technology and decreasing cost of sequencing, a variety of simple and quick methods of gene mapping through sequencing have been developed, including direct sequencing of the mutant genome, sequencing of selective mutant DNA pooling, genetic map construction through sequencing of individuals in population, as well as sequencing of transcriptome and partial genome. These methods can be used to identify mutations at the nucleotide level and has been applied in complex genetic background. Recent reports have shown that sequencing mapping could be even done without the reference of genome sequence, hybridization, and genetic linkage information, which made it possible to perform forward genetic study in many non-model species. In this review, we summarized these new technologies and their application in gene mapping.

  14. A High-Throughput Process for the Solid-Phase Purification of Synthetic DNA Sequences.

    Science.gov (United States)

    Grajkowski, Andrzej; Cieślak, Jacek; Beaucage, Serge L

    2017-06-19

    An efficient process for the purification of synthetic phosphorothioate and native DNA sequences is presented. The process is based on the use of an aminopropylated silica gel support functionalized with aminooxyalkyl functions to enable capture of DNA sequences through an oximation reaction with the keto function of a linker conjugated to the 5'-terminus of DNA sequences. Deoxyribonucleoside phosphoramidites carrying this linker, as a 5'-hydroxyl protecting group, have been synthesized for incorporation into DNA sequences during the last coupling step of a standard solid-phase synthesis protocol executed on a controlled pore glass (CPG) support. Solid-phase capture of the nucleobase- and phosphate-deprotected DNA sequences released from the CPG support is demonstrated to proceed near quantitatively. Shorter than full-length DNA sequences are first washed away from the capture support; the solid-phase purified DNA sequences are then released from this support upon reaction with tetra-n-butylammonium fluoride in dry dimethylsulfoxide (DMSO) and precipitated in tetrahydrofuran (THF). The purity of solid-phase-purified DNA sequences exceeds 98%. The simulated high-throughput and scalability features of the solid-phase purification process are demonstrated without sacrificing purity of the DNA sequences. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.

  15. Confidence scores for prediction models

    DEFF Research Database (Denmark)

    Gerds, Thomas Alexander; van de Wiel, MA

    2011-01-01

    modelling strategy is applied to different training sets. For each modelling strategy we estimate a confidence score based on the same repeated bootstraps. A new decomposition of the expected Brier score is obtained, as well as the estimates of population average confidence scores. The latter can be used...... to distinguish rival prediction models with similar prediction performances. Furthermore, on the subject level a confidence score may provide useful supplementary information for new patients who want to base a medical decision on predicted risk. The ideas are illustrated and discussed using data from cancer...

  16. APPLICATION OF ECONOMIC CONFIDENCE ESTIMATION

    Directory of Open Access Journals (Sweden)

    Valeriy S. Ayzatullen

    2013-01-01

    Full Text Available The socio-economic category of “trust” is studied in the article. The analysis of the existing views about the term “trust” was conducted. A model of the interaction of “Power - Business - People”, using the concept of “trust”, was made. The application and the structure of confidence estimations in economy and politics are studied. The accumulated experience of application of confidence estimations in the macroeconomics of the major countries of the world was showed. The current weaknesses of the confidence indexes are reflected.

  17. Highly effective sequencing whole chloroplast genomes of angiosperms by nine novel universal primer pairs.

    Science.gov (United States)

    Yang, Jun-Bo; Li, De-Zhu; Li, Hong-Tao

    2014-09-01

    Chloroplast genomes supply indispensable information that helps improve the phylogenetic resolution and even as organelle-scale barcodes. Next-generation sequencing technologies have helped promote sequencing of complete chloroplast genomes, but compared with the number of angiosperms, relatively few chloroplast genomes have been sequenced. There are two major reasons for the paucity of completely sequenced chloroplast genomes: (i) massive amounts of fresh leaves are needed for chloroplast sequencing and (ii) there are considerable gaps in the sequenced chloroplast genomes of many plants because of the difficulty of isolating high-quality chloroplast DNA, preventing complete chloroplast genomes from being assembled. To overcome these obstacles, all known angiosperm chloroplast genomes available to date were analysed, and then we designed nine universal primer pairs corresponding to the highly conserved regions. Using these primers, angiosperm whole chloroplast genomes can be amplified using long-range PCR and sequenced using next-generation sequencing methods. The primers showed high universality, which was tested using 24 species representing major clades of angiosperms. To validate the functionality of the primers, eight species representing major groups of angiosperms, that is, early-diverging angiosperms, magnoliids, monocots, Saxifragales, fabids, malvids and asterids, were sequenced and assembled their complete chloroplast genomes. In our trials, only 100 mg of fresh leaves was used. The results show that the universal primer set provided an easy, effective and feasible approach for sequencing whole chloroplast genomes in angiosperms. The designed universal primer pairs provide a possibility to accelerate genome-scale data acquisition and will therefore magnify the phylogenetic resolution and species identification in angiosperms. © 2014 John Wiley & Sons Ltd.

  18. Current impact and future directions of high throughput sequencing in plant virus diagnostics.

    Science.gov (United States)

    Massart, Sebastien; Olmos, Antonio; Jijakli, Haissam; Candresse, Thierry

    2014-08-08

    The ability to provide a fast, inexpensive and reliable diagnostic for any given viral infection is a key parameter in efforts to fight and control these ubiquitous pathogens. The recent developments of high-throughput sequencing (also called Next Generation Sequencing - NGS) technologies and bioinformatics have drastically changed the research on viral pathogens. It is now raising a growing interest for virus diagnostics. This review provides a snapshot vision on the current use and impact of high throughput sequencing approaches in plant virus characterization. More specifically, this review highlights the potential of these new technologies and their interplay with current protocols in the future of molecular diagnostic of plant viruses. The current limitations that will need to be addressed for a wider adoption of high-throughput sequencing in plant virus diagnostics are thoroughly discussed.

  19. A priori Considerations When Conducting High-Throughput Amplicon-Based Sequence Analysis

    Directory of Open Access Journals (Sweden)

    Aditi Sengupta

    2016-03-01

    Full Text Available Amplicon-based sequencing strategies that include 16S rRNA and functional genes, alongside “meta-omics” analyses of communities of microorganisms, have allowed researchers to pose questions and find answers to “who” is present in the environment and “what” they are doing. Next-generation sequencing approaches that aid microbial ecology studies of agricultural systems are fast gaining popularity among agronomy, crop, soil, and environmental science researchers. Given the rapid development of these high-throughput sequencing techniques, researchers with no prior experience will desire information about the best practices that can be used before actually starting high-throughput amplicon-based sequence analyses. We have outlined items that need to be carefully considered in experimental design, sampling, basic bioinformatics, sequencing of mock communities and negative controls, acquisition of metadata, and in standardization of reaction conditions as per experimental requirements. Not all considerations mentioned here may pertain to a particular study. The overall goal is to inform researchers about considerations that must be taken into account when conducting high-throughput microbial DNA sequencing and sequences analysis.

  20. High-Throughput Mapping of Single-Neuron Projections by Sequencing of Barcoded RNA.

    Science.gov (United States)

    Kebschull, Justus M; Garcia da Silva, Pedro; Reid, Ashlan P; Peikon, Ian D; Albeanu, Dinu F; Zador, Anthony M

    2016-09-01

    Neurons transmit information to distant brain regions via long-range axonal projections. In the mouse, area-to-area connections have only been systematically mapped using bulk labeling techniques, which obscure the diverse projections of intermingled single neurons. Here we describe MAPseq (Multiplexed Analysis of Projections by Sequencing), a technique that can map the projections of thousands or even millions of single neurons by labeling large sets of neurons with random RNA sequences ("barcodes"). Axons are filled with barcode mRNA, each putative projection area is dissected, and the barcode mRNA is extracted and sequenced. Applying MAPseq to the locus coeruleus (LC), we find that individual LC neurons have preferred cortical targets. By recasting neuroanatomy, which is traditionally viewed as a problem of microscopy, as a problem of sequencing, MAPseq harnesses advances in sequencing technology to permit high-throughput interrogation of brain circuits.

  1. Contrasting Academic Behavioural Confidence in Mexican and European Psychology Students

    Science.gov (United States)

    Ochoa, Alma Rosa Aguila; Sander, Paul

    2012-01-01

    Introduction: Research with the Academic Behavioural Confidence scale using European students has shown that students have high levels of confidence in their academic abilities. It is generally accepted that people in more collectivist cultures have more realistic confidence levels in contrast to the overconfidence seen in individualistic European…

  2. Confidence and the business cycle

    OpenAIRE

    Sylvain Leduc

    2010-01-01

    The idea that business cycle fluctuations may stem partly from changes in consumer and business confidence is controversial. One way to test the idea is to use professional economic forecasts to measure confidence at specific points in time and correlate the results with future economic activity. Such an analysis suggests that changes in expectations regarding future economic performance are important drivers of economic fluctuations. Moreover, periods of heightened optimism are followed by a...

  3. High-speed automated DNA sequencing utilizing from-the-side laser excitation

    Science.gov (United States)

    Westphall, Michael S.; Brumley, Robert L., Jr.; Buxton, Erin C.; Smith, Lloyd M.

    1995-04-01

    The Human Genome Initiative is an ambitious international effort to map and sequence the three billion bases of DNA encoded in the human genome. If successfully completed, the resultant sequence database will be a tool of unparalleled power for biomedical research. One of the major challenges of this project is in the area of DNA sequencing technology. At this time, virtually all DNA sequencing is based upon the separation of DNA fragments in high resolution polyacrylamide gels. This method, as generally practiced, is one to two orders of magnitude too slow and expensive for the successful completion of the Human Genome projection. One reasonable approach is improved sequencing of DNA fragments is to increase the performance of such gel-based sequencing methods. Decreased sequencing times may be obtained by increasing the magnitude of the electric field employed. This is not possible with conventional sequencing, due to the fact that the additional heat associated with the increased electric field cannot be adequately dissipated. Recent developments in the use of thin gels have addressed this problem. Performing electrophoresis in ultrathin (50 to 100 microns) gels greatly increases the heat transfer efficiency, thus allowing the benefits of larger electric fields to be obtained. An increase in separation speed of about an order of magnitude is readily achieved. Thin gels have successfully been used in capillary and slab formats. A detection system has been designed for use with a multiple fluorophore sequencing strategy in horizontal ultrathin slab gels. The system employs laser through-the-side excitation and a cooled CCD detector; this allows for the parallel detection of up to 24 sets of four fluorescently labeled DNA sequencing reactions during their electrophoretic separation in ultrathin (115 micrometers ) denaturing polyacrylamide gels. Four hundred bases of sequence information is obtained from 100 ng of M13 template DNA in an hour, corresponding to an

  4. Regaining confidence in confidence intervals for the mean treatment effect.

    Science.gov (United States)

    O'Gorman, Thomas W

    2014-09-28

    In many experiments, it is necessary to evaluate the effectiveness of a treatment by comparing the responses of two groups of subjects. This evaluation is often performed by using a confidence interval for the difference between the population means. To compute the limits of this confidence interval, researchers usually use the pooled t formulas, which are derived by assuming normally distributed errors. When the normality assumption does not seem reasonable, the researcher may have little confidence in the confidence interval because the actual one-sided coverage probability may not be close to the nominal coverage probability. This problem can be avoided by using the Robbins-Monro iterative search method to calculate the limits. One problem with this iterative procedure is that it is not clear when the procedure produces a sufficiently accurate estimate of a limit. In this paper, we describe a multiple search method that allows the user to specify the accuracy of the limits. We also give guidance concerning the number of iterations that would typically be needed to achieve a specified accuracy. This multiple iterative search method will produce limits for one-sided and two-sided confidence intervals that maintain their coverage probabilities with non-normal distributions.

  5. Highly Iterated Palindromic Sequences (HIPs and Their Relationship to DNA Methyltransferases

    Directory of Open Access Journals (Sweden)

    Jeff Elhai

    2015-03-01

    Full Text Available The sequence GCGATCGC (Highly Iterated Palindrome, HIP1 is commonly found in high frequency in cyanobacterial genomes. An important clue to its function may be the presence of two orphan DNA methyltransferases that recognize internal sequences GATC and CGATCG. An examination of genomes from 97 cyanobacteria, both free-living and obligate symbionts, showed that there are exceptional cases in which HIP1 is at a low frequency or nearly absent. In some of these cases, it appears to have been replaced by a different GC-rich palindromic sequence, alternate HIPs. When HIP1 is at a high frequency, GATC- and CGATCG-specific methyltransferases are generally present in the genome. When an alternate HIP is at high frequency, a methyltransferase specific for that sequence is present. The pattern of 1-nt deviations from HIP1 sequences is biased towards the first and last nucleotides, i.e., those distinguish CGATCG from HIP1. Taken together, the results point to a role of DNA methylation in the creation or functioning of HIP sites. A model is presented that postulates the existence of a GmeC-dependent mismatch repair system whose activity creates and maintains HIP sequences.

  6. Coverage recommendation for genotyping analysis of highly heterologous species using next-generation sequencing technology

    Science.gov (United States)

    Song, Kai; Li, Li; Zhang, Guofan

    2016-01-01

    Next-generation sequencing (NGS) technology is being applied to an increasing number of non-model species and has been used as the primary approach for accurate genotyping in genetic and evolutionary studies. However, inferring genotypes from sequencing data is challenging, particularly for organisms with a high degree of heterozygosity. This is because genotype calls from sequencing data are often inaccurate due to low sequencing coverage, and if this is not accounted for, genotype uncertainty can lead to serious bias in downstream analyses, such as quantitative trait locus mapping and genome-wide association studies. Here, we used high-coverage reference data sets from Crassostrea gigas to simulate sequencing data with different coverage, and we evaluate the influence of genotype calling rate and accuracy as a function of coverage. Having initially identified the appropriate parameter settings for filtering to ensure genotype accuracy, we used two different single-nucleotide polymorphism (SNP) calling pipelines, single-sample and multi-sample. We found that a coverage of 15× was suitable for obtaining sufficient numbers of SNPs with high accuracy. Our work provides guidelines for the selection of sequence coverage when using NGS to investigate species with a high degree of heterozygosity and rapid decay of linkage disequilibrium. PMID:27760996

  7. PhenoMeter: a metabolome database search tool using statistical similarity matching of metabolic phenotypes for high-confidence detection of functional links

    Directory of Open Access Journals (Sweden)

    Adam James Carroll

    2015-07-01

    Full Text Available This article describes PhenoMeter, a new type of metabolomics database search that accepts metabolite response patterns as queries and searches the MetaPhen database of reference patterns for responses that are statistically significantly similar or inverse for the purposes of detecting functional links. To identify a similarity measure that would detect functional links as reliably as possible, we compared the performance of four statistics in correctly top-matching metabolic phenotypes of Arabidopsis thaliana metabolism mutants affected in different steps of the photorespiration metabolic pathway to reference phenotypes of mutants affected in the same enzymes by independent mutations. The best performing statistic, the PhenoMeter Score (PM Score, was a function of both Pearson correlation and Fisher’s Exact Test of directional overlap. This statistic outperformed Pearson correlation, biweight midcorrelation and Fisher’s Exact Test used alone. To demonstrate general applicability, we show that the PhenoMeter reliably retrieved the most closely functionally-linked response in the database when queried with responses to a wide variety of environmental and genetic perturbations. Attempts to match metabolic phenotypes between independent studies were met with varying success and possible reasons for this are discussed. Overall, our results suggest that integration of pattern-based search tools into metabolomics databases will aid functional annotation of newly recorded metabolic phenotypes analogously to the way sequence similarity search algorithms have aided the functional annotation of genes and proteins. PhenoMeter is freely available at MetabolomeExpress (https://www.metabolome-express.org/phenometer.php.

  8. High signals in the uterine cervix on T2-weighted MRI sequences

    Energy Technology Data Exchange (ETDEWEB)

    Graef, De M.; Karam, R.; Daclin, P.Y.; Rouanet, J.P. [Department of Radiology, C.M.C. Beausoleil, 119 avenue de Lodeve, 34000 Montpellier (France); Juhan, V. [Department of Radiology, C.H.U. Timone, 13000 Marseille (France); Maubon, A.J. [Department of Radiology, C.H.U. Dupuytren, 87000 Limoges (France)

    2003-01-01

    The aim of this pictorial review was to illustrate the normal cervix appearance on T2-weighted images, and give a review of common or less common disorders of the uterine cervix that appear as high signal intensity lesions on T2-weighted sequences. Numerous aetiologies dominated by cervical cancer are reviewed and discussed. This gamut is obviously incomplete; however, radiologists who perform MR women's imaging should perform T2-weighted sequences in the sagittal plane regardless of the indication for pelvic MR. Those sequences will diagnose some previously unknown cervical cancers as well as many other unknown cervical or uterine lesions. (orig.)

  9. Toward high sequence coverage of proteins in human breast cancer cells using on-line monolith-based HPLC-ESI-TOF MS compared to CE MS.

    Science.gov (United States)

    Yoo, Chul; Pal, Manoj; Miller, Fred R; Barder, Timothy J; Huber, Christian; Lubman, David M

    2006-06-01

    A method is developed toward high sequence coverage of proteins isolated from human breast cancer MCF10 cell lines using a 2-D liquid separations. Monolithic-capillary columns prepared by copolymerizing styrene with divinylbenzene are used to achieve high-resolution separation of peptides from protein digests. This separation is performed with minimal sample preparation directly from the 2-D liquid fractionation of the cell lysate. The monolithic column separation is directly interfaced to ESI-TOF MS to obtain a peptide map. The protein digests were also analyzed by MALDI-TOF MS and an accurate M(r) of the intact protein was obtained using an HPLC-ESI-TOF MS. The result is that these techniques provide complementary information where nearly complete sequence coverage of the protein is obtained and can be compared to the experimental M(r) value. The high sequence coverage provides information on isoforms and other post-translational modifications that would not be available from methods that result in low sequence coverage. The results from the use of monolithic columns are compared to that obtained by CE-MS. The monolithic column separations provide a rugged and highly reproducible method for separating protein digests prior to MS analysis and is suited to confidently identify biomarkers associated with cancer progression.

  10. High depth, whole-genome sequencing of cholera isolates from Haiti and the Dominican Republic

    Directory of Open Access Journals (Sweden)

    Sealfon Rachel

    2012-09-01

    Full Text Available Abstract Background Whole-genome sequencing is an important tool for understanding microbial evolution and identifying the emergence of functionally important variants over the course of epidemics. In October 2010, a severe cholera epidemic began in Haiti, with additional cases identified in the neighboring Dominican Republic. We used whole-genome approaches to sequence four Vibrio cholerae isolates from Haiti and the Dominican Republic and three additional V. cholerae isolates to a high depth of coverage (>2000x; four of the seven isolates were previously sequenced. Results Using these sequence data, we examined the effect of depth of coverage and sequencing platform on genome assembly and identification of sequence variants. We found that 50x coverage is sufficient to construct a whole-genome assembly and to accurately call most variants from 100 base pair paired-end sequencing reads. Phylogenetic analysis between the newly sequenced and thirty-three previously sequenced V. cholerae isolates indicates that the Haitian and Dominican Republic isolates are closest to strains from South Asia. The Haitian and Dominican Republic isolates form a tight cluster, with only four variants unique to individual isolates. These variants are located in the CTX region, the SXT region, and the core genome. Of the 126 mutations identified that separate the Haiti-Dominican Republic cluster from the V. cholerae reference strain (N16961, 73 are non-synonymous changes, and a number of these changes cluster in specific genes and pathways. Conclusions Sequence variant analyses of V. cholerae isolates, including multiple isolates from the Haitian outbreak, identify coverage-specific and technology-specific effects on variant detection, and provide insight into genomic change and functional evolution during an epidemic.

  11. Effects of High Fidelity Simulation on Knowledge Acquisition, Self-Confidence, and Satisfaction with Baccalaureate Nursing Students Using the Solomon-Four Research Design

    Science.gov (United States)

    Hall, Rachel Mattson

    2013-01-01

    High Fidelity Simulation is a teaching strategy that is becoming well-entrenched in the world of nursing education and is rapidly expanding due to the challenges and demands of the health care environment. The problem addressed in this study is the conflicting research results regarding the effectiveness of HFS for students' knowledge acquisition…

  12. A Quantitative Assessment of Gender and Career Decision-Making Confidence Levels of High School Seniors in a School-to-Work Program Using the Career Decision Scale.

    Science.gov (United States)

    Fawcett, Mary; Maycock, George

    This study measured differences in the levels of career indecision for urban male and female high school seniors who had varying levels of experience in vocational programs or job related activities through school-to-work (STW) vocational programs. The 113 students, of whom 44% were male and 56% were female, completed the Career Decision Scale…

  13. The role of confidence in world-class sport performance.

    Science.gov (United States)

    Hays, Kate; Thomas, Owen; Maynard, Ian; Bawden, Mark

    2009-09-01

    In this study, we examined the role of confidence in relation to the cognitive, affective, and behavioural responses it elicits, and identified the factors responsible for debilitating confidence within the organizational subculture of world-class sport. Using Vealey's (2001) integrative model of sport confidence as a broad conceptual base, 14 athletes (7 males, 7 females) were interviewed in response to the research aims. Analysis indicated that high sport confidence facilitated performance through its positive effect on athletes' thoughts, feelings, and behaviours. However, the athletes participating in this study were susceptible to factors that served to debilitate their confidence. These factors appeared to be associated with the sources from which they derived their confidence and influenced to some extent by gender. Thus, the focus of interventions designed to enhance sport confidence must reflect the individual needs of the athlete, and might involve identifying an athlete's sources and types of confidence, and ensuring that these are intact during competition preparation phases.

  14. 农村初中英语学困生自信心的干预研究%Interference Study on the Confidence of Students with Learning Difficulties in English in Rural Junior High School

    Institute of Scientific and Technical Information of China (English)

    徐伟; 苏静

    2012-01-01

    We categorized 28 students with learning difficuhies in English from a rural junior high school into experimental and control group randomly, and carried out group guidance for nine times, and measured their confidence before and after the group guid- ance, to explore if the confidence intervention - oriented group guidance have a positive impact on the academic achievements of students with learning difficulties in rural junior school. After the group guidance, the confidence of the experimental group has been improved obviously, and students has made certain progress in English. So, the confidence intervention - oriented group guidance has positive intervention effect on rural junior school students with learning difficulties in English.%我们将某农村中学28名英语学困生随机分为实验组与对照组,并对实验组进行9次团体辅导,在实施团体辅导前后采用自信心量表进行测量,探讨以自信心干预为主的团体辅导是否对农村初中英语学困生的成绩有积极影响。团体辅导后,实验组被试的自信心水平显著提高,且英语学习成绩取得一定的进步。所以,以自信心干预为主的团体辅导对农村初中英语学困生具有积极的干预效果。

  15. Increasing the Confidence in Student's $t$ Interval

    OpenAIRE

    Goutis, Constantinos; Casella, George

    1992-01-01

    The usual confidence interval, based on Student's $t$ distribution, has conditional confidence that is larger than the nominal confidence level. Although this fact is known, along with the fact that increased conditional confidence can be used to improve a confidence assertion, the confidence assertion of Student's $t$ interval has never been critically examined. We do so here, and construct a confidence estimator that allows uniformly higher confidence in the interval and is closer (than $1 ...

  16. Increasing the Confidence in Student's $t$ Interval

    OpenAIRE

    Goutis, Constantinos; Casella, George

    1992-01-01

    The usual confidence interval, based on Student's $t$ distribution, has conditional confidence that is larger than the nominal confidence level. Although this fact is known, along with the fact that increased conditional confidence can be used to improve a confidence assertion, the confidence assertion of Student's $t$ interval has never been critically examined. We do so here, and construct a confidence estimator that allows uniformly higher confidence in the interval and is closer (than $1 ...

  17. Professional confidence: a concept analysis.

    Science.gov (United States)

    Holland, Kathlyn; Middleton, Lyn; Uys, Leana

    2012-03-01

    Professional confidence is a concept that is frequently used and or implied in occupational therapy literature, but often without specifying its meaning. Rodgers's Model of Concept Analysis was used to analyse the term "professional confidence". Published research obtained from a federated search in four health sciences databases was used to inform the concept analysis. The definitions, attributes, antecedents, and consequences of professional confidence as evidenced in the literature are discussed. Surrogate terms and related concepts are identified, and a model case of the concept provided. Based on the analysis, professional confidence can be described as a dynamic, maturing personal belief held by a professional or student. This includes an understanding of and a belief in the role, scope of practice, and significance of the profession, and is based on their capacity to competently fulfil these expectations, fostered through a process of affirming experiences. Developing and fostering professional confidence should be nurtured and valued to the same extent as professional competence, as the former underpins the latter, and both are linked to professional identity.

  18. Identification and Characterization of miRNA Transcriptome in Asiatic Cotton (Gossypium arboreum Using High Throughput Sequencing

    Directory of Open Access Journals (Sweden)

    Muhammad Farooq

    2017-06-01

    Full Text Available MicroRNAs (miRNAs are small 20–24nt molecules that have been well studied over the past decade due to their important regulatory roles in different cellular processes. The mature sequences are more conserved across vast phylogenetic scales than their precursors and some are conserved within entire kingdoms, hence, their loci and function can be predicted by homology searches. Different studies have been performed to elucidate miRNAs using de novo prediction methods but due to complex regulatory mechanisms or false positive in silico predictions, not all of them express in reality and sometimes computationally predicted mature transcripts differ from the actual expressed ones. With the availability of a complete genome sequence of Gossypium arboreum, it is important to annotate the genome for both coding and non-coding regions using high confidence transcript evidence, for this cotton species that is highly resistant to various biotic and abiotic stresses. Here we have analyzed the small RNA transcriptome of G. arboreum leaves and provided genome annotation of miRNAs with evidence from miRNA/miRNA∗ transcripts. A total of 446 miRNAs clustered into 224 miRNA families were found, among which 48 families are conserved in other plants and 176 are novel. Four short RNA libraries were used to shortlist best predictions based on high reads per million. The size, origin, copy numbers and transcript depth of all miRNAs along with their isoforms and targets has been reported. The highest gene copy number was observed for gar-miR7504 followed by gar-miR166, gar-miR8771, gar-miR156, and gar-miR7484. Altogether, 1274 target genes were found in G. arboreum that are enriched for 216 KEGG pathways. The resultant genomic annotations are provided in UCSC, BED format.

  19. High-channel-count plasmonic filter with the metal-insulator-metal Fibonacci-sequence gratings.

    Science.gov (United States)

    Gong, Yongkang; Liu, Xueming; Wang, Leiran

    2010-02-01

    Fibonacci-sequence gratings based on metal-insulator-metal waveguides are proposed. The spectrum properties of this structure are numerically investigated by using the transfer matrix method. Numerical results demonstrate that the proposed structure can generate high-channel-count plasmonic stop bands and can find significant applications in highly integrated dense wavelength division multiplexing networks.

  20. Draft Genome Sequencing of the Highly Halotolerant and Allopolyploid Yeast Zygosaccharomyces rouxii NBRC 1876

    Science.gov (United States)

    Matsushima, Kenichiro; Oshima, Kenshiro; Hattori, Masahira; Koyama, Yasuji

    2017-01-01

    ABSTRACT The highly halotolerant and allopolyploid yeast Zygosaccharomyces rouxii is industrially used for the food production in high concentrations of salt, such as brewing soy sauce and miso paste. Here, we report the draft genome sequence of Z. rouxii NBRC 1876 isolated from miso paste. PMID:28209823

  1. Exome sequencing generates high quality data in non-target regions

    Directory of Open Access Journals (Sweden)

    Guo Yan

    2012-05-01

    Full Text Available Abstract Background Exome sequencing using next-generation sequencing technologies is a cost efficient approach to selectively sequencing coding regions of human genome for detection of disease variants. A significant amount of DNA fragments from the capture process fall outside target regions, and sequence data for positions outside target regions have been mostly ignored after alignment. Result We performed whole exome sequencing on 22 subjects using Agilent SureSelect capture reagent and 6 subjects using Illumina TrueSeq capture reagent. We also downloaded sequencing data for 6 subjects from the 1000 Genomes Project Pilot 3 study. Using these data, we examined the quality of SNPs detected outside target regions by computing consistency rate with genotypes obtained from SNP chips or the Hapmap database, transition-transversion (Ti/Tv ratio, and percentage of SNPs inside dbSNP. For all three platforms, we obtained high-quality SNPs outside target regions, and some far from target regions. In our Agilent SureSelect data, we obtained 84,049 high-quality SNPs outside target regions compared to 65,231 SNPs inside target regions (a 129% increase. For our Illumina TrueSeq data, we obtained 222,171 high-quality SNPs outside target regions compared to 95,818 SNPs inside target regions (a 232% increase. For the data from the 1000 Genomes Project, we obtained 7,139 high-quality SNPs outside target regions compared to 1,548 SNPs inside target regions (a 461% increase. Conclusions These results demonstrate that a significant amount of high quality genotypes outside target regions can be obtained from exome sequencing data. These data should not be ignored in genetic epidemiology studies.

  2. Targeting Low Career Confidence Using the Career Planning Confidence Scale

    Science.gov (United States)

    McAuliffe, Garrett; Jurgens, Jill C.; Pickering, Worth; Calliotte, James; Macera, Anthony; Zerwas, Steven

    2006-01-01

    The authors describe the development and validation of a test of career planning confidence that makes possible the targeting of specific problem issues in employment counseling. The scale, developed using a rational process and the authors' experience with clients, was tested for criterion-related validity against 2 other measures. The scale…

  3. The fallacy of placing confidence in confidence intervals

    NARCIS (Netherlands)

    Morey, Richard D.; Hoekstra, Rink; Rouder, Jeffrey N.; Lee, Michael D.; Wagenmakers, Eric-Jan

    2016-01-01

    Interval estimates – estimates of parameters that include an allowance for sampling uncertainty – have long been touted as a key component of statistical analyses. There are several kinds of interval estimates, but the most popular are confidence intervals (CIs): intervals that contain the true

  4. The fallacy of placing confidence in confidence intervals

    NARCIS (Netherlands)

    Morey, R.D.; Hoekstra, R.; Rouder, J.N.; Lee, M.D.; Wagenmakers, E.-J.

    Interval estimates – estimates of parameters that include an allowance for sampling uncertainty – have long been touted as a key component of statistical analyses. There are several kinds of interval estimates, but the most popular are confidence intervals (CIs): intervals that contain the true

  5. Minimax confidence intervals in geomagnetism

    Science.gov (United States)

    Stark, Philip B.

    1992-01-01

    The present paper uses theory of Donoho (1989) to find lower bounds on the lengths of optimally short fixed-length confidence intervals (minimax confidence intervals) for Gauss coefficients of the field of degree 1-12 using the heat flow constraint. The bounds on optimal minimax intervals are about 40 percent shorter than Backus' intervals: no procedure for producing fixed-length confidence intervals, linear or nonlinear, can give intervals shorter than about 60 percent the length of Backus' in this problem. While both methods rigorously account for the fact that core field models are infinite-dimensional, the application of the techniques to the geomagnetic problem involves approximations and counterfactual assumptions about the data errors, and so these results are likely to be extremely optimistic estimates of the actual uncertainty in Gauss coefficients.

  6. On the optimal trimming of high-throughput mRNA sequence data

    Directory of Open Access Journals (Sweden)

    Matthew D MacManes

    2014-01-01

    Full Text Available The widespread and rapid adoption of high-throughput sequencing technologies has afforded researchers the opportunity to gain a deep understanding of genome level processes that underlie evolutionary change, and perhaps more importantly, the links between genotype and phenotype. In particular, researchers interested in functional biology and adaptation have used these technologies to sequence mRNA transcriptomes of specific tissues, which in turn are often compared to other tissues, or other individuals with different phenotypes. While these techniques are extremely powerful, careful attention to data quality is required. In particular, because high-throughput sequencing is more error-prone than traditional Sanger sequencing, quality trimming of sequence reads should be an important step in all data processing pipelines. While several software packages for quality trimming exist, no general guidelines for the specifics of trimming have been developed. Here, using empirically derived sequence data, I provide general recommendations regarding the optimal strength of trimming, specifically in mRNA-Seq studies. Although very aggressive quality trimming is common, this study suggests that a more gentle trimming, specifically of those nucleotides whose Phred score < 2 or < 5, is optimal for most studies across a wide variety of metrics.

  7. PHYRN: a robust method for phylogenetic analysis of highly divergent sequences.

    Directory of Open Access Journals (Sweden)

    Gaurav Bhardwaj

    Full Text Available Both multiple sequence alignment and phylogenetic analysis are problematic in the "twilight zone" of sequence similarity (≤ 25% amino acid identity. Herein we explore the accuracy of phylogenetic inference at extreme sequence divergence using a variety of simulated data sets. We evaluate four leading multiple sequence alignment (MSA methods (MAFFT, T-COFFEE, CLUSTAL, and MUSCLE and six commonly used programs of tree estimation (Distance-based: Neighbor-Joining; Character-based: PhyML, RAxML, GARLI, Maximum Parsimony, and Bayesian against a novel MSA-independent method (PHYRN described here. Strikingly, at "midnight zone" genetic distances (~7% pairwise identity and 4.0 gaps per position, PHYRN returns high-resolution phylogenies that outperform traditional approaches. We reason this is due to PHRYN's capability to amplify informative positions, even at the most extreme levels of sequence divergence. We also assess the applicability of the PHYRN algorithm for inferring deep evolutionary relationships in the divergent DANGER protein superfamily, for which PHYRN infers a more robust tree compared to MSA-based approaches. Taken together, these results demonstrate that PHYRN represents a powerful mechanism for mapping uncharted frontiers in highly divergent protein sequence data sets.

  8. New Tools For Understanding Microbial Diversity Using High-throughput Sequence Data

    Science.gov (United States)

    Knight, R.; Hamady, M.; Liu, Z.; Lozupone, C.

    2007-12-01

    High-throughput sequencing techniques such as 454 are straining the limits of tools traditionally used to build trees, choose OTUs, and perform other essential sequencing tasks. We have developed a workflow for phylogenetic analysis of large-scale sequence data sets that combines existing tools, such as the Arb phylogeny package and the NAST multiple sequence alignment tool, with new methods for choosing and clustering OTUs and for performing phylogenetic community analysis with UniFrac. This talk discusses the cyberinfrastructure we are developing to support the human microbiome project, and the application of these workflows to analyze very large data sets that contrast the gut microbiota with a range of physical environments. These tools will ultimately help to define core and peripheral microbiomes in a range of environments, and will allow us to understand the physical and biotic factors that contribute most to differences in microbial diversity.

  9. The Pinus taeda genome is characterized by diverse and highly diverged repetitive sequences

    Directory of Open Access Journals (Sweden)

    Yandell Mark

    2010-07-01

    Full Text Available Abstract Background In today's age of genomic discovery, no attempt has been made to comprehensively sequence a gymnosperm genome. The largest genus in the coniferous family Pinaceae is Pinus, whose 110-120 species have extremely large genomes (c. 20-40 Gb, 2N = 24. The size and complexity of these genomes have prompted much speculation as to the feasibility of completing a conifer genome sequence. Conifer genomes are reputed to be highly repetitive, but there is little information available on the nature and identity of repetitive units in gymnosperms. The pines have extensive genetic resources, with approximately 329000 ESTs from eleven species and genetic maps in eight species, including a dense genetic map of the twelve linkage groups in Pinus taeda. Results We present here the Sanger sequence and annotation of ten P. taeda BAC clones and Genome Analyzer II whole genome shotgun (WGS sequences representing 7.5% of the genome. Computational annotation of ten BACs predicts three putative protein-coding genes and at least fifteen likely pseudogenes in nearly one megabase of sequence. We found three conifer-specific LTR retroelements in the BACs, and tentatively identified at least 15 others based on evidence from the distantly related angiosperms. Alignment of WGS sequences to the BACs indicates that 80% of BAC sequences have similar copies (≥ 75% nucleotide identity elsewhere in the genome, but only 23% have identical copies (99% identity. The three most common repetitive elements in the genome were identified and, when combined, represent less than 5% of the genome. Conclusions This study indicates that the majority of repeats in the P. taeda genome are 'novel' and will therefore require additional BAC or genomic sequencing for accurate characterization. The pine genome contains a very large number of diverged and probably defunct repetitive elements. This study also provides new evidence that sequencing a pine genome using a WGS approach is

  10. Multiplexed highly-accurate DNA sequencing of closely-related HIV-1 variants using continuous long reads from single molecule, real-time sequencing

    Science.gov (United States)

    Dilernia, Dario A.; Chien, Jung-Ting; Monaco, Daniela C.; Brown, Michael P.S.; Ende, Zachary; Deymier, Martin J.; Yue, Ling; Paxinos, Ellen E.; Allen, Susan; Tirado-Ramos, Alfredo; Hunter, Eric

    2015-01-01

    Single Molecule, Real-Time (SMRT®) Sequencing (Pacific Biosciences, Menlo Park, CA, USA) provides the longest continuous DNA sequencing reads currently available. However, the relatively high error rate in the raw read data requires novel analysis methods to deconvolute sequences derived from complex samples. Here, we present a workflow of novel computer algorithms able to reconstruct viral variant genomes present in mixtures with an accuracy of >QV50. This approach relies exclusively on Continuous Long Reads (CLR), which are the raw reads generated during SMRT Sequencing. We successfully implement this workflow for simultaneous sequencing of mixtures containing up to forty different >9 kb HIV-1 full genomes. This was achieved using a single SMRT Cell for each mixture and desktop computing power. This novel approach opens the possibility of solving complex sequencing tasks that currently lack a solution. PMID:26101252

  11. Multiplexed highly-accurate DNA sequencing of closely-related HIV-1 variants using continuous long reads from single molecule, real-time sequencing.

    Science.gov (United States)

    Dilernia, Dario A; Chien, Jung-Ting; Monaco, Daniela C; Brown, Michael P S; Ende, Zachary; Deymier, Martin J; Yue, Ling; Paxinos, Ellen E; Allen, Susan; Tirado-Ramos, Alfredo; Hunter, Eric

    2015-11-16

    Single Molecule, Real-Time (SMRT) Sequencing (Pacific Biosciences, Menlo Park, CA, USA) provides the longest continuous DNA sequencing reads currently available. However, the relatively high error rate in the raw read data requires novel analysis methods to deconvolute sequences derived from complex samples. Here, we present a workflow of novel computer algorithms able to reconstruct viral variant genomes present in mixtures with an accuracy of >QV50. This approach relies exclusively on Continuous Long Reads (CLR), which are the raw reads generated during SMRT Sequencing. We successfully implement this workflow for simultaneous sequencing of mixtures containing up to forty different >9 kb HIV-1 full genomes. This was achieved using a single SMRT Cell for each mixture and desktop computing power. This novel approach opens the possibility of solving complex sequencing tasks that currently lack a solution.

  12. Robust misinterpretation of confidence intervals

    NARCIS (Netherlands)

    Hoekstra, Rink; Morey, Richard; Rouder, Jeffrey N.; Wagenmakers, Eric-Jan

    2014-01-01

    Null hypothesis significance testing (NHST) is undoubtedly the most common inferential technique used to justify claims in the social sciences. However, even staunch defenders of NHST agree that its outcomes are often misinterpreted. Confidence intervals (CIs) have frequently been proposed as a more

  13. Robust misinterpretation of confidence intervals

    NARCIS (Netherlands)

    Hoekstra, R.; Morey, R.D.; Rouder, J.N.; Wagenmakers, E.-J.

    2014-01-01

    Null hypothesis significance testing (NHST) is undoubtedly the most common inferential technique used to justify claims in the social sciences. However, even staunch defenders of NHST agree that its outcomes are often misinterpreted. Confidence intervals (CIs) have frequently been proposed as a more

  14. Investigation of the fungal community structures of imported wheat using high-throughput sequencing technology

    Science.gov (United States)

    Wang, Ying; Zhang, Guiming; Gao, Ruifang; Xiang, Caiyu; Feng, Jianjun; Lou, Dingfeng; Liu, Ying

    2017-01-01

    This study introduced the application of high-throughput sequencing techniques to the investigation of microbial diversity in the field of plant quarantine. It examined the microbial diversity of wheat imported into China, and established a bioinformatics database of wheat pathogens based on high-throughput sequencing results. This study analyzed the nuclear ribosomal internal transcribed spacer (ITS) region of fungi through Illumina Miseq sequencing to investigate the fungal communities of both seeds and sieve-through. A total of 758,129 fungal ITS sequences were obtained from ten samples collected from five batches of wheat imported from the USA. These sequences were classified into 2 different phyla, 15 classes, 33 orders, 41 families, or 78 genera, suggesting a high fungal diversity across samples. Apairwise analysis revealed that the diversity of the fungal community in the sieve-through is significantly higher than those in the seeds. Taxonomic analysis showed that at the class level, Dothideomycetes dominated in the seeds and Sordariomycetes dominated in the sieve-through. In all, this study revealed the fungal community composition in the seeds and sieve-through of the wheat, and identified key differences in the fungal community between the seeds and sieve-through. PMID:28241020

  15. Communicating the Benefits of a Full Sequence of High School Science Courses

    Science.gov (United States)

    Nicholas, Catherine Marie

    High school students are generally uninformed about the benefits of enrolling in a full sequence of science courses, therefore only about a third of our nation's high school graduates have completed the science sequence of Biology, Chemistry and Physics. The lack of students completing a full sequence of science courses contributes to the deficit in the STEM degree production rate needed to fill the demand of the current job market and remain competitive as a nation. The purpose of the study was to make a difference in the number of students who have access to information about the benefits of completing a full sequence of science courses. This dissertation study employed qualitative research methodology to gain a broad perspective of staff through a questionnaire and document review and then a deeper understanding through semi-structured interview protocol. The data revealed that a universal sequence of science courses in the high school district did not exist. It also showed that not all students had access to all science courses; students were sorted and tracked according to prerequisites that did not necessarily match the skill set needed for the courses. In addition, the study showed a desire for more support and direction from the district office. It was also apparent that there was a disconnect that existed between who staff members believed should enroll in a full sequence of science courses and who actually enrolled. Finally, communication about science was shown to occur mainly through counseling and peers. A common science sequence, detracking of science courses, increased communication about the postsecondary and academic benefits of a science education, increased district direction and realistic mathematics alignment were all discussed as solutions to the problem.

  16. A robust, simple genotyping-by-sequencing (GBS approach for high diversity species.

    Directory of Open Access Journals (Sweden)

    Robert J Elshire

    Full Text Available Advances in next generation technologies have driven the costs of DNA sequencing down to the point that genotyping-by-sequencing (GBS is now feasible for high diversity, large genome species. Here, we report a procedure for constructing GBS libraries based on reducing genome complexity with restriction enzymes (REs. This approach is simple, quick, extremely specific, highly reproducible, and may reach important regions of the genome that are inaccessible to sequence capture approaches. By using methylation-sensitive REs, repetitive regions of genomes can be avoided and lower copy regions targeted with two to three fold higher efficiency. This tremendously simplifies computationally challenging alignment problems in species with high levels of genetic diversity. The GBS procedure is demonstrated with maize (IBM and barley (Oregon Wolfe Barley recombinant inbred populations where roughly 200,000 and 25,000 sequence tags were mapped, respectively. An advantage in species like barley that lack a complete genome sequence is that a reference map need only be developed around the restriction sites, and this can be done in the process of sample genotyping. In such cases, the consensus of the read clusters across the sequence tagged sites becomes the reference. Alternatively, for kinship analyses in the absence of a reference genome, the sequence tags can simply be treated as dominant markers. Future application of GBS to breeding, conservation, and global species and population surveys may allow plant breeders to conduct genomic selection on a novel germplasm or species without first having to develop any prior molecular tools, or conservation biologists to determine population structure without prior knowledge of the genome or diversity in the species.

  17. Highly conserved D-loop-like nuclear mitochondrial sequences (Numts) in tiger (Panthera tigris)

    Indian Academy of Sciences (India)

    Wenping Zhang; Zhihe Zhang; Fujun Shen; Rong Hou; Xiaoping Lv; Bisong Yue

    2006-08-01

    Using oligonucleotide primers designed to match hypervariable segments I (HVS-1) of Panthera tigris mitochondrial DNA (mtDNA), we amplified two different PCR products (500 bp and 287 bp) in the tiger (Panthera tigris), but got only one PCR product (287 bp) in the leopard (Panthera pardus). Sequence analyses indicated that the sequence of 287 bp was a D-loop-like nuclear mitochondrial sequence (Numts), indicating a nuclear transfer that occurred approximately 4.8–17 million years ago in the tiger and 4.6–16 million years ago in the leopard. Although the mtDNA D-loop sequence has a rapid rate of evolution, the 287-bp Numts are highly conserved; they are nearly identical in tiger subspecies and only 1.742% different between tiger and leopard. Thus, such sequences represent molecular ‘fossils’ that can shed light on evolution of the mitochondrial genome and may be the most appropriate outgroup for phylogenetic analysis. This is also proved by comparing the phylogenetic trees reconstructed using the D-loop sequence of snow leopard and the 287-bp Numts as outgroup.

  18. Detection of genomic variation by selection of a 9 mb DNA region and high throughput sequencing.

    Directory of Open Access Journals (Sweden)

    Sergey I Nikolaev

    Full Text Available Detection of the rare polymorphisms and causative mutations of genetic diseases in a targeted genomic area has become a major goal in order to understand genomic and phenotypic variability. We have interrogated repeat-masked regions of 8.9 Mb on human chromosomes 21 (7.8 Mb and 7 (1.1 Mb from an individual from the International HapMap Project (NA12872. We have optimized a method of genomic selection for high throughput sequencing. Microarray-based selection and sequencing resulted in 260-fold enrichment, with 41% of reads mapping to the target region. 83% of SNPs in the targeted region had at least 4-fold sequence coverage and 54% at least 15-fold. When assaying HapMap SNPs in NA12872, our sequence genotypes are 91.3% concordant in regions with coverage > or = 4-fold, and 97.9% concordant in regions with coverage > or = 15-fold. About 81% of the SNPs recovered with both thresholds are listed in dbSNP. We observed that regions with low sequence coverage occur in close proximity to low-complexity DNA. Validation experiments using Sanger sequencing were performed for 46 SNPs with 15-20 fold coverage, with a confirmation rate of 96%, suggesting that DNA selection provides an accurate and cost-effective method for identifying rare genomic variants.

  19. Genome-wide SNP discovery in tetraploid alfalfa using 454 sequencing and high resolution melting analysis

    Directory of Open Access Journals (Sweden)

    Zhao Patrick X

    2011-07-01

    Full Text Available Abstract Background Single nucleotide polymorphisms (SNPs are the most common type of sequence variation among plants and are often functionally important. We describe the use of 454 technology and high resolution melting analysis (HRM for high throughput SNP discovery in tetraploid alfalfa (Medicago sativa L., a species with high economic value but limited genomic resources. Results The alfalfa genotypes selected from M. sativa subsp. sativa var. 'Chilean' and M. sativa subsp. falcata var. 'Wisfal', which differ in water stress sensitivity, were used to prepare cDNA from tissue of clonally-propagated plants grown under either well-watered or water-stressed conditions, and then pooled for 454 sequencing. Based on 125.2 Mb of raw sequence, a total of 54,216 unique sequences were obtained including 24,144 tentative consensus (TCs sequences and 30,072 singletons, ranging from 100 bp to 6,662 bp in length, with an average length of 541 bp. We identified 40,661 candidate SNPs distributed throughout the genome. A sample of candidate SNPs were evaluated and validated using high resolution melting (HRM analysis. A total of 3,491 TCs harboring 20,270 candidate SNPs were located on the M. truncatula (MT 3.5.1 chromosomes. Gene Ontology assignments indicate that sequences obtained cover a broad range of GO categories. Conclusions We describe an efficient method to identify thousands of SNPs distributed throughout the alfalfa genome covering a broad range of GO categories. Validated SNPs represent valuable molecular marker resources that can be used to enhance marker density in linkage maps, identify potential factors involved in heterosis and genetic variation, and as tools for association mapping and genomic selection in alfalfa.

  20. Perceived Sources of Team Confidence in Soccer and Basketball.

    Science.gov (United States)

    Fransen, Katrien; Vanbeselaere, Norbert; De Cuyper, Bert; Vande Broek, Gert; Boen, Filip

    2015-07-01

    Although it is generally accepted that team confidence is beneficial for optimal team functioning and performance, little is known about the predictors of team confidence. The present study was aimed to shed light on the precursors of both high and low team confidence in two different sports. A distinction is made between sources of process-oriented team confidence (i.e., collective efficacy) and sources of outcome-oriented team confidence (i.e., team outcome confidence), which have often been confounded in previous research. In a first step, two qualitative studies were conducted to identify all possible sources of team confidence in basketball and in soccer. In a second step, three quantitative studies were conducted to further investigate the sources of team outcome confidence in soccer (N = 1028) and in basketball (N = 867), and the sources of collective efficacy in basketball (N = 825). Players perceived high-quality performance as the most important factor for their team outcome confidence. With regard to collective efficacy, team enthusiasm was perceived as most predictive determinant. Positive coaching emerged as second most decisive factor for both types of team confidence. In contrast, negative communication and expression by the players or the coach was perceived as the most decisive predictor of low levels of team confidence. At item level, all studies pointed to the importance of team confidence expression by the athlete leaders (i.e., leader figures within the team) and the coach. The present manuscript sheds light on the precursors of high and low levels of team confidence. Athlete leaders and the coach emerged as key triggers of both upward and downward spirals of team confidence, thereby contaminating all team members.

  1. High-throughput sequencing of three Lemnoideae (duckweeds chloroplast genomes from total DNA.

    Directory of Open Access Journals (Sweden)

    Wenqin Wang

    Full Text Available BACKGROUND: Chloroplast genomes provide a wealth of information for evolutionary and population genetic studies. Chloroplasts play a particularly important role in the adaption for aquatic plants because they float on water and their major surface is exposed continuously to sunlight. The subfamily of Lemnoideae represents such a collection of aquatic species that because of photosynthesis represents one of the fastest growing plant species on earth. METHODS: We sequenced the chloroplast genomes from three different genera of Lemnoideae, Spirodela polyrhiza, Wolffiella lingulata and Wolffia australiana by high-throughput DNA sequencing of genomic DNA using the SOLiD platform. Unfractionated total DNA contains high copies of plastid DNA so that sequences from the nucleus and mitochondria can easily be filtered computationally. Remaining sequence reads were assembled into contiguous sequences (contigs using SOLiD software tools. Contigs were mapped to a reference genome of Lemna minor and gaps, selected by PCR, were sequenced on the ABI3730xl platform. CONCLUSIONS: This combinatorial approach yielded whole genomic contiguous sequences in a cost-effective manner. Over 1,000-time coverage of chloroplast from total DNA were reached by the SOLiD platform in a single spot on a quadrant slide without purification. Comparative analysis indicated that the chloroplast genome was conserved in gene number and organization with respect to the reference genome of L. minor. However, higher nucleotide substitution, abundant deletions and insertions occurred in non-coding regions of these genomes, indicating a greater genomic dynamics than expected from the comparison of other related species in the Pooideae. Noticeably, there was no transition bias over transversion in Lemnoideae. The data should have immediate applications in evolutionary biology and plant taxonomy with increased resolution and statistical power.

  2. Effect of k-tuple length on sample-comparison with high-throughput sequencing data.

    Science.gov (United States)

    Wang, Ying; Lei, Xiaoye; Wang, Shun; Wang, Zicheng; Song, Nianfeng; Zeng, Feng; Chen, Ting

    2016-01-22

    The high-throughput metagenomic sequencing offers a powerful technique to compare the microbial communities. Without requiring extra reference sequences, alignment-free models with short k-tuple (k = 2-10 bp) yielded promising results. Short k-tuples describe the overall statistical distribution, but is hard to capture the specific characteristics inside one microbial community. Longer k-tuple contains more abundant information. However, because the frequency vector of long k-tuple(k ≥ 30 bp) is sparse, the statistical measures designed for short k-tuples are not applicable. In our study, we considered each tuple as a meaningful word and then each sequencing data as a document composed of the words. Therefore, the comparison between two sequencing data is processed as "topic analysis of documents" in text mining. We designed a pipeline with long k-tuple features to compare metagenomic samples combined using algorithms from text mining and pattern recognition. The pipeline is available at http://culotuple.codeplex.com/. Experiments show that our pipeline with long k-tuple features: ①separates genomes with high similarity; ②outperforms short k-tuple models in all experiments. When k ≥ 12, the short k-tuple measures are not applicable anymore. When k is between 20 and 40, long k-tuple pipeline obtains much better grouping results; ③is free from the effect of sequencing platforms/protocols. ③We obtained meaningful and supported biological results on the 40-tuples selected for comparison.

  3. A high resolution genetic map anchoring scaffolds of the sequenced watermelon genome.

    Science.gov (United States)

    Ren, Yi; Zhao, Hong; Kou, Qinghe; Jiang, Jiao; Guo, Shaogui; Zhang, Haiying; Hou, Wenju; Zou, Xiaohua; Sun, Honghe; Gong, Guoyi; Levi, Amnon; Xu, Yong

    2012-01-01

    As part of our ongoing efforts to sequence and map the watermelon (Citrullus spp.) genome, we have constructed a high density genetic linkage map. The map positioned 234 watermelon genome sequence scaffolds (an average size of 1.41 Mb) that cover about 330 Mb and account for 93.5% of the 353 Mb of the assembled genomic sequences of the elite Chinese watermelon line 97103 (Citrullus lanatus var. lanatus). The genetic map was constructed using an F(8) population of 103 recombinant inbred lines (RILs). The RILs are derived from a cross between the line 97103 and the United States Plant Introduction (PI) 296341-FR (C. lanatus var. citroides) that contains resistance to fusarium wilt (races 0, 1, and 2). The genetic map consists of eleven linkage groups that include 698 simple sequence repeat (SSR), 219 insertion-deletion (InDel) and 36 structure variation (SV) markers and spans ∼800 cM with a mean marker interval of 0.8 cM. Using fluorescent in situ hybridization (FISH) with 11 BACs that produced chromosome-specifc signals, we have depicted watermelon chromosomes that correspond to the eleven linkage groups constructed in this study. The high resolution genetic map developed here should be a useful platform for the assembly of the watermelon genome, for the development of sequence-based markers used in breeding programs, and for the identification of genes associated with important agricultural traits.

  4. Low environmental impact bleaching sequences for attaining high brightness level with eucalyptus SPP pulp

    Directory of Open Access Journals (Sweden)

    M. M. Costa

    2009-03-01

    Full Text Available The alternatives used for minimizing the usage of chlorine dioxide in bleaching sequences included a hot acid hydrolysis (Ahot stage, the use of hot chlorine dioxide (Dhot and ozone stages at medium consistency and high consistency (Zmc and Zhc, in addition to stages with atmospheric hydrogen peroxide (P and pressurized hydrogen peroxide (PO. The results were interpreted based on the cost of the chemical products, bleaching process yields and on minimizing the environmental impact of the bleaching process. In spite of some process restrictions, high ISO brightness levels were kept around 90 % brightness. Additionally, the inclusion of stages like acid hydrolysis, pressurized peroxide and ozone in the bleaching sequences provided an increase in operating flexibility, aimed at reducing environmental impact (ECF Light. The Dhot(EOPD(PO sequence presented lower operating cost for ISO brightness above 92 %. However, this kind of sequence was not allowed for closing the wastewater circuit, even partially. For ISO brightness level around 91%, the AhotZhcDP sequence presented a lower operating cost than the others.

  5. Research progress of plant population genomics based on high-throughput sequencing.

    Science.gov (United States)

    Yunsheng, Wang

    2016-08-01

    Population genomics, a new paradigm for population genetics, combine the concepts and techniques of genomics with the theoretical system of population genetics and improve our understanding of microevolution through identification of site-specific effect and genome-wide effects using genome-wide polymorphic sites genotypeing. With the appearance and improvement of the next generation high-throughput sequencing technology, the numbers of plant species with complete genome sequences increased rapidly and large scale resequencing has also been carried out in recent years. Parallel sequencing has also been done in some plant species without complete genome sequences. These studies have greatly promoted the development of population genomics and deepened our understanding of the genetic diversity, level of linking disequilibium, selection effect, demographical history and molecular mechanism of complex traits of relevant plant population at a genomic level. In this review, I briely introduced the concept and research methods of population genomics and summarized the research progress of plant population genomics based on high-throughput sequencing. I also discussed the prospect as well as existing problems of plant population genomics in order to provide references for related studies.

  6. Mitochondrial genome sequences of Artemia tibetiana and Artemia urmiana: assessing molecular changes for high plateau adaptation.

    Science.gov (United States)

    Zhang, Hangxiao; Luo, Qibin; Sun, Jing; Liu, Fei; Wu, Gang; Yu, Jun; Wang, Weiwei

    2013-05-01

    Brine shrimps, Artemia (Crustacea, Anostraca), inhabit hypersaline environments and have a broad geographical distribution from sea level to high plateaus. Artemia therefore possess significant genetic diversity, which gives them their outstanding adaptability. To understand this remarkable plasticity, we sequenced the mitochondrial genomes of two Artemia tibetiana isolates from the Tibetan Plateau in China and one Artemia urmiana isolate from Lake Urmia in Iran and compared them with the genome of a low-altitude Artemia, A. franciscana. We compared the ratio of the rate of nonsynonymous (Ka) and synonymous (Ks) substitutions (Ka/Ks ratio) in the mitochondrial protein-coding gene sequences and found that atp8 had the highest Ka/Ks ratios in comparisons of A. franciscana with either A. tibetiana or A. urmiana and that atp6 had the highest Ka/Ks ratio between A. tibetiana and A. urmiana. Atp6 may have experienced strong selective pressure for high-altitude adaptation because although A. tibetiana and A. urmiana are closely related they live at different altitudes. We identified two extended termination-associated sequences and three conserved sequence blocks in the D-loop region of the mitochondrial genomes. We propose that sequence variations in the D-loop region and in the subunits of the respiratory chain complexes independently or collectively contribute to the adaptation of Artemia to different altitudes.

  7. Semi-Automated Library Preparation for High-Throughput DNA Sequencing Platforms

    Directory of Open Access Journals (Sweden)

    Eveline Farias-Hesson

    2010-01-01

    Full Text Available Next-generation sequencing platforms are powerful technologies, providing gigabases of genetic information in a single run. An important prerequisite for high-throughput DNA sequencing is the development of robust and cost-effective preprocessing protocols for DNA sample library construction. Here we report the development of a semi-automated sample preparation protocol to produce adaptor-ligated fragment libraries. Using a liquid-handling robot in conjunction with Carboxy Terminated Magnetic Beads, we labeled each library sample using a unique 6 bp DNA barcode, which allowed multiplex sample processing and sequencing of 32 libraries in a single run using Applied Biosystems' SOLiD sequencer. We applied our semi-automated pipeline to targeted medical resequencing of nuclear candidate genes in individuals affected by mitochondrial disorders. This novel method is capable of preparing as much as 32 DNA libraries in 2.01 days (8-hour workday for emulsion PCR/high throughput DNA sequencing, increasing sample preparation production by 8-fold.

  8. Norm-Transgression Sequences in the Classroom Interaction at a Madrid High School

    Science.gov (United States)

    Alcala Recuerda, Esther

    2010-01-01

    This paper studies high school classroom sequences, compiled through critical sociolinguistic ethnography, where norm-transgression is made explicit, and how authority is recovered by the teacher after an open period where class participants generally seize to digress. This way, we will be able to approach several dimensions of linguistic…

  9. High-Quality Exome Sequencing of Whole-Genome Amplified Neonatal Dried Blood Spot DNA

    DEFF Research Database (Denmark)

    Poulsen, Jesper Buchhave; Lescai, Francesco; Grove, Jakob;

    2016-01-01

    and WB vs WB sample types yielded similar concordance rates, with values close to 100%. WgaDNA of neonatal DBS samples performs with great accuracy and efficiency in exome sequencing. The wgaDNA performed similarly to matched high-quality reference--whole-blood DNA--based on concordance rates calculated...

  10. Taxonomic and functional assignment of cloned sequences from high Andean forest soil metagenome

    NARCIS (Netherlands)

    Montaña, José Salvador; Jiménez Avella, Diego; Angel, Tatiana; Hernández, Mónica; Baena, Sandra

    2012-01-01

    Total metagenomic DNA was isolated from high Andean forest soil and subjected to taxonomical and functional composition analyses by means of clone library generation and sequencing. The obtained yield of 1.7 μg of DNA/g of soil was used to construct a metagenomic library of approximately 20,000 clon

  11. The Importance of Agriculture Science Course Sequencing in High Schools: A View from Collegiate Agriculture Students

    Science.gov (United States)

    Wheelus, Robin P.

    2009-01-01

    The objective of this study was to investigate the importance of Agriculture Science course sequencing in high schools, as a preparatory factor for students enrolled in collegiate agriculture classes. With the variety of courses listed in the Texas Essential Knowledge and Skills (TEKS) for Agriculture Science, it has been possible for counselors,…

  12. An integrated framework for discovery and genotyping of genomic variants from high-throughput sequencing experiments

    NARCIS (Netherlands)

    Duitama, Jorge; Quintero, Juan Camilo; Cruz, Daniel Felipe; Quintero, Constanza; Hubmann, Georg; Foulquié-Moreno, Maria R.; Verstrepen, Kevin J.; Thevelein, Johan M.; Tohme, Joe

    2014-01-01

    Recent advances in high-throughput sequencing (HTS) technologies and computing capacity have produced unprecedented amounts of genomic data that have unraveled the genetics of phenotypic variability in several species. However, operating and integrating current software tools for data analysis still

  13. Unravelling the genetic basis of hereditary disorders by high-throughput exome sequencing strategies

    NARCIS (Netherlands)

    Jazayeri, Omid

    2016-01-01

    The research presented in this thesis focuses on using Whole Exome Sequencing (WES) to unravel the genetic basis of human hereditary disorders with different inheritance patterns. We set out to apply WES as a diagnostic approach for establishing a molecular diagnosis in a highly heterogeneous group

  14. High-throughput sequencing of core STR loci for forensic genetic investigations using the Roche Genome Sequencer FLX platform

    DEFF Research Database (Denmark)

    Fordyce, Sarah L; Avila-Arcos, Maria C; Rockenbauer, Eszter;

    2011-01-01

    The analysis and profiling of short tandem repeat (STR) loci is routinely used in forensic genetics. Current methods to investigate STR loci, including PCR-based standard fragment analyses and capillary electrophoresis, only provide amplicon lengths that are used to estimate the number of STR...... repeat units. These methods do not allow for the full resolution of STR base composition that sequencing approaches could provide. Here we present an STR profiling method based on the use of the Roche Genome Sequencer (GS) FLX to simultaneously sequence multiple core STR loci. Using this method...

  15. Beyond hypercorrection: remembering corrective feedback for low-confidence errors.

    Science.gov (United States)

    Griffiths, Lauren; Higham, Philip A

    2017-07-01

    Correcting errors based on corrective feedback is essential to successful learning. Previous studies have found that corrections to high-confidence errors are better remembered than low-confidence errors (the hypercorrection effect). The aim of this study was to investigate whether corrections to low-confidence errors can also be successfully retained in some cases. Participants completed an initial multiple-choice test consisting of control, trick and easy general-knowledge questions, rated their confidence after answering each question, and then received immediate corrective feedback. After a short delay, they were given a cued-recall test consisting of the same questions. In two experiments, we found high-confidence errors to control questions were better corrected on the second test compared to low-confidence errors - the typical hypercorrection effect. However, low-confidence errors to trick questions were just as likely to be corrected as high-confidence errors. Most surprisingly, we found that memory for the feedback and original responses, not confidence or surprise, were significant predictors of error correction. We conclude that for some types of material, there is an effortful process of elaboration and problem solving prior to making low-confidence errors that facilitates memory of corrective feedback.

  16. Self-Confidence & Social Interactions

    OpenAIRE

    Bénabou, Roland; Tirole, Jean

    2000-01-01

    This paper studies the interactions between an individual's self-esteem and his social environment - in the workplace, at school, and in personal relationships. Because a person generally has only imperfect knowledge of his own abilities, people who derive benefits from his performance (parent, spouse, friend, teacher, manager, etc.) have incentives to manipulate his self--confidence. We first study situations where an informed principal chooses an incentive structure, such as offering paymen...

  17. The Confidence Information Ontology: a step towards a standard for asserting confidence in annotations.

    Science.gov (United States)

    Bastian, Frederic B; Chibucos, Marcus C; Gaudet, Pascale; Giglio, Michelle; Holliday, Gemma L; Huang, Hong; Lewis, Suzanna E; Niknejad, Anne; Orchard, Sandra; Poux, Sylvain; Skunca, Nives; Robinson-Rechavi, Marc

    2015-01-01

    Biocuration has become a cornerstone for analyses in biology, and to meet needs, the amount of annotations has considerably grown in recent years. However, the reliability of these annotations varies; it has thus become necessary to be able to assess the confidence in annotations. Although several resources already provide confidence information about the annotations that they produce, a standard way of providing such information has yet to be defined. This lack of standardization undermines the propagation of knowledge across resources, as well as the credibility of results from high-throughput analyses. Seeded at a workshop during the Biocuration 2012 conference, a working group has been created to address this problem. We present here the elements that were identified as essential for assessing confidence in annotations, as well as a draft ontology--the Confidence Information Ontology--to illustrate how the problems identified could be addressed. We hope that this effort will provide a home for discussing this major issue among the biocuration community. Tracker URL: https://github.com/BgeeDB/confidence-information-ontology Ontology URL: https://raw.githubusercontent.com/BgeeDB/confidence-information-ontology/master/src/ontology/cio-simple.obo

  18. Characterization of a transcriptome from a non-model organism, Cladonia rangiferina, the grey reindeer lichen, using high-throughput next generation sequencing and EST sequence data.

    Science.gov (United States)

    Junttila, Sini; Rudd, Stephen

    2012-10-30

    Lichens are symbiotic organisms that have a remarkable ability to survive in some of the most extreme terrestrial climates on earth. Lichens can endure frequent desiccation and wetting cycles and are able to survive in a dehydrated molecular dormant state for decades at a time. Genetic resources have been established in lichen species for the study of molecular systematics and their taxonomic classification. No lichen species have been characterised yet using genomics and the molecular mechanisms underlying the lichen symbiosis and the fundamentals of desiccation tolerance remain undescribed. We report the characterisation of a transcriptome of the grey reindeer lichen, Cladonia rangiferina, using high-throughput next-generation transcriptome sequencing and traditional Sanger EST sequencing data. Altogether 243,729 high quality sequence reads were de novo assembled into 16,204 contigs and 49,587 singletons. The genome of origin for the sequences produced was predicted using Eclat with sequences derived from the axenically grown symbiotic partners used as training sequences for the classification model. 62.8% of the sequences were classified as being of fungal origin while the remaining 37.2% were predicted as being of algal origin. The assembled sequences were annotated by BLASTX comparison against a non-redundant protein sequence database with 34.4% of the sequences having a BLAST match. 29.3% of the sequences had a Gene Ontology term match and 27.9% of the sequences had a domain or structural match following an InterPro search. 60 KEGG pathways with more than 10 associated sequences were identified. Our results present a first transcriptome sequencing and de novo assembly for a lichen species and describe the ongoing molecular processes and the most active pathways in C. rangiferina. This brings a meaningful contribution to publicly available lichen sequence information. These data provide a first glimpse into the molecular nature of the lichen symbiosis and

  19. Characterization of a transcriptome from a non-model organism, Cladonia rangiferina, the grey reindeer lichen, using high-throughput next generation sequencing and EST sequence data

    Directory of Open Access Journals (Sweden)

    Junttila Sini

    2012-10-01

    Full Text Available Abstract Background Lichens are symbiotic organisms that have a remarkable ability to survive in some of the most extreme terrestrial climates on earth. Lichens can endure frequent desiccation and wetting cycles and are able to survive in a dehydrated molecular dormant state for decades at a time. Genetic resources have been established in lichen species for the study of molecular systematics and their taxonomic classification. No lichen species have been characterised yet using genomics and the molecular mechanisms underlying the lichen symbiosis and the fundamentals of desiccation tolerance remain undescribed. We report the characterisation of a transcriptome of the grey reindeer lichen, Cladonia rangiferina, using high-throughput next-generation transcriptome sequencing and traditional Sanger EST sequencing data. Results Altogether 243,729 high quality sequence reads were de novo assembled into 16,204 contigs and 49,587 singletons. The genome of origin for the sequences produced was predicted using Eclat with sequences derived from the axenically grown symbiotic partners used as training sequences for the classification model. 62.8% of the sequences were classified as being of fungal origin while the remaining 37.2% were predicted as being of algal origin. The assembled sequences were annotated by BLASTX comparison against a non-redundant protein sequence database with 34.4% of the sequences having a BLAST match. 29.3% of the sequences had a Gene Ontology term match and 27.9% of the sequences had a domain or structural match following an InterPro search. 60 KEGG pathways with more than 10 associated sequences were identified. Conclusions Our results present a first transcriptome sequencing and de novo assembly for a lichen species and describe the ongoing molecular processes and the most active pathways in C. rangiferina. This brings a meaningful contribution to publicly available lichen sequence information. These data provide a first

  20. Confidence-Based Feature Acquisition

    Science.gov (United States)

    Wagstaff, Kiri L.; desJardins, Marie; MacGlashan, James

    2010-01-01

    Confidence-based Feature Acquisition (CFA) is a novel, supervised learning method for acquiring missing feature values when there is missing data at both training (learning) and test (deployment) time. To train a machine learning classifier, data is encoded with a series of input features describing each item. In some applications, the training data may have missing values for some of the features, which can be acquired at a given cost. A relevant JPL example is that of the Mars rover exploration in which the features are obtained from a variety of different instruments, with different power consumption and integration time costs. The challenge is to decide which features will lead to increased classification performance and are therefore worth acquiring (paying the cost). To solve this problem, CFA, which is made up of two algorithms (CFA-train and CFA-predict), has been designed to greedily minimize total acquisition cost (during training and testing) while aiming for a specific accuracy level (specified as a confidence threshold). With this method, it is assumed that there is a nonempty subset of features that are free; that is, every instance in the data set includes these features initially for zero cost. It is also assumed that the feature acquisition (FA) cost associated with each feature is known in advance, and that the FA cost for a given feature is the same for all instances. Finally, CFA requires that the base-level classifiers produce not only a classification, but also a confidence (or posterior probability).

  1. High-throughput sequencing of forensic genetic samples using punches of FTA cards with buccal swabs

    DEFF Research Database (Denmark)

    Kampmann, Marie-Louise; Buchard, Anders; Børsting, Claus;

    2016-01-01

    with buccal swabs and compared the results with those obtained with DNA extracted using the EZ1 DNA Investigator Kit. Concordant profiles were obtained for all samples. Our protocol includes simple punch, wash, and PCR steps, reducing cost and hands-on time in the laboratory. Furthermore, it facilitates......Here, we demonstrate that punches from buccal swab samples preserved on FTA cards can be used for high-throughput DNA sequencing, also known as massively parallel sequencing (MPS). We typed 44 reference samples with the HID-Ion AmpliSeq Identity Panel using washed 1.2 mm punches from FTA cards...

  2. Division of high resolution sequence stratigraphy units with wavelet transform of logs in Dagang Oilfield

    Institute of Scientific and Technical Information of China (English)

    2007-01-01

    Division of high resolution sequence stratigraphy units based on wavelet transform of logging data is found to be good at identifying subtle cycles of geological process in Kongnan area of Dagang Oilfield. The analysis of multi-scales gyre of formation with 1-D continuous Dmey wavelet transform of log curve (GR) and 1-D discrete Daubechies wavelet transform of log curve (Rt) all make the division of sequence interfaces more objective and precise, which avoids the artificial influence with core analysis and the uncertainty with seismic data and core analysis.

  3. A mini-IRES sequence for stringent selection of high producers

    Indian Academy of Sciences (India)

    Jun Yan; Hailin Yang; Guohua Yue; Wenda Gao

    2013-06-01

    Internal Ribosome Entry Site (IRES) sequences have been widely used to link the expression of two independent proteins on the same mRNA transcript. Genes encoding fluorescent proteins or drug-resistance enzymes are usually placed downstream of IRES, serving as expression indicators or selection markers. In biological applications where the upstream gene-of-interest is to be expressed at extremely high levels, it is often desirable to purposely reduce IRES downstream gene expression to economize the cellular resources and/or to generate more stringent selection pressure. Here we describe a miniature IRES mutant sequence (IRESmut3) with dramatically diminished co-translational efficiency to fulfill these purposes.

  4. Feature Augmentation for Learning Confidence Measure in Stereo Matching.

    Science.gov (United States)

    Kim, Sunok; Min, Dongbo; Kim, Seungryong; Sohn, Kwanghoon

    2017-09-08

    Confidence estimation is essential for refining stereo matching results through a post-processing step. This problem has recently been studied using a learning-based approach, which demonstrates a substantial improvement on conventional simple non-learning based methods. However, the formulation of learning-based methods that individually estimates the confidence of each pixel disregards spatial coherency that might exist in the confidence map, thus providing a limited performance under challenging conditions. Our key observation is that the confidence features and resulting confidence maps are smoothly varying in the spatial domain, and highly correlated within the local regions of an image. We present a new approach that imposes spatial consistency on the confidence estimation. Specifically, a set of robust confidence features is extracted from each superpixel decomposed using the Gaussian mixture model (GMM), and then these features are concatenated with pixel-level confidence features. The features are then enhanced through adaptive filtering in the feature domain. In addition, the resulting confidence map, estimated using the confidence features with a random regression forest, is further improved through K-nearest neighbor (K-NN) based aggregation scheme on both pixel-and superpixel-level. To validate the proposed confidence estimation scheme, we employ cost modulation or ground control points (GCPs) based optimization in stereo matching. Experimental results demonstrate that the proposed method outperforms state-of-the-art approaches on various benchmarks including challenging outdoor scenes.

  5. Achieving high throughput sequencing of a cDNA library utilizing an alternative protocol for the bench top next-generation sequencing system.

    Science.gov (United States)

    Wan, Minxi; Faruq, Junaid; Rosenberg, Julian N; Xia, Jinlan; Oyler, George A; Betenbaugh, Michael J

    2013-02-15

    The development of next-generation sequencing (NGS) technologies has provided novel tools for genome analysis and expression profiling. A high throughput cDNA sequencing method using a bench top next-generation sequencing system, GS Junior, is now available. Here, we used an alternative protocol to the standard method for generating the cDNA library. This protocol can decrease the number of processing steps to manipulate RNA when constructing a cDNA library from an RNA sample, and does not require mRNA isolation from total RNA. Thus it can decrease the risk of RNA degradation and the cost for preparing a cDNA library. Also, the efficiency of sequencing data obtained with this approach is comparable to the standard method as verified by sequencing characteristics and expression levels of the reference gene glyceraldehyde-3-phosphate dehydrogenase (GAPDH).

  6. Investigation of a Quadruplex-Forming Repeat Sequence Highly Enriched in Xanthomonas and Nostoc sp.

    Science.gov (United States)

    Rehm, Charlotte; Wurmthaler, Lena A; Li, Yuanhao; Frickey, Tancred; Hartig, Jörg S

    2015-01-01

    In prokaryotes simple sequence repeats (SSRs) with unit sizes of 1-5 nucleotides (nt) are causative for phase and antigenic variation. Although an increased abundance of heptameric repeats was noticed in bacteria, reports about SSRs of 6-9 nt are rare. In particular G-rich repeat sequences with the propensity to fold into G-quadruplex (G4) structures have received little attention. In silico analysis of prokaryotic genomes show putative G4 forming sequences to be abundant. This report focuses on a surprisingly enriched G-rich repeat of the type GGGNATC in Xanthomonas and cyanobacteria such as Nostoc. We studied in detail the genomes of Xanthomonas campestris pv. campestris ATCC 33913 (Xcc), Xanthomonas axonopodis pv. citri str. 306 (Xac), and Nostoc sp. strain PCC7120 (Ana). In all three organisms repeats are spread all over the genome with an over-representation in non-coding regions. Extensive variation of the number of repetitive units was observed with repeat numbers ranging from two up to 26 units. However a clear preference for four units was detected. The strong bias for four units coincides with the requirement of four consecutive G-tracts for G4 formation. Evidence for G4 formation of the consensus repeat sequences was found in biophysical studies utilizing CD spectroscopy. The G-rich repeats are preferably located between aligned open reading frames (ORFs) and are under-represented in coding regions or between divergent ORFs. The G-rich repeats are preferentially located within a distance of 50 bp upstream of an ORF on the anti-sense strand or within 50 bp from the stop codon on the sense strand. Analysis of whole transcriptome sequence data showed that the majority of repeat sequences are transcribed. The genetic loci in the vicinity of repeat regions show increased genomic stability. In conclusion, we introduce and characterize a special class of highly abundant and wide-spread quadruplex-forming repeat sequences in bacteria.

  7. High-resolution mapping and transcriptional activity analysis of chicken centromere sequences on giant lampbrush chromosomes.

    Science.gov (United States)

    Krasikova, Alla; Fukagawa, Tatsuo; Zlotina, Anna

    2012-12-01

    Exploration into morphofunctional organisation of centromere DNA sequences is important for understanding the mechanisms of kinetochore specification and assembly. In-depth epigenetic analysis of DNA fragments associated with centromeric nucleosome proteins has demonstrated unique features of centromere organisation in chicken karyotype: there are both mature centromeres, which comprise chromosome-specific homogeneous arrays of tandem repeats, and recently evolved primitive centromeres, which consist of non-tandemly organised DNA sequences. In this work, we describe the arrangement and transcriptional activity of chicken centromere repeats for Cen1, Cen2, Cen3, Cen4, Cen7, Cen8, and Cen11 and non-repetitive centromere sequences of chromosomes 5, 27, and Z using highly elongated lampbrush chromosomes, which are characteristic of the diplotene stage of oogenesis. The degree of chromatin packaging and fine spatial organisations of tandemly repetitive and non-tandemly repetitive centromeric sequences significantly differ at the lampbrush stage. Using DNA/RNA FISH, we have demonstrated that during the lampbrush stage, DNA sequences are transcribed within the centromere regions of chromosomes that lack centromere-specific tandem repeats. In contrast, chromosome-specific centromeric repeats Cen1, Cen2, Cen3, Cen4, Cen7, Cen8, and Cen11 do not demonstrate any transcriptional activity during the lampbrush stage. In addition, we found that CNM repeat cluster localises adjacent to non-repetitive centromeric sequences in chicken microchromosome 27 indicating that centromere region in this chromosome is repeat-rich. Cross-species FISH allowed localisation of the sequences homologous to centromeric DNA of chicken chromosomes 5 and 27 in centromere regions of quail orthologous chromosomes.

  8. An efficient strategy for large-scale high-throughput transposon-mediated sequencing of cDNA clones

    Science.gov (United States)

    Butterfield, Yaron S. N.; Marra, Marco A.; Asano, Jennifer K.; Chan, Susanna Y.; Guin, Ranabir; Krzywinski, Martin I.; Lee, Soo Sen; MacDonald, Kim W. K.; Mathewson, Carrie A.; Olson, Teika E.; Pandoh, Pawan K.; Prabhu, Anna-Liisa; Schnerch, Angelique; Skalska, Ursula; Smailus, Duane E.; Stott, Jeff M.; Tsai, Miranda I.; Yang, George S.; Zuyderduyn, Scott D.; Schein, Jacqueline E.; Jones, Steven J. M.

    2002-01-01

    We describe an efficient high-throughput method for accurate DNA sequencing of entire cDNA clones. Developed as part of our involvement in the Mammalian Gene Collection full-length cDNA sequencing initiative, the method has been used and refined in our laboratory since September 2000. Amenable to large scale projects, we have used the method to generate >7 Mb of accurate sequence from 3695 candidate full-length cDNAs. Sequencing is accomplished through the insertion of Mu transposon into cDNAs, followed by sequencing reactions primed with Mu-specific sequencing primers. Transposon insertion reactions are not performed with individual cDNAs but rather on pools of up to 96 clones. This pooling strategy reduces the number of transposon insertion sequencing libraries that would otherwise be required, reducing the costs and enhancing the efficiency of the transposon library construction procedure. Sequences generated using transposon-specific sequencing primers are assembled to yield the full-length cDNA sequence, with sequence editing and other sequence finishing activities performed as required to resolve sequence ambiguities. Although analysis of the many thousands (22 785) of sequenced Mu transposon insertion events revealed a weak sequence preference for Mu insertion, we observed insertion of the Mu transposon into 1015 of the possible 1024 5mer candidate insertion sites. PMID:12034834

  9. High Sequence Variations in Mitochondrial DNA Control Region among Worldwide Populations of Flathead Mullet Mugil cephalus

    Directory of Open Access Journals (Sweden)

    Brian Wade Jamandre

    2014-01-01

    Full Text Available The sequence and structure of the complete mtDNA control region (CR of M. cephalus from African, Pacific, and Atlantic populations are presented in this study to assess its usefulness in phylogeographic studies of this species. The mtDNA CR sequence variations among M. cephalus populations largely exceeded intraspecific polymorphisms that are generally observed in other vertebrates. The length of CR sequence varied among M. cephalus populations due to the presence of indels and variable number of tandem repeats at the 3′ hypervariable domain. The high evolutionary rate of the CR in this species probably originated from these mutations. However, no excessive homoplasic mutations were noticed. Finally, the star shaped tree inferred from the CR polymorphism stresses a rapid radiation worldwide, in this species. The CR still appears as a good marker for phylogeographic investigations and additional worldwide samples are warranted to further investigate the genetic structure and evolution in M. cephalus.

  10. HIGH RESOLUTION IMAGE PROJECTION IN FREQUENCY DOMAIN FOR CONTINUOUS IMAGE SEQUENCE

    Directory of Open Access Journals (Sweden)

    M. Nagaraju Naik

    2010-09-01

    Full Text Available Unlike most other information technologies, which have enjoyed an exponential growth for the past several decades, display resolution has largely stagnated. Low display resolution has in turn limited the resolution of digital images. Scaling is a non-trivial process that involves a trade-off between efficiency, smoothness and sharpness. As the size of an image is increased, so the pixels, which comprise the image, become increasingly visible, making the image to appear soft. Super scalar representation of image sequence is limited due to image information present in low dimensional image sequence. To project a image frame sequence into high-resolution static or fractional scalingvalue, a scaling approach is developed based on energy spectral interpolation and frequency spectral interpolation techniques. To realize the frequency spectral resolution Cubic-B-Spline method is used.

  11. Comparing the performance of three ancient DNA extraction methods for high-throughput sequencing

    DEFF Research Database (Denmark)

    Gamba, Cristina; Hanghøj, Kristian Ebbesen; Gaunitz, Charleen

    2016-01-01

    The DNA molecules that can be extracted from archaeological and palaeontological remains are often degraded and massively contaminated with environmental microbial material. This reduces the efficacy of shotgun approaches for sequencing ancient genomes, despite the decreasing sequencing costs...... of high-throughput sequencing (HTS). Improving the recovery of endogenous molecules from the DNA extraction and purification steps could, thus, help advance the characterization of ancient genomes. Here, we apply the three most commonly used DNA extraction methods to five ancient bone samples spanning...... a ~30 thousand year temporal range and originating from a diversity of environments, from South America to Alaska. We show that methods based on the purification of DNA fragments using silica columns are more advantageous than in solution methods and increase not only the total amount of DNA molecules...

  12. High sequence variability among hemocyte-specific Kazal-type proteinase inhibitors in decapod crustaceans.

    Science.gov (United States)

    Cerenius, Lage; Liu, Haipeng; Zhang, Yanjiao; Rimphanitchayakit, Vichien; Tassanakajon, Anchalee; Gunnar Andersson, M; Söderhäll, Kenneth; Söderhäll, Irene

    2010-01-01

    Crustacean hemocytes were found to produce a large number of transcripts coding for Kazal-type proteinase inhibitors (KPIs). A detailed study performed with the crayfish Pacifastacus leniusculus and the shrimp Penaeus monodon revealed the presence of at least 26 and 20 different Kazal domains from the hemocyte KPIs, respectively. Comparisons with KPIs from other taxa indicate that the sequences of these domains evolve rapidly. A few conserved positions, e.g. six invariant cysteines were present in all domain sequences whereas the position of P1 amino acid, a determinant for substrate specificity, varied highly. A study with a single crayfish animal suggested that even at the individual level considerable sequence variability among hemocyte KPIs produced exist. Expression analysis of four crayfish KPI transcripts in hematopoietic tissue cells and different hemocyte types suggest that some of these KPIs are likely to be involved in hematopoiesis or hemocyte release as they were produced in particular hemocyte types or maturation stages only.

  13. Social media sentiment and consumer confidence

    OpenAIRE

    Piet J.H. Daas; Puts, Marco J.H.

    2014-01-01

    Changes in the sentiment of Dutch public social media messages were compared with changes in monthly consumer confidence over a period of three-and-a-half years, revealing that both were highly correlated (up to r = 0.9) and that both series cointegrated. This phenomenon is predominantly affected by changes in the sentiment of all Dutch public Facebook messages. The inclusion of various selections of public Twitter messages improved this association and the response to changes in sentiment. G...

  14. High-throughput novel microsatellite marker of faba bean via next generation sequencing

    Directory of Open Access Journals (Sweden)

    Yang Tao

    2012-11-01

    Full Text Available Abstract Background Faba bean (Vicia faba L. is an important food legume crop, grown for human consumption globally including in China, Turkey, Egypt and Ethiopia. Although genetic gain has been made through conventional selection and breeding efforts, this could be substantially improved through the application of molecular methods. For this, a set of reliable molecular markers representative of the entire genome is required. Results A library with 125,559 putative SSR sequences was constructed and characterized for repeat type and length from a mixed genome of 247 spring and winter sown faba bean genotypes using 454 sequencing. A suit of 28,503 primer pair sequences were designed and 150 were randomly selected for validation. Of these, 94 produced reproducible amplicons that were polymorphic among 32 faba bean genotypes selected from diverse geographical locations. The number of alleles per locus ranged from 2 to 8, the expected heterozygocities ranged from 0.0000 to 1.0000, and the observed heterozygosities ranged from 0.0908 to 0.8410. The validation by UPGMA cluster analysis of 32 genotypes based on Nei's genetic distance, showed high quality and effectiveness of those novel SSR markers developed via next generation sequencing technology. Conclusions Large scale SSR marker development was successfully achieved using next generation sequencing of the V. faba genome. These novel markers are valuable for constructing genetic linkage maps, future QTL mapping, and marker-assisted trait selection in faba bean breeding efforts.

  15. Accuracy of the high-throughput amplicon sequencing to identify species within the genus Aspergillus.

    Science.gov (United States)

    Lee, Seungeun; Yamamoto, Naomichi

    2015-12-01

    This study characterized the accuracy of high-throughput amplicon sequencing to identify species within the genus Aspergillus. To this end, we sequenced the internal transcribed spacer 1 (ITS1), β-tubulin (BenA), and calmodulin (CaM) gene encoding sequences as DNA markers from eight reference Aspergillus strains with known identities using 300-bp sequencing on the Illumina MiSeq platform, and compared them with the BLASTn outputs. The identifications with the sequences longer than 250 bp were accurate at the section rank, with some ambiguities observed at the species rank due to mostly cross detection of sibling species. Additionally, in silico analysis was performed to predict the identification accuracy for all species in the genus Aspergillus, where 107, 210, and 187 species were predicted to be identifiable down to the species rank based on ITS1, BenA, and CaM, respectively. Finally, air filter samples were analysed to quantify the relative abundances of Aspergillus species in outdoor air. The results were reproducible across biological duplicates both at the species and section ranks, but not strongly correlated between ITS1 and BenA, suggesting the Aspergillus detection can be taxonomically biased depending on the selection of the DNA markers and/or primers.

  16. Galaxy Workflows for Web-based Bioinformatics Analysis of Aptamer High-throughput Sequencing Data

    Directory of Open Access Journals (Sweden)

    William H Thiel

    2016-01-01

    Full Text Available Development of RNA and DNA aptamers for diagnostic and therapeutic applications is a rapidly growing field. Aptamers are identified through iterative rounds of selection in a process termed SELEX (Systematic Evolution of Ligands by EXponential enrichment. High-throughput sequencing (HTS revolutionized the modern SELEX process by identifying millions of aptamer sequences across multiple rounds of aptamer selection. However, these vast aptamer HTS datasets necessitated bioinformatics techniques. Herein, we describe a semiautomated approach to analyze aptamer HTS datasets using the Galaxy Project, a web-based open source collection of bioinformatics tools that were originally developed to analyze genome, exome, and transcriptome HTS data. Using a series of Workflows created in the Galaxy webserver, we demonstrate efficient processing of aptamer HTS data and compilation of a database of unique aptamer sequences. Additional Workflows were created to characterize the abundance and persistence of aptamer sequences within a selection and to filter sequences based on these parameters. A key advantage of this approach is that the online nature of the Galaxy webserver and its graphical interface allow for the analysis of HTS data without the need to compile code or install multiple programs.

  17. Analysis of 4,664 high-quality sequence-finished poplar full-length

    Energy Technology Data Exchange (ETDEWEB)

    Ralph, S. [University of British Columbia, Vancouver; Gunter, Lee E [ORNL; Tuskan, Gerald A [ORNL; Douglas, Carl [University of British Columbia, Vancouver; Holt, Robert A. [Genome Sciences Centre, Vancouver, BC, Canada; Jones, Steven [Genome Sciences Centre, Vancouver, BC, Canada; Marra, Marco [Genome Sciences Centre, Vancouver, BC, Canada; Bohlmann, J. [University of British Columbia, Vancouver

    2008-01-01

    The genus Populus includes poplars, aspens and cottonwoods, which will be collectively referred to as poplars hereafter unless otherwise specified. Poplars are the dominant tree species in many forest ecosystems in the Northern Hemisphere and are of substantial economic value in plantation forestry. Poplar has been established as a model system for genomics studies of growth, development, and adaptation of woody perennial plants including secondary xylem formation, dormancy, adaptation to local environments, and biotic interactions. As part of the poplar genome sequencing project and the development of genomic resources for poplar, we have generated a full-length (FL)-cDNA collection using the biotinylated CAP trapper method. We constructed four FLcDNA libraries using RNA from xylem, phloem and cambium, and green shoot tips and leaves from the P. trichocarpa Nisqually-1 genotype, as well as insect-attacked leaves of the P. trichocarpa x P. deltoides hybrid. Following careful selection of candidate cDNA clones, we used a combined strategy of paired end reads and primer walking to generate a set of 4,664 high-accuracy, sequence-verified FLcDNAs, which clustered into 3,990 putative unique genes. Mapping FLcDNAs to the poplar genome sequence combined with BLAST comparisons to previously predicted protein coding sequences in the poplar genome identified 39 FLcDNAs that likely localize to gaps in the current genome sequence assembly. Another 173 FLcDNAs mapped to the genome sequence but were not included among the previously predicted genes in the poplar genome. Comparative sequence analysis against Arabidopsis thaliana and other species in the non-redundant database of GenBank revealed that 11.5% of the poplar FLcDNAs display no significant sequence similarity to other plant proteins. By mapping the poplar FLcDNAs against transcriptome data previously obtained with a 15.5 K cDNA microarray, we identified 153 FLcDNA clones for genes that were differentially expressed in

  18. Comparison of highly repeated DNA sequences in some Lemuridae and taxonomic implications.

    Science.gov (United States)

    Montagnon, D; Crovella, S; Rumpler, Y

    1993-01-01

    Highly repeated DNA sequences of Eulemur fulvus mayottensis, E. coronatus, Lemur catta, and Hapalemur griseus griseus have been identified and compared. Sequence analysis of highly repeated DNA fragments isolated from L. catta and Hapalemur showed a high percentage of similarity (nearly 95%), as did fragments isolated from the two very close Eulemur species, whereas comparison of the DNA fragments isolated from the two Eulemur species and the L. catta/Hapalemur group showed a very low percentage (approximately 40%) of identity, as might be expected for distant species. These results confirm our previous data, obtained by Southern blot hybridization techniques on the same species, and strongly support the existence of a common trunk between L. catta and Hapalemur, but different from the leading to the Eulemur species.

  19. Rapid high resolution genotyping of Francisella tularensis by whole genome sequence comparison of annotated genes ("MLST+".

    Directory of Open Access Journals (Sweden)

    Markus H Antwerpen

    Full Text Available The zoonotic disease tularemia is caused by the bacterium Francisella tularensis. This pathogen is considered as a category A select agent with potential to be misused in bioterrorism. Molecular typing based on DNA-sequence like canSNP-typing or MLVA has become the accepted standard for this organism. Due to the organism's highly clonal nature, the current typing methods have reached their limit of discrimination for classifying closely related subpopulations within the subspecies F. tularensis ssp. holarctica. We introduce a new gene-by-gene approach, MLST+, based on whole genome data of 15 sequenced F. tularensis ssp. holarctica strains and apply this approach to investigate an epidemic of lethal tularemia among non-human primates in two animal facilities in Germany. Due to the high resolution of MLST+ we are able to demonstrate that three independent clones of this highly infectious pathogen were responsible for these spatially and temporally restricted outbreaks.

  20. Large-Scale Biomonitoring of Remote and Threatened Ecosystems via High-Throughput Sequencing

    Science.gov (United States)

    Gibson, Joel F.; Shokralla, Shadi; Curry, Colin; Baird, Donald J.; Monk, Wendy A.; King, Ian; Hajibabaei, Mehrdad

    2015-01-01

    Biodiversity metrics are critical for assessment and monitoring of ecosystems threatened by anthropogenic stressors. Existing sorting and identification methods are too expensive and labour-intensive to be scaled up to meet management needs. Alternately, a high-throughput DNA sequencing approach could be used to determine biodiversity metrics from bulk environmental samples collected as part of a large-scale biomonitoring program. Here we show that both morphological and DNA sequence-based analyses are suitable for recovery of individual taxonomic richness, estimation of proportional abundance, and calculation of biodiversity metrics using a set of 24 benthic samples collected in the Peace-Athabasca Delta region of Canada. The high-throughput sequencing approach was able to recover all metrics with a higher degree of taxonomic resolution than morphological analysis. The reduced cost and increased capacity of DNA sequence-based approaches will finally allow environmental monitoring programs to operate at the geographical and temporal scale required by industrial and regulatory end-users. PMID:26488407

  1. Frequency-locked pulse sequencer for high-frame-rate monochromatic tissue motion imaging.

    Science.gov (United States)

    Azar, Reza Zahiri; Baghani, Ali; Salcudean, Septimiu E; Rohling, Robert

    2011-04-01

    To overcome the inherent low frame rate of conventional ultrasound, we have previously presented a system that can be implemented on conventional ultrasound scanners for high-frame-rate imaging of monochromatic tissue motion. The system employs a sector subdivision technique in the sequencer to increase the acquisition rate. To eliminate the delays introduced during data acquisition, a motion phase correction algorithm has also been introduced to create in-phase displacement images. Previous experimental results from tissue- mimicking phantoms showed that the system can achieve effective frame rates of up to a few kilohertz on conventional ultrasound systems. In this short communication, we present a new pulse sequencing strategy that facilitates high-frame-rate imaging of monochromatic motion such that the acquired echo signals are inherently in-phase. The sequencer uses the knowledge of the excitation frequency to synchronize the acquisition of the entire imaging plane to that of an external exciter. This sequencing approach eliminates any need for synchronization or phase correction and has applications in tissue elastography, which we demonstrate with tissue-mimicking phantoms.

  2. New ancient DNA sequences suggest high genetic diversity for the woolly mammoth (Mammuthus primigenius )

    Institute of Scientific and Technical Information of China (English)

    2006-01-01

    Partial DNA sequences of cytochrome b gene (mtDNA) were successfully retrieved from Late Pleistocene fossil bone of Mammuthus primigenius collected from the Xiguitu County (Yakeshi), Inner Mongolia Autonomous Region and from Zhaodong, Harbin of Heilongjiang Province in northern China. Two ancient DNA fragments ( 109 bp and 124 bp) were authenticated by reproducible experiments in two different laboratories and by phylogenetic analysis with other Elephantidae taxa. Phylogenetic analysis using these sequences and published data in either separate or combined datasets indicate unstable relationship among the woolly mammoth and the two living elephants, Elephas and Loxodonta. In addition to the short sequences used to attempt the long independent evolution of Elephantidae terminal taxa, we suggest that a high intra-specific diversity existed in Mammuthus primigenius crossing both spatial and temporal ranges, resulting in a complex and divergent genetic background for DNA sequences so far recovered. The high genetic diversity in the extinct woolly mammoth can explain the apparent instability of Elephantidae taxa on the molecular phylogenetic trees and can reconcile the apparent paradox regarding the unresolved Elephantidae trichotomy.

  3. High Resolution Imaging of PHIBSS z~2 Main Sequence Galaxies in CO J=1-0

    CERN Document Server

    Bolatto, A D; Leroy, A K; Tacconi, L J; Bouché, N; Schreiber, N M Förster; Genzel, R; Cooper, M C; Fisher, D B; Combes, F; García-Burillo, S; Burkert, A; Bournaud, F; Weiss, A; Saintonge, A; Wuyts, S; Sternberg, A

    2015-01-01

    We present Karl G. Jansky Very Large Array observations of the CO J=1-0 transition in a sample of four $z\\sim2$ main sequence galaxies. These galaxies are in the blue sequence of star-forming galaxies at their redshift, and are part of the IRAM Plateau de Bure HIgh-$z$ Blue Sequence Survey (PHIBSS) which imaged them in CO J=3-2. Two galaxies are imaged here at high signal-to-noise, allowing determinations of their disk sizes, line profiles, molecular surface densities, and excitation. Using these and published measurements, we show that the CO and optical disks have similar sizes in main-sequence galaxies, and in the galaxy where we can compare CO J=1-0 and J=3-2 sizes we find these are also very similar. Assuming a Galactic CO-to-H$_2$ conversion, we measure surface densities of $\\Sigma_{mol}\\sim1200$ M$_\\odot$pc$^{-2}$ in projection and estimate $\\Sigma_{mol}\\sim500-900$ M$_\\odot$pc$^{-2}$ deprojected. Finally, our data yields velocity-integrated Rayleigh-Jeans brightness temperature line ratios $r_{31}$ th...

  4. An integrated framework for discovery and genotyping of genomic variants from high-throughput sequencing experiments.

    Science.gov (United States)

    Duitama, Jorge; Quintero, Juan Camilo; Cruz, Daniel Felipe; Quintero, Constanza; Hubmann, Georg; Foulquié-Moreno, Maria R; Verstrepen, Kevin J; Thevelein, Johan M; Tohme, Joe

    2014-04-01

    Recent advances in high-throughput sequencing (HTS) technologies and computing capacity have produced unprecedented amounts of genomic data that have unraveled the genetics of phenotypic variability in several species. However, operating and integrating current software tools for data analysis still require important investments in highly skilled personnel. Developing accurate, efficient and user-friendly software packages for HTS data analysis will lead to a more rapid discovery of genomic elements relevant to medical, agricultural and industrial applications. We therefore developed Next-Generation Sequencing Eclipse Plug-in (NGSEP), a new software tool for integrated, efficient and user-friendly detection of single nucleotide variants (SNVs), indels and copy number variants (CNVs). NGSEP includes modules for read alignment, sorting, merging, functional annotation of variants, filtering and quality statistics. Analysis of sequencing experiments in yeast, rice and human samples shows that NGSEP has superior accuracy and efficiency, compared with currently available packages for variants detection. We also show that only a comprehensive and accurate identification of repeat regions and CNVs allows researchers to properly separate SNVs from differences between copies of repeat elements. We expect that NGSEP will become a strong support tool to empower the analysis of sequencing data in a wide range of research projects on different species.

  5. Analysis of high-throughput sequencing and annotation strategies for phage genomes.

    Directory of Open Access Journals (Sweden)

    Matthew R Henn

    Full Text Available BACKGROUND: Bacterial viruses (phages play a critical role in shaping microbial populations as they influence both host mortality and horizontal gene transfer. As such, they have a significant impact on local and global ecosystem function and human health. Despite their importance, little is known about the genomic diversity harbored in phages, as methods to capture complete phage genomes have been hampered by the lack of knowledge about the target genomes, and difficulties in generating sufficient quantities of genomic DNA for sequencing. Of the approximately 550 phage genomes currently available in the public domain, fewer than 5% are marine phage. METHODOLOGY/PRINCIPAL FINDINGS: To advance the study of phage biology through comparative genomic approaches we used marine cyanophage as a model system. We compared DNA preparation methodologies (DNA extraction directly from either phage lysates or CsCl purified phage particles, and sequencing strategies that utilize either Sanger sequencing of a linker amplification shotgun library (LASL or of a whole genome shotgun library (WGSL, or 454 pyrosequencing methods. We demonstrate that genomic DNA sample preparation directly from a phage lysate, combined with 454 pyrosequencing, is best suited for phage genome sequencing at scale, as this method is capable of capturing complete continuous genomes with high accuracy. In addition, we describe an automated annotation informatics pipeline that delivers high-quality annotation and yields few false positives and negatives in ORF calling. CONCLUSIONS/SIGNIFICANCE: These DNA preparation, sequencing and annotation strategies enable a high-throughput approach to the burgeoning field of phage genomics.

  6. High-Accuracy HLA Type Inference from Whole-Genome Sequencing Data Using Population Reference Graphs.

    Directory of Open Access Journals (Sweden)

    Alexander T Dilthey

    2016-10-01

    Full Text Available Genetic variation at the Human Leucocyte Antigen (HLA genes is associated with many autoimmune and infectious disease phenotypes, is an important element of the immunological distinction between self and non-self, and shapes immune epitope repertoires. Determining the allelic state of the HLA genes (HLA typing as a by-product of standard whole-genome sequencing data would therefore be highly desirable and enable the immunogenetic characterization of samples in currently ongoing population sequencing projects. Extensive hyperpolymorphism and sequence similarity between the HLA genes, however, pose problems for accurate read mapping and make HLA type inference from whole-genome sequencing data a challenging problem. We describe how to address these challenges in a Population Reference Graph (PRG framework. First, we construct a PRG for 46 (mostly HLA genes and pseudogenes, their genomic context and their characterized sequence variants, integrating a database of over 10,000 known allele sequences. Second, we present a sequence-to-PRG paired-end read mapping algorithm that enables accurate read mapping for the HLA genes. Third, we infer the most likely pair of underlying alleles at G group resolution from the IMGT/HLA database at each locus, employing a simple likelihood framework. We show that HLA*PRG, our algorithm, outperforms existing methods by a wide margin. We evaluate HLA*PRG on six classical class I and class II HLA genes (HLA-A, -B, -C, -DQA1, -DQB1, -DRB1 and on a set of 14 samples (3 samples with 2 x 100bp, 11 samples with 2 x 250bp Illumina HiSeq data. Of 158 alleles tested, we correctly infer 157 alleles (99.4%. We also identify and re-type two erroneous alleles in the original validation data. We conclude that HLA*PRG for the first time achieves accuracies comparable to gold-standard reference methods from standard whole-genome sequencing data, though high computational demands (currently ~30-250 CPU hours per sample remain a

  7. High-Accuracy HLA Type Inference from Whole-Genome Sequencing Data Using Population Reference Graphs.

    Science.gov (United States)

    Dilthey, Alexander T; Gourraud, Pierre-Antoine; Mentzer, Alexander J; Cereb, Nezih; Iqbal, Zamin; McVean, Gil

    2016-10-01

    Genetic variation at the Human Leucocyte Antigen (HLA) genes is associated with many autoimmune and infectious disease phenotypes, is an important element of the immunological distinction between self and non-self, and shapes immune epitope repertoires. Determining the allelic state of the HLA genes (HLA typing) as a by-product of standard whole-genome sequencing data would therefore be highly desirable and enable the immunogenetic characterization of samples in currently ongoing population sequencing projects. Extensive hyperpolymorphism and sequence similarity between the HLA genes, however, pose problems for accurate read mapping and make HLA type inference from whole-genome sequencing data a challenging problem. We describe how to address these challenges in a Population Reference Graph (PRG) framework. First, we construct a PRG for 46 (mostly HLA) genes and pseudogenes, their genomic context and their characterized sequence variants, integrating a database of over 10,000 known allele sequences. Second, we present a sequence-to-PRG paired-end read mapping algorithm that enables accurate read mapping for the HLA genes. Third, we infer the most likely pair of underlying alleles at G group resolution from the IMGT/HLA database at each locus, employing a simple likelihood framework. We show that HLA*PRG, our algorithm, outperforms existing methods by a wide margin. We evaluate HLA*PRG on six classical class I and class II HLA genes (HLA-A, -B, -C, -DQA1, -DQB1, -DRB1) and on a set of 14 samples (3 samples with 2 x 100bp, 11 samples with 2 x 250bp Illumina HiSeq data). Of 158 alleles tested, we correctly infer 157 alleles (99.4%). We also identify and re-type two erroneous alleles in the original validation data. We conclude that HLA*PRG for the first time achieves accuracies comparable to gold-standard reference methods from standard whole-genome sequencing data, though high computational demands (currently ~30-250 CPU hours per sample) remain a significant

  8. Pitfalls of mapping high throughput sequencing data to repetitive sequences: Piwi’s genomic targets still not identified

    Science.gov (United States)

    Marinov, Georgi K.; Wang, Jie; Handler, Dominik; Wold, Barbara J.; Weng, Zhiping; Hannon, Gregory J.; Aravin, Alexei A.; Zamore, Phillip D.; Brennecke, Julius; Toth, Katalin Fejes

    2015-01-01

    Huang et al. (2013) recently reported that chromatin immuno-precipitation followed by sequencing (ChIP-seq) reveals the genome-wide sites of occupancy by Piwi - a piRNA-guided Argonaute protein central to transposon silencing in Drosophila. Their study also reported that loss of Piwi causes widespread rewiring of transcriptional patterns as evidenced by changes in RNA polymerase II occupancy across the genome. Here we reanalyze their underlying deep sequencing data and report that the data do not support the author’s central conclusions. PMID:25805138

  9. Alignment of high-throughput sequencing data inside in-memory databases.

    Science.gov (United States)

    Firnkorn, Daniel; Knaup-Gregori, Petra; Lorenzo Bermejo, Justo; Ganzinger, Matthias

    2014-01-01

    In times of high-throughput DNA sequencing techniques, performance-capable analysis of DNA sequences is of high importance. Computer supported DNA analysis is still an intensive time-consuming task. In this paper we explore the potential of a new In-Memory database technology by using SAP's High Performance Analytic Appliance (HANA). We focus on read alignment as one of the first steps in DNA sequence analysis. In particular, we examined the widely used Burrows-Wheeler Aligner (BWA) and implemented stored procedures in both, HANA and the free database system MySQL, to compare execution time and memory management. To ensure that the results are comparable, MySQL has been running in memory as well, utilizing its integrated memory engine for database table creation. We implemented stored procedures, containing exact and inexact searching of DNA reads within the reference genome GRCh37. Due to technical restrictions in SAP HANA concerning recursion, the inexact matching problem could not be implemented on this platform. Hence, performance analysis between HANA and MySQL was made by comparing the execution time of the exact search procedures. Here, HANA was approximately 27 times faster than MySQL which means, that there is a high potential within the new In-Memory concepts, leading to further developments of DNA analysis procedures in the future.

  10. BatchPrimer3: a high throughput web application for PCR and sequencing primer design.

    Science.gov (United States)

    You, Frank M; Huo, Naxin; Gu, Yong Qiang; Luo, Ming-Cheng; Ma, Yaqin; Hane, Dave; Lazo, Gerard R; Dvorak, Jan; Anderson, Olin D

    2008-05-29

    Microsatellite (simple sequence repeat - SSR) and single nucleotide polymorphism (SNP) markers are two types of important genetic markers useful in genetic mapping and genotyping. Often, large-scale genomic research projects require high-throughput computer-assisted primer design. Numerous such web-based or standard-alone programs for PCR primer design are available but vary in quality and functionality. In particular, most programs lack batch primer design capability. Such a high-throughput software tool for designing SSR flanking primers and SNP genotyping primers is increasingly demanded. A new web primer design program, BatchPrimer3, is developed based on Primer3. BatchPrimer3 adopted the Primer3 core program as a major primer design engine to choose the best primer pairs. A new score-based primer picking module is incorporated into BatchPrimer3 and used to pick position-restricted primers. BatchPrimer3 v1.0 implements several types of primer designs including generic primers, SSR primers together with SSR detection, and SNP genotyping primers (including single-base extension primers, allele-specific primers, and tetra-primers for tetra-primer ARMS PCR), as well as DNA sequencing primers. DNA sequences in FASTA format can be batch read into the program. The basic information of input sequences, as a reference of parameter setting of primer design, can be obtained by pre-analysis of sequences. The input sequences can be pre-processed and masked to exclude and/or include specific regions, or set targets for different primer design purposes as in Primer3Web and primer3Plus. A tab-delimited or Excel-formatted primer output also greatly facilitates the subsequent primer-ordering process. Thousands of primers, including wheat conserved intron-flanking primers, wheat genome-specific SNP genotyping primers, and Brachypodium SSR flanking primers in several genome projects have been designed using the program and validated in several laboratories. BatchPrimer3 is a

  11. Analysis of the Repertoire Features of TCR Beta Chain CDR3 in Human by High-Throughput Sequencing

    Directory of Open Access Journals (Sweden)

    Xianliang Hou

    2016-07-01

    Full Text Available Background/Aims: To ward off a wide variety of pathogens, the human adaptive immune system harbors a vast array of T-cell receptors, collectively referred to as the TCR repertoire. Assessment of the repertoire features of TCR is vital for us to deeper understand of immune behaviour and immune response. Methods: In this study, we used a combination of multiplex-PCR, Illumina sequencing and IMGT (ImMunoGeneTics/HighV-QUEST for a standardized analysis of the repertoire features of TCR beta chain in the blood of healthy individuals, including the repertoire features of public TCR complementarity-determining regions (CDR3 sequences, highly expanded clones, long TCR CDR3 sequences. Results: We found that public CDR3 sequences and high-frequency sequences had the same characteristics, both of them had fewer nucleotide additions and shorter CDR3 length, which were closer to the germline sequence. Moreover, our studies provided evidence that public amino acid sequences are produced by multiple nucleotide sequences. Notably, there was skewed VDJ segment usage in long CDR3 sequences, the expression levels of 10 TRβV segments, 7 TRβJ segments and 2 TRβD segments were significantly different in the long CDR3 sequences compared to the short CDR3 sequences. Moreover, we identified that extensive N additions and increase of D gene usage contributing to TCR CDR3 length, and observed there was distinct usage frequency of amino acids in long CDR3 sequences compared to the short CDR3 sequences. Conclusions: Some repertoire features could be observed in the public sequences, highly abundance clones, and long TCR CDR3 sequences, which might be helpful for further study of immune behavior and immune response.

  12. High-throughput Sequencing Based Immune Repertoire Study during Infectious Disease

    Directory of Open Access Journals (Sweden)

    Dongni Hou

    2016-08-01

    Full Text Available The selectivity of the adaptive immune response is based on the enormous diversity of T and B cell antigen-specific receptors. The immune repertoire, the collection of T and B cells with functional diversity in the circulatory system at any given time, is dynamic and reflects the essence of immune selectivity. In this article, we review the recent advances in immune repertoire study of infectious diseases that achieved by traditional techniques and high-throughput sequencing techniques. High-throughput sequencing techniques enable the determination of complementary regions of lymphocyte receptors with unprecedented efficiency and scale. This progress in methodology enhances the understanding of immunologic changes during pathogen challenge, and also provides a basis for further development of novel diagnostic markers, immunotherapies and vaccines.

  13. The Main Sequences of Starforming Galaxies and Active Galactic Nuclei at High Redshift

    CERN Document Server

    Mancuso, Claudia; Shi, J; Gonzàlez-Nuevo, J; Bèthermin, M; Danese, L

    2016-01-01

    We provide a novel, unifying physical interpretation on the origin, the average shape, the scatter, and the cosmic evolution for the main sequences of starforming galaxies and active galactic nuclei at high redshift z $\\gtrsim$ 1. We achieve this goal in a model-independent way by exploiting: (i) the redshift-dependent SFR functions based on the latest UV/far-IR data from HST/Herschel, and re- lated statistics of strong gravitationally lensed sources; (ii) deterministic evolutionary tracks for the history of star formation and black hole accretion, gauged on a wealth of multiwavelength observations including the observed Eddington ratio distribution. We further validate these ingredients by showing their consistency with the observed galaxy stellar mass functions and AGN bolometric luminosity functions at different redshifts via the continuity equation approach. Our analysis of the main sequence for high-redshift galaxies and AGNs highlights that the present data are consistently interpreted in terms of an in...

  14. Fixing Formalin: A Method to Recover Genomic-Scale DNA Sequence Data from Formalin-Fixed Museum Specimens Using High-Throughput Sequencing

    Science.gov (United States)

    Hykin, Sarah M.; Bi, Ke; McGuire, Jimmy A.

    2015-01-01

    For 150 years or more, specimens were routinely collected and deposited in natural history collections without preserving fresh tissue samples for genetic analysis. In the case of most herpetological specimens (i.e. amphibians and reptiles), attempts to extract and sequence DNA from formalin-fixed, ethanol-preserved specimens—particularly for use in phylogenetic analyses—has been laborious and largely ineffective due to the highly fragmented nature of the DNA. As a result, tens of thousands of specimens in herpetological collections have not been available for sequence-based phylogenetic studies. Massively parallel High-Throughput Sequencing methods and the associated bioinformatics, however, are particularly suited to recovering meaningful genetic markers from severely degraded/fragmented DNA sequences such as DNA damaged by formalin-fixation. In this study, we compared previously published DNA extraction methods on three tissue types subsampled from formalin-fixed specimens of Anolis carolinensis, followed by sequencing. Sufficient quality DNA was recovered from liver tissue, making this technique minimally destructive to museum specimens. Sequencing was only successful for the more recently collected specimen (collected ~30 ybp). We suspect this could be due either to the conditions of preservation and/or the amount of tissue used for extraction purposes. For the successfully sequenced sample, we found a high rate of base misincorporation. After rigorous trimming, we successfully mapped 27.93% of the cleaned reads to the reference genome, were able to reconstruct the complete mitochondrial genome, and recovered an accurate phylogenetic placement for our specimen. We conclude that the amount of DNA available, which can vary depending on specimen age and preservation conditions, will determine if sequencing will be successful. The technique described here will greatly improve the value of museum collections by making many formalin-fixed specimens available for

  15. Fixing Formalin: A Method to Recover Genomic-Scale DNA Sequence Data from Formalin-Fixed Museum Specimens Using High-Throughput Sequencing.

    Science.gov (United States)

    Hykin, Sarah M; Bi, Ke; McGuire, Jimmy A

    2015-01-01

    For 150 years or more, specimens were routinely collected and deposited in natural history collections without preserving fresh tissue samples for genetic analysis. In the case of most herpetological specimens (i.e. amphibians and reptiles), attempts to extract and sequence DNA from formalin-fixed, ethanol-preserved specimens-particularly for use in phylogenetic analyses-has been laborious and largely ineffective due to the highly fragmented nature of the DNA. As a result, tens of thousands of specimens in herpetological collections have not been available for sequence-based phylogenetic studies. Massively parallel High-Throughput Sequencing methods and the associated bioinformatics, however, are particularly suited to recovering meaningful genetic markers from severely degraded/fragmented DNA sequences such as DNA damaged by formalin-fixation. In this study, we compared previously published DNA extraction methods on three tissue types subsampled from formalin-fixed specimens of Anolis carolinensis, followed by sequencing. Sufficient quality DNA was recovered from liver tissue, making this technique minimally destructive to museum specimens. Sequencing was only successful for the more recently collected specimen (collected ~30 ybp). We suspect this could be due either to the conditions of preservation and/or the amount of tissue used for extraction purposes. For the successfully sequenced sample, we found a high rate of base misincorporation. After rigorous trimming, we successfully mapped 27.93% of the cleaned reads to the reference genome, were able to reconstruct the complete mitochondrial genome, and recovered an accurate phylogenetic placement for our specimen. We conclude that the amount of DNA available, which can vary depending on specimen age and preservation conditions, will determine if sequencing will be successful. The technique described here will greatly improve the value of museum collections by making many formalin-fixed specimens available for

  16. Fixing Formalin: A Method to Recover Genomic-Scale DNA Sequence Data from Formalin-Fixed Museum Specimens Using High-Throughput Sequencing.

    Directory of Open Access Journals (Sweden)

    Sarah M Hykin

    Full Text Available For 150 years or more, specimens were routinely collected and deposited in natural history collections without preserving fresh tissue samples for genetic analysis. In the case of most herpetological specimens (i.e. amphibians and reptiles, attempts to extract and sequence DNA from formalin-fixed, ethanol-preserved specimens-particularly for use in phylogenetic analyses-has been laborious and largely ineffective due to the highly fragmented nature of the DNA. As a result, tens of thousands of specimens in herpetological collections have not been available for sequence-based phylogenetic studies. Massively parallel High-Throughput Sequencing methods and the associated bioinformatics, however, are particularly suited to recovering meaningful genetic markers from severely degraded/fragmented DNA sequences such as DNA damaged by formalin-fixation. In this study, we compared previously published DNA extraction methods on three tissue types subsampled from formalin-fixed specimens of Anolis carolinensis, followed by sequencing. Sufficient quality DNA was recovered from liver tissue, making this technique minimally destructive to museum specimens. Sequencing was only successful for the more recently collected specimen (collected ~30 ybp. We suspect this could be due either to the conditions of preservation and/or the amount of tissue used for extraction purposes. For the successfully sequenced sample, we found a high rate of base misincorporation. After rigorous trimming, we successfully mapped 27.93% of the cleaned reads to the reference genome, were able to reconstruct the complete mitochondrial genome, and recovered an accurate phylogenetic placement for our specimen. We conclude that the amount of DNA available, which can vary depending on specimen age and preservation conditions, will determine if sequencing will be successful. The technique described here will greatly improve the value of museum collections by making many formalin-fixed specimens

  17. Transcriptome analysis of the silkworm (Bombyx mori) by high-throughput RNA sequencing.

    Science.gov (United States)

    Li, Yinü; Wang, Guozeng; Tian, Jian; Liu, Huifen; Yang, Huipeng; Yi, Yongzhu; Wang, Jinhui; Shi, Xiaofeng; Jiang, Feng; Yao, Bin; Zhang, Zhifang

    2012-01-01

    The domestic silkworm, Bombyx mori, is a model insect with important economic value for silk production that also acts as a bioreactor for biomaterial production. The functional complexity of the silkworm transcriptome has not yet been fully elucidated, although genomic sequencing and other tools have been widely used in its study. We explored the transcriptome of silkworm at different developmental stages using high-throughput paired-end RNA sequencing. A total of about 3.3 gigabases (Gb) of sequence was obtained, representing about a 7-fold coverage of the B. mori genome. From the reads that were mapped to the genome sequence; 23,461 transcripts were obtained, 5,428 of them were novel. Of the 14,623 predicted protein-coding genes in the silkworm genome database, 11,884 of them were found to be expressed in the silkworm transcriptome, giving a coverage of 81.3%. A total of 13,195 new exons were detected, of which, 5,911 were found in the annotated genes in the Silkworm Genome Database (SilkDB). An analysis of alternative splicing in the transcriptome revealed that 3,247 genes had undergone alternative splicing. To help with the data analysis, a transcriptome database that integrates our transcriptome data with the silkworm genome data was constructed and is publicly available at http://124.17.27.136/gbrowse2/. To our knowledge, this is the first study to elucidate the silkworm transcriptome using high-throughput RNA sequencing technology. Our data indicate that the transcriptome of silkworm is much more complex than previously anticipated. This work provides tools and resources for the identification of new functional elements and paves the way for future functional genomics studies.

  18. High-throughput genome sequencing of two Listeria monocytogenes clinical isolates during a large foodborne outbreak

    Directory of Open Access Journals (Sweden)

    Trout-Yakel Keri M

    2010-02-01

    Full Text Available Abstract Background A large, multi-province outbreak of listeriosis associated with ready-to-eat meat products contaminated with Listeria monocytogenes serotype 1/2a occurred in Canada in 2008. Subtyping of outbreak-associated isolates using pulsed-field gel electrophoresis (PFGE revealed two similar but distinct AscI PFGE patterns. High-throughput pyrosequencing of two L. monocytogenes isolates was used to rapidly provide the genome sequence of the primary outbreak strain and to investigate the extent of genetic diversity associated with a change of a single restriction enzyme fragment during PFGE. Results The chromosomes were collinear, but differences included 28 single nucleotide polymorphisms (SNPs and three indels, including a 33 kbp prophage that accounted for the observed difference in AscI PFGE patterns. The distribution of these traits was assessed within further clinical, environmental and food isolates associated with the outbreak, and this comparison indicated that three distinct, but highly related strains may have been involved in this nationwide outbreak. Notably, these two isolates were found to harbor a 50 kbp putative mobile genomic island encoding translocation and efflux functions that has not been observed in other Listeria genomes. Conclusions High-throughput genome sequencing provided a more detailed real-time assessment of genetic traits characteristic of the outbreak strains than could be achieved with routine subtyping methods. This study confirms that the latest generation of DNA sequencing technologies can be applied during high priority public health events, and laboratories need to prepare for this inevitability and assess how to properly analyze and interpret whole genome sequences in the context of molecular epidemiology.

  19. Improving High-Throughput Sequencing Approaches for Reconstructing the Evolutionary Dynamics of Upper Paleolithic Human Groups

    DEFF Research Database (Denmark)

    Seguin-Orlando, Andaine

    been mainly driven by the development of High-Throughput DNA Sequencing (HTS) technologies but also by the implementation of novel molecular tools tailored to the manipulation of ultra short and damaged DNA molecules. Our ability to retrieve traces of genetic material has tremendously improved, pushing...... work on admixture events between Neanderthals and anatomically modern humans and but also suggested that the latter were organized in small family units whose members avoided inbreeding....

  20. High Throughput Sequencing of Germline and Tumor from Men With Early-Onset Metastatic Prostate Cancer

    Science.gov (United States)

    2014-10-01

    challenge, Dr. Tomlins has continued to develop state of the art technologies to use formalin-fixed paraffin-embedded (FFPE) prostate cancer specimens...men with early-onset, metastatic prostate cancer PRINCIPAL INVESTIGATOR: Kathleen A. Cooney, M.D. CONTRACTING ORGANIZATION...High-Throughput Sequencing of Germline and Tumor From Men with Early-Onset Metastatic Prostate Cancer 5b. GRANT NUMBER W81XWH-13-1-0371 5c

  1. Predicting protein-protein interactions from sequence using correlation coefficient and high-quality interaction dataset.

    Science.gov (United States)

    Shi, Ming-Guang; Xia, Jun-Feng; Li, Xue-Ling; Huang, De-Shuang

    2010-03-01

    Identifying protein-protein interactions (PPIs) is critical for understanding the cellular function of the proteins and the machinery of a proteome. Data of PPIs derived from high-throughput technologies are often incomplete and noisy. Therefore, it is important to develop computational methods and high-quality interaction dataset for predicting PPIs. A sequence-based method is proposed by combining correlation coefficient (CC) transformation and support vector machine (SVM). CC transformation not only adequately considers the neighboring effect of protein sequence but describes the level of CC between two protein sequences. A gold standard positives (interacting) dataset MIPS Core and a gold standard negatives (non-interacting) dataset GO-NEG of yeast Saccharomyces cerevisiae were mined to objectively evaluate the above method and attenuate the bias. The SVM model combined with CC transformation yielded the best performance with a high accuracy of 87.94% using gold standard positives and gold standard negatives datasets. The source code of MATLAB and the datasets are available on request under smgsmg@mail.ustc.edu.cn.

  2. Exome sequencing identifies potential risk variants for Mendelian disorders at high prevalence in Qatar.

    Science.gov (United States)

    Rodriguez-Flores, Juan L; Fakhro, Khalid; Hackett, Neil R; Salit, Jacqueline; Fuller, Jennifer; Agosto-Perez, Francisco; Gharbiah, Maey; Malek, Joel A; Zirie, Mahmoud; Jayyousi, Amin; Badii, Ramin; Al-Nabet Al-Marri, Ajayeb; Chouchane, Lotfi; Stadler, Dora J; Mezey, Jason G; Crystal, Ronald G

    2014-01-01

    Exome sequencing of families of related individuals has been highly successful in identifying genetic polymorphisms responsible for Mendelian disorders. Here, we demonstrate the value of the reverse approach, where we use exome sequencing of a sample of unrelated individuals to analyze allele frequencies of known causal mutations for Mendelian diseases. We sequenced the exomes of 100 individuals representing the three major genetic subgroups of the Qatari population (Q1 Bedouin, Q2 Persian-South Asian, Q3 African) and identified 37 variants in 33 genes with effects on 36 clinically significant Mendelian diseases. These include variants not present in 1000 Genomes and variants at high frequency when compared with 1000 Genomes populations. Several of these Mendelian variants were only segregating in one Qatari subpopulation, where the observed subpopulation specificity trends were confirmed in an independent population of 386 Qataris. Premarital genetic screening in Qatar tests for only four out of the 37, such that this study provides a set of Mendelian disease variants with potential impact on the epidemiological profile of the population that could be incorporated into the testing program if further experimental and clinical characterization confirms high penetrance. © 2013 WILEY PERIODICALS, INC.

  3. Characterizing ncRNAs in human pathogenic protists using high-throughput sequencing technology

    Directory of Open Access Journals (Sweden)

    Lesley Joan Collins

    2011-12-01

    Full Text Available ncRNAs are key genes in many human diseases including cancer and viral infection, as well as providing critical functions in pathogenic organisms such as fungi, bacteria, viruses and protists. Until now the identification and characterization of ncRNAs associated with disease has been slow or inaccurate requiring many years of testing to understand complicated RNA and protein gene relationships. High-throughput sequencing now offers the opportunity to characterize miRNAs, siRNAs, snoRNAs and long ncRNAs on a genomic scale making it faster and easier to clarify how these ncRNAs contribute to the disease state. However, this technology is still relatively new, and ncRNA discovery is not an application of high priority for streamlined bioinformatics. Here we summarize background concepts and practical approaches for ncRNA analysis using high-throughput sequencing, and how it relates to understanding human disease. As a case study, we focus on the parasitic protists Giardia lamblia and Trichomonas vaginalis, where large evolutionary distance has meant difficulties in comparing ncRNAs with those from model eukaryotes. A combination of biological, computational and sequencing approaches has enabled easier classification of ncRNA classes such as snoRNAs, but has also aided the identification of novel classes. It is hoped that a higher level of understanding of ncRNA expression and interaction may aid in the development of less harsh treatment for protist-based diseases.

  4. Comparing the performance of three ancient DNA extraction methods for high-throughput sequencing.

    Science.gov (United States)

    Gamba, Cristina; Hanghøj, Kristian; Gaunitz, Charleen; Alfarhan, Ahmed H; Alquraishi, Saleh A; Al-Rasheid, Khaled A S; Bradley, Daniel G; Orlando, Ludovic

    2016-03-01

    The DNA molecules that can be extracted from archaeological and palaeontological remains are often degraded and massively contaminated with environmental microbial material. This reduces the efficacy of shotgun approaches for sequencing ancient genomes, despite the decreasing sequencing costs of high-throughput sequencing (HTS). Improving the recovery of endogenous molecules from the DNA extraction and purification steps could, thus, help advance the characterization of ancient genomes. Here, we apply the three most commonly used DNA extraction methods to five ancient bone samples spanning a ~30 thousand year temporal range and originating from a diversity of environments, from South America to Alaska. We show that methods based on the purification of DNA fragments using silica columns are more advantageous than in solution methods and increase not only the total amount of DNA molecules retrieved but also the relative importance of endogenous DNA fragments and their molecular diversity. Therefore, these methods provide a cost-effective solution for downstream applications, including DNA sequencing on HTS platforms. © 2015 John Wiley & Sons Ltd.

  5. Facile, High Quality Sequencing of Bacterial Genomes from Small Amounts of DNA

    Directory of Open Access Journals (Sweden)

    Momchilo Vuyisich

    2014-01-01

    Full Text Available Sequencing bacterial genomes has traditionally required large amounts of genomic DNA (~1 μg. There have been few studies to determine the effects of the input DNA amount or library preparation method on the quality of sequencing data. Several new commercially available library preparation methods enable shotgun sequencing from as little as 1 ng of input DNA. In this study, we evaluated the NEBNext Ultra library preparation reagents for sequencing bacterial genomes. We have evaluated the utility of NEBNext Ultra for resequencing and de novo assembly of four bacterial genomes and compared its performance with the TruSeq library preparation kit. The NEBNext Ultra reagents enable high quality resequencing and de novo assembly of a variety of bacterial genomes when using 100 ng of input genomic DNA. For the two most challenging genomes (Burkholderia spp., which have the highest GC content and are the longest, we also show that the quality of both resequencing and de novo assembly is not decreased when only 10 ng of input genomic DNA is used.

  6. High intraindividual variation in internal transcibed spacer sequences in Aeschynanthus (Gesneriaceae): implications for phylogenetics.

    Science.gov (United States)

    Denduangboripant, J; Cronk, Q C

    2000-07-22

    Aeschynanthus (Gesneriaceae) is a large genus of tropical epiphytes that is widely distributed from the Himalayas and China throughout South-East Asia to New Guinea and the Solomon Islands. Polymerase chain reaction (PCR) consensus sequences of the internal transcribed spacers (ITS) of Aeschynanthus nuclear ribosomal DNA showed sequence polymorphism that was difficult to interpret. Cloning individual sequences from the PCR product generated a phylogenetic tree of 23 Aeschynanthus species (two clones per species). The intraindividual clone pairs varied from 0 to 5.01%. We suggest that the high intraindividual sequence variation results from low molecular drive in the ITS of Aeschynanthus. However, this study shows that, despite the variation found within some individuals, it is still possible to use these data to reconstruct phylogenetic relationships of the species, suggesting that clone variation, although persistent, does not pre-date the divergence of Aeschynanthus species. The Aeschynanthus analysis revealed two major clades with different but overlapping geographic distributions and reflected classification based on morphology (particularly seed hair type).

  7. De Novo Peptide Sequencing: Deep Mining of High-Resolution Mass Spectrometry Data.

    Science.gov (United States)

    Islam, Mohammad Tawhidul; Mohamedali, Abidali; Fernandes, Criselda Santan; Baker, Mark S; Ranganathan, Shoba

    2017-01-01

    High resolution mass spectrometry has revolutionized proteomics over the past decade, resulting in tremendous amounts of data in the form of mass spectra, being generated in a relatively short span of time. The mining of this spectral data for analysis and interpretation though has lagged behind such that potentially valuable data is being overlooked because it does not fit into the mold of traditional database searching methodologies. Although the analysis of spectra by de novo sequences removes such biases and has been available for a long period of time, its uptake has been slow or almost nonexistent within the scientific community. In this chapter, we propose a methodology to integrate de novo peptide sequencing using three commonly available software solutions in tandem, complemented by homology searching, and manual validation of spectra. This simplified method would allow greater use of de novo sequencing approaches and potentially greatly increase proteome coverage leading to the unearthing of valuable insights into protein biology, especially of organisms whose genomes have been recently sequenced or are poorly annotated.

  8. End-to-End Optimization of High-Throughput DNA Sequencing.

    Science.gov (United States)

    O'Reilly, Eliza; Baccelli, Francois; De Veciana, Gustavo; Vikalo, Haris

    2016-10-01

    At the core of Illumina's high-throughput DNA sequencing platforms lies a biophysical surface process that results in a random geometry of clusters of homogeneous short DNA fragments typically hundreds of base pairs long-bridge amplification. The statistical properties of this random process and the lengths of the fragments are critical as they affect the information that can be subsequently extracted, that is, density of successfully inferred DNA fragment reads. The ensembles of overlapping DNA fragment reads are then used to computationally reconstruct the much longer target genome sequence. The success of the reconstruction in turn depends on having a sufficiently large ensemble of DNA fragments that are sufficiently long. In this article using stochastic geometry, we model and optimize the end-to-end flow cell synthesis and target genome sequencing process, linking and partially controlling the statistics of the physical processes to the success of the final computational step. Based on a rough calibration of our model, we provide, for the first time, a mathematical framework capturing the salient features of the sequencing platform that serves as a basis for optimizing cost, performance, and/or sensitivity analysis to various parameters.

  9. Investigation of Human Cancers for Retrovirus by Low-Stringency Target Enrichment and High-Throughput Sequencing

    DEFF Research Database (Denmark)

    Vinner, Lasse; Mourier, Tobias; Friis-Nielsen, Jens;

    2015-01-01

    Although nearly one fifth of all human cancers have an infectious aetiology, the causes for the majority of cancers remain unexplained. Despite the enormous data output from high-throughput shotgun sequencing, viral DNA in a clinical sample typically constitutes a proportion of host DNA that is too......-stringency in-solution hybridization method enables detection of discovery of hitherto unknown viral sequences by high-throughput sequencing. The sensitivity was sufficient to detect retroviral...

  10. Confidence sets for network structure

    CERN Document Server

    Airoldi, Edoardo M; Wolfe, Patrick J

    2011-01-01

    Latent variable models are frequently used to identify structure in dichotomous network data, in part because they give rise to a Bernoulli product likelihood that is both well understood and consistent with the notion of exchangeable random graphs. In this article we propose conservative confidence sets that hold with respect to these underlying Bernoulli parameters as a function of any given partition of network nodes, enabling us to assess estimates of 'residual' network structure, that is, structure that cannot be explained by known covariates and thus cannot be easily verified by manual inspection. We demonstrate the proposed methodology by analyzing student friendship networks from the National Longitudinal Survey of Adolescent Health that include race, gender, and school year as covariates. We employ a stochastic expectation-maximization algorithm to fit a logistic regression model that includes these explanatory variables as well as a latent stochastic blockmodel component and additional node-specific...

  11. High-Throughput Analysis of T-DNA Location and Structure Using Sequence Capture.

    Directory of Open Access Journals (Sweden)

    Soichi Inagaki

    Full Text Available Agrobacterium-mediated transformation of plants with T-DNA is used both to introduce transgenes and for mutagenesis. Conventional approaches used to identify the genomic location and the structure of the inserted T-DNA are laborious and high-throughput methods using next-generation sequencing are being developed to address these problems. Here, we present a cost-effective approach that uses sequence capture targeted to the T-DNA borders to select genomic DNA fragments containing T-DNA-genome junctions, followed by Illumina sequencing to determine the location and junction structure of T-DNA insertions. Multiple probes can be mixed so that transgenic lines transformed with different T-DNA types can be processed simultaneously, using a simple, index-based pooling approach. We also developed a simple bioinformatic tool to find sequence read pairs that span the junction between the genome and T-DNA or any foreign DNA. We analyzed 29 transgenic lines of Arabidopsis thaliana, each containing inserts from 4 different T-DNA vectors. We determined the location of T-DNA insertions in 22 lines, 4 of which carried multiple insertion sites. Additionally, our analysis uncovered a high frequency of unconventional and complex T-DNA insertions, highlighting the needs for high-throughput methods for T-DNA localization and structural characterization. Transgene insertion events have to be fully characterized prior to use as commercial products. Our method greatly facilitates the first step of this characterization of transgenic plants by providing an efficient screen for the selection of promising lines.

  12. High penetrance of sequencing errors and interpretative shortcomings in mtDNA sequence analysis of LHON patients.

    Science.gov (United States)

    Bandelt, Hans-Jürgen; Yao, Yong-Gang; Salas, Antonio; Kivisild, Toomas; Bravi, Claudio M

    2007-01-12

    For identifying mutation(s) that are potentially pathogenic it is essential to determine the entire mitochondrial DNA (mtDNA) sequences from patients suffering from a particular mitochondrial disease, such as Leber hereditary optic neuropathy (LHON). However, such sequencing efforts can, in the worst case, be riddled with errors by imposing phantom mutations or misreporting variant nucleotides, and moreover, by inadvertently regarding some mutations as novel and pathogenic, which are actually known to define minor haplogroups. Under such circumstances it remains unclear whether the disease-associated mutations would have been determined adequately. Here, we re-analyse four problematic LHON studies and propose guidelines by which some of the pitfalls could be avoided.

  13. RIASEC Interest and Confidence Cutoff Scores: Implications for Career Counseling

    Science.gov (United States)

    Bonitz, Verena S.; Armstrong, Patrick Ian; Larson, Lisa M.

    2010-01-01

    One strategy commonly used to simplify the joint interpretation of interest and confidence inventories is the use of cutoff scores to classify individuals dichotomously as having high or low levels of confidence and interest, respectively. The present study examined the adequacy of cutoff scores currently recommended for the joint interpretation…

  14. nigerian students' self-confidence in responding to statements of ...

    African Journals Online (AJOL)

    Temechegn

    The goal of the study was to find out the self-confidence and confidence level of senior ... Specifically, chemistry teachers ask students this question when an ... high school students from connecting with scientific principles in the way ... chemical reaction, ability to identify factors that affect equilibrium reactions and ability to.

  15. Highly sensitive, non-invasive detection of colorectal cancer mutations using single molecule, third generation sequencing

    Directory of Open Access Journals (Sweden)

    Giancarlo Russo

    2015-12-01

    We present the first study that applies the high read accuracy and depth of single molecule, real time, circular consensus sequencing (SMRT-CCS to the detection of mutations in stool DNA in order to provide a non-invasive, sensitive and accurate test for CRC. In stool DNA isolated from patients diagnosed with adenocarcinoma, we are able to detect mutations at frequencies below 0.5% with no false positives. This approach establishes a foundation for a non-invasive, highly sensitive assay to screen the population for CRC and the early stage adenomas that lead to CRC.

  16. High-resolution analysis of the 5'-end transcriptome using a next generation DNA sequencer.

    Directory of Open Access Journals (Sweden)

    Shin-ichi Hashimoto

    Full Text Available Massively parallel, tag-based sequencing systems, such as the SOLiD system, hold the promise of revolutionizing the study of whole genome gene expression due to the number of data points that can be generated in a simple and cost-effective manner. We describe the development of a 5'-end transcriptome workflow for the SOLiD system and demonstrate the advantages in sensitivity and dynamic range offered by this tag-based application over traditional approaches for the study of whole genome gene expression. 5'-end transcriptome analysis was used to study whole genome gene expression within a colon cancer cell line, HT-29, treated with the DNA methyltransferase inhibitor, 5-aza-2'-deoxycytidine (5Aza. More than 20 million 25-base 5'-end tags were obtained from untreated and 5Aza-treated cells and matched to sequences within the human genome. Seventy three percent of the mapped unique tags were associated with RefSeq cDNA sequences, corresponding to approximately 14,000 different protein-coding genes in this single cell type. The level of expression of these genes ranged from 0.02 to 4,704 transcripts per cell. The sensitivity of a single sequence run of the SOLiD platform was 100-1,000 fold greater than that observed from 5'end SAGE data generated from the analysis of 70,000 tags obtained by Sanger sequencing. The high-resolution 5'end gene expression profiling presented in this study will not only provide novel insight into the transcriptional machinery but should also serve as a basis for a better understanding of cell biology.

  17. Development of a multilocus sequence typing tool for high-resolution genotyping of Enterocytozoon bieneusi.

    Science.gov (United States)

    Feng, Yaoyu; Li, Na; Dearen, Theresa; Lobo, Maria L; Matos, Olga; Cama, Vitaliano; Xiao, Lihua

    2011-07-01

    Thus far, genotyping of Enterocytozoon bieneusi has been based solely on DNA sequence analysis of the internal transcribed spacer (ITS) of the rRNA gene. Both host-adapted and zoonotic (human-pathogenic) genotypes of E. bieneusi have been identified. In this study, we searched for microsatellite and minisatellite sequences in the whole-genome sequence database of E. bieneusi isolate H348. Seven potential targets (MS1 to MS7) were identified. Testing of the seven targets by PCR using two human-pathogenic E. bieneusi genotypes (A and Peru10) led to the selection of four targets (MS1, MS3, MS4, and MS7). Further analysis of the four loci with an additional 24 specimens of both host-adapted and zoonotic E. bieneusi genotypes indicated that most host-adapted genotypes were not amplified by PCR targeting these loci. In contrast, 10 or 11 of the 13 specimens of the zoonotic genotypes were amplified by PCR at each locus. Altogether, 12, 8, 7, and 11 genotypes of were identified at MS1, MS3, MS4, and MS7, respectively. Phylogenetic analysis of the nucleotide sequences obtained produced a genetic relationship that was similar to the one at the ITS locus, with the formation of a large group of zoonotic genotypes that included most E. bieneusi genotypes in humans. Thus, a multilocus sequence typing tool was developed for high-resolution genotyping of E. bieneusi. Data obtained in the study should also have implications for understanding the taxonomy of Enterocytozoon spp., the public health significance of E. bieneusi in animals, and the sources of human E. bieneusi infections.

  18. High resolution profiling of human exon methylation by liquid hybridization capture-based bisulfite sequencing

    Directory of Open Access Journals (Sweden)

    Wang Junwen

    2011-12-01

    Full Text Available Abstract Background DNA methylation plays important roles in gene regulation during both normal developmental and disease states. In the past decade, a number of methods have been developed and applied to characterize the genome-wide distribution of DNA methylation. Most of these methods endeavored to screen whole genome and turned to be enormously costly and time consuming for studies of the complex mammalian genome. Thus, they are not practical for researchers to study multiple clinical samples in biomarker research. Results Here, we display a novel strategy that relies on the selective capture of target regions by liquid hybridization followed by bisulfite conversion and deep sequencing, which is referred to as liquid hybridization capture-based bisulfite sequencing (LHC-BS. To estimate this method, we utilized about 2 μg of native genomic DNA from YanHuang (YH whole blood samples and a mature dendritic cell (mDC line, respectively, to evaluate their methylation statuses of target regions of exome. The results indicated that the LHC-BS system was able to cover more than 97% of the exome regions and detect their methylation statuses with acceptable allele dropouts. Most of the regions that couldn't provide accurate methylation information were distributed in chromosomes 6 and Y because of multiple mapping to those regions. The accuracy of this strategy was evaluated by pair-wise comparisons using the results from whole genome bisulfite sequencing and validated by bisulfite specific PCR sequencing. Conclusions In the present study, we employed a liquid hybridisation capture system to enrich for exon regions and then combined with bisulfite sequencing to examine the methylation statuses for the first time. This technique is highly sensitive and flexible and can be applied to identify differentially methylated regions (DMRs at specific genomic locations of interest, such as regulatory elements or promoters.

  19. High throughput whole rumen metagenome profiling using untargeted massively parallel sequencing

    Directory of Open Access Journals (Sweden)

    Ross Elizabeth M

    2012-07-01

    Full Text Available Abstract Background Variation of microorganism communities in the rumen of cattle (Bos taurus is of great interest because of possible links to economically or environmentally important traits, such as feed conversion efficiency or methane emission levels. The resolution of studies investigating this variation may be improved by utilizing untargeted massively parallel sequencing (MPS, that is, sequencing without targeted amplification of genes. The objective of this study was to develop a method which used MPS to generate “rumen metagenome profiles”, and to investigate if these profiles were repeatable among samples taken from the same cow. Given faecal samples are much easier to obtain than rumen fluid samples; we also investigated whether rumen metagenome profiles were predictive of faecal metagenome profiles. Results Rather than focusing on individual organisms within the rumen, our method used MPS data to generate quantitative rumen micro-biome profiles, regardless of taxonomic classifications. The method requires a previously assembled reference metagenome. A number of such reference metagenomes were considered, including two rumen derived metagenomes, a human faecal microflora metagenome and a reference metagenome made up of publically available prokaryote sequences. Sequence reads from each test sample were aligned to these references. The “rumen metagenome profile” was generated from the number of the reads that aligned to each contig in the database. We used this method to test the hypothesis that rumen fluid microbial community profiles vary more between cows than within multiple samples from the same cow. Rumen fluid samples were taken from three cows, at three locations within the rumen. DNA from the samples was sequenced on the Illumina GAIIx. When the reads were aligned to a rumen metagenome reference, the rumen metagenome profiles were repeatable (P  Conclusions We have presented a simple and high throughput method of

  20. High-throughput sequencing-based analysis of endogenetic fungal communities inhabiting the Chinese Cordyceps reveals unexpectedly high fungal diversity.

    Science.gov (United States)

    Xia, Fei; Chen, Xin; Guo, Meng-Yuan; Bai, Xiao-Hui; Liu, Yan; Shen, Guang-Rong; Li, Yu-Ling; Lin, Juan; Zhou, Xuan-Wei

    2016-09-14

    Chinese Cordyceps, known in Chinese as "DongChong XiaCao", is a parasitic complex of a fungus (Ophiocordyceps sinensis) and a caterpillar. The current study explored the endogenetic fungal communities inhabiting Chinese Cordyceps. Samples were collected from five different geographical regions of Qinghai and Tibet, and the nuclear ribosomal internal transcribed spacer-1 sequences from each sample were obtained using Illumina high-throughput sequencing. The results showed that Ascomycota was the dominant fungal phylum in Chinese Cordyceps and its soil microhabitat from different sampling regions. Among the Ascomycota, 65 genera were identified, and the abundant operational taxonomic units showed the strongest sequence similarity to Ophiocordyceps, Verticillium, Pseudallescheria, Candida and Ilyonectria Not surprisingly, the genus Ophiocordyceps was the largest among the fungal communities identified in the fruiting bodies and external mycelial cortices of Chinese Cordyceps. In addition, fungal communities in the soil microhabitats were clustered separately from the external mycelial cortices and fruiting bodies of Chinese Cordyceps from different sampling regions. There was no significant structural difference in the fungal communities between the fruiting bodies and external mycelial cortices of Chinese Cordyceps. This study revealed an unexpectedly high diversity of fungal communities inhabiting the Chinese Cordyceps and its microhabitats.

  1. Investigation of a Quadruplex-Forming Repeat Sequence Highly Enriched in Xanthomonas and Nostoc sp.

    Directory of Open Access Journals (Sweden)

    Charlotte Rehm

    Full Text Available In prokaryotes simple sequence repeats (SSRs with unit sizes of 1-5 nucleotides (nt are causative for phase and antigenic variation. Although an increased abundance of heptameric repeats was noticed in bacteria, reports about SSRs of 6-9 nt are rare. In particular G-rich repeat sequences with the propensity to fold into G-quadruplex (G4 structures have received little attention. In silico analysis of prokaryotic genomes show putative G4 forming sequences to be abundant. This report focuses on a surprisingly enriched G-rich repeat of the type GGGNATC in Xanthomonas and cyanobacteria such as Nostoc. We studied in detail the genomes of Xanthomonas campestris pv. campestris ATCC 33913 (Xcc, Xanthomonas axonopodis pv. citri str. 306 (Xac, and Nostoc sp. strain PCC7120 (Ana. In all three organisms repeats are spread all over the genome with an over-representation in non-coding regions. Extensive variation of the number of repetitive units was observed with repeat numbers ranging from two up to 26 units. However a clear preference for four units was detected. The strong bias for four units coincides with the requirement of four consecutive G-tracts for G4 formation. Evidence for G4 formation of the consensus repeat sequences was found in biophysical studies utilizing CD spectroscopy. The G-rich repeats are preferably located between aligned open reading frames (ORFs and are under-represented in coding regions or between divergent ORFs. The G-rich repeats are preferentially located within a distance of 50 bp upstream of an ORF on the anti-sense strand or within 50 bp from the stop codon on the sense strand. Analysis of whole transcriptome sequence data showed that the majority of repeat sequences are transcribed. The genetic loci in the vicinity of repeat regions show increased genomic stability. In conclusion, we introduce and characterize a special class of highly abundant and wide-spread quadruplex-forming repeat sequences in bacteria.

  2. Single nucleus genome sequencing reveals high similarity among nuclei of an endomycorrhizal fungus.

    Directory of Open Access Journals (Sweden)

    Kui Lin

    2014-01-01

    Full Text Available Nuclei of arbuscular endomycorrhizal fungi have been described as highly diverse due to their asexual nature and absence of a single cell stage with only one nucleus. This has raised fundamental questions concerning speciation, selection and transmission of the genetic make-up to next generations. Although this concept has become textbook knowledge, it is only based on studying a few loci, including 45S rDNA. To provide a more comprehensive insight into the genetic makeup of arbuscular endomycorrhizal fungi, we applied de novo genome sequencing of individual nuclei of Rhizophagus irregularis. This revealed a surprisingly low level of polymorphism between nuclei. In contrast, within a nucleus, the 45S rDNA repeat unit turned out to be highly diverged. This finding demystifies a long-lasting hypothesis on the complex genetic makeup of arbuscular endomycorrhizal fungi. Subsequent genome assembly resulted in the first draft reference genome sequence of an arbuscular endomycorrhizal fungus. Its length is 141 Mbps, representing over 27,000 protein-coding gene models. We used the genomic sequence to reinvestigate the phylogenetic relationships of Rhizophagus irregularis with other fungal phyla. This unambiguously demonstrated that Glomeromycota are more closely related to Mucoromycotina than to its postulated sister Dikarya.

  3. Single Nucleus Genome Sequencing Reveals High Similarity among Nuclei of an Endomycorrhizal Fungus

    Science.gov (United States)

    Zhang, Zhonghua; Ivanov, Sergey; Saunders, Diane G. O.; Mu, Desheng; Pang, Erli; Cao, Huifen; Cha, Hwangho; Lin, Tao; Zhou, Qian; Shang, Yi; Li, Ying; Sharma, Trupti; van Velzen, Robin; de Ruijter, Norbert; Aanen, Duur K.; Win, Joe; Kamoun, Sophien; Bisseling, Ton; Geurts, René; Huang, Sanwen

    2014-01-01

    Nuclei of arbuscular endomycorrhizal fungi have been described as highly diverse due to their asexual nature and absence of a single cell stage with only one nucleus. This has raised fundamental questions concerning speciation, selection and transmission of the genetic make-up to next generations. Although this concept has become textbook knowledge, it is only based on studying a few loci, including 45S rDNA. To provide a more comprehensive insight into the genetic makeup of arbuscular endomycorrhizal fungi, we applied de novo genome sequencing of individual nuclei of Rhizophagus irregularis. This revealed a surprisingly low level of polymorphism between nuclei. In contrast, within a nucleus, the 45S rDNA repeat unit turned out to be highly diverged. This finding demystifies a long-lasting hypothesis on the complex genetic makeup of arbuscular endomycorrhizal fungi. Subsequent genome assembly resulted in the first draft reference genome sequence of an arbuscular endomycorrhizal fungus. Its length is 141 Mbps, representing over 27,000 protein-coding gene models. We used the genomic sequence to reinvestigate the phylogenetic relationships of Rhizophagus irregularis with other fungal phyla. This unambiguously demonstrated that Glomeromycota are more closely related to Mucoromycotina than to its postulated sister Dikarya. PMID:24415955

  4. A Statistical Method for Assessing Peptide Identification Confidence in Accurate Mass and Time Tag Proteomics

    Energy Technology Data Exchange (ETDEWEB)

    Stanley, Jeffrey R.; Adkins, Joshua N.; Slysz, Gordon W.; Monroe, Matthew E.; Purvine, Samuel O.; Karpievitch, Yuliya V.; Anderson, Gordon A.; Smith, Richard D.; Dabney, Alan R.

    2011-07-15

    High-throughput proteomics is rapidly evolving to require high mass measurement accuracy for a variety of different applications. Increased mass measurement accuracy in bottom-up proteomics specifically allows for an improved ability to distinguish and characterize detected MS features, which may in turn be identified by, e.g., matching to entries in a database for both precursor and fragmentation mass identification methods. Many tools exist with which to score the identification of peptides from LC-MS/MS measurements or to assess matches to an accurate mass and time (AMT) tag database, but these two calculations remain distinctly unrelated. Here we present a statistical method, Statistical Tools for AMT tag Confidence (STAC), which extends our previous work incorporating prior probabilities of correct sequence identification from LC-MS/MS, as well as the quality with which LC-MS features match AMT tags, to evaluate peptide identification confidence. Compared to existing tools, we are able to obtain significantly more high-confidence peptide identifications at a given false discovery rate and additionally assign confidence estimates to individual peptide identifications. Freely available software implementations of STAC are available in both command line and as a Windows graphical application.

  5. High throughput sequencing in mice: a platform comparison identifies a preponderance of cryptic SNPs

    Directory of Open Access Journals (Sweden)

    Darakjian Priscila

    2009-08-01

    Full Text Available Abstract Background Allelic variation is the cornerstone of genetically determined differences in gene expression, gene product structure, physiology, and behavior. However, allelic variation, particularly cryptic (unknown or not annotated variation, is problematic for follow up analyses. Polymorphisms result in a high incidence of false positive and false negative results in hybridization based analyses and hinder the identification of the true variation underlying genetically determined differences in physiology and behavior. Given the proliferation of mouse genetic models (e.g., knockout models, selectively bred lines, heterogeneous stocks derived from standard inbred strains and wild mice and the wealth of gene expression microarray and phenotypic studies using genetic models, the impact of naturally-occurring polymorphisms on these data is critical. With the advent of next-generation, high-throughput sequencing, we are now in a position to determine to what extent polymorphisms are currently cryptic in such models and their impact on downstream analyses. Results We sequenced the two most commonly used inbred mouse strains, DBA/2J and C57BL/6J, across a region of chromosome 1 (171.6 – 174.6 megabases using two next generation high-throughput sequencing platforms: Applied Biosystems (SOLiD and Illumina (Genome Analyzer. Using the same templates on both platforms, we compared realignments and single nucleotide polymorphism (SNP detection with an 80 fold average read depth across platforms and samples. While public datasets currently annotate 4,527 SNPs between the two strains in this interval, thorough high-throughput sequencing identified a total of 11,824 SNPs in the interval, including 7,663 new SNPs. Furthermore, we confirmed 40 missense SNPs and discovered 36 new missense SNPs. Conclusion Comparisons utilizing even two of the best characterized mouse genetic models, DBA/2J and C57BL/6J, indicate that more than half of naturally

  6. Localization of a new highly repeated DNA sequence of Lemur cafta (Lemuridae, Strepsirhini).

    Science.gov (United States)

    Boniotto, Michele; Ventura, Mario; Cardone, Maria Francesca; Boaretto, Francesca; Archidiacono, Nicoletta; Rocchi, Mariano; Crovella, Sergio

    2002-10-01

    We have isolated and cloned an 800-bp highly repeated DNA (HRDNA) sequence from Lemur catta (LCA) and described its localization on LCA chromosomes. Lemur catta HRDNA sequences were localized by performing FISH experiments on standard and elongated metaphasic chromosomes using an LCA HRDNA probe (LCASAT). A complex hybridization pattern was detected. A strong pericentromeric hybridization signal was observed on most LCA chromosomes. Chromosomes 7 and 13 were lit in pericentromeric regions, as well as in the interspersed heterochromatin. Chromosomes 1, 3, 4, 17, 19, X, and microchromosomes (20, 25, 26, and 27) showed no signals in the pericentromeric region, but chromosomes 3 and 4 showed a positive hybridization in heterochromatic regions. The 800-bp L catta HRDNA was species specific. We performed FISH experiments with the LCASAT probe on Eulemur macaco macaco (EMA) and Eulemur fulvus fulvus (EFU) metaphases and no positive signal of hybridization was detected. These findings were also confirmed by Southern blot analysis and PCR.

  7. Exploring Genetic Diversity in Plants Using High-Throughput Sequencing Techniques.

    Science.gov (United States)

    Onda, Yoshihiko; Mochida, Keiichi

    2016-08-01

    Food security has emerged as an urgent concern because of the rising world population. To meet the food demands of the near future, it is required to improve the productivity of various crops, not just of staple food crops. The genetic diversity among plant populations in a given species allows the plants to adapt to various environmental conditions. Such diversity could therefore yield valuable traits that could overcome the food-security challenges. To explore genetic diversity comprehensively and to rapidly identify useful genes and/or allele, advanced high-throughput sequencing techniques, also called next-generation sequencing (NGS) technologies, have been developed. These provide practical solutions to the challenges in crop genomics. Here, we review various sources of genetic diversity in plants, newly developed genetic diversity-mining tools synergized with NGS techniques, and related genetic approaches such as quantitative trait locus analysis and genome-wide association study.

  8. Statistical assignment of DNA sequences using Bayesian phylogenetics

    DEFF Research Database (Denmark)

    Terkelsen, Kasper Munch; Boomsma, Wouter Krogh; Huelsenbeck, John P.;

    2008-01-01

    -analysis of previously published ancient DNA data and show that, with high statistical confidence, most of the published sequences are in fact of Neanderthal origin. However, there are several cases of chimeric sequences that are comprised of a combination of both Neanderthal and modern human DNA....

  9. Accurate molecular diagnosis of phenylketonuria and tetrahydrobiopterin-deficient hyperphenylalaninemias using high-throughput targeted sequencing

    Science.gov (United States)

    Trujillano, Daniel; Perez, Belén; González, Justo; Tornador, Cristian; Navarrete, Rosa; Escaramis, Georgia; Ossowski, Stephan; Armengol, Lluís; Cornejo, Verónica; Desviat, Lourdes R; Ugarte, Magdalena; Estivill, Xavier

    2014-01-01

    Genetic diagnostics of phenylketonuria (PKU) and tetrahydrobiopterin (BH4) deficient hyperphenylalaninemia (BH4DH) rely on methods that scan for known mutations or on laborious molecular tools that use Sanger sequencing. We have implemented a novel and much more efficient strategy based on high-throughput multiplex-targeted resequencing of four genes (PAH, GCH1, PTS, and QDPR) that, when affected by loss-of-function mutations, cause PKU and BH4DH. We have validated this approach in a cohort of 95 samples with the previously known PAH, GCH1, PTS, and QDPR mutations and one control sample. Pooled barcoded DNA libraries were enriched using a custom NimbleGen SeqCap EZ Choice array and sequenced using a HiSeq2000 sequencer. The combination of several robust bioinformatics tools allowed us to detect all known pathogenic mutations (point mutations, short insertions/deletions, and large genomic rearrangements) in the 95 samples, without detecting spurious calls in these genes in the control sample. We then used the same capture assay in a discovery cohort of 11 uncharacterized HPA patients using a MiSeq sequencer. In addition, we report the precise characterization of the breakpoints of four genomic rearrangements in PAH, including a novel deletion of 899 bp in intron 3. Our study is a proof-of-principle that high-throughput-targeted resequencing is ready to substitute classical molecular methods to perform differential genetic diagnosis of hyperphenylalaninemias, allowing the establishment of specifically tailored treatments a few days after birth. PMID:23942198

  10. SSR_pipeline--computer software for the identification of microsatellite sequences from paired-end Illumina high-throughput DNA sequence data

    Science.gov (United States)

    Miller, Mark P.; Knaus, Brian J.; Mullins, Thomas D.; Haig, Susan M.

    2013-01-01

    SSR_pipeline is a flexible set of programs designed to efficiently identify simple sequence repeats (SSRs; for example, microsatellites) from paired-end high-throughput Illumina DNA sequencing data. The program suite contains three analysis modules along with a fourth control module that can be used to automate analyses of large volumes of data. The modules are used to (1) identify the subset of paired-end sequences that pass quality standards, (2) align paired-end reads into a single composite DNA sequence, and (3) identify sequences that possess microsatellites conforming to user specified parameters. Each of the three separate analysis modules also can be used independently to provide greater flexibility or to work with FASTQ or FASTA files generated from other sequencing platforms (Roche 454, Ion Torrent, etc). All modules are implemented in the Python programming language and can therefore be used from nearly any computer operating system (Linux, Macintosh, Windows). The program suite relies on a compiled Python extension module to perform paired-end alignments. Instructions for compiling the extension from source code are provided in the documentation. Users who do not have Python installed on their computers or who do not have the ability to compile software also may choose to download packaged executable files. These files include all Python scripts, a copy of the compiled extension module, and a minimal installation of Python in a single binary executable. See program documentation for more information.

  11. Properties of frequentist confidence levels derivatives

    CERN Document Server

    Martínez, Miriam Lucio; Dettori, Francesco

    2016-01-01

    In high energy physics, results from searches for new particles or rare processes are often reported using a modified frequentist approach, known as $\\rm{CL_s}$ method. In this paper, we study the properties of the derivatives of $\\rm{CL_s}$ and $\\rm{CL_{s+b}}$ as signal strength estimators if the confidence levels are interpreted as credible intervals. Our approach allows obtaining best fit points and $\\chi^2$ functions which can be used for phenomenology studies. In addition, this approach can be used to incorporate $\\rm{CL_s}$ results into Bayesian combinations.

  12. Robust misinterpretation of confidence intervals.

    Science.gov (United States)

    Hoekstra, Rink; Morey, Richard D; Rouder, Jeffrey N; Wagenmakers, Eric-Jan

    2014-10-01

    Null hypothesis significance testing (NHST) is undoubtedly the most common inferential technique used to justify claims in the social sciences. However, even staunch defenders of NHST agree that its outcomes are often misinterpreted. Confidence intervals (CIs) have frequently been proposed as a more useful alternative to NHST, and their use is strongly encouraged in the APA Manual. Nevertheless, little is known about how researchers interpret CIs. In this study, 120 researchers and 442 students-all in the field of psychology-were asked to assess the truth value of six particular statements involving different interpretations of a CI. Although all six statements were false, both researchers and students endorsed, on average, more than three statements, indicating a gross misunderstanding of CIs. Self-declared experience with statistics was not related to researchers' performance, and, even more surprisingly, researchers hardly outperformed the students, even though the students had not received any education on statistical inference whatsoever. Our findings suggest that many researchers do not know the correct interpretation of a CI. The misunderstandings surrounding p-values and CIs are particularly unfortunate because they constitute the main tools by which psychologists draw conclusions from data.

  13. HaploGrep 2: mitochondrial haplogroup classification in the era of high-throughput sequencing

    Science.gov (United States)

    Weissensteiner, Hansi; Pacher, Dominic; Kloss-Brandstätter, Anita; Forer, Lukas; Specht, Günther; Bandelt, Hans-Jürgen; Kronenberg, Florian; Salas, Antonio; Schönherr, Sebastian

    2016-01-01

    Mitochondrial DNA (mtDNA) profiles can be classified into phylogenetic clusters (haplogroups), which is of great relevance for evolutionary, forensic and medical genetics. With the extensive growth of the underlying phylogenetic tree summarizing the published mtDNA sequences, the manual process of haplogroup classification would be too time-consuming. The previously published classification tool HaploGrep provided an automatic way to address this issue. Here, we present the completely updated version HaploGrep 2 offering several advanced features, including a generic rule-based system for immediate quality control (QC). This allows detecting artificial recombinants and missing variants as well as annotating rare and phantom mutations. Furthermore, the handling of high-throughput data in form of VCF files is now directly supported. For data output, several graphical reports are generated in real time, such as a multiple sequence alignment format, a VCF format and extended haplogroup QC reports, all viewable directly within the application. In addition, HaploGrep 2 generates a publication-ready phylogenetic tree of all input samples encoded relative to the revised Cambridge Reference Sequence. Finally, new distance measures and optimizations of the algorithm increase accuracy and speed-up the application. HaploGrep 2 can be accessed freely and without any registration at http://haplogrep.uibk.ac.at. PMID:27084951

  14. High-throughput nucleotide sequence analysis of diverse bacterial communities in leachates of decomposing pig carcasses

    Directory of Open Access Journals (Sweden)

    Seung Hak Yang

    2015-09-01

    Full Text Available The leachate generated by the decomposition of animal carcass has been implicated as an environmental contaminant surrounding the burial site. High-throughput nucleotide sequencing was conducted to investigate the bacterial communities in leachates from the decomposition of pig carcasses. We acquired 51,230 reads from six different samples (1, 2, 3, 4, 6 and 14 week-old carcasses and found that sequences representing the phylum Firmicutes predominated. The diversity of bacterial 16S rRNA gene sequences in the leachate was the highest at 6 weeks, in contrast to those at 2 and 14 weeks. The relative abundance of Firmicutes was reduced, while the proportion of Bacteroidetes and Proteobacteria increased from 3–6 weeks. The representation of phyla was restored after 14 weeks. However, the community structures between the samples taken at 1–2 and 14 weeks differed at the bacterial classification level. The trend in pH was similar to the changes seen in bacterial communities, indicating that the pH of the leachate could be related to the shift in the microbial community. The results indicate that the composition of bacterial communities in leachates of decomposing pig carcasses shifted continuously during the study period and might be influenced by the burial site.

  15. Analysis of transposable elements in the genome of Asparagus officinalis from high coverage sequence data.

    Science.gov (United States)

    Li, Shu-Fen; Gao, Wu-Jun; Zhao, Xin-Peng; Dong, Tian-Yu; Deng, Chuan-Liang; Lu, Long-Dou

    2014-01-01

    Asparagus officinalis is an economically and nutritionally important vegetable crop that is widely cultivated and is used as a model dioecious species to study plant sex determination and sex chromosome evolution. To improve our understanding of its genome composition, especially with respect to transposable elements (TEs), which make up the majority of the genome, we performed Illumina HiSeq2000 sequencing of both male and female asparagus genomes followed by bioinformatics analysis. We generated 17 Gb of sequence (12×coverage) and assembled them into 163,406 scaffolds with a total cumulated length of 400 Mbp, which represent about 30% of asparagus genome. Overall, TEs masked about 53% of the A. officinalis assembly. Majority of the identified TEs belonged to LTR retrotransposons, which constitute about 28% of genomic DNA, with Ty1/copia elements being more diverse and accumulated to higher copy numbers than Ty3/gypsy. Compared with LTR retrotransposons, non-LTR retrotransposons and DNA transposons were relatively rare. In addition, comparison of the abundance of the TE groups between male and female genomes showed that the overall TE composition was highly similar, with only slight differences in the abundance of several TE groups, which is consistent with the relatively recent origin of asparagus sex chromosomes. This study greatly improves our knowledge of the repetitive sequence construction of asparagus, which facilitates the identification of TEs responsible for the early evolution of plant sex chromosomes and is helpful for further studies on this dioecious plant.

  16. Ancient, highly polymorphic human major histocompatibility complex DQA1 intron sequence

    Energy Technology Data Exchange (ETDEWEB)

    McGinnis, M.D.; Quinn, D.L.; Lebo, R.V. [Univ. of California, San Francisco, CA (United States); Simons, M.J. [GeneType Pty. Ltd., Fitzroy, Victoria (Australia)

    1994-10-01

    A 438 basepair intron 1 sequence adjacent to exon 2 in the human major histocompatibility complex DQA1 gene defined 16 allelic variants in 69 individuals from wide ethnic backgrounds. In contrast, the most variable coding region spanned by the 247 basepair exon 2 defined 11 allelic variants. Our phylogenetic human intron 1 tree derived by the Bootstrap algorithm reflects the same relative allelic relationships as the reported DQA1 exon 2 have cosegregated since divergence of the human races. Comparison of human alleles to a Rhesus monkey DQA1 first intron sequence found only 10 nucleotide substitutions unique to Rhesus, with the other 428 positions (98%) found in at least one human allele. This high degree of homology reflects the evolutionary stability of intron sequences since these two species diverged over 20 million years ago. Because more intron 1 alleles exist than exon 2 alleles, these polymorphic introns can be used to improve tissue typing for transplantation, paternity testing, and forensics and to derive more complete phylogenetic trees. These results suggest that introns represent a previously underutilized polymorphic resource. 42 refs., 3 figs., 1 tab.

  17. High-throughput sequencing reveals an altered T cell repertoire in X-linked agammaglobulinemia.

    Science.gov (United States)

    Ramesh, Manish; Simchoni, Noa; Hamm, David; Cunningham-Rundles, Charlotte

    2015-12-01

    To examine the T cell receptor structure in the absence of B cells, the TCR β CDR3 was sequenced from DNA of 15 X-linked agammaglobulinemia (XLA) subjects and 18 male controls, using the Illumina HiSeq platform and the ImmunoSEQ analyzer. V gene usage and the V-J combinations, derived from both productive and non-productive sequences, were significantly different between XLA samples and controls. Although the CDR3 length was similar for XLA and control samples, the CDR3 region of the XLA T cell receptor contained significantly fewer deletions and insertions in V, D, and J gene segments, differences intrinsic to the V(D)J recombination process and not due to peripheral T cell selection. XLA CDR3s demonstrated fewer charged amino acid residues, more sharing of CDR3 sequences, and almost completely lacked a population of highly modified Vβ gene segments found in control DNA, suggesting both a skewed and contracted T cell repertoire in XLA.

  18. High-throughput sequencing and morphology perform equally well for benthic monitoring of marine ecosystems.

    Science.gov (United States)

    Lejzerowicz, Franck; Esling, Philippe; Pillet, Loïc; Wilding, Thomas A; Black, Kenneth D; Pawlowski, Jan

    2015-09-10

    Environmental diversity surveys are crucial for the bioassessment of anthropogenic impacts on marine ecosystems. Traditional benthic monitoring relying on morphotaxonomic inventories of macrofaunal communities is expensive, time-consuming and expertise-demanding. High-throughput sequencing of environmental DNA barcodes (metabarcoding) offers an alternative to describe biological communities. However, whether the metabarcoding approach meets the quality standards of benthic monitoring remains to be tested. Here, we compared morphological and eDNA/RNA-based inventories of metazoans from samples collected at 10 stations around a fish farm in Scotland, including near-cage and distant zones. For each of 5 replicate samples per station, we sequenced the V4 region of the 18S rRNA gene using the Illumina technology. After filtering, we obtained 841,766 metazoan sequences clustered in 163 Operational Taxonomic Units (OTUs). We assigned the OTUs by combining local BLAST searches with phylogenetic analyses. We calculated two commonly used indices: the Infaunal Trophic Index and the AZTI Marine Biotic Index. We found that the molecular data faithfully reflect the morphology-based indices and provides an equivalent assessment of the impact associated with fish farms activities. We advocate that future benthic monitoring should integrate metabarcoding as a rapid and accurate tool for the evaluation of the quality of marine benthic ecosystems.

  19. Target-dependent enrichment of virions determines the reduction of high-throughput sequencing in virus discovery.

    Directory of Open Access Journals (Sweden)

    Randi Holm Jensen

    Full Text Available Viral infections cause many different diseases stemming both from well-characterized viral pathogens but also from emerging viruses, and the search for novel viruses continues to be of great importance. High-throughput sequencing is an important technology for this purpose. However, viral nucleic acids often constitute a minute proportion of the total genetic material in a sample from infected tissue. Techniques to enrich viral targets in high-throughput sequencing have been reported, but the sensitivity of such methods is not well established. This study compares different library preparation techniques targeting both DNA and RNA with and without virion enrichment. By optimizing the selection of intact virus particles, both by physical and enzymatic approaches, we assessed the effectiveness of the specific enrichment of viral sequences as compared to non-enriched sample preparations by selectively looking for and counting read sequences obtained from shotgun sequencing. Using shotgun sequencing of total DNA or RNA, viral targets were detected at concentrations corresponding to the predicted level, providing a foundation for estimating the effectiveness of virion enrichment. Virion enrichment typically produced a 1000-fold increase in the proportion of DNA virus sequences. For RNA virions the gain was less pronounced with a maximum 13-fold increase. This enrichment varied between the different sample concentrations, with no clear trend. Despite that less sequencing was required to identify target sequences, it was not evident from our data that a lower detection level was achieved by virion enrichment compared to shotgun sequencing.

  20. Identification of microRNAs from Eugenia uniflora by high-throughput sequencing and bioinformatics analysis.

    Directory of Open Access Journals (Sweden)

    Frank Guzman

    Full Text Available BACKGROUND: microRNAs or miRNAs are small non-coding regulatory RNAs that play important functions in the regulation of gene expression at the post-transcriptional level by targeting mRNAs for degradation or inhibiting protein translation. Eugenia uniflora is a plant native to tropical America with pharmacological and ecological importance, and there have been no previous studies concerning its gene expression and regulation. To date, no miRNAs have been reported in Myrtaceae species. RESULTS: Small RNA and RNA-seq libraries were constructed to identify miRNAs and pre-miRNAs in Eugenia uniflora. Solexa technology was used to perform high throughput sequencing of the library, and the data obtained were analyzed using bioinformatics tools. From 14,489,131 small RNA clean reads, we obtained 1,852,722 mature miRNA sequences representing 45 conserved families that have been identified in other plant species. Further analysis using contigs assembled from RNA-seq allowed the prediction of secondary structures of 25 known and 17 novel pre-miRNAs. The expression of twenty-seven identified miRNAs was also validated using RT-PCR assays. Potential targets were predicted for the most abundant mature miRNAs in the identified pre-miRNAs based on sequence homology. CONCLUSIONS: This study is the first large scale identification of miRNAs and their potential targets from a species of the Myrtaceae family without genomic sequence resources. Our study provides more information about the evolutionary conservation of the regulatory network of miRNAs in plants and highlights species-specific miRNAs.

  1. A highly conserved repeated chromosomal sequence in the radioresistant bacterium Deinococcus radiodurans SARK.

    Science.gov (United States)

    Lennon, E; Gutman, P D; Yao, H L; Minton, K W

    1991-03-01

    A DNA fragment containing a portion of a DNA damage-inducible gene from Deinococcus radiodurans SARK hybridized to numerous fragments of SARK genomic DNA because of a highly conserved repetitive chromosomal element. The element is of variable length, ranging from 150 to 192 bp, depending on the absence or presence of one or two 21-bp sequences located internally. A putative translational start site of the damage-inducible gene is within the reiterated element. The element contains dyad symmetries that suggest modes of transcriptional and/or translational control.

  2. Draft genome sequence of Bacillus thuringiensis 147, a Brazilian strain with high insecticidal activity

    Science.gov (United States)

    Barbosa, Luiz Carlos Bertucci; Farias, Débora Lopes; Silva, Isabella de Moraes Guimarães; Melo, Fernando Lucas; Ribeiro, Bergmann Morais; Aguiar, Raimundo Wagner de Souza

    2015-01-01

    Bacillus thuringiensis is a ubiquitous Gram-positive and sporulating bacterium. Its crystals and secreted toxins are useful tools against larvae of diverse insect orders and, as a consequence, an alternative to recalcitrant chemical insecticides. We report here the draft genome sequence ofB. thuringiensis 147, a strain isolated from Brazil and with high insecticidal activity. The assembled genome contained 6,167,994 bp and was distributed in seven replicons (a chromosome and 6 plasmids). We identified 12 coding regions, located in two plasmids, which encode insecticidal proteins. PMID:26517667

  3. Novel design of multicapillary arrays for high-throughput DNA sequencing.

    Science.gov (United States)

    Tsupryk, Andriy; Gorbovitski, Michael; Kabotyanski, Evgeni A; Gorfinkel, Vera

    2006-07-01

    A novel approach to design and optimize linear multicapillary arrays (LMCAs) for high-throughput DNA sequencing is proposed. A significant increase in the number of capillary lanes is obtained due to the use of composite insertions alternately placed between working capillaries of the array and a specific combination of refractive indices of the DNA separation matrix, capillary glass, the insertions and a medium which surrounds the capillary array. Theoretical and experimental studies showed that in conjunction with a dual-side laser illumination scheme, the proposed LMCA design allows a simultaneous uniform irradiation of as many as 550 working capillaries.

  4. [Characteristics of anaerobic sequencing batch reactor for the treatment of high-solids-content waste].

    Science.gov (United States)

    Wang, Zhi-jun; Wang, Wei; Zhang, Xi-hui

    2006-06-01

    Based on the experiments of digestion of thermo-hydrolyzed sewage sludge in both mesophilic and thermophilic anaerobic sequencing batch reactors (ASBRs) with 20, 10, 7.5, 5d hydraulic retention time (HRT), operating characteristics of ASBR for treatment of high-solids-content waste were investigated. ASBR can efficiently accumulates suspended solids and keep high concentration solids, however there exists a "critical point" of ASBR, which means the maximum capability to accumulate suspended solids without negative effects on ASBR stability, and beyond which the performance deteriorates. Under steady condition, ASBR can sustains high solid retention time (SRT) and mean cell retention time (MCRT), the SRT and MCRT is 2.53 approximately 3.73 and 2.03 approximately 3.14 times of hydraulic retention time (HRT) when treating thermo-hydrolyzed sludge, respectively. Therefore, compared to traditional continuous-flow stirred tank reactor (CSTR), the efficiency of ASBR enhances about 7.13% approximately 34.68%.

  5. Confidence rating of marine eutrophication assessments

    DEFF Research Database (Denmark)

    Murray, Ciarán; Andersen, Jesper Harbo; Kaartokallio, Hermanni

    2011-01-01

    This report presents the development of a methodology for assessing confidence in eutrophication status classifications. The method can be considered as a secondary assessment, supporting the primary assessment of eutrophication status. The confidence assessment is based on a transparent scoring...

  6. A Mathematical Framework for Statistical Decision Confidence.

    Science.gov (United States)

    Hangya, Balázs; Sanders, Joshua I; Kepecs, Adam

    2016-09-01

    Decision confidence is a forecast about the probability that a decision will be correct. From a statistical perspective, decision confidence can be defined as the Bayesian posterior probability that the chosen option is correct based on the evidence contributing to it. Here, we used this formal definition as a starting point to develop a normative statistical framework for decision confidence. Our goal was to make general predictions that do not depend on the structure of the noise or a specific algorithm for estimating confidence. We analytically proved several interrelations between statistical decision confidence and observable decision measures, such as evidence discriminability, choice, and accuracy. These interrelationships specify necessary signatures of decision confidence in terms of externally quantifiable variables that can be empirically tested. Our results lay the foundations for a mathematically rigorous treatment of decision confidence that can lead to a common framework for understanding confidence across different research domains, from human and animal behavior to neural representations.

  7. Status and Confidence, in the Lab

    OpenAIRE

    Jeffrey V. Butler

    2009-01-01

    It is widely recognized that confidence can have important economic consequences. While most of the focus has been on overconfidence, systematic variation in confidence can imply systematic variation in economic outcomes. Intriguingly, sociological and social psychological research suggests that being on the wrong side of inequality undermines confidence. This paper examines the link between inequality and confidence in a controlled, incentive-compatible laboratory setting. Inequality was int...

  8. High Throughput Random Mutagenesis and Single Molecule Real Time Sequencing of the Muscle Nicotinic Acetylcholine Receptor

    Science.gov (United States)

    Groot-Kormelink, Paul J.; Ferrand, Sandrine; Kelley, Nicholas; Bill, Anke; Freuler, Felix; Imbert, Pierre-Eloi; Marelli, Anthony; Gerwin, Nicole; Sivilotti, Lucia G.; Miraglia, Loren; Orth, Anthony P.; Oakeley, Edward J.; Schopfer, Ulrich; Siehler, Sandra

    2016-01-01

    High throughput random mutagenesis is a powerful tool to identify which residues are important for the function of a protein, and gain insight into its structure-function relation. The human muscle nicotinic acetylcholine receptor was used to test whether this technique previously used for monomeric receptors can be applied to a pentameric ligand-gated ion channel. A mutant library for the α1 subunit of the channel was generated by error-prone PCR, and full length sequences of all 2816 mutants were retrieved using single molecule real time sequencing. Each α1 mutant was co-transfected with wildtype β1, δ, and ε subunits, and the channel function characterized by an ion flux assay. To test whether the strategy could map the structure-function relation of this receptor, we attempted to identify mutations that conferred resistance to competitive antagonists. Mutant hits were defined as receptors that responded to the nicotinic agonist epibatidine, but were not inhibited by either α-bungarotoxin or tubocurarine. Eight α1 subunit mutant hits were identified, six of which contained mutations at position Y233 or V275 in the transmembrane domain. Three single point mutations (Y233N, Y233H, and V275M) were studied further, and found to enhance the potencies of five channel agonists tested. This suggests that the mutations made the channel resistant to the antagonists, not by impairing antagonist binding, but rather by producing a gain-of-function phenotype, e.g. increased agonist sensitivity. Our data show that random high throughput mutagenesis is applicable to multimeric proteins to discover novel functional mutants, and outlines the benefits of using single molecule real time sequencing with regards to quality control of the mutant library as well as downstream mutant data interpretation. PMID:27649498

  9. Highly sensitive, non-invasive detection of colorectal cancer mutations using single molecule, third generation sequencing.

    Science.gov (United States)

    Russo, Giancarlo; Patrignani, Andrea; Poveda, Lucy; Hoehn, Frederic; Scholtka, Bettina; Schlapbach, Ralph; Garvin, Alex M

    2015-12-01

    Colorectal cancer (CRC) represents one of the most prevalent and lethal malignant neoplasms and every individual of age 50 and above should undergo regular CRC screening. Currently, the most effective preventive screening procedure to detect adenomatous polyps, the precursors to CRC, is colonoscopy. Since every colorectal cancer starts as a polyp, detecting all polyps and removing them is crucial. By exactly doing that, colonoscopy reduces CRC incidence by 80%, however it is an invasive procedure that might have unpleasant and, in rare occasions, dangerous side effects. Despite numerous efforts over the past two decades, a non-invasive screening method for the general population with detection rates for adenomas and CRC similar to that of colonoscopy has not yet been established. Recent advances in next generation sequencing technologies have yet to be successfully applied to this problem, because the detection of rare mutations has been hindered by the systematic biases due to sequencing context and the base calling quality of NGS. We present the first study that applies the high read accuracy and depth of single molecule, real time, circular consensus sequencing (SMRT-CCS) to the detection of mutations in stool DNA in order to provide a non-invasive, sensitive and accurate test for CRC. In stool DNA isolated from patients diagnosed with adenocarcinoma, we are able to detect mutations at frequencies below 0.5% with no false positives. This approach establishes a foundation for a non-invasive, highly sensitive assay to screen the population for CRC and the early stage adenomas that lead to CRC.

  10. Transcriptomic analysis of Petunia hybrida in response to salt stress using high throughput RNA sequencing.

    Directory of Open Access Journals (Sweden)

    Gonzalo H Villarino

    Full Text Available Salinity and drought stress are the primary cause of crop losses worldwide. In sodic saline soils sodium chloride (NaCl disrupts normal plant growth and development. The complex interactions of plant systems with abiotic stress have made RNA sequencing a more holistic and appealing approach to study transcriptome level responses in a single cell and/or tissue. In this work, we determined the Petunia transcriptome response to NaCl stress by sequencing leaf samples and assembling 196 million Illumina reads with Trinity software. Using our reference transcriptome we identified more than 7,000 genes that were differentially expressed within 24 h of acute NaCl stress. The proposed transcriptome can also be used as an excellent tool for biological and bioinformatics in the absence of an available Petunia genome and it is available at the SOL Genomics Network (SGN http://solgenomics.net. Genes related to regulation of reactive oxygen species, transport, and signal transductions as well as novel and undescribed transcripts were among those differentially expressed in response to salt stress. The candidate genes identified in this study can be applied as markers for breeding or to genetically engineer plants to enhance salt tolerance. Gene Ontology analyses indicated that most of the NaCl damage happened at 24 h inducing genotoxicity, affecting transport and organelles due to the high concentration of Na+ ions. Finally, we report a modification to the library preparation protocol whereby cDNA samples were bar-coded with non-HPLC purified primers, without affecting the quality and quantity of the RNA-seq data. The methodological improvement presented here could substantially reduce the cost of sample preparation for future high-throughput RNA sequencing experiments.

  11. Transcriptomic Analysis of Petunia hybrida in Response to Salt Stress Using High Throughput RNA Sequencing

    Science.gov (United States)

    Villarino, Gonzalo H.; Bombarely, Aureliano; Giovannoni, James J.; Scanlon, Michael J.; Mattson, Neil S.

    2014-01-01

    Salinity and drought stress are the primary cause of crop losses worldwide. In sodic saline soils sodium chloride (NaCl) disrupts normal plant growth and development. The complex interactions of plant systems with abiotic stress have made RNA sequencing a more holistic and appealing approach to study transcriptome level responses in a single cell and/or tissue. In this work, we determined the Petunia transcriptome response to NaCl stress by sequencing leaf samples and assembling 196 million Illumina reads with Trinity software. Using our reference transcriptome we identified more than 7,000 genes that were differentially expressed within 24 h of acute NaCl stress. The proposed transcriptome can also be used as an excellent tool for biological and bioinformatics in the absence of an available Petunia genome and it is available at the SOL Genomics Network (SGN) http://solgenomics.net. Genes related to regulation of reactive oxygen species, transport, and signal transductions as well as novel and undescribed transcripts were among those differentially expressed in response to salt stress. The candidate genes identified in this study can be applied as markers for breeding or to genetically engineer plants to enhance salt tolerance. Gene Ontology analyses indicated that most of the NaCl damage happened at 24 h inducing genotoxicity, affecting transport and organelles due to the high concentration of Na+ ions. Finally, we report a modification to the library preparation protocol whereby cDNA samples were bar-coded with non-HPLC purified primers, without affecting the quality and quantity of the RNA-seq data. The methodological improvement presented here could substantially reduce the cost of sample preparation for future high-throughput RNA sequencing experiments. PMID:24722556

  12. Contrasting Diversity Values: Statistical Inferences Based on Overlapping Confidence Intervals

    Science.gov (United States)

    MacGregor-Fors, Ian; Payton, Mark E.

    2013-01-01

    Ecologists often contrast diversity (species richness and abundances) using tests for comparing means or indices. However, many popular software applications do not support performing standard inferential statistics for estimates of species richness and/or density. In this study we simulated the behavior of asymmetric log-normal confidence intervals and determined an interval level that mimics statistical tests with P(α) = 0.05 when confidence intervals from two distributions do not overlap. Our results show that 84% confidence intervals robustly mimic 0.05 statistical tests for asymmetric confidence intervals, as has been demonstrated for symmetric ones in the past. Finally, we provide detailed user-guides for calculating 84% confidence intervals in two of the most robust and highly-used freeware related to diversity measurements for wildlife (i.e., EstimateS, Distance). PMID:23437239

  13. Large scale single nucleotide polymorphism discovery in unsequenced genomes using second generation high throughput sequencing technology: applied to turkey

    NARCIS (Netherlands)

    Kerstens, H.H.D.; Crooijmans, R.P.M.A.; Veenendaal, A.; Dibbits, B.W.; Chin-A-Woeng, T.F.C.; Dunnen, den J.T.; Groenen, M.A.M.

    2009-01-01

    Background - The development of second generation sequencing methods has enabled large scale DNA variation studies at moderate cost. For the high throughput discovery of single nucleotide polymorphisms (SNPs) in species lacking a sequenced reference genome, we set-up an analysis pipeline based on a

  14. Complete Genome Sequences of Two Methicillin-Sensitive Staphylococcus aureus Isolates Representing a Population Subset Highly Prevalent in Human Colonization

    Science.gov (United States)

    Weber, Robert E.; Layer, Franziska; Fuchs, Stephan; Bender, Jennifer K.; Fiedler, Stefan; Werner, Guido

    2016-01-01

    Here, we report the high-quality draft genome sequences of two methicillin-susceptible Staphylococcus aureus isolates, 08-02119 and 08-02300. Belonging to sequence type 582 (ST582) and ST7, both isolates are representatives of clonal lineages often associated with asymptomatic colonization of humans. PMID:27469954

  15. STAMP: Extensions to the STADEN sequence analysis package for high throughput interactive microsatellite marker design.

    Science.gov (United States)

    Kraemer, Lars; Beszteri, Bánk; Gäbler-Schwarz, Steffi; Held, Christoph; Leese, Florian; Mayer, Christoph; Pöhlmann, Kevin; Frickenhaus, Stephan

    2009-01-30

    Microsatellites (MSs) are DNA markers with high analytical power, which are widely used in population genetics, genetic mapping, and forensic studies. Currently available software solutions for high-throughput MS design (i) have shortcomings in detecting and distinguishing imperfect and perfect MSs, (ii) lack often necessary interactive design steps, and (iii) do not allow for the development of primers for multiplex amplifications. We present a set of new tools implemented as extensions to the STADEN package, which provides the backbone functionality for flexible sequence analysis workflows. The possibility to assemble overlapping reads into unique contigs (provided by the base functionality of the STADEN package) is important to avoid developing redundant markers, a feature missing from most other similar tools. Our extensions to the STADEN package provide the following functionality to facilitate microsatellite (and also minisatellite) marker design: The new modules (i) integrate the state-of-the-art tandem repeat detection and analysis software PHOBOS into workflows, (ii) provide two separate repeat detection steps - with different search criteria - one for masking repetitive regions during assembly of sequencing reads and the other for designing repeat-flanking primers for MS candidate loci, (iii) incorporate the widely used primer design program PRIMER3 into STADEN workflows, enabling the interactive design and visualization of flanking primers for microsatellites, and (iv) provide the functionality to find optimal locus- and primer pair combinations for multiplex primer design. Furthermore, our extensions include a module for storing analysis results in an SQLite database, providing a transparent solution for data access from within as well as from outside of the STADEN Package. The STADEN package is enhanced by our modules into a highly flexible, high-throughput, interactive tool for conventional and multiplex microsatellite marker design. It gives the user

  16. Highly sulfated hexasaccharide sequences isolated from chondroitin sulfate of shark fin cartilage: insights into the sugar sequences with bioactivities.

    Science.gov (United States)

    Mizumoto, Shuji; Murakoshi, Saori; Kalayanamitra, Kittiwan; Deepa, Sarama Sathyaseelan; Fukui, Shigeyuki; Kongtawelert, Prachya; Yamada, Shuhei; Sugahara, Kazuyuki

    2013-02-01

    Chondroitin sulfate (CS) chains regulate the development of the central nervous system in vertebrates and are linear polysaccharides consisting of variously sulfated repeating disaccharides, [-4GlcUAβ1-3GalNAcβ1-](n), where GlcUA and GalNAc represent D-glucuronic acid and N-acetyl-D-galactosamine, respectively. CS chains containing D-disaccharide units [GlcUA(2-O-sulfate)-GalNAc(6-O-sulfate)] are involved in the development of cerebellar Purkinje cells and neurite outgrowth-promoting activity through interaction with a neurotrophic factor, pleiotrophin, resulting in the regulation of signaling. In this study, to obtain further structural information on the CS chains containing d-disaccharide units involved in brain development, oligosaccharides containing D-units were isolated from a shark fin cartilage. Seven novel hexasaccharide sequences, ΔO-D-D, ΔA-D-D, ΔC-D-D, ΔE-A-D, ΔD-D-C, ΔE-D-D and ΔA-B-D, in addition to three previously reported sequences, ΔC-A-D, ΔC-D-C and ΔA-D-A, were isolated from a CS preparation of shark fin cartilage after exhaustive digestion with chondroitinase AC-I, which cannot act on the galactosaminidic linkages bound to D-units. The symbol Δ stands for a 4,5-unsaturated bond of uronic acids, whereas A, B, C, D, E and O represent [GlcUA-GalNAc(4-O-sulfate)], [GlcUA(2-O-sulfate)-GalNAc(4-O-sulfate)], [GlcUA-GalNAc(6-O-sulfate)], [GlcUA(2-O-sulfate)-GalNAc(6-O-sulfate)], [GlcUA-GalNAc(4-O-, 6-O-sulfate)] and [GlcUA-GalNAc], respectively. In binding studies using an anti-CS monoclonal antibody, MO-225, the epitopes of which are involved in cerebellar development in mammals, novel epitope structures, ΔA-D-A, ΔA-D-D and ΔA-B-D, were revealed. Hexasaccharides containing two consecutive D-units or a B-unit will be useful for the structural and functional analyses of CS chains particularly in the neuroglycobiological fields.

  17. A Confidence Paradigm for Classification Systems

    Science.gov (United States)

    2008-09-01

    M.U. Thomas Date Dean, Graduate School of Engineering and Management Table of Contents Page List of Figures...Plato, Aristotle, Plotinus, St Augustine, St Aquinas , Machi- avelli, Descartes, Hobbes, Locke, Rousseau, Kant, Marx, Mill, Confucius) discuss having...independence, and aggregation of confidence is a linear summation of individual confidence values. Thomas and Allcock [61] develop a statistical confidence

  18. NGC: lossless and lossy compression of aligned high-throughput sequencing data.

    Science.gov (United States)

    Popitsch, Niko; von Haeseler, Arndt

    2013-01-01

    A major challenge of current high-throughput sequencing experiments is not only the generation of the sequencing data itself but also their processing, storage and transmission. The enormous size of these data motivates the development of data compression algorithms usable for the implementation of the various storage policies that are applied to the produced intermediate and final result files. In this article, we present NGC, a tool for the compression of mapped short read data stored in the wide-spread SAM format. NGC enables lossless and lossy compression and introduces the following two novel ideas: first, we present a way to reduce the number of required code words by exploiting common features of reads mapped to the same genomic positions; second, we present a highly configurable way for the quantization of per-base quality values, which takes their influence on downstream analyses into account. NGC, evaluated with several real-world data sets, saves 33-66% of disc space using lossless and up to 98% disc space using lossy compression. By applying two popular variant and genotype prediction tools to the decompressed data, we could show that the lossy compression modes preserve >99% of all called variants while outperforming comparable methods in some configurations.

  19. Quantitative assessment of RNA-protein interactions with high-throughput sequencing-RNA affinity profiling.

    Science.gov (United States)

    Ozer, Abdullah; Tome, Jacob M; Friedman, Robin C; Gheba, Dan; Schroth, Gary P; Lis, John T

    2015-08-01

    Because RNA-protein interactions have a central role in a wide array of biological processes, methods that enable a quantitative assessment of these interactions in a high-throughput manner are in great demand. Recently, we developed the high-throughput sequencing-RNA affinity profiling (HiTS-RAP) assay that couples sequencing on an Illumina GAIIx genome analyzer with the quantitative assessment of protein-RNA interactions. This assay is able to analyze interactions between one or possibly several proteins with millions of different RNAs in a single experiment. We have successfully used HiTS-RAP to analyze interactions of the EGFP and negative elongation factor subunit E (NELF-E) proteins with their corresponding canonical and mutant RNA aptamers. Here we provide a detailed protocol for HiTS-RAP that can be completed in about a month (8 d hands-on time). This includes the preparation and testing of recombinant proteins and DNA templates, clustering DNA templates on a flowcell, HiTS and protein binding with a GAIIx instrument, and finally data analysis. We also highlight aspects of HiTS-RAP that can be further improved and points of comparison between HiTS-RAP and two other recently developed methods, quantitative analysis of RNA on a massively parallel array (RNA-MaP) and RNA Bind-n-Seq (RBNS), for quantitative analysis of RNA-protein interactions.

  20. Exome sequencing identifies highly recurrent MED12 somatic mutations in breast fibroadenoma.

    Science.gov (United States)

    Lim, Weng Khong; Ong, Choon Kiat; Tan, Jing; Thike, Aye Aye; Ng, Cedric Chuan Young; Rajasegaran, Vikneswari; Myint, Swe Swe; Nagarajan, Sanjanaa; Nasir, Nur Diyana Md; McPherson, John R; Cutcutache, Ioana; Poore, Gregory; Tay, Su Ting; Ooi, Wei Siong; Tan, Veronique Kiak Mien; Hartman, Mikael; Ong, Kong Wee; Tan, Benita K T; Rozen, Steven G; Tan, Puay Hoon; Tan, Patrick; Teh, Bin Tean

    2014-08-01

    Fibroadenomas are the most common breast tumors in women under 30 (refs. 1,2). Exome sequencing of eight fibroadenomas with matching whole-blood samples revealed recurrent somatic mutations solely in MED12, which encodes a Mediator complex subunit. Targeted sequencing of an additional 90 fibroadenomas confirmed highly frequent MED12 exon 2 mutations (58/98, 59%) that are probably somatic, with 71% of mutations occurring in codon 44. Using laser capture microdissection, we show that MED12 fibroadenoma mutations are present in stromal but not epithelial mammary cells. Expression profiling of MED12-mutated and wild-type fibroadenomas revealed that MED12 mutations are associated with dysregulated estrogen signaling and extracellular matrix organization. The fibroadenoma MED12 mutation spectrum is nearly identical to that of previously reported MED12 lesions in uterine leiomyoma but not those of other tumors. Benign tumors of the breast and uterus, both of which are key target tissues of estrogen, may thus share a common genetic basis underpinned by highly frequent and specific MED12 mutations.

  1. Extracellular DNA amplicon sequencing reveals high levels of benthic eukaryotic diversity in the central Red Sea.

    Science.gov (United States)

    Pearman, John K; Irigoien, Xabier; Carvalho, Susana

    2016-04-01

    The present study aims to characterize the benthic eukaryotic biodiversity patterns at a coarse taxonomic level in three areas of the central Red Sea (a lagoon, an offshore area in Thuwal and a shallow coastal area near Jeddah) based on extracellular DNA. High-throughput amplicon sequencing targeting the V9 region of the 18S rRNA gene was undertaken for 32 sediment samples. High levels of alpha-diversity were detected with 16,089 operational taxonomic units (OTUs) being identified. The majority of the OTUs were assigned to Metazoa (29.2%), Alveolata (22.4%) and Stramenopiles (17.8%). Stramenopiles (Diatomea) and Alveolata (Ciliophora) were frequent in a lagoon and in shallower coastal stations, whereas metazoans (Arthropoda: Maxillopoda) were dominant in deeper offshore stations. Only 24.6% of total OTUs were shared among all areas. Beta-diversity was generally lower between the lagoon and Jeddah (nearshore) than between either of those and the offshore area, suggesting a nearshore-offshore biodiversity gradient. The current approach allowed for a broad-range of benthic eukaryotic biodiversity to be analysed with significantly less labour than would be required by other traditional taxonomic approaches. Our findings suggest that next generation sequencing techniques have the potential to provide a fast and standardised screening of benthic biodiversity at large spatial and temporal scales.

  2. Extracellular DNA amplicon sequencing reveals high levels of benthic eukaryotic diversity in the central Red Sea

    KAUST Repository

    Pearman, John K.

    2015-11-01

    The present study aims to characterize the benthic eukaryotic biodiversity patterns at a coarse taxonomic level in three areas of the central Red Sea (a lagoon, an offshore area in Thuwal and a shallow coastal area near Jeddah) based on extracellular DNA. High-throughput amplicon sequencing targeting the V9 region of the 18S rRNA gene was undertaken for 32 sediment samples. High levels of alpha-diversity were detected with 16,089 operational taxonomic units (OTUs) being identified. The majority of the OTUs were assigned to Metazoa (29.2%), Alveolata (22.4%) and Stramenopiles (17.8%). Stramenopiles (Diatomea) and Alveolata (Ciliophora) were frequent in a lagoon and in shallower coastal stations, whereas metazoans (Arthropoda: Maxillopoda) were dominant in deeper offshore stations. Only 24.6% of total OTUs were shared among all areas. Beta-diversity was generally lower between the lagoon and Jeddah (nearshore) than between either of those and the offshore area, suggesting a nearshore–offshore biodiversity gradient. The current approach allowed for a broad-range of benthic eukaryotic biodiversity to be analysed with significantly less labour than would be required by other traditional taxonomic approaches. Our findings suggest that next generation sequencing techniques have the potential to provide a fast and standardised screening of benthic biodiversity at large spatial and temporal scales.

  3. Perchlorate reduction by hydrogen autotrophic bacteria and microbial community analysis using high-throughput sequencing.

    Science.gov (United States)

    Wan, Dongjin; Liu, Yongde; Niu, Zhenhua; Xiao, Shuhu; Li, Daorong

    2016-02-01

    Hydrogen autotrophic reduction of perchlorate have advantages of high removal efficiency and harmless to drinking water. But so far the reported information about the microbial community structure was comparatively limited, changes in the biodiversity and the dominant bacteria during acclimation process required detailed study. In this study, perchlorate-reducing hydrogen autotrophic bacteria were acclimated by hydrogen aeration from activated sludge. For the first time, high-throughput sequencing was applied to analyze changes in biodiversity and the dominant bacteria during acclimation process. The Michaelis-Menten model described the perchlorate reduction kinetics well. Model parameters q(max) and K(s) were 2.521-3.245 (mg ClO4(-)/gVSS h) and 5.44-8.23 (mg/l), respectively. Microbial perchlorate reduction occurred across at pH range 5.0-11.0; removal was highest at pH 9.0. The enriched mixed bacteria could use perchlorate, nitrate and sulfate as electron accepter, and the sequence of preference was: NO3(-) > ClO4(-) > SO4(2-). Compared to the feed culture, biodiversity decreased greatly during acclimation process, the microbial community structure gradually stabilized after 9 acclimation cycles. The Thauera genus related to Rhodocyclales was the dominated perchlorate reducing bacteria (PRB) in the mixed culture.

  4. Engine Test Confidence Evaluation System

    Science.gov (United States)

    2007-09-13

    Tech nolog y Ele ment s Demonstrator: Silicon Nitride Blade Example Date of Rating: Now Feb 07 High Turbine Compressor Combustor Low Turbine Fan...TFI*STE 6 5.8*6*6*6*9*9 6 6.8 Demonstrator: Silicon Nitride Blade Example Date of Rating: Now Feb 07 High Turbine Compressor Combustor Low Turbine Fan

  5. High-throughput sequencing offers insight into mechanisms of resource partitioning in cryptic bat species

    DEFF Research Database (Denmark)

    Razgour, Orly; Clare, Elizabeth L.; Zeale, Matt R.K.

    2011-01-01

    cryptic bat species that are sympatric in southern England (Plecotus austriacus and P. auritus) (Fig. 1). Using Roche FLX 454 (Roche, Basel, CH) high-throughput sequencing (HTS) and uniquely tagged generic arthropod primers, we identified 142 prey Molecular Operational Taxonomic Units (MOTUs) in the diet...... of the cryptic bats, 60% of which were assigned to a likely species or genus. The findings from the molecular study supported the results of microscopic analyses in showing that the diets of both species were dominated by lepidopterans. However, HTS provided a sufficiently high resolution of prey identification...... to determine fine-scale differences in resource use. Although both bat species appeared to have a generalist diet, eared-moths from the family Noctuidae were the main prey consumed. Interspecific niche overlap was greater than expected by chance (O(jk) = 0.72, P

  6. Defining the alloreactive T cell repertoire using high-throughput sequencing of mixed lymphocyte reaction culture.

    Directory of Open Access Journals (Sweden)

    Ryan O Emerson

    Full Text Available The cellular immune response is the most important mediator of allograft rejection and is a major barrier to transplant tolerance. Delineation of the depth and breadth of the alloreactive T cell repertoire and subsequent application of the technology to the clinic may improve patient outcomes. As a first step toward this, we have used MLR and high-throughput sequencing to characterize the alloreactive T cell repertoire in healthy adults at baseline and 3 months later. Our results demonstrate that thousands of T cell clones proliferate in MLR, and that the alloreactive repertoire is dominated by relatively high-abundance T cell clones. This clonal make up is consistently reproducible across replicates and across a span of three months. These results indicate that our technology is sensitive and that the alloreactive TCR repertoire is broad and stable over time. We anticipate that application of this approach to track donor-reactive clones may positively impact clinical management of transplant patients.

  7. A high-resolution radiation hybrid map of the human genome draft sequence.

    Science.gov (United States)

    Olivier, M; Aggarwal, A; Allen, J; Almendras, A A; Bajorek, E S; Beasley, E M; Brady, S D; Bushard, J M; Bustos, V I; Chu, A; Chung, T R; De Witte, A; Denys, M E; Dominguez, R; Fang, N Y; Foster, B D; Freudenberg, R W; Hadley, D; Hamilton, L R; Jeffrey, T J; Kelly, L; Lazzeroni, L; Levy, M R; Lewis, S C; Liu, X; Lopez, F J; Louie, B; Marquis, J P; Martinez, R A; Matsuura, M K; Misherghi, N S; Norton, J A; Olshen, A; Perkins, S M; Perou, A J; Piercy, C; Piercy, M; Qin, F; Reif, T; Sheppard, K; Shokoohi, V; Smick, G A; Sun, W L; Stewart, E A; Fernando, J; Tejeda; Tran, N M; Trejo, T; Vo, N T; Yan, S C; Zierten, D L; Zhao, S; Sachidanandam, R; Trask, B J; Myers, R M; Cox, D R

    2001-02-16

    We have constructed a physical map of the human genome by using a panel of 90 whole-genome radiation hybrids (the TNG panel) in conjunction with 40,322 sequence-tagged sites (STSs) derived from random genomic sequences as well as expressed sequences. Of 36,678 STSs on the TNG radiation hybrid map, only 3604 (9.8%) were absent from the unassembled draft sequence of the human genome. Of 20,030 STSs ordered on the TNG map as well as the assembled human genome draft sequence and the Celera assembled human genome sequence, 36% of the STSs had a discrepant order between the working draft sequence and the Celera sequence. The TNG map order was identical to one of the two sequence orders in 60% of these discrepant cases.

  8. Digital PCR provides sensitive and absolute calibration for high throughput sequencing

    Directory of Open Access Journals (Sweden)

    Fan H Christina

    2009-03-01

    Full Text Available Abstract Background Next-generation DNA sequencing on the 454, Solexa, and SOLiD platforms requires absolute calibration of the number of molecules to be sequenced. This requirement has two unfavorable consequences. First, large amounts of sample-typically micrograms-are needed for library preparation, thereby limiting the scope of samples which can be sequenced. For many applications, including metagenomics and the sequencing of ancient, forensic, and clinical samples, the quantity of input DNA can be critically limiting. Second, each library requires a titration sequencing run, thereby increasing the cost and lowering the throughput of sequencing. Results We demonstrate the use of digital PCR to accurately quantify 454 and Solexa sequencing libraries, enabling the preparation of sequencing libraries from nanogram quantities of input material while eliminating costly and time-consuming titration runs of the sequencer. We successfully sequenced low-nanogram scale bacterial and mammalian DNA samples on the 454 FLX and Solexa DNA sequencing platforms. This study is the first to definitively demonstrate the successful sequencing of picogram quantities of input DNA on the 454 platform, reducing the sample requirement more than 1000-fold without pre-amplification and the associated bias and reduction in library depth. Conclusion The digital PCR assay allows absolute quantification of sequencing libraries, eliminates uncertainties associated with the construction and application of standard curves to PCR-based quantification, and with a coefficient of variation close to 10%, is sufficiently precise to enable direct sequencing without titration runs.

  9. Unprecedented high-resolution view of bacterial operon architecture revealed by RNA sequencing.

    Science.gov (United States)

    Conway, Tyrrell; Creecy, James P; Maddox, Scott M; Grissom, Joe E; Conkle, Trevor L; Shadid, Tyler M; Teramoto, Jun; San Miguel, Phillip; Shimada, Tomohiro; Ishihama, Akira; Mori, Hirotada; Wanner, Barry L

    2014-07-08

    We analyzed the transcriptome of Escherichia coli K-12 by strand-specific RNA sequencing at single-nucleotide resolution during steady-state (logarithmic-phase) growth and upon entry into stationary phase in glucose minimal medium. To generate high-resolution transcriptome maps, we developed an organizational schema which showed that in practice only three features are required to define operon architecture: the promoter, terminator, and deep RNA sequence read coverage. We precisely annotated 2,122 promoters and 1,774 terminators, defining 1,510 operons with an average of 1.98 genes per operon. Our analyses revealed an unprecedented view of E. coli operon architecture. A large proportion (36%) of operons are complex with internal promoters or terminators that generate multiple transcription units. For 43% of operons, we observed differential expression of polycistronic genes, despite being in the same operons, indicating that E. coli operon architecture allows fine-tuning of gene expression. We found that 276 of 370 convergent operons terminate inefficiently, generating complementary 3' transcript ends which overlap on average by 286 nucleotides, and 136 of 388 divergent operons have promoters arranged such that their 5' ends overlap on average by 168 nucleotides. We found 89 antisense transcripts of 397-nucleotide average length, 7 unannotated transcripts within intergenic regions, and 18 sense transcripts that completely overlap operons on the opposite strand. Of 519 overlapping transcripts, 75% correspond to sequences that are highly conserved in E. coli (>50 genomes). Our data extend recent studies showing unexpected transcriptome complexity in several bacteria and suggest that antisense RNA regulation is widespread. Importance: We precisely mapped the 5' and 3' ends of RNA transcripts across the E. coli K-12 genome by using a single-nucleotide analytical approach. Our resulting high-resolution transcriptome maps show that ca. one-third of E. coli operons are

  10. TASSEL-GBS: a high capacity genotyping by sequencing analysis pipeline.

    Directory of Open Access Journals (Sweden)

    Jeffrey C Glaubitz

    Full Text Available Genotyping by sequencing (GBS is a next generation sequencing based method that takes advantage of reduced representation to enable high throughput genotyping of large numbers of individuals at a large number of SNP markers. The relatively straightforward, robust, and cost-effective GBS protocol is currently being applied in numerous species by a large number of researchers. Herein we describe a bioinformatics pipeline, TASSEL-GBS, designed for the efficient processing of raw GBS sequence data into SNP genotypes. The TASSEL-GBS pipeline successfully fulfills the following key design criteria: (1 Ability to run on the modest computing resources that are typically available to small breeding or ecological research programs, including desktop or laptop machines with only 8-16 GB of RAM, (2 Scalability from small to extremely large studies, where hundreds of thousands or even millions of SNPs can be scored in up to 100,000 individuals (e.g., for large breeding programs or genetic surveys, and (3 Applicability in an accelerated breeding context, requiring rapid turnover from tissue collection to genotypes. Although a reference genome is required, the pipeline can also be run with an unfinished "pseudo-reference" consisting of numerous contigs. We describe the TASSEL-GBS pipeline in detail and benchmark it based upon a large scale, species wide analysis in maize (Zea mays, where the average error rate was reduced to 0.0042 through application of population genetic-based SNP filters. Overall, the GBS assay and the TASSEL-GBS pipeline provide robust tools for studying genomic diversity.

  11. Towards Measurement of Confidence in Safety Cases

    Science.gov (United States)

    Denney, Ewen; Paim Ganesh J.; Habli, Ibrahim

    2011-01-01

    Arguments in safety cases are predominantly qualitative. This is partly attributed to the lack of sufficient design and operational data necessary to measure the achievement of high-dependability targets, particularly for safety-critical functions implemented in software. The subjective nature of many forms of evidence, such as expert judgment and process maturity, also contributes to the overwhelming dependence on qualitative arguments. However, where data for quantitative measurements is systematically collected, quantitative arguments provide far more benefits over qualitative arguments, in assessing confidence in the safety case. In this paper, we propose a basis for developing and evaluating integrated qualitative and quantitative safety arguments based on the Goal Structuring Notation (GSN) and Bayesian Networks (BN). The approach we propose identifies structures within GSN-based arguments where uncertainties can be quantified. BN are then used to provide a means to reason about confidence in a probabilistic way. We illustrate our approach using a fragment of a safety case for an unmanned aerial system and conclude with some preliminary observations

  12. Linguistic Weighted Aggregation under Confidence Levels

    Directory of Open Access Journals (Sweden)

    Chonghui Zhang

    2015-01-01

    Full Text Available We develop some new linguistic aggregation operators based on confidence levels. Firstly, we introduce the confidence linguistic weighted averaging (CLWA operator and the confidence linguistic ordered weighted averaging (CLOWA operator. These two new linguistic aggregation operators are able to consider the confidence level of the aggregated arguments provided by the information providers. We also study some of their properties. Then, based on the generalized means, we introduce the confidence generalized linguistic ordered weighted averaging (CGLOWA operator. The main advantage of the CGLOWA operator is that it includes a wide range of special cases such as the CLOWA operator, the confidence linguistic ordered weighted quadratic averaging (CLOWQA operator, and the confidence linguistic ordered weighted geometric (CLOWG operator. Finally, we develop an application of the new approach in a multicriteria decision-making under linguistic environment and illustrate it with a numerical example.

  13. Sequence assembly

    DEFF Research Database (Denmark)

    Scheibye-Alsing, Karsten; Hoffmann, S.; Frankel, Annett Maria

    2009-01-01

    Despite the rapidly increasing number of sequenced and re-sequenced genomes, many issues regarding the computational assembly of large-scale sequencing data have remain unresolved. Computational assembly is crucial in large genome projects as well for the evolving high-throughput technologies...

  14. Sequencing rare marine actinomycete genomes reveals high density of unique natural product biosynthetic gene clusters.

    Science.gov (United States)

    Schorn, Michelle A; Alanjary, Mohammad M; Aguinaldo, Kristen; Korobeynikov, Anton; Podell, Sheila; Patin, Nastassia; Lincecum, Tommie; Jensen, Paul R; Ziemert, Nadine; Moore, Bradley S

    2016-12-01

    Traditional natural product discovery methods have nearly exhausted the accessible diversity of microbial chemicals, making new sources and techniques paramount in the search for new molecules. Marine actinomycete bacteria have recently come into the spotlight as fruitful producers of structurally diverse secondary metabolites, and remain relatively untapped. In this study, we sequenced 21 marine-derived actinomycete strains, rarely studied for their secondary metabolite potential and under-represented in current genomic databases. We found that genome size and phylogeny were good predictors of biosynthetic gene cluster diversity, with larger genomes rivalling the well-known marine producers in the Streptomyces and Salinispora genera. Genomes in the Micrococcineae suborder, however, had consistently the lowest number of biosynthetic gene clusters. By networking individual gene clusters into gene cluster families, we were able to computationally estimate the degree of novelty each genus contributed to the current sequence databases. Based on the similarity measures between all actinobacteria in the Joint Genome Institute's Atlas of Biosynthetic gene Clusters database, rare marine genera show a high degree of novelty and diversity, with Corynebacterium, Gordonia, Nocardiopsis, Saccharomonospora and Pseudonocardia genera representing the highest gene cluster diversity. This research validates that rare marine actinomycetes are important candidates for exploration, as they are relatively unstudied, and their relatives are historically rich in secondary metabolites.

  15. Massively parallel digital high resolution melt for rapid and absolutely quantitative sequence profiling

    Science.gov (United States)

    Velez, Daniel Ortiz; Mack, Hannah; Jupe, Julietta; Hawker, Sinead; Kulkarni, Ninad; Hedayatnia, Behnam; Zhang, Yang; Lawrence, Shelley; Fraley, Stephanie I.

    2017-02-01

    In clinical diagnostics and pathogen detection, profiling of complex samples for low-level genotypes represents a significant challenge. Advances in speed, sensitivity, and extent of multiplexing of molecular pathogen detection assays are needed to improve patient care. We report the development of an integrated platform enabling the identification of bacterial pathogen DNA sequences in complex samples in less than four hours. The system incorporates a microfluidic chip and instrumentation to accomplish universal PCR amplification, High Resolution Melting (HRM), and machine learning within 20,000 picoliter scale reactions, simultaneously. Clinically relevant concentrations of bacterial DNA molecules are separated by digitization across 20,000 reactions and amplified with universal primers targeting the bacterial 16S gene. Amplification is followed by HRM sequence fingerprinting in all reactions, simultaneously. The resulting bacteria-specific melt curves are identified by Support Vector Machine learning, and individual pathogen loads are quantified. The platform reduces reaction volumes by 99.995% and achieves a greater than 200-fold increase in dynamic range of detection compared to traditional PCR HRM approaches. Type I and II error rates are reduced by 99% and 100% respectively, compared to intercalating dye-based digital PCR (dPCR) methods. This technology could impact a number of quantitative profiling applications, especially infectious disease diagnostics.

  16. Massively parallel digital high resolution melt for rapid and absolutely quantitative sequence profiling

    Science.gov (United States)

    Velez, Daniel Ortiz; Mack, Hannah; Jupe, Julietta; Hawker, Sinead; Kulkarni, Ninad; Hedayatnia, Behnam; Zhang, Yang; Lawrence, Shelley; Fraley, Stephanie I.

    2017-01-01

    In clinical diagnostics and pathogen detection, profiling of complex samples for low-level genotypes represents a significant challenge. Advances in speed, sensitivity, and extent of multiplexing of molecular pathogen detection assays are needed to improve patient care. We report the development of an integrated platform enabling the identification of bacterial pathogen DNA sequences in complex samples in less than four hours. The system incorporates a microfluidic chip and instrumentation to accomplish universal PCR amplification, High Resolution Melting (HRM), and machine learning within 20,000 picoliter scale reactions, simultaneously. Clinically relevant concentrations of bacterial DNA molecules are separated by digitization across 20,000 reactions and amplified with universal primers targeting the bacterial 16S gene. Amplification is followed by HRM sequence fingerprinting in all reactions, simultaneously. The resulting bacteria-specific melt curves are identified by Support Vector Machine learning, and individual pathogen loads are quantified. The platform reduces reaction volumes by 99.995% and achieves a greater than 200-fold increase in dynamic range of detection compared to traditional PCR HRM approaches. Type I and II error rates are reduced by 99% and 100% respectively, compared to intercalating dye-based digital PCR (dPCR) methods. This technology could impact a number of quantitative profiling applications, especially infectious disease diagnostics. PMID:28176860

  17. High-throughput DNA Stretching in Continuous Elongational Flow for Genome Sequence Scanning

    Science.gov (United States)

    Meltzer, Robert; Griffis, Joshua; Safranovitch, Mikhail; Malkin, Gene; Cameron, Douglas

    2014-03-01

    Genome Sequence Scanning (GSS) identifies and compares bacterial genomes by stretching long (60 - 300 kb) genomic DNA restriction fragments and scanning for site-selective fluorescent probes. Practical application of GSS requires: 1) high throughput data acquisition, 2) efficient DNA stretching, 3) reproducible DNA elasticity in the presence of intercalating fluorescent dyes. GSS utilizes a pseudo-two-dimensional micron-scale funnel with convergent sheathing flows to stretch one molecule at a time in continuous elongational flow and center the DNA stream over diffraction-limited confocal laser excitation spots. Funnel geometry has been optimized to maximize throughput of DNA within the desired length range (>10 million nucleobases per second). A constant-strain detection channel maximizes stretching efficiency by applying a constant parabolic tension profile to each molecule, minimizing relaxation and flow-induced tumbling. The effect of intercalator on DNA elasticity is experimentally controlled by reacting one molecule of DNA at a time in convergent sheathing flows of the dye. Derivations of accelerating flow and non-linear tension distribution permit alignment of detected fluorescence traces to theoretical templates derived from whole-genome sequence data.

  18. Massively parallel digital high resolution melt for rapid and absolutely quantitative sequence profiling.

    Science.gov (United States)

    Velez, Daniel Ortiz; Mack, Hannah; Jupe, Julietta; Hawker, Sinead; Kulkarni, Ninad; Hedayatnia, Behnam; Zhang, Yang; Lawrence, Shelley; Fraley, Stephanie I

    2017-02-08

    In clinical diagnostics and pathogen detection, profiling of complex samples for low-level genotypes represents a significant challenge. Advances in speed, sensitivity, and extent of multiplexing of molecular pathogen detection assays are needed to improve patient care. We report the development of an integrated platform enabling the identification of bacterial pathogen DNA sequences in complex samples in less than four hours. The system incorporates a microfluidic chip and instrumentation to accomplish universal PCR amplification, High Resolution Melting (HRM), and machine learning within 20,000 picoliter scale reactions, simultaneously. Clinically relevant concentrations of bacterial DNA molecules are separated by digitization across 20,000 reactions and amplified with universal primers targeting the bacterial 16S gene. Amplification is followed by HRM sequence fingerprinting in all reactions, simultaneously. The resulting bacteria-specific melt curves are identified by Support Vector Machine learning, and individual pathogen loads are quantified. The platform reduces reaction volumes by 99.995% and achieves a greater than 200-fold increase in dynamic range of detection compared to traditional PCR HRM approaches. Type I and II error rates are reduced by 99% and 100% respectively, compared to intercalating dye-based digital PCR (dPCR) methods. This technology could impact a number of quantitative profiling applications, especially infectious disease diagnostics.

  19. Molecular characterization and physical localization of highly repetitive DNA sequences from Brazilian Alstroemeria species.

    Science.gov (United States)

    Kuipers, A G J; Kamstra, S A; de Jeu, M J; Visser, R G F

    2002-01-01

    Highly repetitive DNA sequences were isolated from genomic DNA libraries of Alstroemeria psittacina and A. inodora. Among the repetitive sequences that were isolated, tandem repeats as well as dispersed repeats could be discerned. The tandem repeats belonged to a family of interlinked Sau3A subfragments with sizes varying from 68-127 bp, and constituted a larger HinfI repeat of approximately 400 bp. Southern hybridization showed a similar molecular organization of the tandem repeats in each of the Brazilian Alstroemeria species tested. None of the repeats hybridized with DNA from Chilean Alstroemeria species, which indicates that they are specific for the Brazilian species. In-situ localization studies revealed the tandem repeats to be localized in clusters on the chromosomes of A. inodora and A. psittacina: distal hybridization sites were found on chromosome arms 2PS, 6PL, 7PS, 7PL and 8PL, interstitial sites on chromosome arms 2PL, 3PL, 4PL and 5PL. The applicability of the tandem repeats for cytogenetic analysis of interspecific hybrids and their role in heterochromatin organization are discussed.

  20. Aerobic granulation strategy for bioaugmentation of a sequencing batch reactor (SBR) treating high strength pyridine wastewater

    Energy Technology Data Exchange (ETDEWEB)

    Liu, Xiaodong; Chen, Yan [Jiangsu Key Laboratory for Chemical Pollution Control and Resources Reuse, School of Environmental and Biological Engineering, Nanjing University of Science and Technology, Nanjing 210094, Jiangsu Province (China); Zhang, Xin [Jiangsu Key Laboratory for Chemical Pollution Control and Resources Reuse, School of Environmental and Biological Engineering, Nanjing University of Science and Technology, Nanjing 210094, Jiangsu Province (China); Suzhou Institute of Architectural Design Co., Ltd, Suzhou 215021, Jiangsu Province (China); Jiang, Xinbai; Wu, Shijing [Jiangsu Key Laboratory for Chemical Pollution Control and Resources Reuse, School of Environmental and Biological Engineering, Nanjing University of Science and Technology, Nanjing 210094, Jiangsu Province (China); Shen, Jinyou, E-mail: shenjinyou@mail.njust.edu.cn [Jiangsu Key Laboratory for Chemical Pollution Control and Resources Reuse, School of Environmental and Biological Engineering, Nanjing University of Science and Technology, Nanjing 210094, Jiangsu Province (China); Sun, Xiuyun; Li, Jiansheng; Lu, Lude [Jiangsu Key Laboratory for Chemical Pollution Control and Resources Reuse, School of Environmental and Biological Engineering, Nanjing University of Science and Technology, Nanjing 210094, Jiangsu Province (China); Wang, Lianjun, E-mail: wanglj@mail.njust.edu.cn [Jiangsu Key Laboratory for Chemical Pollution Control and Resources Reuse, School of Environmental and Biological Engineering, Nanjing University of Science and Technology, Nanjing 210094, Jiangsu Province (China)

    2015-09-15

    Abstract: Aerobic granules were successfully cultivated in a sequencing batch reactor (SBR), using a single bacterial strain Rhizobium sp. NJUST18 as the inoculum. NJUST18 presented as both a good pyridine degrader and an efficient autoaggregator. Stable granules with diameter of 0.5–1 mm, sludge volume index of 25.6 ± 3.6 mL g{sup −1} and settling velocity of 37.2 ± 2.7 m h{sup −1}, were formed in SBR following 120-day cultivation. These granules exhibited excellent pyridine degradation performance, with maximum volumetric degradation rate (V{sub max}) varied between 1164.5 mg L{sup −1} h{sup −1} and 1867.4 mg L{sup −1} h{sup −1}. High-throughput sequencing analysis exhibited a large shift in microbial community structure, since the SBR was operated under open condition. Paracoccus and Comamonas were found to be the most predominant species in the aerobic granule system after the system had stabilized. The initially inoculated Rhizobium sp. lost its dominance during aerobic granulation. However, the inoculation of Rhizobium sp. played a key role in the start-up process of this bioaugmentation system. This study demonstrated that, in addition to the hydraulic selection pressure during settling and effluent discharge, the selection of aggregating bacterial inocula is equally important for the formation of the aerobic granule.

  1. High Throughput Sequencing of T Cell Antigen Receptors Reveals a Conserved TCR Repertoire

    Science.gov (United States)

    Hou, Xianliang; Lu, Chong; Chen, Sisi; Xie, Qian; Cui, Guangying; Chen, Jianing; Chen, Zhi; Wu, Zhongwen; Ding, Yulong; Ye, Ping; Dai, Yong; Diao, Hongyan

    2016-01-01

    Abstract The T-cell receptor (TCR) repertoire is a mirror of the human immune system that reflects processes caused by infections, cancer, autoimmunity, and aging. Next-generation sequencing has become a powerful tool for deep TCR profiling. Herein, we used this technology to study the repertoire features of TCR beta chain in the blood of healthy individuals. Peripheral blood samples were collected from 10 healthy donors. T cells were isolated with anti-human CD3 magnetic beads according to the manufacturer's protocol. We then combined multiplex-PCR, Illumina sequencing, and IMGT/High V-QUEST to analyze the characteristics and polymorphisms of the TCR. Most of the individual T cell clones were present at very low frequencies, suggesting that they had not undergone clonal expansion. The usage frequencies of the TCR beta variable, beta joining, and beta diversity gene segments were similar among T cells from different individuals. Notably, the usage frequency of individual nucleotides and amino acids within complementarity-determining region (CDR3) intervals was remarkably consistent between individuals. Moreover, our data show that terminal deoxynucleotidyl transferase activity was biased toward the insertion of G (31.92%) and C (27.14%) over A (21.82%) and T (19.12%) nucleotides. Some conserved features could be observed in the composition of CDR3, which may inform future studies of human TCR gene recombination. PMID:26962778

  2. Bacterioplankton community analysis in tilapia ponds by Illumina high-throughput sequencing.

    Science.gov (United States)

    Fan, Li Min; Barry, Kamira; Hu, Geng Dong; Meng, Shun long; Song, Chao; Wu, Wei; Chen, Jia Zhang; Xu, Pao

    2016-01-01

    The changes of microbial community in aquaculture systems under the effects of stocking densities and seasonality were investigated in tilapia ponds. Total DNAs were extracted from the water samples, 16S rRNA gene was amplified and the bacterial community analyzed by Illumina high-throughput sequencing obtaining 3486 OTUs, from a total read of 715,842 sequences. Basing on the analysis of bacterial compositions, richness, diversity, bacterial 16S rRNA gene abundance, water sample comparisons and existence of specific bacterial taxa within three fish ponds in a 4 months period, the study conclusively observed that the dominant phylum in all water samples were similar, and they included; Proteobacteria, Cyanobacteria, Bacteroidetes, Actinobacteria, Planctomycetes and Chlorobi, distributed in different proportions in the different months and ponds. The seasonal changes had a more pronounced effect on the bacterioplankton community than the stocking densities; however some differences between the ponds were more likely caused by feed coefficient than by stocking densities. At the same time, most bacterial communities were affected by the nutrient input except phylum Cyanobacteria that was also affected by the feed control of tilapia.

  3. Performance and microbial ecology of a nitritation sequencing batch reactor treating high-strength ammonia wastewater

    Science.gov (United States)

    Chen, Wenjing; Dai, Xiaohu; Cao, Dawen; Wang, Sha; Hu, Xiaona; Liu, Wenru; Yang, Dianhai

    2016-01-01

    The partial nitrification (PN) performance and the microbial community variations were evaluated in a sequencing batch reactor (SBR) for 172 days, with the stepwise elevation of ammonium concentration. Free ammonia (FA) and low dissolved oxygen inhibition of nitrite-oxidized bacteria (NOB) were used to achieve nitritation in the SBR. During the 172 days operation, the nitrogen loading rate of the SBR was finally raised to 3.6 kg N/m3/d corresponding the influent ammonium of 1500 mg/L, with the ammonium removal efficiency and nitrite accumulation rate were 94.12% and 83.54%, respectively, indicating that the syntrophic inhibition of FA and low dissolved oxygen contributed substantially to the stable nitrite accumulation. The results of the 16S rRNA high-throughput sequencing revealed that Nitrospira, the only nitrite-oxidizing bacteria in the system, were successively inhibited and eliminated, and the SBR reactor was dominated finally by Nitrosomonas, the ammonium-oxidizing bacteria, which had a relative abundance of 83%, indicating that the Nitrosomonas played the primary roles on the establishment and maintaining of nitritation. Followed by Nitrosomonas, Anaerolineae (7.02%) and Saprospira (1.86%) were the other mainly genera in the biomass. PMID:27762325

  4. Optimized Protocol for Simple Extraction of High-Quality Genomic DNA from Clostridium difficile for Whole-Genome Sequencing.

    Science.gov (United States)

    Sim, James Heng Chiak; Anikst, Victoria; Lohith, Akshar; Pourmand, Nader; Banaei, Niaz

    2015-07-01

    Successful sequencing of the Clostridium difficile genome requires high-quality genomic DNA (gDNA) as the starting material. gDNA extraction using conventional methods is laborious. We describe here an optimized method for the simple extraction of C. difficile gDNA using the QIAamp DNA minikit, which yielded high-quality sequence reads on the Illumina MiSeq platform. Copyright © 2015, American Society for Microbiology. All Rights Reserved.

  5. The gut microbiotassay – a high-throughput real-time PCR chip combined with next generation sequencing

    DEFF Research Database (Denmark)

    Hermann-Bank, Marie Louise; Skovgaard, Kerstin; Mølbak, Lars

    this assay with the high-throughput real-time PCR chip “Access Array 48.48” from Fluidigm. The chip executes 2304 individual reactions in parallel and afterwards it is possible to harvest the amplicons for next-generation sequencing. This approach gives a taxonomical overview of the gut microbiota, hence...... generation sequencing both provides a quantitative measure in terms of Cq-values achieved from the real-time PCR, as well as the deeper information obtained from next-generation sequencing of the amplicons. It is quick to perform and offers a high-throughput at a relatively low cost. These features make...

  6. High-Quality Genome Sequence of the Highly Resistant Bacterium Staphylococcus haemolyticus, Isolated from a Neonatal Bloodstream Infection.

    Science.gov (United States)

    Hosseinkhani, Farideh; Emaneini, Mohammad; van Leeuwen, Willem

    2017-07-20

    Using Illumina HiSeq and PacBio technologies, we sequenced the genome of the multidrug-resistant bacterium Staphylococcus haemolyticus, originating from a bloodstream infection in a neonate. The sequence data can be used as an accurate reference sequence. Copyright © 2017 Hosseinkhani et al.

  7. Embracing the World With Confidence

    Institute of Scientific and Technical Information of China (English)

    2008-01-01

    President Hu Jintao’s Asian tour thrusts post-Olympic China into global spotlight As Chinese President Hu Jintao embarked on his first overseas trip after the Beijing Olympics, hopes were high that this tour would set the tone for post-Olympic Chinese diplomacy.

  8. Study on multi-scheme analysis and evaluation method for concrete sequence placement of high arch dam

    Institute of Scientific and Technical Information of China (English)

    2011-01-01

    A complete scheme for solving the key scientific problems of how to make concrete sequence placement scheme of high arch dam reasonable and feasible and how to meet the need of construction process was presented.First,based on a coupling analysis of concrete sequence placement system of high arch dam,a mathematical model considering complex construction constraints was established.Second,a multi-scheme computational analysis method for concrete sequence placement of high arch dam was proposed based on dynamic simulation.Third,a multi-scheme evaluation method for concrete sequence placement was put forward based on analytic hierarchy process.Fourth,feedback guidance for progress control and management in the high arch dam construction process was proposed.Finally,these methods were applied to a practical project to show that the methods can analyze and evaluate multi-scheme for concrete sequence placement of high arch dam effectively,optimize the process of dam concrete sequence placement,and recommend engineering measures.These methods provide new theoretical principles and technical measures for real-time progress control in the high arch dam construction.

  9. High-Quality de Novo Genome Assembly of the Dekkera bruxellensis Yeast Isolate Using Nanopore MinION Sequencing.

    Science.gov (United States)

    Fournier, Téo; Gounot, Jean-Sébastien; Freel, Kelle; Cruaud, Corinne; Lemainque, Arnaud; Aury, Jean-Marc; Wincker, Patrick; Schacherer, Joseph; Friedrich, Anne

    2017-08-09

    Genetic variation in natural populations represents the raw material for phenotypic diversity. Species-wide characterization of genetic variants is crucial to have a deeper insight into the genotype-phenotype relationship. With the advent of new sequencing strategies and more recently the release of long-read sequencing platforms, it is now possible to explore the genetic diversity of any non-model organisms, representing a fundamental resource for biological research. In the frame of population genomic surveys, a first step is to obtain the complete sequence and high quality assembly of a reference genome. Here, we sequenced and assembled a reference genome of the non-conventional Dekkera bruxellensis yeast. While this species is a major cause of wine spoilage, it paradoxically contributes to the specific flavor profile of some Belgium beers. In addition, an extreme karyotype variability is observed across natural isolates, highlighting that D. bruxellensis genome is very dynamic. The whole genome of the D. bruxellensis UMY321 isolate was sequenced using a combination of Nanopore long-read and Illumina short-read sequencing data. We generated the most complete and contiguous de novo assembly of D. bruxellensis to date and obtained a first glimpse into the genomic variability within this species by comparing the sequences of several isolates. This genome sequence is therefore of high value for population genomic surveys and represents a reference to study genome dynamic in this yeast species. Copyright © 2017, G3: Genes, Genomes, Genetics.

  10. The High Throughput Sequence Annotation Service (HT-SAS – the shortcut from sequence to true Medline words

    Directory of Open Access Journals (Sweden)

    Siedlecki Pawel

    2009-05-01

    Full Text Available Abstract Background Advances in high-throughput technologies available to modern biology have created an increasing flood of experimentally determined facts. Ordering, managing and describing these raw results is the first step which allows facts to become knowledge. Currently there are limited ways to automatically annotate such data, especially utilizing information deposited in published literature. Results To aid researchers in describing results from high-throughput experiments we developed HT-SAS, a web service for automatic annotation of proteins using general English words. For each protein a poll of Medline abstracts connected to homologous proteins is gathered using the UniProt-Medline link. Overrepresented words are detected using binomial statistics approximation. We tested our automatic approach with a protein test set from SGD to determine the accuracy and usefulness of our approach. We also applied the automatic annotation service to improve annotations of proteins from Plasmodium bergei expressed exclusively during the blood stage. Conclusion Using HT-SAS we created new, or enriched already established annotations for over 20% of proteins from Plasmodium bergei expressed in the blood stage, deposited in PlasmoDB. Our tests show this approach to information extraction provides highly specific keywords, often also when the number of abstracts is limited. Our service should be useful for manual curators, as a complement to manually curated information sources and for researchers working with protein datasets, especially from poorly characterized organisms.

  11. High quality RNA extraction from Maqui berry for its application in next-generation sequencing.

    Science.gov (United States)

    Sánchez, Carolina; Villacreses, Javier; Blanc, Noelle; Espinoza, Loreto; Martinez, Camila; Pastor, Gabriela; Manque, Patricio; Undurraga, Soledad F; Polanco, Victor

    2016-01-01

    Maqui berry (Aristotelia chilensis) is a native Chilean species that produces berries that are exceptionally rich in anthocyanins and natural antioxidants. These natural compounds provide an array of health benefits for humans, making them very desirable in a fruit. At the same time, these substances also interfere with nucleic acid preparations, making RNA extraction from Maqui berry a major challenge. Our group established a method for RNA extraction of Maqui berry with a high quality RNA (good purity, good integrity and higher yield). This procedure is based on the adapted CTAB method using high concentrations of PVP (4 %) and β-mercaptoethanol (4 %) and spermidine in the extraction buffer. These reagents help to remove contaminants such as polysaccharides, proteins, phenols and also prevent the oxidation of phenolic compounds. The high quality of RNA isolated through this method allowed its uses with success in molecular applications for this endemic Chilean fruit, such as differential expression analysis of RNA-Seq data using next generation sequencing (NGS). Furthermore, we consider that our method could potentially be used for other plant species with extremely high levels of antioxidants and anthocyanins.

  12. High-throughput sequencing of nematode communities from total soil DNA extractions

    DEFF Research Database (Denmark)

    Sapkota, Rumakanta; Nicolaisen, Mogens

    2015-01-01

    nematodes without the need for enrichment was developed. Using this strategy on DNA templates from a set of 22 agricultural soils, we obtained 64.4% sequences of nematode origin in total, whereas the remaining sequences were almost entirely from other metazoans. The nematode sequences were derived from...

  13. Market entry decisions: effects of absolute and relative confidence.

    Science.gov (United States)

    Bolger, Fergus; Pulford, Briony D; Colman, Andrew M

    2008-01-01

    In a market entry game, the number of entrants usually approaches game-theoretic equilibrium quickly, but in real-world markets business start-ups typically exceed market capacity, resulting in chronically high failure rates and suboptimal industry profits. Excessive entry has been attributed to overconfidence arising when expected payoffs depend partly on skill. In an experimental test of this hypothesis, 96 participants played 24 rounds of a market entry game, with expected payoffs dependent partly on skill on half the rounds, after their confidence was manipulated and measured. The results provide direct support for the hypothesis that high levels of confidence are largely responsible for excessive entry, and they suggest that absolute confidence, independent of interpersonal comparison, rather than confidence about one's abilities relative to others, drives excessive entry decisions when skill is involved.

  14. Identification and characterization of miRNA transcriptome in potato by high-throughput sequencing.

    Directory of Open Access Journals (Sweden)

    Runxuan Zhang

    Full Text Available Micro RNAs (miRNAs represent a class of short, non-coding, endogenous RNAs which play important roles in post-transcriptional regulation of gene expression. While the diverse functions of miRNAs in model plants have been well studied, the impact of miRNAs in crop plant biology is poorly understood. Here we used high-throughput sequencing and bioinformatics analysis to analyze miRNAs in the tuber bearing crop potato (Solanum tuberosum. Small RNAs were analysed from leaf and stolon tissues. 28 conserved miRNA families were found and potato-specific miRNAs were identified and validated by RNA gel blot hybridization. The size, origin and predicted targets of conserved and potato specific miRNAs are described. The large number of miRNAs and complex population of small RNAs in potato suggest important roles for these non-coding RNAs in diverse physiological and metabolic pathways.

  15. Identification and characterization of miRNA transcriptome in potato by high-throughput sequencing.

    Science.gov (United States)

    Zhang, Runxuan; Marshall, David; Bryan, Glenn J; Hornyik, Csaba

    2013-01-01

    Micro RNAs (miRNAs) represent a class of short, non-coding, endogenous RNAs which play important roles in post-transcriptional regulation of gene expression. While the diverse functions of miRNAs in model plants have been well studied, the impact of miRNAs in crop plant biology is poorly understood. Here we used high-throughput sequencing and bioinformatics analysis to analyze miRNAs in the tuber bearing crop potato (Solanum tuberosum). Small RNAs were analysed from leaf and stolon tissues. 28 conserved miRNA families were found and potato-specific miRNAs were identified and validated by RNA gel blot hybridization. The size, origin and predicted targets of conserved and potato specific miRNAs are described. The large number of miRNAs and complex population of small RNAs in potato suggest important roles for these non-coding RNAs in diverse physiological and metabolic pathways.

  16. High resolution MR angiography with rephasing and dephasing sequences for selective vascular imaging of arteries

    Energy Technology Data Exchange (ETDEWEB)

    Seiderer, M.; Laub, G.; Staebler, A.; Yousry, P.; Lauterjung, L.

    1988-03-01

    With rephasing and dephasing sequences the vascular system is imaged with high or low signal intensity whereas stationary tissue is imaged with identical signal intensity. With images recorded in systole and diastole followed by image subtraction separate imaging of arteries or veins without background superposition is possible. 13 patients with vascular lesions of the lower extremities and 7 volunteers were examined. Vascular stenosis, aneurysm, dilatation, occlusion and collateral vessels could be imaged similar to digital subtraction angiography. Vessels with a diameter down to 1 mm could be imaged. The large slice thickness up to 80 mm results in projection type images where the vascular tree is imaged over the whole field of view and without partial volume effects.

  17. Transcriptome sequencing in a Tibetan barley landrace with high resistance to powdery mildew.

    Science.gov (United States)

    Zeng, Xing-Quan; Luo, Xiao-Mei; Wang, Yu-Lin; Xu, Qi-Jun; Bai, Li-Jun; Yuan, Hong-Jun; Tashi, Nyima

    2014-01-01

    Hulless barley is an important cereal crop worldwide, especially in Tibet of China. However, this crop is usually susceptible to powdery mildew caused by Blumeria graminis f. sp. hordei. In this study, we aimed to understand the functions and pathways of genes involved in the disease resistance by transcriptome sequencing of a Tibetan barley landrace with high resistance to powdery mildew. A total of 831 significant differentially expressed genes were found in the infected seedlings, covering 19 functions. Either "cell," "cell part," and "extracellular region" in the cellular component category or "binding" and "catalytic" in the category of molecular function as well as "metabolic process" and "cellular process" in the biological process category together demonstrated that these functions may be involved in the resistance to powdery mildew of the hulless barley. In addition, 330 KEGG pathways were found using BLASTx with an E-value cut-off of powdery mildew infection.

  18. Opera: reconstructing optimal genomic scaffolds with high-throughput paired-end sequences.

    Science.gov (United States)

    Gao, Song; Sung, Wing-Kin; Nagarajan, Niranjan

    2011-11-01

    Scaffolding, the problem of ordering and orienting contigs, typically using paired-end reads, is a crucial step in the assembly of high-quality draft genomes. Even as sequencing technologies and mate-pair protocols have improved significantly, scaffolding programs still rely on heuristics, with no guarantees on the quality of the solution. In this work, we explored the feasibility of an exact solution for scaffolding and present a first tractable solution for this problem (Opera). We also describe a graph contraction procedure that allows the solution to scale to large scaffolding problems and demonstrate this by scaffolding several large real and synthetic datasets. In comparisons with existing scaffolders, Opera simultaneously produced longer and more accurate scaffolds demonstrating the utility of an exact approach. Opera also incorporates an exact quadratic programming formulation to precisely compute gap sizes (Availability: http://sourceforge.net/projects/operasf/ ).

  19. Identification of microsatellites from an extinct moa species using high-throughput (454) sequence data

    DEFF Research Database (Denmark)

    Allentoft, Morten Erik; Schuster, Stephan C.; Holdaway, Richard N.

    2009-01-01

    Genetic variation in microsatellites is rarely examined in the field of ancient DNA (aDNA) due to the low quantity of nuclear DNA in the fossil record together with the lack of characterized nuclear markers in extinct species. 454 sequencing platforms provide a new high-throughput technology...... capable of generating up to 1 gigabases per run as short (200-400-bp) read lengths. 454 data were generated from the fossil bone of an extinct New Zealand moa (Aves: Dinornithiformes). We identified numerous short tandem repeat (STR) motifs, and here present the successful isolation and characterization...... of one polymorphic microsatellite (Moa_MS2). Primers designed to flank this locus amplified all three moa species tested here. The presented method proved to be a fast and efficient way of identifying microsatellite markers in ancient DNA templates and, depending on biomolecule preservation, has...

  20. Barcoding the food chain: from Sanger to high-throughput sequencing.

    Science.gov (United States)

    Littlefair, Joanne E; Clare, Elizabeth L

    2016-11-01

    Society faces the complex challenge of supporting biodiversity and ecosystem functioning, while ensuring food security by providing safe traceable food through an ever-more-complex global food chain. The increase in human mobility brings the added threat of pests, parasites, and invaders that further complicate our agro-industrial efforts. DNA barcoding technologies allow researchers to identify both individual species, and, when combined with universal primers and high-throughput sequencing techniques, the diversity within mixed samples (metabarcoding). These tools are already being employed to detect market substitutions, trace pests through the forensic evaluation of trace "environmental DNA", and to track parasitic infections in livestock. The potential of DNA barcoding to contribute to increased security of the food chain is clear, but challenges remain in regulation and the need for validation of experimental analysis. Here, we present an overview of the current uses and challenges of applied DNA barcoding in agriculture, from agro-ecosystems within farmland to the kitchen table.

  1. SNP calling using genotype model selection on high-throughput sequencing data

    KAUST Repository

    You, Na

    2012-01-16

    Motivation: A review of the available single nucleotide polymorphism (SNP) calling procedures for Illumina high-throughput sequencing (HTS) platform data reveals that most rely mainly on base-calling and mapping qualities as sources of error when calling SNPs. Thus, errors not involved in base-calling or alignment, such as those in genomic sample preparation, are not accounted for.Results: A novel method of consensus and SNP calling, Genotype Model Selection (GeMS), is given which accounts for the errors that occur during the preparation of the genomic sample. Simulations and real data analyses indicate that GeMS has the best performance balance of sensitivity and positive predictive value among the tested SNP callers. © The Author 2012. Published by Oxford University Press. All rights reserved.

  2. The main challenges that remain in applying high-throughput sequencing to clinical diagnostics.

    Science.gov (United States)

    Loeffelholz, Michael; Fofanov, Yuriy

    2015-01-01

    Over the last 10 years, the quality, price and availability of high-throughput sequencing instruments have improved to the point that this technology may be close to becoming a routine tool in the diagnostic microbiology laboratory. Two groups of challenges, however, have to be resolved in order to move this powerful research technology into routine use in the clinical microbiology laboratory. The computational/bioinformatics challenges include data storage cost and privacy concerns, requiring analysis to be performed without access to cloud storage or expensive computational infrastructure. The logistical challenges include interpretation of complex results and acceptance and understanding of the advantages and limitations of this technology by the medical community. This article focuses on the approaches to address these challenges, such as file formats, algorithms, data collection, reporting and good laboratory practices.

  3. Compositional analysis: a valid approach to analyze microbiome high-throughput sequencing data.

    Science.gov (United States)

    Gloor, Gregory B; Reid, Gregor

    2016-08-01

    A workshop held at the 2015 annual meeting of the Canadian Society of Microbiologists highlighted compositional data analysis methods and the importance of exploratory data analysis for the analysis of microbiome data sets generated by high-throughput DNA sequencing. A summary of the content of that workshop, a review of new methods of analysis, and information on the importance of careful analyses are presented herein. The workshop focussed on explaining the rationale behind the use of compositional data analysis, and a demonstration of these methods for the examination of 2 microbiome data sets. A clear understanding of bioinformatics methodologies and the type of data being analyzed is essential, given the growing number of studies uncovering the critical role of the microbiome in health and disease and the need to understand alterations to its composition and function following intervention with fecal transplant, probiotics, diet, and pharmaceutical agents.

  4. Food skills confidence and household gatekeepers' dietary practices.

    Science.gov (United States)

    Burton, Melissa; Reid, Mike; Worsley, Anthony; Mavondo, Felix

    2017-01-01

    Household food gatekeepers have the potential to influence the food attitudes and behaviours of family members, as they are mainly responsible for food-related tasks in the home. The aim of this study was to determine the role of gatekeepers' confidence in food-related skills and nutrition knowledge on food practices in the home. An online survey was completed by 1059 Australian dietary gatekeepers selected from the Global Market Insite (GMI) research database. Participants responded to questions about food acquisition and preparation behaviours, the home eating environment, perceptions and attitudes towards food, and demographics. Two-step cluster analysis was used to identify groups based on confidence regarding food skills and nutrition knowledge. Chi-square tests and one-way ANOVAs were used to compare the groups on the dependent variables. Three groups were identified: low confidence, moderate confidence and high confidence. Gatekeepers in the highest confidence group were significantly more likely to report lower body mass index (BMI), and indicate higher importance of fresh food products, vegetable prominence in meals, product information use, meal planning, perceived behavioural control and overall diet satisfaction. Gatekeepers in the lowest confidence group were significantly more likely to indicate more perceived barriers to healthy eating, report more time constraints and more impulse purchasing practices, and higher convenience ingredient use. Other smaller associations were also found. Household food gatekeepers with high food skills confidence were more likely to engage in several healthy food practices, while those with low food skills confidence were more likely to engage in unhealthy food practices. Food education strategies aimed at building food-skills and nutrition knowledge will enable current and future gatekeepers to make healthier food decisions for themselves and for their families. Copyright © 2016 Elsevier Ltd. All rights reserved.

  5. A high-throughput sequencing test for diagnosing inherited bleeding, thrombotic, and platelet disorders

    Science.gov (United States)

    Simeoni, Ilenia; Stephens, Jonathan C.; Hu, Fengyuan; Deevi, Sri V. V.; Megy, Karyn; Bariana, Tadbir K.; Lentaigne, Claire; Schulman, Sol; Sivapalaratnam, Suthesh; Vries, Minka J. A.; Westbury, Sarah K.; Greene, Daniel; Papadia, Sofia; Alessi, Marie-Christine; Attwood, Antony P.; Ballmaier, Matthias; Baynam, Gareth; Bermejo, Emilse; Bertoli, Marta; Bray, Paul F.; Bury, Loredana; Cattaneo, Marco; Collins, Peter; Daugherty, Louise C.; Favier, Rémi; French, Deborah L.; Furie, Bruce; Gattens, Michael; Germeshausen, Manuela; Ghevaert, Cedric; Goodeve, Anne C.; Guerrero, Jose A.; Hampshire, Daniel J.; Hart, Daniel P.; Heemskerk, Johan W. M.; Henskens, Yvonne M. C.; Hill, Marian; Hogg, Nancy; Jolley, Jennifer D.; Kahr, Walter H.; Kelly, Anne M.; Kerr, Ron; Kostadima, Myrto; Kunishima, Shinji; Lambert, Michele P.; Liesner, Ri; López, José A.; Mapeta, Rutendo P.; Mathias, Mary; Millar, Carolyn M.; Nathwani, Amit; Neerman-Arbez, Marguerite; Nurden, Alan T.; Nurden, Paquita; Othman, Maha; Peerlinck, Kathelijne; Perry, David J.; Poudel, Pawan; Reitsma, Pieter; Rondina, Matthew T.; Smethurst, Peter A.; Stevenson, William; Szkotak, Artur; Tuna, Salih; van Geet, Christel; Whitehorn, Deborah; Wilcox, David A.; Zhang, Bin; Revel-Vilk, Shoshana; Gresele, Paolo; Bellissimo, Daniel B.; Penkett, Christopher J.; Laffan, Michael A.; Mumford, Andrew D.; Rendon, Augusto; Freson, Kathleen; Ouwehand, Willem H.; Turro, Ernest

    2016-01-01

    Inherited bleeding, thrombotic, and platelet disorders (BPDs) are diseases that affect ∼300 individuals per million births. With the exception of hemophilia and von Willebrand disease patients, a molecular analysis for patients with a BPD is often unavailable. Many specialized tests are usually required to reach a putative diagnosis and they are typically performed in a step-wise manner to control costs. This approach causes delays and a conclusive molecular diagnosis is often never reached, which can compromise treatment and impede rapid identification of affected relatives. To address this unmet diagnostic need, we designed a high-throughput sequencing platform targeting 63 genes relevant for BPDs. The platform can call single nucleotide variants, short insertions/deletions, and large copy number variants (though not inversions) which are subjected to automated filtering for diagnostic prioritization, resulting in an average of 5.34 candidate variants per individual. We sequenced 159 and 137 samples, respectively, from cases with and without previously known causal variants. Among the latter group, 61 cases had clinical and laboratory phenotypes indicative of a particular molecular etiology, whereas the remainder had an a priori highly uncertain etiology. All previously detected variants were recapitulated and, when the etiology was suspected but unknown or uncertain, a molecular diagnosis was reached in 56 of 61 and only 8 of 76 cases, respectively. The latter category highlights the need for further research into novel causes of BPDs. The ThromboGenomics platform thus provides an affordable DNA-based test to diagnose patients suspected of having a known inherited BPD. PMID:27084890

  6. The complete genome sequence and comparative genome analysis of the high pathogenicity Yersinia enterocolitica strain 8081.

    Directory of Open Access Journals (Sweden)

    Nicholas R Thomson

    2006-12-01

    Full Text Available The human enteropathogen, Yersinia enterocolitica, is a significant link in the range of Yersinia pathologies extending from mild gastroenteritis to bubonic plague. Comparison at the genomic level is a key step in our understanding of the genetic basis for this pathogenicity spectrum. Here we report the genome of Y. enterocolitica strain 8081 (serotype 0:8; biotype 1B and extensive microarray data relating to the genetic diversity of the Y. enterocolitica species. Our analysis reveals that the genome of Y. enterocolitica strain 8081 is a patchwork of horizontally acquired genetic loci, including a plasticity zone of 199 kb containing an extraordinarily high density of virulence genes. Microarray analysis has provided insights into species-specific Y. enterocolitica gene functions and the intraspecies differences between the high, low, and nonpathogenic Y. enterocolitica biotypes. Through comparative genome sequence analysis we provide new information on the evolution of the Yersinia. We identify numerous loci that represent ancestral clusters of genes potentially important in enteric survival and pathogenesis, which have been lost or are in the process of being lost, in the other sequenced Yersinia lineages. Our analysis also highlights large metabolic operons in Y. enterocolitica that are absent in the related enteropathogen, Yersinia pseudotuberculosis, indicating major differences in niche and nutrients used within the mammalian gut. These include clusters directing, the production of hydrogenases, tetrathionate respiration, cobalamin synthesis, and propanediol utilisation. Along with ancestral gene clusters, the genome of Y. enterocolitica has revealed species-specific and enteropathogen-specific loci. This has provided important insights into the pathology of this bacterium and, more broadly, into the evolution of the genus. Moreover, wider investigations looking at the patterns of gene loss and gain in the Yersinia have highlighted common

  7. Nonspecific PCR amplification by high-fidelity polymerases: implications for next-generation sequencing of AFLP markers.

    Science.gov (United States)

    Brelsford, Alan; Collin, Hélène; Perrin, Nicolas; Fumagalli, Luca

    2012-01-01

    High-fidelity 'proofreading' polymerases are often used in library construction for next-generation sequencing projects, in an effort to minimize errors in the resulting sequence data. The increased template fidelity of these polymerases can come at the cost of reduced template specificity, and library preparation methods based on the AFLP technique may be particularly susceptible. Here, we compare AFLP profiles generated with standard Taq and two versions of a high-fidelity polymerase. We find that Taq produces fewer and brighter peaks than high-fidelity polymerase, suggesting that Taq performs better at selectively amplifying templates that exactly match the primer sequences. Because the higher accuracy of proofreading polymerases remains important for sequencing applications, we suggest that it may be more effective to use alternative library preparation methods. © 2011 Blackwell Publishing Ltd.

  8. High resolution measurement of DUF1220 domain copy number from whole genome sequence data.

    Science.gov (United States)

    Astling, David P; Heft, Ilea E; Jones, Kenneth L; Sikela, James M

    2017-08-14

    DUF1220 protein domains found primarily in Neuroblastoma BreakPoint Family (NBPF) genes show the greatest human lineage-specific increase in copy number of any coding region in the genome. There are 302 haploid copies of DUF1220 in hg38 (~160 of which are human-specific) and the majority of these can be divided into 6 different subtypes (referred to as clades). Copy number changes of specific DUF1220 clades have been associated in a dose-dependent manner with brain size variation (both evolutionarily and within the human population), cognitive aptitude, autism severity, and schizophrenia severity. However, no published methods can directly measure copies of DUF1220 with high accuracy and no method can distinguish between domains within a clade. Here we describe a novel method for measuring copies of DUF1220 domains and the NBPF genes in which they are found from whole genome sequence data. We have characterized the effect that various sequencing and alignment parameters and strategies have on the accuracy and precision of the method and defined the parameters that lead to optimal DUF1220 copy number measurement and resolution. We show that copy number estimates obtained using our read depth approach are highly correlated with those generated by ddPCR for three representative DUF1220 clades. By simulation, we demonstrate that our method provides sufficient resolution to analyze DUF1220 copy number variation at three levels: (1) DUF1220 clade copy number within individual genes and groups of genes (gene-specific clade groups) (2) genome wide DUF1220 clade copies and (3) gene copy number for DUF1220-encoding genes. To our knowledge, this is the first method to accurately measure copies of all six DUF1220 clades and the first method to provide gene specific resolution of these clades. This allows one to discriminate among the ~300 haploid human DUF1220 copies to an extent not possible with any other method. The result is a greatly enhanced capability to analyze the

  9. Large scale single nucleotide polymorphism discovery in unsequenced genomes using second generation high throughput sequencing technology: applied to turkey

    Directory of Open Access Journals (Sweden)

    den Dunnen Johan T

    2009-10-01

    Full Text Available Abstract Background The development of second generation sequencing methods has enabled large scale DNA variation studies at moderate cost. For the high throughput discovery of single nucleotide polymorphisms (SNPs in species lacking a sequenced reference genome, we set-up an analysis pipeline based on a short read de novo sequence assembler and a program designed to identify variation within short reads. To illustrate the potential of this technique, we present the results obtained with a randomly sheared, enzymatically generated, 2-3 kbp genome fraction of six pooled Meleagris gallopavo (turkey individuals. Results A total of 100 million 36 bp reads were generated, representing approximately 5-6% (~62 Mbp of the turkey genome, with an estimated sequence depth of 58. Reads consisting of bases called with less than 1% error probability were selected and assembled into contigs. Subsequently, high throughput discovery of nucleotide variation was performed using sequences with more than 90% reliability by using the assembled contigs that were 50 bp or longer as the reference sequence. We identified more than 7,500 SNPs with a high probability of representing true nucleotide variation in turkeys. Increasing the reference genome by adding publicly available turkey BAC-end sequences increased the number of SNPs to over 11,000. A comparison with the sequenced chicken genome indicated that the assembled turkey contigs were distributed uniformly across the turkey genome. Genotyping of a representative sample of 340 SNPs resulted in a SNP conversion rate of 95%. The correlation of the minor allele count (MAC and observed minor allele frequency (MAF for the validated SNPs was 0.69. Conclusion We provide an efficient and cost-effective approach for the identification of thousands of high quality SNPs in species currently lacking a sequenced genome and applied this to turkey. The methodology addresses a random fraction of the genome, resulting in an even

  10. Use of the melting curve assay as a means for high-throughput quantification of Illumina sequencing libraries

    Directory of Open Access Journals (Sweden)

    Hiroshi Shinozuka

    2016-08-01

    Full Text Available Background. Multiplexed sequencing is commonly performed on massively parallel short-read sequencing platforms such as Illumina, and the efficiency of library normalisation can affect the quality of the output dataset. Although several library normalisation approaches have been established, none are ideal for highly multiplexed sequencing due to issues of cost and/or processing time. Methods. An inexpensive and high-throughput library quantification method has been developed, based on an adaptation of the melting curve assay. Sequencing libraries were subjected to the assay using the Bio-Rad Laboratories CFX ConnectTM Real-Time PCR Detection System. The library quantity was calculated through summation of reduction of relative fluorescence units between 86 and 95 °C. Results.PCR-enriched sequencing libraries are suitable for this quantification without pre-purification of DNA. Short DNA molecules, which ideally should be eliminated from the library for subsequent processing, were differentiated from the target DNA in a mixture on the basis of differences in melting temperature. Quantification results for long sequences targeted using the melting curve assay were correlated with those from existing methods (R2 > 0.77, and that observed from MiSeq sequencing (R2 = 0.82. Discussion.The results of multiplexed sequencing suggested that the normalisation performance of the described method is equivalent to that of another recently reported high-throughput bead-based method, BeNUS. However, costs for the melting curve assay are considerably lower and processing times shorter than those of other existing methods, suggesting greater suitability for highly multiplexed sequencing applications.

  11. Next-generation sequencing of human mitochondrial reference genomes uncovers high heteroplasmy frequency.

    Directory of Open Access Journals (Sweden)

    Maria Ximena Sosa

    Full Text Available We describe methods for rapid sequencing of the entire human mitochondrial genome (mtgenome, which involve long-range PCR for specific amplification of the mtgenome, pyrosequencing, quantitative mapping of sequence reads to identify sequence variants and heteroplasmy, as well as de novo sequence assembly. These methods have been used to study 40 publicly available HapMap samples of European (CEU and African (YRI ancestry to demonstrate a sequencing error rate <5.63×10(-4, nucleotide diversity of 1.6×10(-3 for CEU and 3.7×10(-3 for YRI, patterns of sequence variation consistent with earlier studies, but a higher rate of heteroplasmy varying between 10% and 50%. These results demonstrate that next-generation sequencing technologies allow interrogation of the mitochondrial genome in greater depth than previously possible which may be of value in biology and medicine.

  12. Sample-Align-D: A High Performance Multiple Sequence Alignment System using Phylogenetic Sampling and Domain Decomposition

    CERN Document Server

    Saeed, Fahad

    2009-01-01

    Multiple Sequence Alignment (MSA) is one of the most computationally intensive tasks in Computational Biology. Existing best known solutions for multiple sequence alignment take several hours (in some cases days) of computation time to align, for example, 2000 homologous sequences of average length 300. Inspired by the Sample Sort approach in parallel processing, in this paper we propose a highly scalable multiprocessor solution for the MSA problem in phylogenetically diverse sequences. Our method employs an intelligent scheme to partition the set of sequences into smaller subsets using kmer count based similarity index, referred to as k-mer rank. Each subset is then independently aligned in parallel using any sequential approach. Further fine tuning of the local alignments is achieved using constraints derived from a global ancestor of the entire set. The proposed Sample-Align-D Algorithm has been implemented on a cluster of workstations using MPI message passing library. The accuracy of the proposed solutio...

  13. Quantifying braided river morphodynamics through a sequence of high-flow events

    Science.gov (United States)

    Williams, R. D.; Brasington, J.; Vericat, D.; Hicks, D. M.

    2012-04-01

    Quantifying braided river morphology and morphological change is a key task for understanding braided river behaviour. In the last decade, developments in geomatics technologies and associated data processing toolboxes have transformed the potential for producing precise, reach-scale topographic datasets. Moreover, since fast data acquisition rates enable surveys to be undertaken at frequencies that are commensurate with individual flood events, it is now possible to map morphological change for sequences of high-flow events over considerable spatial extents. The application of high-resolution remote sensing technologies to monitor braided river dynamics thus has the potential to provide considerable insight into the relationships between forcing discharges, sediment transport and morphological evolution. In this paper we present a set of Digital Elevation Models (DEMs) that have been produced by monitoring the evolution of a 2.5 x 0.7 km braided study area of the Rees River, New Zealand, through a sequence of ten high-flow events over an eight-month period. We then use the morphological approach to produce a sediment budget for the study area. The morphological evolution of the Rees River braided study area was monitored after each storm event using a combination of two remote sensing methodologies. First, dry areas of the braidplain were surveyed using a Terrestrial Laser Scanner (TLS) mounted on an Argo Amphibious All Terrain Vehicle. Second, since the TLS was not water penetrating, bathymetry was mapped using an empirically calibrated optical method, based on non-metric vertical aerial photos acquired from a helicopter and an acoustic depth survey along primary anabranches. The resulting data were fused together to produce high quality DEMs, with sub-cm and sub-decimetre vertical standard deviations of error for the TLS and optical-empirical bathymetric components respectively. The resulting set of DEMs enabled the quantification of morphological change through

  14. Sequence-dpenedent DNA separation by anion-exchange high-performance liquid chromatography

    Energy Technology Data Exchange (ETDEWEB)

    Yamakawa, Hisashi; Higashino, Ken-ich; Ohara, Osamu [Kazusa DNA Research Inst., Chiba (Japan)

    1996-09-05

    High-performance liquid chromatography (HPLC) system with a new nonporous anion-exchange resin, DNA-NPR, made it possible to rapidly separate DNA fragments up to 20 kbp with high resolution. In order to further characterize this chromatographic DNA separation system, we prepared a mixtures of double-stranded DNAs of constant length carrying a fully degenerated 50-bp region and analyzed their chromatographic behavior on the DNA-NPR column. The results indicated that the separation of DNA fragments on the anion-exchange HPLC was governed not only by size, but also by nucleotide sequence: even DNA fragments with the same size and the same base content could be separated on this column. Taking advantage of this characteristic feature of the anion-exchange HPLC, we could readily fractionate human cDNAs with practically acceptable recovery and high resolution. Furthermore, the combination of HPLC and gel electrophoresis realized separation of a mixture of DNA fragments in a two-dimensional pattern. 22 refs., 5 figs., 1 tab.

  15. Nitrate removal from high strength nitrate-bearing wastes in granular sludge sequencing batch reactors.

    Science.gov (United States)

    Krishna Mohan, Tulasi Venkata; Renu, Kadali; Nancharaiah, Yarlagadda Venkata; Satya Sai, Pedapati Murali; Venugopalan, Vayalam Purath

    2016-02-01

    A 6-L sequencing batch reactor (SBR) was operated for development of granular sludge capable of denitrification of high strength nitrates. Complete and stable denitrification of up to 5420 mg L(-1) nitrate-N (2710 mg L(-1) nitrate-N in reactor) was achieved by feeding simulated nitrate waste at a C/N ratio of 3. Compact and dense denitrifying granular sludge with relatively stable microbial community was developed during reactor operation. Accumulation of large amounts of nitrite due to incomplete denitrification occurred when the SBR was fed with 5420 mg L(-1) NO3-N at a C/N ratio of 2. Complete denitrification could not be achieved at this C/N ratio, even after one week of reactor operation as the nitrite levels continued to accumulate. In order to improve denitrification performance, the reactor was fed with nitrate concentrations of 1354 mg L(-1), while keeping C/N ratio at 2. Subsequently, nitrate concentration in the feed was increased in a step-wise manner to establish complete denitrification of 5420 mg L(-1) NO3-N at a C/N ratio of 2. The results show that substrate concentration plays an important role in denitrification of high strength nitrate by influencing nitrite accumulation. Complete denitrification of high strength nitrates can be achieved at lower substrate concentrations, by an appropriate acclimatization strategy.

  16. High-quality RNA extraction from copepods for Next Generation Sequencing: A comparative study.

    Science.gov (United States)

    Asai, Sneha; Ianora, Adrianna; Lauritano, Chiara; Lindeque, Penelope K; Carotenuto, Ylenia

    2015-12-01

    Despite the ecological importance of copepods, few Next Generation Sequencing studies (NGS) have been performed on small crustaceans, and a standard method for RNA extraction is lacking. In this study, we compared three commonly-used methods: TRIzol®, Aurum Total RNA Mini Kit and Qiagen RNeasy Micro Kit, in combination with preservation reagents TRIzol® or RNAlater®, to obtain high-quality and quantity of RNA from copepods for NGS. Total RNA was extracted from the copepods Calanus helgolandicus, Centropages typicus and Temora stylifera and its quantity and quality were evaluated using NanoDrop, agarose gel electrophoresis and Agilent Bioanalyzer. Our results demonstrate that preservation of copepods in RNAlater® and extraction with Qiagen RNeasy Micro Kit were the optimal isolation method for high-quality and quantity of RNA for NGS studies of C. helgolandicus. Intriguingly, C. helgolandicus 28S rRNA is formed by two subunits that separate after heat-denaturation and migrate along with 18S rRNA. This unique property of protostome RNA has never been reported in copepods. Overall, our comparative study on RNA extraction protocols will help increase gene expression studies on copepods using high-throughput applications, such as RNA-Seq and microarrays. Copyright © 2014 Elsevier B.V. All rights reserved.

  17. Intelligence, Self-confidence and Entrepreneurship

    OpenAIRE

    Asoni, Andrea

    2011-01-01

    I investigate the effect of human capital on entrepreneurship using the National Longitudinal Survey of Youth - 1979. I find that individuals with higher measured intelligence and self-confidence are more likely to be entrepreneurs. Furthermore I present evidence suggesting that intelligence and self-confidence affect business ownership through two different channels: intelligence increases business survival while self-confidence increases business creation. Finally, once we control for intel...

  18. ClustalXeed: a GUI-based grid computation version for high performance and terabyte size multiple sequence alignment

    Directory of Open Access Journals (Sweden)

    Kim Taeho

    2010-09-01

    Full Text Available Abstract Background There is an increasing demand to assemble and align large-scale biological sequence data sets. The commonly used multiple sequence alignment programs are still limited in their ability to handle very large amounts of sequences because the system lacks a scalable high-performance computing (HPC environment with a greatly extended data storage capacity. Results We designed ClustalXeed, a software system for multiple sequence alignment with incremental improvements over previous versions of the ClustalX and ClustalW-MPI software. The primary advantage of ClustalXeed over other multiple sequence alignment software is its ability to align a large family of protein or nucleic acid sequences. To solve the conventional memory-dependency problem, ClustalXeed uses both physical random access memory (RAM and a distributed file-allocation system for distance matrix construction and pair-align computation. The computation efficiency of disk-storage system was markedly improved by implementing an efficient load-balancing algorithm, called "idle node-seeking task algorithm" (INSTA. The new editing option and the graphical user interface (GUI provide ready access to a parallel-computing environment for users who seek fast and easy alignment of large DNA and protein sequence sets. Conclusions ClustalXeed can now compute a large volume of biological sequence data sets, which were not tractable in any other parallel or single MSA program. The main developments include: 1 the ability to tackle larger sequence alignment problems than possible with previous systems through markedly improved storage-handling capabilities. 2 Implementing an efficient task load-balancing algorithm, INSTA, which improves overall processing times for multiple sequence alignment with input sequences of non-uniform length. 3 Support for both single PC and distributed cluster systems.

  19. Comparative analyses of six solanaceous transcriptomes reveal a high degree of sequence conservation and species-specific transcripts

    Directory of Open Access Journals (Sweden)

    Ouyang Shu

    2005-09-01

    Full Text Available Abstract Background The Solanaceae is a family of closely related species with diverse phenotypes that have been exploited for agronomic purposes. Previous studies involving a small number of genes suggested sequence conservation across the Solanaceae. The availability of large collections of Expressed Sequence Tags (ESTs for the Solanaceae now provides the opportunity to assess sequence conservation and divergence on a genomic scale. Results All available ESTs and Expressed Transcripts (ETs, 449,224 sequences for six Solanaceae species (potato, tomato, pepper, petunia, tobacco and Nicotiana benthamiana, were clustered and assembled into gene indices. Examination of gene ontologies revealed that the transcripts within the gene indices encode a similar suite of biological processes. Although the ESTs and ETs were derived from a variety of tissues, 55–81% of the sequences had significant similarity at the nucleotide level with sequences among the six species. Putative orthologs could be identified for 28–58% of the sequences. This high degree of sequence conservation was supported by expression profiling using heterologous hybridizations to potato cDNA arrays that showed similar expression patterns in mature leaves for all six solanaceous species. 16–19% of the transcripts within the six Solanaceae gene indices did not have matches among Solanaceae, Arabidopsis, rice or 21 other plant gene indices. Conclusion Results from this genome scale analysis confirmed a high level of sequence conservation at the nucleotide level of the coding sequence among Solanaceae. Additionally, the results indicated that part of the Solanaceae transcriptome is likely to be unique for each species.

  20. Confidence Intervals from One One Observation

    CERN Document Server

    Rodriguez, Carlos C

    2008-01-01

    Robert Machol's surprising result, that from a single observation it is possible to have finite length confidence intervals for the parameters of location-scale models, is re-produced and extended. Two previously unpublished modifications are included. First, Herbert Robbins nonparametric confidence interval is obtained. Second, I introduce a technique for obtaining confidence intervals for the scale parameter of finite length in the logarithmic metric. Keywords: Theory/Foundations , Estimation, Prior Distributions, Non-parametrics & Semi-parametrics Geometry of Inference, Confidence Intervals, Location-Scale models

  1. A model for developing disability confidence.

    Science.gov (United States)

    Lindsay, Sally; Cancelliere, Sara

    2017-05-15

    Many clinicians, educators, and employers lack disability confidence which can affect their interactions with, and inclusion of people with disabilities. Our objective was to explore how disability confidence developed among youth who volunteered with children who have a disability. We conducted 30 in-depth interviews (16 without a disability, 14 with disabilities), with youth aged 15-25. We analyzed our data using an interpretive, qualitative, thematic approach. We identified four main themes that led to the progression of disability confidence including: (1) "disability discomfort," referring to lacking knowledge about disability and experiencing unease around people with disabilities; (2) "reaching beyond comfort zone" where participants increased their understanding of disability and became sensitized to difference; (3) "broadened perspectives" where youth gained exposure to people with disabilities and challenged common misperceptions and stereotypes; and (4) "disability confidence" which includes having knowledge of people with disabilities, inclusive, and positive attitudes towards them. Volunteering is one way that can help to develop disability confidence. Youth with and without disabilities both reported a similar process of developing disability confidence; however, there were nuances between the two groups. Implications for Rehabilitation The development of disability confidence is important for enhancing the social inclusion of people with disabilities. Volunteering with people who have a disability, or a disability different from their own, can help to develop disability confidence which involves positive attitudes, empathy, and appropriate communication skills. Clinicians, educators, and employers should consider promoting working with disabled people through such avenues as volunteering or service learning to gain disability confidence.

  2. Genome sequencing of the high oil crop sesame provides insight into oil biosynthesis.

    Science.gov (United States)

    Wang, Linhai; Yu, Sheng; Tong, Chaobo; Zhao, Yingzhong; Liu, Yan; Song, Chi; Zhang, Yanxin; Zhang, Xudong; Wang, Ying; Hua, Wei; Li, Donghua; Li, Dan; Li, Fang; Yu, Jingyin; Xu, Chunyan; Han, Xuelian; Huang, Shunmou; Tai, Shuaishuai; Wang, Junyi; Xu, Xun; Li, Yingrui; Liu, Shengyi; Varshney, Rajeev K; Wang, Jun; Zhang, Xiurong

    2014-02-27

    Sesame, Sesamum indicum L., is considered the queen of oilseeds for its high oil content and quality, and is grown widely in tropical and subtropical areas as an important source of oil and protein. However, the molecular biology of sesame is largely unexplored. Here, we report a high-quality genome sequence of sesame assembled de novo with a contig N50 of 52.2 kb and a scaffold N50 of 2.1 Mb, containing an estimated 27,148 genes. The results reveal novel, independent whole genome duplication and the absence of the Toll/interleukin-1 receptor domain in resistance genes. Candidate genes and oil biosynthetic pathways contributing to high oil content were discovered by comparative genomic and transcriptomic analyses. These revealed the expansion of type 1 lipid transfer genes by tandem duplication, the contraction of lipid degradation genes, and the differential expression of essential genes in the triacylglycerol biosynthesis pathway, particularly in the early stage of seed development. Resequencing data in 29 sesame accessions from 12 countries suggested that the high genetic diversity of lipid-related genes might be associated with the wide variation in oil content. Additionally, the results shed light on the pivotal stage of seed development, oil accumulation and potential key genes for sesamin production, an important pharmacological constituent of sesame. As an important species from the order Lamiales and a high oil crop, the sesame genome will facilitate future research on the evolution of eudicots, as well as the study of lipid biosynthesis and potential genetic improvement of sesame.

  3. SSR_pipeline: a bioinformatic infrastructure for identifying microsatellites from paired-end Illumina high-throughput DNA sequencing data

    Science.gov (United States)

    Miller, Mark P.; Knaus, Brian J.; Mullins, Thomas D.; Haig, Susan M.

    2013-01-01

    SSR_pipeline is a flexible set of programs designed to efficiently identify simple sequence repeats (e.g., microsatellites) from paired-end high-throughput Illumina DNA sequencing data. The program suite contains 3 analysis modules along with a fourth control module that can automate analyses of large volumes of data. The modules are used to 1) identify the subset of paired-end sequences that pass Illumina quality standards, 2) align paired-end reads into a single composite DNA sequence, and 3) identify sequences that possess microsatellites (both simple and compound) conforming to user-specified parameters. The microsatellite search algorithm is extremely efficient, and we have used it to identify repeats with motifs from 2 to 25bp in length. Each of the 3 analysis modules can also be used independently to provide greater flexibility or to work with FASTQ or FASTA files generated from other sequencing platforms (Roche 454, Ion Torrent, etc.). We demonstrate use of the program with data from the brine fly Ephydra packardi (Diptera: Ephydridae) and provide empirical timing benchmarks to illustrate program performance on a common desktop computer environment. We further show that the Illumina platform is capable of identifying large numbers of microsatellites, even when using unenriched sample libraries and a very small percentage of the sequencing capacity from a single DNA sequencing run. All modules from SSR_pipeline are implemented in the Python programming language and can therefore be used from nearly any computer operating system (Linux, Macintosh, and Windows).

  4. High-Quality Exome Sequencing of Whole-Genome Amplified Neonatal Dried Blood Spot DNA.

    Directory of Open Access Journals (Sweden)

    Jesper Buchhave Poulsen

    Full Text Available Stored neonatal dried blood spot (DBS samples from neonatal screening programmes are a valuable diagnostic and research resource. Combined with information from national health registries they can be used in population-based studies of genetic diseases. DNA extracted from neonatal DBSs can be amplified to obtain micrograms of an otherwise limited resource, referred to as whole-genome amplified DNA (wgaDNA. Here we investigate the robustness of exome sequencing of wgaDNA of neonatal DBS samples. We conducted three pilot studies of seven, eight and seven subjects, respectively. For each subject we analysed a neonatal DBS sample and corresponding adult whole-blood (WB reference sample. Different DNA sample types were prepared for each of the subjects. Pilot 1: wgaDNA of 2x3.2mm neonatal DBSs (DBS_2x3.2 and raw DNA extract of the WB reference sample (WB_ref. Pilot 2: DBS_2x3.2, WB_ref and a WB_ref replica sharing DNA extract with the WB_ref sample. Pilot 3: DBS_2x3.2, WB_ref, wgaDNA of 2x1.6 mm neonatal DBSs and wgaDNA of the WB reference sample. Following sequencing and data analysis, we compared pairwise variant calls to obtain a measure of similarity-the concordance rate. Concordance rates were slightly lower when comparing DBS vs WB sample types than for any two WB sample types of the same subject before filtering of the variant calls. The overall concordance rates were dependent on the variant type, with SNPs performing best. Post-filtering, the comparisons of DBS vs WB and WB vs WB sample types yielded similar concordance rates, with values close to 100%. WgaDNA of neonatal DBS samples performs with great accuracy and efficiency in exome sequencing. The wgaDNA performed similarly to matched high-quality reference-whole-blood DNA-based on concordance rates calculated from variant calls. No differences were observed substituting 2x3.2 with 2x1.6 mm discs, allowing for additional reduction of sample material in future projects.

  5. Expressed sequence tags analysis of a liver tissue cDNA library from a highly inbred minipig line

    Institute of Scientific and Technical Information of China (English)

    CHEN You-nan; TAN Wei-dong; LU Yan-rong; QIN Sheng-fang; LI Sheng-fu; ZENG Yang-zhi; BU Hong; LI You-ping; CHENG Jing-qiu

    2007-01-01

    Background Porcine liver performing efficient physiological functions in the human body is prerequisite for successful liver xenotransplantation. However, the protein differences between pig and human remain largely unexplored. Therefore,we investigated the liver expression profile of a highly inbred minipig line.Methods A cDNA library was constructed from liver tissue of an inbred Banna minipig. Two hundred randomly selected clones were sequenced then analysed by BLAST programme.Results Alignments of the sequences showed 44% encoded previously known porcine genes. Among the 56% unknown genes, sequences of 72 clones had high similarities with known genes of other species and the similarities to human were mostly above 0.80. The other 40 clones showing no similarity to genes in National Centre for Biotechnology Information are newly discovered, expressed sequence tags specific to liver of inbred Banna minipig. Twenty-two of the 200 clones had full length encoding regions, 38 complete 5' terminal sequences and 140 complete 3' terminal sequences.Conclusion These newly discovered expression sequences may be an important resource for research involving physiological characteristics and medical usage of inbred pigs and contribute to matching studies in xenotransplantation.

  6. Exploring fungal diversity in deep-sea sediments from Okinawa Trough using high-throughput Illumina sequencing

    Science.gov (United States)

    Zhang, Xiao-Yong; Wang, Guang-Hua; Xu, Xin-Ya; Nong, Xu-Hua; Wang, Jie; Amin, Muhammad; Qi, Shu-Hua

    2016-10-01

    The present study investigated the fungal diversity in four different deep-sea sediments from Okinawa Trough using high-throughput Illumina sequencing of the nuclear ribosomal internal transcribed spacer-1 (ITS1). A total of 40,297 fungal ITS1 sequences clustered into 420 operational taxonomic units (OTUs) with 97% sequence similarity and 170 taxa were recovered from these sediments. Most ITS1 sequences (78%) belonged to the phylum Ascomycota, followed by Basidiomycota (17.3%), Zygomycota (1.5%) and Chytridiomycota (0.8%), and a small proportion (2.4%) belonged to unassigned fungal phyla. Compared with previous studies on fungal diversity of sediments from deep-sea environments by culture-dependent approach and clone library analysis, the present result suggested that Illumina sequencing had been dramatically accelerating the discovery of fungal community of deep-sea sediments. Furthermore, our results revealed that Sordariomycetes was the most diverse and abundant fungal class in this study, challenging the traditional view that the diversity of Sordariomycetes phylotypes was low in the deep-sea environments. In addition, more than 12 taxa accounted for 21.5% sequences were found to be rarely reported as deep-sea fungi, suggesting the deep-sea sediments from Okinawa Trough harbored a plethora of different fungal communities compared with other deep-sea environments. To our knowledge, this study is the first exploration of the fungal diversity in deep-sea sediments from Okinawa Trough using high-throughput Illumina sequencing.

  7. Whole-exome sequencing and high throughput genotyping identified KCNJ11 as the thirteenth MODY gene.

    Directory of Open Access Journals (Sweden)

    Amélie Bonnefond

    Full Text Available BACKGROUND: Maturity-onset of the young (MODY is a clinically heterogeneous form of diabetes characterized by an autosomal-dominant mode of inheritance, an onset before the age of 25 years, and a primary defect in the pancreatic beta-cell function. Approximately 30% of MODY families remain genetically unexplained (MODY-X. Here, we aimed to use whole-exome sequencing (WES in a four-generation MODY-X family to identify a new susceptibility gene for MODY. METHODOLOGY: WES (Agilent-SureSelect capture/Illumina-GAIIx sequencing was performed in three affected and one non-affected relatives in the MODY-X family. We then performed a high-throughput multiplex genotyping (Illumina-GoldenGate assay of the putative causal mutations in the whole family and in 406 controls. A linkage analysis was also carried out. PRINCIPAL FINDINGS: By focusing on variants of interest (i.e. gains of stop codon, frameshift, non-synonymous and splice-site variants not reported in dbSNP130 present in the three affected relatives and not present in the control, we found 69 mutations. However, as WES was not uniform between samples, a total of 324 mutations had to be assessed in the whole family and in controls. Only one mutation (p.Glu227Lys in KCNJ11 co-segregated with diabetes in the family (with a LOD-score of 3.68. No KCNJ11 mutation was found in 25 other MODY-X unrelated subjects. CONCLUSIONS/SIGNIFICANCE: Beyond neonatal diabetes mellitus (NDM, KCNJ11 is also a MODY gene ('MODY13', confirming the wide spectrum of diabetes related phenotypes due to mutations in NDM genes (i.e. KCNJ11, ABCC8 and INS. Therefore, the molecular diagnosis of MODY should include KCNJ11 as affected carriers can be ideally treated with oral sulfonylureas.

  8. Preliminary results of high resolution magneto-biostratigraphy of continental sequences in Chapala Basin, Southwestern Mexico

    Science.gov (United States)

    Mendez Cardenas, D. L.; Benammi, M.

    2007-05-01

    Chapala Lake is south from Guadalajara, Jalisco State (Southwestern Mexico). Belongs to a series of Pliocenic lakes along the Mexican Volcanic Belt. It is localized in the Chapala rift, and the entire area is controlled by the tectonic setting of the Colima, Tepic and Chapala rifts, constituting the triple junction rift-rift-rift. The deposits studied belong to volcanosedimentary sequences, composed by lacustrine and fluvial associations alternated with units of ash and pumice. The faunistic component reported consists at least of 27 mammals species, and the sediments were there're in have to work with special attention for seek rodents by handpicking. Probably these rodents will be the clue to determine the deposits correlation. Core demagnetization shows that they are low-coercivity magnetic minerals like magnetite or Ti-magnetite. It was verified that the characteristic magnetization corresponds to MNRp and the inversion test resulted good. Rodents are represented by Geomynae, Sigmondontinae and Sciurinae. The Geomynae family is the most common, and the faunistic association indicates Blancan age. This also allows a correlation with the polarity pattern in the GSS between 3,6 and 2,6 Ma. Actually, is known that this kind of studies in continental sequences supported with paleontological record of vertebrates could give us a more precised calibration of the age of such deposits. Allowing better understanding of the evolution of these mammals and their path trough geological record. This work shows the preliminary results of rodents palaeontology and high resolution magneto-stratigraphy in the units from to Chapala Basin.

  9. Expression of a new chimeric protein with a highly repeated sequence in tobacco cells.

    Science.gov (United States)

    Saumonneau, Amélie; Rottier, Karine; Conrad, Udo; Popineau, Yves; Guéguen, Jacques; Francin-Allami, Mathilde

    2011-07-01

    In wheat, the high-molecular weight (HMW) glutenin subunits are known to contribute to gluten viscoelasticity, and show some similarities to elastomeric animal proteins as elastin. When combining the sequence of a glutenin with that of elastin is a way to create new chimeric functional proteins, which could be expressed in plants. The sequence of a glutenin subunit was modified by the insertion of several hydrophobic and elastic motifs derived from elastin (elastin-like peptide, ELP) into the hydrophilic repetitive domain of the glutenin subunit to create a triblock protein, the objective being to improve the mechanical (elastomeric) properties of this wheat storage protein. In this study, we investigated an expression model system to analyze the expression and trafficking of the wild-type HMW glutenin subunit (GS(W)) and an HMW glutenin subunit mutated by the insertion of elastin motifs (GS(M)-ELP). For this purpose, a series of constructs was made to express wild-type subunits and subunits mutated by insertion of elastin motifs in fusion with green fluorescent protein (GFP) in tobacco BY-2 cells. Our results showed for the first time the expression of HMW glutenin fused with GFP in tobacco protoplasts. We also expressed and localized the chimeric protein composed of plant glutenin and animal elastin-like peptides (ELP) in BY-2 protoplasts, and demonstrated its presence in protein body-like structures in the endoplasmic reticulum. This work, therefore, provides a basis for heterologous production of the glutenin-ELP triblock protein to characterize its mechanical properties.

  10. Unbiased Characterization of Anopheles Mosquito Blood Meals by Targeted High-Throughput Sequencing.

    Science.gov (United States)

    Logue, Kyle; Keven, John Bosco; Cannon, Matthew V; Reimer, Lisa; Siba, Peter; Walker, Edward D; Zimmerman, Peter A; Serre, David

    2016-03-01

    Understanding mosquito host choice is important for assessing vector competence or identifying disease reservoirs. Unfortunately, the availability of an unbiased method for comprehensively evaluating the composition of insect blood meals is very limited, as most current molecular assays only test for the presence of a few pre-selected species. These approaches also have limited ability to identify the presence of multiple mammalian hosts in a single blood meal. Here, we describe a novel high-throughput sequencing method that enables analysis of 96 mosquitoes simultaneously and provides a comprehensive and quantitative perspective on the composition of each blood meal. We validated in silico that universal primers targeting the mammalian mitochondrial 16S ribosomal RNA genes (16S rRNA) should amplify more than 95% of the mammalian 16S rRNA sequences present in the NCBI nucleotide database. We applied this method to 442 female Anopheles punctulatus s. l. mosquitoes collected in Papua New Guinea (PNG). While human (52.9%), dog (15.8%) and pig (29.2%) were the most common hosts identified in our study, we also detected DNA from mice, one marsupial species and two bat species. Our analyses also revealed that 16.3% of the mosquitoes fed on more than one host. Analysis of the human mitochondrial hypervariable region I in 102 human blood meals showed that 5 (4.9%) of the mosquitoes unambiguously fed on more than one person. Overall, analysis of PNG mosquitoes illustrates the potential of this approach to identify unsuspected hosts and characterize mixed blood meals, and shows how this approach can be adapted to evaluate inter-individual variations among human blood meals. Furthermore, this approach can be applied to any disease-transmitting arthropod and can be easily customized to investigate non-mammalian host sources.

  11. Unbiased Characterization of Anopheles Mosquito Blood Meals by Targeted High-Throughput Sequencing.

    Directory of Open Access Journals (Sweden)

    Kyle Logue

    2016-03-01

    Full Text Available Understanding mosquito host choice is important for assessing vector competence or identifying disease reservoirs. Unfortunately, the availability of an unbiased method for comprehensively evaluating the composition of insect blood meals is very limited, as most current molecular assays only test for the presence of a few pre-selected species. These approaches also have limited ability to identify the presence of multiple mammalian hosts in a single blood meal. Here, we describe a novel high-throughput sequencing method that enables analysis of 96 mosquitoes simultaneously and provides a comprehensive and quantitative perspective on the composition of each blood meal. We validated in silico that universal primers targeting the mammalian mitochondrial 16S ribosomal RNA genes (16S rRNA should amplify more than 95% of the mammalian 16S rRNA sequences present in the NCBI nucleotide database. We applied this method to 442 female Anopheles punctulatus s. l. mosquitoes collected in Papua New Guinea (PNG. While human (52.9%, dog (15.8% and pig (29.2% were the most common hosts identified in our study, we also detected DNA from mice, one marsupial species and two bat species. Our analyses also revealed that 16.3% of the mosquitoes fed on more than one host. Analysis of the human mitochondrial hypervariable region I in 102 human blood meals showed that 5 (4.9% of the mosquitoes unambiguously fed on more than one person. Overall, analysis of PNG mosquitoes illustrates the potential of this approach to identify unsuspected hosts and characterize mixed blood meals, and shows how this approach can be adapted to evaluate inter-individual variations among human blood meals. Furthermore, this approach can be applied to any disease-transmitting arthropod and can be easily customized to investigate non-mammalian host sources.

  12. Self-confidence, gender and academic achievement of undergraduate nursing students.

    Science.gov (United States)

    Kukulu, K; Korukcu, O; Ozdemir, Y; Bezci, A; Calik, C

    2013-04-01

    The aim of this study was to determine the self-confidence levels of nursing students and the factors related to such self-confidence. Data were obtained via a questionnaire for socio-demographic characteristics and a 'Self-Confidence Scale' prepared by the researchers. High self-confidence levels were noted in 78.6% of female students and 92.3% of male students. While 84.5% of second-year students had high self-confidence levels, this rate was 76% in fourth-year students. Female nursing students were significantly less self-confident than male students. Self-confidence should be nurtured in a caring nursing curriculum; however, there is a lack of clarity as to what confidence means, how it is perceived by students and what educators can do to instil self-confidence in nursing students.

  13. Genotyping by Sequencing Using Specific Allelic Capture to Build a High-Density Genetic Map of Durum Wheat.

    Directory of Open Access Journals (Sweden)

    Yan Holtz

    Full Text Available Targeted sequence capture is a promising technology which helps reduce costs for sequencing and genotyping numerous genomic regions in large sets of individuals. Bait sequences are designed to capture specific alleles previously discovered in parents or reference populations. We studied a set of 135 RILs originating from a cross between an emmer cultivar (Dic2 and a recent durum elite cultivar (Silur. Six thousand sequence baits were designed to target Dic2 vs. Silur polymorphisms discovered in a previous RNAseq study. These baits were exposed to genomic DNA of the RIL population. Eighty percent of the targeted SNPs were recovered, 65% of which were of high quality and coverage. The final high density genetic map consisted of more than 3,000 markers, whose genetic and physical mapping were consistent with those obtained with large arrays.

  14. Genotyping by Sequencing Using Specific Allelic Capture to Build a High-Density Genetic Map of Durum Wheat.

    Science.gov (United States)

    Holtz, Yan; Ardisson, Morgane; Ranwez, Vincent; Besnard, Alban; Leroy, Philippe; Poux, Gérard; Roumet, Pierre; Viader, Véronique; Santoni, Sylvain; David, Jacques

    2016-01-01

    Targeted sequence capture is a promising technology which helps reduce costs for sequencing and genotyping numerous genomic regions in large sets of individuals. Bait sequences are designed to capture specific alleles previously discovered in parents or reference populations. We studied a set of 135 RILs originating from a cross between an emmer cultivar (Dic2) and a recent durum elite cultivar (Silur). Six thousand sequence baits were designed to target Dic2 vs. Silur polymorphisms discovered in a previous RNAseq study. These baits were exposed to genomic DNA of the RIL population. Eighty percent of the targeted SNPs were recovered, 65% of which were of high quality and coverage. The final high density genetic map consisted of more than 3,000 markers, whose genetic and physical mapping were consistent with those obtained with large arrays.

  15. Chicken skin virome analyzed by high-throughput sequencing shows a composition highly different from human skin.

    Science.gov (United States)

    Denesvre, Caroline; Dumarest, Marine; Rémy, Sylvie; Gourichon, David; Eloit, Marc

    2015-10-01

    Recent studies show that human skin at homeostasis is a complex ecosystem whose virome include circular DNA viruses, especially papillomaviruses and polyomaviruses. To determine the chicken skin virome in comparison with human skin virome, a chicken swabs pool sample from fifteen indoor healthy chickens of five genetic backgrounds was examined for the presence of DNA viruses by high-throughput sequencing (HTS). The results indicate a predominance of herpesviruses from the Mardivirus genus, coming from either vaccinal origin or presumably asymptomatic infection. Despite the high sensitivity of the HTS method used herein to detect small circular DNA viruses, we did not detect any papillomaviruses, polyomaviruses, or circoviruses, indicating that these viruses may not be resident of the chicken skin. The results suggest that the turkey herpesvirus is a resident of chicken skin in vaccinated chickens. This study indicates major differences between the skin viromes of chickens and humans. The origin of this difference remains to be further studied in relation with skin physiology, environment, or virus population dynamics.

  16. Measurement of tag confidence in user generated contents retrieval

    Science.gov (United States)

    Lee, Sihyoung; Min, Hyun-Seok; Lee, Young Bok; Ro, Yong Man

    2009-01-01

    As online image sharing services are becoming popular, the importance of correctly annotated tags is being emphasized for precise search and retrieval. Tags created by user along with user-generated contents (UGC) are often ambiguous due to the fact that some tags are highly subjective and visually unrelated to the image. They cause unwanted results to users when image search engines rely on tags. In this paper, we propose a method of measuring tag confidence so that one can differentiate confidence tags from noisy tags. The proposed tag confidence is measured from visual semantics of the image. To verify the usefulness of the proposed method, experiments were performed with UGC database from social network sites. Experimental results showed that the image retrieval performance with confidence tags was increased.

  17. Biodegradation and kinetics of aerobic granules under high organic loading rates in sequencing batch reactor.

    Science.gov (United States)

    Chen, Yao; Jiang, Wenju; Liang, David Tee; Tay, Joo Hwa

    2008-05-01

    Biodegradation, kinetics, and microbial diversity of aerobic granules were investigated under a high range of organic loading rate 6.0 to 12.0 kg chemical oxygen demand (COD) m(-3) day(-1) in a sequencing batch reactor. The selection and enriching of different bacterial species under different organic loading rates had an important effect on the characteristics and performance of the mature aerobic granules and caused the difference on granular biodegradation and kinetic behaviors. Good granular characteristics and performance were presented at steady state under various organic loading rates. Larger and denser aerobic granules were developed and stabilized at relatively higher organic loading rates with decreased bioactivity in terms of specific oxygen utilization rate and specific growth rate (muoverall) or solid retention time. The decrease of bioactivity was helpful to maintain granule stability under high organic loading rates and improve reactor operation. The corresponding biokinetic coefficients of endogenous decay rate (kd), observed yield (Yobs), and theoretical yield (Y) were measured and calculated in this study. As the increase of organic loading rate, a decreased net sludge production (Yobs) is associated with an increased solid retention time, while kd and Y changed insignificantly and can be regarded as constants under different organic loading rates.

  18. A low volumetric exchange ratio allows high autotrophic nitrogen removal in a sequencing batch reactor.

    Science.gov (United States)

    De Clippeleir, Haydée; Vlaeminck, Siegfried E; Carballa, Marta; Verstraete, Willy

    2009-11-01

    Sequencing batch reactors (SBRs) have several advantages, such as a lower footprint and a higher flexibility, compared to biofilm based reactors, such as rotating biological contactors. However, the critical parameters for a fast start-up of the nitrogen removal by oxygen-limited autotrophic nitrification/denitrification (OLAND) in a SBR are not available. In this study, a low critical minimum settling velocity (0.7 m h(-1)) and a low volumetric exchange ratio (25%) were found to be essential to ensure a fast start-up, in contrast to a high critical minimum settling velocity (2 m h(-1)) and a high volumetric exchange ratio (40%) which yielded no successful start-up. To prevent nitrite accumulation, two effective actions were found to restore the microbial activity balance between aerobic and anoxic ammonium-oxidizing bacteria (AerAOB and AnAOB). A daily biomass washout at a critical minimum settling velocity of 5 m h(-1) removed small aggregates rich in AerAOB activity, and the inclusion of an anoxic phase enhanced the AnAOB to convert the excess nitrite. This study showed that stable physicochemical conditions were needed to obtain a competitive nitrogen removal rate of 1.1 g N L(-1) d(-1).

  19. Simple tools for assembling and searching high-density picolitre pyrophosphate sequence data

    OpenAIRE

    Parker Andrew G; Parker Nicolas J

    2008-01-01

    Abstract Background The advent of pyrophosphate sequencing makes large volumes of sequencing data available at a lower cost than previously possible. However, the short read lengths are difficult to assemble and the large dataset is difficult to handle. During the sequencing of a virus from the tsetse fly, Glossina pallidipes, we found the need for tools to search quickly a set of reads for near exact text matches. Methods A set of tools is provided to search a large data set of pyrophosphate...

  20. Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and Genome Analyzer systems

    OpenAIRE

    Minoche, André E.; Dohm, Juliane C.; Himmelbauer, Heinz

    2011-01-01

    Background The generation and analysis of high-throughput sequencing data are becoming a major component of many studies in molecular biology and medical research. Illumina's Genome Analyzer (GA) and HiSeq instruments are currently the most widely used sequencing devices. Here, we comprehensively evaluate properties of genomic HiSeq and GAIIx data derived from two plant genomes and one virus, with read lengths of 95 to 150 bases. Results We provide quantifications and evidence for GC bias, er...

  1. Draft Genome Sequence of Bacillus subtilis subsp. natto Strain CGMCC 2108, a High Producer of Poly-γ-Glutamic Acid.

    Science.gov (United States)

    Tan, Siyuan; Meng, Yonghong; Su, Anping; Zhang, Chen; Ren, Yuanyuan

    2016-05-26

    Here, we report the 4.1-Mb draft genome sequence of Bacillus subtilis subsp. natto strain CGMCC 2108, a high producer of poly-γ-glutamic acid (γ-PGA). This sequence will provide further help for the biosynthesis of γ-PGA and will greatly facilitate research efforts in metabolic engineering of B. subtilis subsp. natto strain CGMCC 2108. Copyright © 2016 Tan et al.

  2. Strategies for achieving high sequencing accuracy for low diversity samples and avoiding sample bleeding using illumina platform.

    Directory of Open Access Journals (Sweden)

    Abhishek Mitra

    Full Text Available Sequencing microRNA, reduced representation sequencing, Hi-C technology and any method requiring the use of in-house barcodes result in sequencing libraries with low initial sequence diversity. Sequencing such data on the Illumina platform typically produces low quality data due to the limitations of the Illumina cluster calling algorithm. Moreover, even in the case of diverse samples, these limitations are causing substantial inaccuracies in multiplexed sample assignment (sample bleeding. Such inaccuracies are unacceptable in clinical applications, and in some other fields (e.g. detection of rare variants. Here, we discuss how both problems with quality of low-diversity samples and sample bleeding are caused by incorrect detection of clusters on the flowcell during initial sequencing cycles. We propose simple software modifications (Long Template Protocol that overcome this problem. We present experimental results showing that our Long Template Protocol remarkably increases data quality for low diversity samples, as compared with the standard analysis protocol; it also substantially reduces sample bleeding for all samples. For comprehensiveness, we also discuss and compare experimental results from alternative approaches to sequencing low diversity samples. First, we discuss how the low diversity problem, if caused by barcodes, can be avoided altogether at the barcode design stage. Second and third, we present modified guidelines, which are more stringent than the manufacturer's, for mixing low diversity samples with diverse samples and lowering cluster density, which in our experience consistently produces high quality data from low diversity samples. Fourth and fifth, we present rescue strategies that can be applied when sequencing results in low quality data and when there is no more biological material available. In such cases, we propose that the flowcell be re-hybridized and sequenced again using our Long Template Protocol. Alternatively

  3. 75 FR 81037 - Waste Confidence Decision Update

    Science.gov (United States)

    2010-12-23

    ... COMMISSION 10 CFR Part 51 Waste Confidence Decision Update AGENCY: Nuclear Regulatory Commission. ACTION: Update and final revision of Waste Confidence Decision. SUMMARY: The U.S. Nuclear Regulatory Commission... update to the Decision were products of rulemaking proceedings designed to assess the degree of...

  4. Self-Confidence in the Hospitality Industry

    OpenAIRE

    Michael Oshins

    2014-01-01

    Few industries rely on self-confidence to the extent that the hospitality industry does because guests must feel welcome and that they are in capable hands. This article examines the results of hundreds of student interviews with industry professionals at all levels to determine where the majority of the hospitality industry gets their self-confidence.

  5. Self-Confidence in the Hospitality Industry

    Directory of Open Access Journals (Sweden)

    Michael Oshins

    2014-02-01

    Full Text Available Few industries rely on self-confidence to the extent that the hospitality industry does because guests must feel welcome and that they are in capable hands. This article examines the results of hundreds of student interviews with industry professionals at all levels to determine where the majority of the hospitality industry gets their self-confidence.

  6. Nonparametric confidence intervals for monotone functions

    NARCIS (Netherlands)

    Groeneboom, P.; Jongbloed, G.

    2015-01-01

    We study nonparametric isotonic confidence intervals for monotone functions. In [Ann. Statist. 29 (2001) 1699–1731], pointwise confidence intervals, based on likelihood ratio tests using the restricted and unrestricted MLE in the current status model, are introduced. We extend the method to the trea

  7. Building Scientific Confidence in the Development and ...

    Science.gov (United States)

    Building Scientific Confidence in the Development and Evaluation of Read-Across Using Tox21 Approaches Slide presentation at GlobalChem conference and workshop in Washington, DC on Case Study on Building Scientific Confidence in the Development and Evaluation of Read-Across Using Tox21 Approaches

  8. Nonparametric confidence intervals for monotone functions

    NARCIS (Netherlands)

    Groeneboom, P.; Jongbloed, G.

    2015-01-01

    We study nonparametric isotonic confidence intervals for monotone functions. In [Ann. Statist. 29 (2001) 1699–1731], pointwise confidence intervals, based on likelihood ratio tests using the restricted and unrestricted MLE in the current status model, are introduced. We extend the method to the

  9. Examining Response Confidence in Multiple Text Tasks

    Science.gov (United States)

    List, Alexandra; Alexander, Patricia A.

    2015-01-01

    Students' confidence in their responses to a multiple text-processing task and their justifications for those confidence ratings were investigated. Specifically, 215 undergraduates responded to two academic questions, differing by type (i.e., discrete and open-ended) and by domain (i.e., developmental psychology and astrophysics), using a digital…

  10. Confidence and Competence with Mathematical Procedures

    Science.gov (United States)

    Foster, Colin

    2016-01-01

    Confidence assessment (CA), in which students state alongside each of their answers a confidence level expressing how certain they are, has been employed successfully within higher education. However, it has not been widely explored with school pupils. This study examined how school mathematics pupils (N?=?345) in five different secondary schools…

  11. Confidence and Competence with Mathematical Procedures

    Science.gov (United States)

    Foster, Colin

    2016-01-01

    Confidence assessment (CA), in which students state alongside each of their answers a confidence level expressing how certain they are, has been employed successfully within higher education. However, it has not been widely explored with school pupils. This study examined how school mathematics pupils (N?=?345) in five different secondary schools…

  12. Examining Response Confidence in Multiple Text Tasks

    Science.gov (United States)

    List, Alexandra; Alexander, Patricia A.

    2015-01-01

    Students' confidence in their responses to a multiple text-processing task and their justifications for those confidence ratings were investigated. Specifically, 215 undergraduates responded to two academic questions, differing by type (i.e., discrete and open-ended) and by domain (i.e., developmental psychology and astrophysics), using a digital…

  13. Lower confidence limits for structure reliability

    Institute of Scientific and Technical Information of China (English)

    CHEN Jiading; LI Ji

    2006-01-01

    For a class of data often arising in engineering,we have developed an approach to compute the lower confidence limit for structure reliability with a given confidence level.Especially,in a case with no failure and a case with only one failure,the concrete computational methods are presented.

  14. Financial Literacy, Confidence and Financial Advice Seeking

    NARCIS (Netherlands)

    Kramer, Marc M.

    2016-01-01

    We find that people with higher confidence in their own financial literacy are less likely to seek financial advice, but no relation between objective measures of literacy and advice seeking. The negative association between confidence and advice seeking is more pronounced among wealthy households.

  15. Confidence intervals for similarity values determined for clonedSSU rRNA genes from environmental samples

    Energy Technology Data Exchange (ETDEWEB)

    Fields, M.W.; Schryver, J.C.; Brandt, C.C.; Yan, T.; Zhou, J.Z.; Palumbo, A.V.

    2007-04-02

    The goal of this research was to investigate the influenceof the error rate of sequence determination on the differentiation ofcloned SSU rRNA gene sequences for assessment of community structure. SSUrRNA cloned sequences from groundwater samples that represent differentbacterial divisions were sequenced multiple times with the samesequencing primer. From comparison of sequence alignments with unediteddata, confidence intervals were obtained from both a adouble binomial Tmodel of sequence comparison and by non-parametric methods. The resultsindicated that similarity values below 0.9946 arelikely derived fromdissimilar sequences at a confidence level of 0.95, and not sequencingerrors. The results confirmed that screening by direct sequencedetermination could be reliably used to differentiate at the specieslevel. However, given sequencing errors comparable to those seen in thisstudy, sequences with similarities above 0.9946 should be treated as thesame sequence if a 95 percent confidence is desired.

  16. Confidence assessment. Site-descriptive modelling SDM-Site Laxemar

    Energy Technology Data Exchange (ETDEWEB)

    2008-12-15

    The objective of this report is to assess the confidence that can be placed in the Laxemar site descriptive model, based on the information available at the conclusion of the surface-based investigations (SDM-Site Laxemar). In this exploration, an overriding question is whether remaining uncertainties are significant for repository engineering design or long-term safety assessment and could successfully be further reduced by more surface-based investigations or more usefully by explorations underground made during construction of the repository. Procedures for this assessment have been progressively refined during the course of the site descriptive modelling, and applied to all previous versions of the Forsmark and Laxemar site descriptive models. They include assessment of whether all relevant data have been considered and understood, identification of the main uncertainties and their causes, possible alternative models and their handling, and consistency between disciplines. The assessment then forms the basis for an overall confidence statement. The confidence in the Laxemar site descriptive model, based on the data available at the conclusion of the surface based site investigations, has been assessed by exploring: - Confidence in the site characterization data base, - remaining issues and their handling, - handling of alternatives, - consistency between disciplines and - main reasons for confidence and lack of confidence in the model. Generally, the site investigation database is of high quality, as assured by the quality procedures applied. It is judged that the Laxemar site descriptive model has an overall high level of confidence. Because of the relatively robust geological model that describes the site, the overall confidence in the Laxemar Site Descriptive model is judged to be high, even though details of the spatial variability remain unknown. The overall reason for this confidence is the wide spatial distribution of the data and the consistency between

  17. High-Resolution Analysis of Coronavirus Gene Expression by RNA Sequencing and Ribosome Profiling.

    Science.gov (United States)

    Irigoyen, Nerea; Firth, Andrew E; Jones, Joshua D; Chung, Betty Y-W; Siddell, Stuart G; Brierley, Ian

    2016-02-01

    Members of the family Coronaviridae have the largest genomes of all RNA viruses, typically in the region of 30 kilobases. Several coronaviruses, such as Severe acute respiratory syndrome-related coronavirus (SARS-CoV) and Middle East respiratory syndrome-related coronavirus (MERS-CoV), are of medical importance, with high mortality rates and, in the case of SARS-CoV, significant pandemic potential. Other coronaviruses, such as Porcine epidemic diarrhea virus and Avian coronavirus, are important livestock pathogens. Ribosome profiling is a technique which exploits the capacity of the translating ribosome to protect around 30 nucleotides of mRNA from ribonuclease digestion. Ribosome-protected mRNA fragments are purified, subjected to deep sequencing and mapped back to the transcriptome to give a global "snap-shot" of translation. Parallel RNA sequencing allows normalization by transcript abundance. Here we apply ribosome profiling to cells infected with Murine coronavirus, mouse hepatitis virus, strain A59 (MHV-A59), a model coronavirus in the same genus as SARS-CoV and MERS-CoV. The data obtained allowed us to study the kinetics of virus transcription and translation with exquisite precision. We studied the timecourse of positive and negative-sense genomic and subgenomic viral RNA production and the relative translation efficiencies of the different virus ORFs. Virus mRNAs were not found to be translated more efficiently than host mRNAs; rather, virus translation dominates host translation at later time points due to high levels of virus transcripts. Triplet phasing of the profiling data allowed precise determination of translated reading frames and revealed several translated short open reading frames upstream of, or embedded within, known virus protein-coding regions. Ribosome pause sites were identified in the virus replicase polyprotein pp1a ORF and investigated experimentally. Contrary to expectations, ribosomes were not found to pause at the ribosomal

  18. High-Resolution Analysis of Coronavirus Gene Expression by RNA Sequencing and Ribosome Profiling.

    Directory of Open Access Journals (Sweden)

    Nerea Irigoyen

    2016-02-01

    Full Text Available Members of the family Coronaviridae have the largest genomes of all RNA viruses, typically in the region of 30 kilobases. Several coronaviruses, such as Severe acute respiratory syndrome-related coronavirus (SARS-CoV and Middle East respiratory syndrome-related coronavirus (MERS-CoV, are of medical importance, with high mortality rates and, in the case of SARS-CoV, significant pandemic potential. Other coronaviruses, such as Porcine epidemic diarrhea virus and Avian coronavirus, are important livestock pathogens. Ribosome profiling is a technique which exploits the capacity of the translating ribosome to protect around 30 nucleotides of mRNA from ribonuclease digestion. Ribosome-protected mRNA fragments are purified, subjected to deep sequencing and mapped back to the transcriptome to give a global "snap-shot" of translation. Parallel RNA sequencing allows normalization by transcript abundance. Here we apply ribosome profiling to cells infected with Murine coronavirus, mouse hepatitis virus, strain A59 (MHV-A59, a model coronavirus in the same genus as SARS-CoV and MERS-CoV. The data obtained allowed us to study the kinetics of virus transcription and translation with exquisite precision. We studied the timecourse of positive and negative-sense genomic and subgenomic viral RNA production and the relative translation efficiencies of the different virus ORFs. Virus mRNAs were not found to be translated more efficiently than host mRNAs; rather, virus translation dominates host translation at later time points due to high levels of virus transcripts. Triplet phasing of the profiling data allowed precise determination of translated reading frames and revealed several translated short open reading frames upstream of, or embedded within, known virus protein-coding regions. Ribosome pause sites were identified in the virus replicase polyprotein pp1a ORF and investigated experimentally. Contrary to expectations, ribosomes were not found to pause at the

  19. Empirical methods for controlling false positives and estimating confidence in ChIP-Seq peaks

    Directory of Open Access Journals (Sweden)

    Courdy Samir J

    2008-12-01

    Full Text Available Abstract Background High throughput signature sequencing holds many promises, one of which is the ready identification of in vivo transcription factor binding sites, histone modifications, changes in chromatin structure and patterns of DNA methylation across entire genomes. In these experiments, chromatin immunoprecipitation is used to enrich for particular DNA sequences of interest and signature sequencing is used to map the regions to the genome (ChIP-Seq. Elucidation of these sites of DNA-protein binding/modification are proving instrumental in reconstructing networks of gene regulation and chromatin remodelling that direct development, response to cellular perturbation, and neoplastic transformation. Results Here we present a package of algorithms and software that makes use of control input data to reduce false positives and estimate confidence in ChIP-Seq peaks. Several different methods were compared using two simulated spike-in datasets. Use of control input data and a normalized difference score were found to more than double the recovery of ChIP-Seq peaks at a 5% false discovery rate (FDR. Moreover, both a binomial p-value/q-value and an empirical FDR were found to predict the true FDR within 2–3 fold and are more reliable estimators of confidence than a global Poisson p-value. These methods were then used to reanalyze Johnson et al.'s neuron-restrictive silencer factor (NRSF ChIP-Seq data without relying on extensive qPCR validated NRSF sites and the presence of NRSF binding motifs for setting thresholds. Conclusion The methods developed and tested here show considerable promise for reducing false positives and estimating confidence in ChIP-Seq data without any prior knowledge of the chIP target. They are part of a larger open source package freely available from http://useq.sourceforge.net/.

  20. Adaptation of Shift Sequence Based Method for High Number in Shifts Rostering Problem for Health Care Workers

    Directory of Open Access Journals (Sweden)

    Mindaugas Liogys

    2011-08-01

    Full Text Available Purpose—is to investigate a shift sequence-based approach efficiency then problem consisting of a high number of shifts. Research objectives:• Solve health care workers rostering problem using a shift sequence based method.• Measure its efficiency then number of shifts increases. Design/methodology/approach—Usually rostering problems are highly constrained.Constraints are classified to soft and hard constraints. Soft and hard constraints of the problem are additionally classified to: sequence constraints, schedule constraints and roster constraints. Sequence constraints are considered when constructing shift sequences. Schedule constraints are considered when constructing a schedule. Roster constraints are applied, then constructing overall solution, i.e. combining all schedules.Shift sequence based approach consists of two stages:• Shift sequences construction,• The construction of schedules.In the shift sequences construction stage, the shift sequences are constructed for each set of health care workers of different skill, considering sequence constraints. Shifts sequences are ranked by their penalties for easier retrieval in later stage.In schedules construction stage, schedules for each health care worker are constructed iteratively, using the shift sequences produced in stage 1. Shift sequence based method is an adaptive iterative method where health care workers who received the highest schedule penalties in the last iteration are scheduled first at the current iteration. During the roster construction, and after a schedule has been generated for the current health care worker, an improvement method based on an efficient greedy local search is carried out on the partial roster. It simply swaps any pair of shifts between two health care workers in the (partial roster, as long as the swaps satisfy hard constraints and decrease the roster penalty.Findings—Using shift sequence method for solving health care workers rostering

  1. Self-confidence and metacognitive processes

    Directory of Open Access Journals (Sweden)

    Kleitman Sabina

    2005-01-01

    Full Text Available This paper examines the status of Self-confidence trait. Two studies strongly suggest that Self-confidence is a component of metacognition. In the first study, participants (N=132 were administered measures of Self-concept, a newly devised Memory and Reasoning Competence Inventory (MARCI, and a Verbal Reasoning Test (VRT. The results indicate a significant relationship between confidence ratings on the VRT and the Reasoning component of MARCI. The second study (N=296 employed an extensive battery of cognitive tests and several metacognitive measures. Results indicate the presence of robust Self-confidence and Metacognitive Awareness factors, and a significant correlation between them. Self-confidence taps not only processes linked to performance on items that have correct answers, but also beliefs about events that may never occur.

  2. High-throughput sequencing enhanced phage display enables the identification of patient-specific epitope motifs in serum

    DEFF Research Database (Denmark)

    Christiansen, Anders; Kringelum, Jens Vindahl; Hansen, Christian Skjødt

    2015-01-01

    of the bioinformatic approach was demonstrated by identifying epitopes of a prominent peanut allergen, Ara h 1, in sera from patients with severe peanut allergy. The identified epitopes were confirmed by high-density peptide micro-arrays. The present study demonstrates that high-throughput sequencing can empower phage...

  3. Heterozygous mapping strategy (HetMapps)for high resolution genotyping-by-sequencing markers: a case study in grapevine

    Science.gov (United States)

    Genotyping by sequencing (GBS) provides opportunities to generate high-resolution genetic maps at a low per-sample genotyping cost, but missing data and under-calling of heterozygotes complicate the creation of GBS linkage maps for highly heterozygous species. To overcome these issues, we developed ...

  4. Viral Metagenomics: Analysis of Begomoviruses by Illumina High-Throughput Sequencing

    Directory of Open Access Journals (Sweden)

    Ali Idris

    2014-03-01

    Full Text Available Traditional DNA sequencing methods are inefficient, lack the ability to discern the least abundant viral sequences, and ineffective for determining the extent of variability in viral populations. Here, populations of single-stranded DNA plant begomoviral genomes and their associated beta- and alpha-satellite molecules (virus-satellite complexes (genus, Begomovirus; family, Geminiviridae were enriched from total nucleic acids isolated from symptomatic, field-infected plants, using rolling circle amplification (RCA. Enriched virus-satellite complexes were subjected to Illumina-Next Generation Sequencing (NGS. CASAVA and SeqMan NGen programs were implemented, respectively, for quality control and for de novo and reference-guided contig assembly of viral-satellite sequences. The authenticity of the begomoviral sequences, and the reproducibility of the Illumina-NGS approach for begomoviral deep sequencing projects, were validated by comparing NGS results with those obtained using traditional molecular cloning and Sanger sequencing of viral components and satellite DNAs, also enriched by RCA or amplified by polymerase chain reaction. As the use of NGS approaches, together with advances in software development, make possible deep sequence coverage at a lower cost; the approach described herein will streamline the exploration of begomovirus diversity and population structure from naturally infected plants, irrespective of viral abundance. This is the first report of the implementation of Illumina-NGS to explore the diversity and identify begomoviral-satellite SNPs directly from plants naturally-infected with begomoviruses under field conditions.

  5. A Web-based High-Throughput Tool for Next-Generation Sequence Annotation

    Science.gov (United States)

    2011-06-01

    annotation of a newly sequenced complete genome, can help devise new strategies in diagnostics and forensics . Moreover, these annotations, coupled...References 1. Hall, N., “Advanced sequencing technologies and their wider impact in microbiology ”, The Journal of Experimental Biology, 210(9), pp. 1518–1525

  6. DNA template strand sequencing of single-cells maps genomic rearrangements at high resolution

    NARCIS (Netherlands)

    Falconer, Ester; Hills, Mark; Naumann, Ulrike; Poon, Steven S. S.; Chavez, Elizabeth A.; Sanders, Ashley D.; Zhao, Yongjun; Hirst, Martin; Lansdorp, Peter M.

    2012-01-01

    DNA rearrangements such as sister chromatid exchanges (SCEs) are sensitive indicators of genomic stress and instability, but they are typically masked by single-cell sequencing techniques. We developed Strand-seq to independently sequence parental DNA template strands from single cells, making it po

  7. Viral metagenomics: Analysis of begomoviruses by illumina high-throughput sequencing

    KAUST Repository

    Idris, Ali

    2014-03-12

    Traditional DNA sequencing methods are inefficient, lack the ability to discern the least abundant viral sequences, and ineffective for determining the extent of variability in viral populations. Here, populations of single-stranded DNA plant begomoviral genomes and their associated beta- and alpha-satellite molecules (virus-satellite complexes) (genus, Begomovirus; family, Geminiviridae) were enriched from total nucleic acids isolated from symptomatic, field-infected plants, using rolling circle amplification (RCA). Enriched virus-satellite complexes were subjected to Illumina-Next Generation Sequencing (NGS). CASAVA and SeqMan NGen programs were implemented, respectively, for quality control and for de novo and reference-guided contig assembly of viral-satellite sequences. The authenticity of the begomoviral sequences, and the reproducibility of the Illumina-NGS approach for begomoviral deep sequencing projects, were validated by comparing NGS results with those obtained using traditional molecular cloning and Sanger sequencing of viral components and satellite DNAs, also enriched by RCA or amplified by polymerase chain reaction. As the use of NGS approaches, together with advances in software development, make possible deep sequence coverage at a lower cost; the approach described herein will streamline the exploration of begomovirus diversity and population structure from naturally infected plants, irrespective of viral abundance. This is the first report of the implementation of Illumina-NGS to explore the diversity and identify begomoviral-satellite SNPs directly from plants naturally-infected with begomoviruses under field conditions. 2014 by the authors; licensee MDPI, Basel, Switzerland.

  8. Viral metagenomics: analysis of begomoviruses by illumina high-throughput sequencing.

    Science.gov (United States)

    Idris, Ali; Al-Saleh, Mohammed; Piatek, Marek J; Al-Shahwan, Ibrahim; Ali, Shahjahan; Brown, Judith K

    2014-03-12

    Traditional DNA sequencing methods are inefficient, lack the ability to discern the least abundant viral sequences, and ineffective for determining the extent of variability in viral populations. Here, populations of single-stranded DNA plant begomoviral genomes and their associated beta- and alpha-satellite molecules (virus-satellite complexes) (genus, Begomovirus; family, Geminiviridae) were enriched from total nucleic acids isolated from symptomatic, field-infected plants, using rolling circle amplification (RCA). Enriched virus-satellite complexes were subjected to Illumina-Next Generation Sequencing (NGS). CASAVA and SeqMan NGen programs were implemented, respectively, for quality control and for de novo and reference-guided contig assembly of viral-satellite sequences. The authenticity of the begomoviral sequences, and the reproducibility of the Illumina-NGS approach for begomoviral deep sequencing projects, were validated by comparing NGS results with those obtained using traditional molecular cloning and Sanger sequencing of viral components and satellite DNAs, also enriched by RCA or amplified by polymerase chain reaction. As the use of NGS approaches, together with advances in software development, make possible deep sequence coverage at a lower cost; the approach described herein will streamline the exploration of begomovirus diversity and population structure from naturally infected plants, irrespective of viral abundance. This is the first report of the implementation of Illumina-NGS to explore the diversity and identify begomoviral-satellite SNPs directly from plants naturally-infected with begomoviruses under field conditions.

  9. High Depth, Whole-Genome Sequencing of Cholera Isolates from Haiti and the Dominican Republic

    Science.gov (United States)

    2012-09-11

    cholerae [21] and is a homolog of TagA, which has mucinase function [22]. Sequencing of additional isolates from this outbreak over time is likely to...eliminate paralogs , we required the next best hit to be less than 0.8 times as similar as the best hit. We constructed a multiple sequence alignment for

  10. High-Quality Draft Genome Sequence of Bacillus amyloliquefaciens Strain 629, an Endophyte from Theobroma cacao.

    Science.gov (United States)

    SantAnna, Brena M M; Marbach, Phellippe P A; Rojas-Herrera, Marcelo; De Souza, Jorge T; Roque, Milton R A; Queiroz, Artur T L

    2015-11-19

    Bacillus amyloliquefaciens strain 629 is an endophyte isolated from Theobroma cacao L. Here, we report the draft genome sequence (3.9 Mb) of B. amyloliquefaciens strain 629 containing 16 contigs (3,903,367 bp), 3,912 coding sequences, and an average 46.5% G+C content. Copyright © 2015 SantAnna et al.

  11. High diversity of picornaviruses in rats from different continents revealed by deep sequencing

    DEFF Research Database (Denmark)

    Arn Hansen, Thomas; Mollerup, Sarah; Nguyen, Nam-Phuong;

    2016-01-01

    ) collected from two continents by analyzing 2.2 billion next-generation sequencing reads derived from both DNA and RNA. Among other virus families, we found sequences from members of the Picornaviridae to be abundant in the microbiome of all the samples. Here we describe the diversity of the picornavirus...

  12. The Accelerated Build-up of the Red Sequence in High Redshift Galaxy Clusters

    CERN Document Server

    Cerulo, P; Lidman, C; Demarco, R; Huertas-Company, M; Mei, S; Sánchez-Janssen, R; Barrientos, L F; Muñoz, R P

    2016-01-01

    We analyse the evolution of the red sequence in a sample of galaxy clusters at redshifts $0.8 11.5$) red sequence galaxies in the WINGS clusters, which do not include only the brightest cluster galaxies and which are not present in the HCS clusters, suggesting that they formed at epochs later than $z=0.8$. The comparison with the luminosity distribution of a sample of passive red sequence galaxies drawn from the COSMOS/UltraVISTA field in the photometric redshift range $0.8sequence in clusters is more developed at the faint end, suggesting that halo mass plays an important role in setting the time-scales for the build-up of the red sequence.

  13. Prenatal MRI Findings of Fetuses with Congenital High Airway Obstruction Sequence

    Energy Technology Data Exchange (ETDEWEB)

    Guimaraes, Carolina V. A.; Linam, Leann E.; Kline-Fath, Beth M. [Cincinnati Children' s Hospital Medical Center, Cincinnati (United States)] (and others)

    2009-04-15

    To define the MRI findings of congenital high airway obstruction sequence (CHAOS) in a series of fetuses. Prenatal fetal MR images were reviewed in seven fetuses with CHAOS at 21 to 27 weeks of gestation. The MRI findings were reviewed. The MRI parameters evaluated included the appearance of the lungs and diaphragm, presence or absence of hydrops, amount of amniotic fluid, airway appearance, predicted level of airway obstruction, and any additional findings or suspected genetic syndromes. All the fetuses viewed (7 of 7) demonstrated the following MRI findings: dilated airway below the level of obstruction, increased lung signal, markedly increased lung volumes with flattened or inverted hemidiaphragms, massive ascites, centrally positioned and compressed heart, as well as placentomegaly. Other frequent findings were anasarca (6 of 7) and polyhydramnios (3 of 7). MRI identified the level of obstruction as laryngeal in five cases and tracheal in two cases. In four of the patients, surgery or autopsy confirmed the MRI predicted level of obstruction. Associated abnormalities were found in 4 of 7 (genetic syndromes in 2). Postnatal radiography (n = 3) showed markedly hyperinflated lungs with inverted or flattened hemidiaphragms, strandy perihilar opacities, pneumothoraces and tracheotomy. Two fetuses were terminated and one fetus demised in utero. Four fetuses were delivered via ex utero intrapartum treatment procedure. MRI shows a consistent pattern of abnormalities in fetuses with CHAOS, accurately identifies the level of airway obstruction, and helps differentiate from other lung abnormalities such as bilateral congenital pulmonary airway malformation by demonstrating an abnormally dilated airway distal to the obstruction.

  14. Conjugation with Acridines Turns Nuclear Localization Sequence into Highly Active Antimicrobial Peptide

    Directory of Open Access Journals (Sweden)

    Zhang Wei

    2015-12-01

    Full Text Available The emergence of multidrug-resistant bacteria creates an urgent need for alternative antibiotics with new mechanisms of action. In this study, we synthesized a novel type of antimicrobial agent, Acr3-NLS, by conjugating hydrophobic acridines to the N-terminus of a nuclear localization sequence (NLS, a short cationic peptide. To further improve the antimicrobial activity of our agent, dimeric (Acr3-NLS2 was simultaneously synthesized by joining two monomeric Acr3-NLS together via a disulfide linker. Our results show that Acr3-NLS and especially (Acr3-NLS2 display significant antimicrobial activity against gram-negative and gram-positive bacteria compared to that of the NLS. Subsequently, the results derived from the study on the mechanism of action demonstrate that Acr3-NLS and (Acr3-NLS2 can kill bacteria by membrane disruption and DNA binding. The double targets–cell membrane and intracellular DNA–will reduce the risk of bacteria developing resistance to Acr3-NLS and (Acr3-NLS2. Overall, this study provides a novel strategy to design highly effective antimicrobial agents with a dual mode of action for infection treatment.

  15. Microbial Characterization of Denitrifying Sulifde Removal Sludge Using High-Throughput Amplicon Sequencing Method

    Institute of Scientific and Technical Information of China (English)

    Ma Wenjuan; Liu Chunshuang; Zhao Dongfeng; Guo Yadong; Wang Aijie; Jia Kuili

    2015-01-01

    The denitrifying sulifde removal (DSR) process has recently been studied extensively from an engineering per-spective. However, the importance of microbial communities of this process was generally underestimated. In this study, the microbial community structure of a lab-scale DSR reactor was characterized in order to provide a comprehensive insight into the key microbial groups in DSR system. Results from high-throughput sequencing analysis revealed that the frac-tion of autotrophic denitriifers increased from 2.34 % to 10.93% and 44.51% in the DSR system when the inlfuent NaCl increased from 0 g/L, to 4 g/L and 30 g/L, respectively. On the contrary, the fraction of heterotrophic denitriifers decreased from 61.74% to 39.57%, and 24.12%, respectively.Azoarcus andThiobacillus were the main autotrophic denitriifers, and Thauera was the main hetetrophic denitriifer during the whole process. This study could be useful for better understanding the interaction between autotrophs and heterotrophs in DSR system.

  16. Rapid Detection and Identification of Infectious Pathogens Based on High-throughput Sequencing

    Institute of Scientific and Technical Information of China (English)

    Pei-Xiang Ni; Xin Ding; Yin-Xin Zhang; Xue Yao; Rui-Xue Sun; Peng Wang; Yan-Ping Gong

    2015-01-01

    Background:The dilemma of pathogens identification in patients with unidentified clinical symptoms such as fever of unknown origin exists,which not only poses a challenge to both the diagnostic and therapeutic process by itself,but also to expert physicians.Methods:In this report,we have attempted to increase the awareness of unidentified pathogens by developing a method to investigate hitherto unidentified infectious pathogens based on unbiased high-throughput sequencing.Results:Our observations show that this method supplements current diagnostic technology that predominantly relies on information derived five cases from the intensive care unit.This methodological approach detects viruses and corrects the incidence of false positive detection rates of pathogens in a much shorter period.Through our method is followed by polymerase chain reaction validation,we could identify infection with Epstein-Barr virus,and in another case,we could identify infection with Streptococcus viridians based on the culture,which was false positive.Conclusions:This technology is a promising approach to revolutionize rapid diagnosis of infectious pathogens and to guide therapy that might result in the improvement of personalized medicine.

  17. Evaluation of the microbial diversity in amyotrophic lateral sclerosis using high-throughput sequencing

    Directory of Open Access Journals (Sweden)

    Xin Fang

    2016-09-01

    Full Text Available More and more evidences indicate that diseases of the central nervous system (CNS have been seriously affected by faecal microbes. However, little work is done to explore interaction between amyotrophic lateral sclerosis (ALS and faecal microbes. In the present study, high-throughput sequencing method was used to compare the intestinal microbial diversity of healthy people and ALS patients. The principal coordinate analysis (PCoA, Venn and unweighted pair-group method using arithmetic averages (UPGMA showed an obvious microbial changes between healthy people (group H and ALS patients (group A, and the average ratios of Bacteroides, Faecalibacterium, Anaerostipes, Prevotella, Escherichia and Lachnospira at genus level between ALS patients and healthy people were 0.78, 2.18, 3.41, 0.35, 0.79 and 13.07. Furthermore, the decreased Firmicutes/Bacteroidetes ratio at phylum level using LEfSE (LDA >4.0, together with the significant increased genus Dorea (harmful microorganisms and significant reduced genus Oscillibacter, Anaerostipes, Lachnospiraceae (beneficial microorganisms in ALS patients, indicated that the imbalance in intestinal microflora constitution had a strong association with the pathogenesis of ALS.

  18. High-throughput sequencing, characterization and detection of new and conserved cucumber miRNAs.

    Directory of Open Access Journals (Sweden)

    Germán Martínez

    Full Text Available Micro RNAS (miRNAs are a class of endogenous small non coding RNAs involved in the post-transcriptional regulation of gene expression. In plants, a great number of conserved and specific miRNAs, mainly arising from model species, have been identified to date. However less is known about the diversity of these regulatory RNAs in vegetal species with agricultural and/or horticultural importance. Here we report a combined approach of bioinformatics prediction, high-throughput sequencing data and molecular methods to analyze miRNAs populations in cucumber (Cucumis sativus plants. A set of 19 conserved and 6 known but non-conserved miRNA families were found in our cucumber small RNA dataset. We also identified 7 (3 with their miRNA* strand not previously described miRNAs, candidates to be cucumber-specific. To validate their description these new C. sativus miRNAs were detected by northern blot hybridization. Additionally, potential targets for most conserved and new miRNAs were identified in cucumber genome.In summary, in this study we have identified, by first time, conserved, known non-conserved and new miRNAs arising from an agronomically important species such as C. sativus. The detection of this complex population of regulatory small RNAs suggests that similarly to that observe in other plant species, cucumber miRNAs may possibly play an important role in diverse biological and metabolic processes.

  19. High diagnostic yield of clinical exome sequencing in Middle Eastern patients with Mendelian disorders.

    Science.gov (United States)

    Yavarna, Tarunashree; Al-Dewik, Nader; Al-Mureikhi, Mariam; Ali, Rehab; Al-Mesaifri, Fatma; Mahmoud, Laila; Shahbeck, Noora; Lakhani, Shenela; AlMulla, Mariam; Nawaz, Zafar; Vitazka, Patrik; Alkuraya, Fowzan S; Ben-Omran, Tawfeg

    2015-09-01

    Clinical exome sequencing (CES) has become an increasingly popular diagnostic tool in patients with heterogeneous genetic disorders, especially in those with neurocognitive phenotypes. Utility of CES in consanguineous populations has not yet been determined on a large scale. A clinical cohort of 149 probands from Qatar with suspected Mendelian, mainly neurocognitive phenotypes, underwent CES from July 2012 to June 2014. Intellectual disability and global developmental delay were the most common clinical presentations but our cohort displayed other phenotypes, such as epilepsy, dysmorphism, microcephaly and other structural brain anomalies and autism. A pathogenic or likely pathogenic mutation, including pathogenic CNVs, was identified in 89 probands for a diagnostic yield of 60%. Consanguinity and positive family history predicted a higher diagnostic yield. In 5% (7/149) of cases, CES implicated novel candidate disease genes (MANF, GJA9, GLG1, COL15A1, SLC35F5, MAGE4, NEUROG1). CES uncovered two coexisting genetic disorders in 4% (6/149) and actionable incidental findings in 2% (3/149) of cases. Average time to diagnosis was reduced from 27 to 5 months. CES, which already has the highest diagnostic yield among all available diagnostic tools in the setting of Mendelian disorders, appears to be particularly helpful diagnostically in the highly consanguineous Middle Eastern population.

  20. A method for high-throughput production of sequence-verified DNA libraries and strain collections.

    Science.gov (United States)

    Smith, Justin D; Schlecht, Ulrich; Xu, Weihong; Suresh, Sundari; Horecka, Joe; Proctor, Michael J; Aiyar, Raeka S; Bennett, Richard A O; Chu, Angela; Li, Yong Fuga; Roy, Kevin; Davis, Ronald W; Steinmetz, Lars M; Hyman, Richard W; Levy, Sasha F; St Onge, Robert P

    2017-02-13

    The low costs of array-synthesized oligonucleotide libraries are empowering rapid advances in quantitative and synthetic biology. However, high synthesis error rates, uneven representation, and lack of access to individual oligonucleotides limit the true potential of these libraries. We have developed a cost-effective method called Recombinase Directed Indexing (REDI), which involves integration of a complex library into yeast, site-specific recombination to index library DNA, and next-generation sequencing to identify desired clones. We used REDI to generate a library of ~3,300 DNA probes that exhibited > 96% purity and remarkable uniformity (> 95% of probes within twofold of the median abundance). Additionally, we created a collection of ~9,000 individually accessible CRISPR interference yeast strains for > 99% of genes required for either fermentative or respiratory growth, demonstrating the utility of REDI for rapid and cost-effective creation of strain collections from oligonucleotide pools. Our approach is adaptable to any complex DNA library, and fundamentally changes how these libraries can be parsed, maintained, propagated, and characterized.

  1. Evaluation of the Microbial Diversity in Amyotrophic Lateral Sclerosis Using High-Throughput Sequencing

    Science.gov (United States)

    Fang, Xin; Wang, Xin; Yang, Shaoguo; Meng, Fanjing; Wang, Xiaolei; Wei, Hua; Chen, Tingtao

    2016-01-01

    More and more evidences indicate that diseases of the central nervous system have been seriously affected by fecal microbes. However, little work is done to explore interaction between amyotrophic lateral sclerosis (ALS) and fecal microbes. In the present study, high-throughput sequencing method was used to compare the intestinal microbial diversity of healthy people and ALS patients. The principal coordinate analysis, Venn and unweighted pair-group method using arithmetic averages (UPGMA) showed an obvious microbial changes between healthy people (group H) and ALS patients (group A), and the average ratios of Bacteroides, Faecalibacterium, Anaerostipes, Prevotella, Escherichia, and Lachnospira at genus level between ALS patients and healthy people were 0.78, 2.18, 3.41, 0.35, 0.79, and 13.07. Furthermore, the decreased Firmicutes/Bacteroidetes ratio at phylum level using LEfSE (LDA > 4.0), together with the significant increased genus Dorea (harmful microorganisms) and significant reduced genus Oscillibacter, Anaerostipes, Lachnospiraceae (beneficial microorganisms) in ALS patients, indicated that the imbalance in intestinal microflora constitution had a strong association with the pathogenesis of ALS. PMID:27703453

  2. High-throughput sequencing of plasma microRNA in chronic fatigue syndrome/myalgic encephalomyelitis.

    Directory of Open Access Journals (Sweden)

    Ekua W Brenu

    Full Text Available BACKGROUND: MicroRNAs (miRNAs are known to regulate many biological processes and their dysregulation has been associated with a variety of diseases including Chronic Fatigue Syndrome/Myalgic Encephalomyelitis (CFS/ME. The recent discovery of stable and reproducible miRNA in plasma has raised the possibility that circulating miRNAs may serve as novel diagnostic markers. The objective of this study was to determine the role of plasma miRNA in CFS/ME. RESULTS: Using Illumina high-throughput sequencing we identified 19 miRNAs that were differentially expressed in the plasma of CFS/ME patients in comparison to non-fatigued controls. Following RT-qPCR analysis, we were able to confirm the significant up-regulation of three miRNAs (hsa-miR-127-3p, hsa-miR-142-5p and hsa-miR-143-3p in the CFS/ME patients. CONCLUSION: Our study is the first to identify circulating miRNAs from CFS/ME patients and also to confirm three differentially expressed circulating miRNAs in CFS/ME patients, providing a basis for further study to find useful CFS/ME biomarkers.

  3. Rapid Detection and Identification of Infectious Pathogens Based on High-throughput Sequencing

    Directory of Open Access Journals (Sweden)

    Pei-Xiang Ni

    2015-01-01

    Full Text Available Background: The dilemma of pathogens identification in patients with unidentified clinical symptoms such as fever of unknown origin exists, which not only poses a challenge to both the diagnostic and therapeutic process by itself, but also to expert physicians. Methods: In this report, we have attempted to increase the awareness of unidentified pathogens by developing a method to investigate hitherto unidentified infectious pathogens based on unbiased high-throughput sequencing. Results: Our observations show that this method supplements current diagnostic technology that predominantly relies on information derived five cases from the intensive care unit. This methodological approach detects viruses and corrects the incidence of false positive detection rates of pathogens in a much shorter period. Through our method is followed by polymerase chain reaction validation, we could identify infection with Epstein-Barr virus, and in another case, we could identify infection with Streptococcus viridians based on the culture, which was false positive. Conclusions: This technology is a promising approach to revolutionize rapid diagnosis of infectious pathogens and to guide therapy that might result in the improvement of personalized medicine.

  4. Unbiased K-mer Analysis Reveals Changes in Copy Number of Highly Repetitive Sequences During Maize Domestication and Improvement

    Science.gov (United States)

    Liu, Sanzhen; Zheng, Jun; Migeon, Pierre; Ren, Jie; Hu, Ying; He, Cheng; Liu, Hongjun; Fu, Junjie; White, Frank F.; Toomajian, Christopher; Wang, Guoying

    2017-01-01

    The major component of complex genomes is repetitive elements, which remain recalcitrant to characterization. Using maize as a model system, we analyzed whole genome shotgun (WGS) sequences for the two maize inbred lines B73 and Mo17 using k-mer analysis to quantify the differences between the two genomes. Significant differences were identified in highly repetitive sequences, including centromere, 45S ribosomal DNA (rDNA), knob, and telomere repeats. Genotype specific 45S rDNA sequences were discovered. The B73 and Mo17 polymorphic k-mers were used to examine allele-specific expression of 45S rDNA in the hybrids. Although Mo17 contains higher copy number than B73, equivalent levels of overall 45S rDNA expression indicates that transcriptional or post-transcriptional regulation mechanisms operate for the 45S rDNA in the hybrids. Using WGS sequences of B73xMo17 doubled haploids, genomic locations showing differential repetitive contents were genetically mapped, which displayed different organization of highly repetitive sequences in the two genomes. In an analysis of WGS sequences of HapMap2 lines, including maize wild progenitor, landraces, and improved lines, decreases and increases in abundance of additional sets of k-mers associated with centromere, 45S rDNA, knob, and retrotransposons were found among groups, revealing global evolutionary trends of genomic repeats during maize domestication and improvement. PMID:28186206

  5. Differential effects of high-temperature stress on nuclear topology and transcription of repetitive noncoding and coding rye sequences.

    Science.gov (United States)

    Tomás, D; Brazão, J; Viegas, W; Silva, M

    2013-01-01

    The plant stress response has been extensively characterized at the biochemical and physiological levels. However, knowledge concerning repetitive sequence genome fraction modulation during extreme temperature conditions is scarce. We studied high-temperature effects on subtelomeric repetitive sequences (pSc200) and 45S rDNA in rye seedlings submitted to 40°C during 4 h. Chromatin organization patterns were evaluated through fluorescent in situ hybridization and transcription levels were assessed using quantitative real-time PCR. Additionally, the nucleolar dynamics were evaluated through fibrillarin immunodetection in interphase nuclei. The results obtained clearly demonstrated that the pSc200 sequence organization is not affected by high-temperature stress (HTS) and proved for the first time that this noncoding subtelomeric sequence is stably transcribed. Conversely, it was demonstrated that HTS treatment induces marked rDNA chromatin decondensation along with nucleolar enlargement and a significant increase in ribosomal gene transcription. The role of noncoding and coding repetitive rye sequences in the plant stress response that are suggested by their clearly distinct behaviors is discussed. While the heterochromatic conformation of pSc200 sequences seems to be involved in the stabilization of the interphase chromatin architecture under stress conditions, the dynamic modulation of nucleolar and rDNA topology and transcription suggest their role in plant stress response pathways.

  6. Test Anxiety Reduction and Confidence Training: A Replication

    Science.gov (United States)

    Bowman, Noah; Driscoll, Richard

    2013-01-01

    This study was undertaken to replicate prior research in which a brief counter-conditioning and confidence training program was found to reduce anxiety and raise test scores. First-semester college students were screened with the Westside Test Anxiety Scale, and the 25 identified as having high or moderately-high anxiety were randomly divided…

  7. Efficient generation of cavitation bubbles and reactive oxygen species using triggered high-intensity focused ultrasound sequence for sonodynamic treatment

    Science.gov (United States)

    Yasuda, Jun; Yoshizawa, Shin; Umemura, Shin-ichiro

    2016-07-01

    Sonodynamic treatment is a method of treating cancer using reactive oxygen species (ROS) generated by cavitation bubbles in collaboration with a sonosensitizer at a target tissue. In this treatment method, both localized ROS generation and ROS generation with high efficiency are important. In this study, a triggered high-intensity focused ultrasound (HIFU) sequence, which consists of a short, extremely high intensity pulse immediately followed by a long, moderate-intensity burst, was employed for the efficient generation of ROS. In experiments, a solution sealed in a chamber was exposed to a triggered HIFU sequence. Then, the distribution of generated ROS was observed by the luminol reaction, and the amount of generated ROS was quantified using KI method. As a result, the localized ROS generation was demonstrated by light emission from the luminol reaction. Moreover, it was demonstrated that the triggered HIFU sequence has higher efficiency of ROS generation by both the KI method and the luminol reaction emission.

  8. Phylogenetic and Functional Analysis of Metagenome Sequence from High-Temperature Archaeal Habitats Demonstrate Linkages between Metabolic Potential and Geochemistry

    DEFF Research Database (Denmark)

    Inskeep, William P; Jay, Zackary J; Herrgard, Markus

    2013-01-01

    from the sequence data. Analysis of protein family occurrence, particularly of those involved in energy conservation, electron transport, and autotrophic metabolism, revealed significant differences in metabolic strategies across sites consistent with differences in major geochemical attributes (e.......4 and to discuss specific examples where the metabolic potential correlated with measured environmental parameters and geochemical processes occurring in situ. Random shotgun metagenome sequence (∼40-45 Mb Sanger sequencing per site) was obtained from environmental DNA extracted from high-temperature sediments and....../or microbial mats and subjected to numerous phylogenetic and functional analyses. Analysis of individual sequences (e.g., MEGAN and G + C content) and assemblies from each habitat type revealed the presence of dominant archaeal populations in all environments, 10 of whose genomes were largely reconstructed...

  9. Exploring the sources of bacterial spoilers in beefsteaks by culture-independent high-throughput sequencing.

    Directory of Open Access Journals (Sweden)

    Francesca De Filippis

    Full Text Available Microbial growth on meat to unacceptable levels contributes significantly to change meat structure, color and flavor and to cause meat spoilage. The types of microorganisms initially present in meat depend on several factors and multiple sources of contamination can be identified. The aims of this study were to evaluate the microbial diversity in beefsteaks before and after aerobic storage at 4°C and to investigate the sources of microbial contamination by examining the microbiota of carcasses wherefrom the steaks originated and of the processing environment where the beef was handled. Carcass, environmental (processing plant and meat samples were analyzed by culture-independent high-throughput sequencing of 16S rRNA gene amplicons. The microbiota of carcass swabs was very complex, including more than 600 operational taxonomic units (OTUs belonging to 15 different phyla. A significant association was found between beef microbiota and specific beef cuts (P<0.01 indicating that different cuts of the same carcass can influence the microbial contamination of beef. Despite the initially high complexity of the carcass microbiota, the steaks after aerobic storage at 4°C showed a dramatic decrease in microbial complexity. Pseudomonas sp. and Brochothrix thermosphacta were the main contaminants, and Acinetobacter, Psychrobacter and Enterobacteriaceae were also found. Comparing the relative abundance of OTUs in the different samples it was shown that abundant OTUs in beefsteaks after storage occurred in the corresponding carcass. However, the abundance of these same OTUs clearly increased in environmental samples taken in the processing plant suggesting that spoilage-associated microbial species originate from carcasses, they are carried to the processing environment where the meat is handled and there they become a resident microbiota. Such microbiota is then further spread on meat when it is handled and it represents the starting microbial association

  10. Prediction of new high pressure structural sequence in thorium carbide: A first principles study

    Energy Technology Data Exchange (ETDEWEB)

    Sahoo, B. D., E-mail: bdsahoo@barc.gov.in; Joshi, K. D.; Gupta, Satish C. [Applied Physics Division, Bhabha Atomic Research Centre, Mumbai 400085 (India)

    2015-05-14

    In the present work, we report the detailed electronic band structure calculations on thorium monocarbide. The comparison of enthalpies, derived for various phases using evolutionary structure search method in conjunction with first principles total energy calculations at several hydrostatic compressions, yielded a high pressure structural sequence of NaCl type (B1) → Pnma → Cmcm → CsCl type (B2) at hydrostatic pressures of ∼19 GPa, 36 GPa, and 200 GPa, respectively. However, the two high pressure experimental studies by Gerward et al. [J. Appl. Crystallogr. 19, 308 (1986); J. Less-Common Met. 161, L11 (1990)] one up to 36 GPa and other up to 50 GPa, on substoichiometric thorium carbide samples with carbon deficiency of ∼20%, do not report any structural transition. The discrepancy between theory and experiment could be due to the non-stoichiometry of thorium carbide samples used in the experiment. Further, in order to substantiate the results of our static lattice calculations, we have determined the phonon dispersion relations for these structures from lattice dynamic calculations. The theoretically calculated phonon spectrum reveal that the B1 phase fails dynamically at ∼33.8 GPa whereas the Pnma phase appears as dynamically stable structure around the B1 to Pnma transition pressure. Similarly, the Cmcm structure also displays dynamic stability in the regime of its structural stability. The B2 phase becomes dynamically stable much below the Cmcm to B2 transition pressure. Additionally, we have derived various thermophysical properties such as zero pressure equilibrium volume, bulk modulus, its pressure derivative, Debye temperature, thermal expansion coefficient and Gruneisen parameter at 300 K and compared these with available experimental data. Further, the behavior of zero pressure bulk modulus, heat capacity and Helmholtz free energy has been examined as a function temperature and compared with the experimental data of Danan [J

  11. Prediction of new high pressure structural sequence in thorium carbide: A first principles study

    Science.gov (United States)

    Sahoo, B. D.; Joshi, K. D.; Gupta, Satish C.

    2015-05-01

    In the present work, we report the detailed electronic band structure calculations on thorium monocarbide. The comparison of enthalpies, derived for various phases using evolutionary structure search method in conjunction with first principles total energy calculations at several hydrostatic compressions, yielded a high pressure structural sequence of NaCl type (B1) → Pnma → Cmcm → CsCl type (B2) at hydrostatic pressures of ˜19 GPa, 36 GPa, and 200 GPa, respectively. However, the two high pressure experimental studies by Gerward et al. [J. Appl. Crystallogr. 19, 308 (1986); J. Less-Common Met. 161, L11 (1990)] one up to 36 GPa and other up to 50 GPa, on substoichiometric thorium carbide samples with carbon deficiency of ˜20%, do not report any structural transition. The discrepancy between theory and experiment could be due to the non-stoichiometry of thorium carbide samples used in the experiment. Further, in order to substantiate the results of our static lattice calculations, we have determined the phonon dispersion relations for these structures from lattice dynamic calculations. The theoretically calculated phonon spectrum reveal that the B1 phase fails dynamically at ˜33.8 GPa whereas the Pnma phase appears as dynamically stable structure around the B1 to Pnma transition pressure. Similarly, the Cmcm structure also displays dynamic stability in the regime of its structural stability. The B2 phase becomes dynamically stable much below the Cmcm to B2 transition pressure. Additionally, we have derived various thermophysical properties such as zero pressure equilibrium volume, bulk modulus, its pressure derivative, Debye temperature, thermal expansion coefficient and Gruneisen parameter at 300 K and compared these with available experimental data. Further, the behavior of zero pressure bulk modulus, heat capacity and Helmholtz free energy has been examined as a function temperature and compared with the experimental data of Danan [J. Nucl. Mater. 57, 280

  12. Evaluation of a transposase protocol for rapid generation of shotgun high-throughput sequencing libraries from nanogram quantities of DNA.

    Science.gov (United States)

    Marine, Rachel; Polson, Shawn W; Ravel, Jacques; Hatfull, Graham; Russell, Daniel; Sullivan, Matthew; Syed, Fraz; Dumas, Michael; Wommack, K Eric

    2011-11-01

    Construction of DNA fragment libraries for next-generation sequencing can prove challenging, especially for samples with low DNA yield. Protocols devised to circumvent the problems associated with low starting quantities of DNA can result in amplification biases that skew the distribution of genomes in metagenomic data. Moreover, sample throughput can be slow, as current library construction techniques are time-consuming. This study evaluated Nextera, a new transposon-based method that is designed for quick production of DNA fragment libraries from a small quantity of DNA. The sequence read distribution across nine phage genomes in a mock viral assemblage met predictions for six of the least-abundant phages; however, the rank order of the most abundant phages differed slightly from predictions. De novo genome assemblies from Nextera libraries provided long contigs spanning over half of the phage genome; in four cases where full-length genome sequences were available for comparison, consensus sequences were found to match over 99% of the genome with near-perfect identity. Analysis of areas of low and high sequence coverage within phage genomes indicated that GC content may influence coverage of sequences from Nextera libraries. Comparisons of phage genomes prepared using both Nextera and a standard 454 FLX Titanium library preparation protocol suggested that the coverage biases according to GC content observed within the Nextera libraries were largely attributable to bias in the Nextera protocol rather than to the 454 sequencing technology. Nevertheless, given suitable sequence coverage, the Nextera protocol produced high-quality data for genomic studies. For metagenomics analyses, effects of GC amplification bias would need to be considered; however, the library preparation standardization that Nextera provides should benefit comparative metagenomic analyses.

  13. Molecular cytogenetic mapping of Cucumis sativus and C. melo using highly repetitive DNA sequences.

    Science.gov (United States)

    Koo, Dal-Hoe; Nam, Young-Woo; Choi, Doil; Bang, Jae-Wook; de Jong, Hans; Hur, Yoonkang

    2010-04-01

    Chromosomes often serve as one of the most important molecular aspects of studying the evolution of species. Indeed, most of the crucial mutations that led to differentiation of species during the evolution have occurred at the chromosomal level. Furthermore, the analysis of pachytene chromosomes appears to be an invaluable tool for the study of evolution due to its effectiveness in chromosome identification and precise physical gene mapping. By applying fluorescence in situ hybridization of 45S rDNA and CsCent1 probes to cucumber pachytene chromosomes, here, we demonstrate that cucumber chromosomes 1 and 2 may have evolved from fusions of ancestral karyotype with chromosome number n = 12. This conclusion is further supported by the centromeric sequence similarity between cucumber and melon, which suggests that these sequences evolved from a common ancestor. It may be after or during speciation that these sequences were specifically amplified, after which they diverged and specific sequence variants were homogenized. Additionally, a structural change on the centromeric region of cucumber chromosome 4 was revealed by fiber-FISH using the mitochondrial-related repetitive sequences, BAC-E38 and CsCent1. These showed the former sequences being integrated into the latter in multiple regions. The data presented here are useful resources for comparative genomics and cytogenetics of Cucumis and, in particular, the ongoing genome sequencing project of cucumber.

  14. Simple tools for assembling and searching high-density picolitre pyrophosphate sequence data

    Directory of Open Access Journals (Sweden)

    Parker Andrew G

    2008-04-01

    Full Text Available Abstract Background The advent of pyrophosphate sequencing makes large volumes of sequencing data available at a lower cost than previously possible. However, the short read lengths are difficult to assemble and the large dataset is difficult to handle. During the sequencing of a virus from the tsetse fly, Glossina pallidipes, we found the need for tools to search quickly a set of reads for near exact text matches. Methods A set of tools is provided to search a large data set of pyrophosphate sequence reads under a "live" CD version of Linux on a standard PC that can be used by anyone without prior knowledge of Linux and without having to install a Linux setup on the computer. The tools permit short lengths of de novo assembly, checking of existing assembled sequences, selection and display of reads from the data set and gathering counts of sequences in the reads. Results Demonstrations are given of the use of the tools to help with checking an assembly against the fragment data set; investigating homopolymer lengths, repeat regions and polymorphisms; and resolving inserted bases caused by incomplete chain extension. Conclusion The additional information contained in a pyrophosphate sequencing data set beyond a basic assembly is difficult to access due to a lack of tools. The set of simple tools presented here would allow anyone with basic computer skills and a standard PC to access this information.

  15. High diversity of picornaviruses in rats from different continents revealed by deep sequencing

    DEFF Research Database (Denmark)

    Arn Hansen, Thomas; Mollerup, Sarah; Nguyen, Nam-Phuong

    2016-01-01

    Outbreaks of zoonotic diseases in humans and livestock are not uncommon, and an important component in containment of such emerging viral diseases is rapid and reliable diagnostics. Such methods are often PCR-based and hence require the availability of sequence data from the pathogen. Rattus......) collected from two continents by analyzing 2.2 billion next-generation sequencing reads derived from both DNA and RNA. Among other virus families, we found sequences from members of the Picornaviridae to be abundant in the microbiome of all the samples. Here we describe the diversity of the picornavirus...

  16. Confidence rating of marine eutrophication assessments

    DEFF Research Database (Denmark)

    Murray, Ciarán; Andersen, Jesper Harbo; Kaartokallio, Hermanni

    2011-01-01

    This report presents the development of a methodology for assessing confidence in eutrophication status classifications. The method can be considered as a secondary assessment, supporting the primary assessment of eutrophication status. The confidence assessment is based on a transparent scoring...... of the 'value' of the indicators on which the primary assessment is made. Such secondary assessment of confidence represents a first step towards linking status classification with information regarding their accuracy and precision and ultimately a tool for improving or targeting actions to improve the health...

  17. Building Scientific Confidence in Read-Across: Progress in ...

    Science.gov (United States)

    Presentation at the 41st Annual Winter Meeting of The Toxicology Forum - From Assay to Assessment: Incorporating High Throughput Strategies into Health and Safety Evaluations on Building Scientific Confidence in Read-Across: Progress in using HT Data to inform Read-Across Performance Presentation at the 41st Annual Winter Meeting of The Toxicology Forum - From Assay to Assessment: Incorporating High Throughput Strategies into Health and Safety Evaluations on Building Scientific Confidence in Read-Across: Progress in using HT Data to inform Read-Across Performance

  18. A Protocol for Functional Assessment of Whole-Protein Saturation Mutagenesis Libraries Utilizing High-Throughput Sequencing.

    Science.gov (United States)

    Stiffler, Michael A; Subramanian, Subu K; Salinas, Victor H; Ranganathan, Rama

    2016-07-03

    Site-directed mutagenesis has long been used as a method to interrogate protein structure, function and evolution. Recent advances in massively-parallel sequencing technology have opened up the possibility of assessing the functional or fitness effects of large numbers of mutations simultaneously. Here, we present a protocol for experimentally determining the effects of all possible single amino acid mutations in a protein of interest utilizing high-throughput sequencing technology, using the 263 amino acid antibiotic resistance enzyme TEM-1 β-lactamase as an example. In this approach, a whole-protein saturation mutagenesis library is constructed by site-directed mutagenic PCR, randomizing each position individually to all possible amino acids. The library is then transformed into bacteria, and selected for the ability to confer resistance to β-lactam antibiotics. The fitness effect of each mutation is then determined by deep sequencing of the library before and after selection. Importantly, this protocol introduces methods which maximize sequencing read depth and permit the simultaneous selection of the entire mutation library, by mixing adjacent positions into groups of length accommodated by high-throughput sequencing read length and utilizing orthogonal primers to barcode each group. Representative results using this protocol are provided by assessing the fitness effects of all single amino acid mutations in TEM-1 at a clinically relevant dosage of ampicillin. The method should be easily extendable to other proteins for which a high-throughput selection assay is in place.

  19. Phylogenetic and functional analysis of metagenome sequence from high-temperature archaeal habitats demonstrate linkages between metabolic potential and geochemistry

    Directory of Open Access Journals (Sweden)

    William P. Inskeep

    2013-05-01

    Full Text Available Geothermal habitats in Yellowstone National Park (YNP provide an unparalled opportunity to understand the environmental factors that control the distribution of archaea in thermal habitats. Here we describe, analyze and synthesize metagenomic and geochemical data collected from seven high-temperature sites that contain microbial communities dominated by archaea relative to bacteria. The specific objectives of the study were to use metagenome sequencing to determine the structure and functional capacity of thermophilic archaeal-dominated microbial communities across a pH range from 2.5 to 6.4 and to discuss specific examples where the metabolic potential correlated with measured environmental parameters and geochemical processes occurring in situ. Random shotgun metagenome sequence (~40-45 Mbase Sanger sequencing per site was obtained from environmental DNA extracted from high-temperature sediments and/or microbial mats and subjected to numerous phylogenetic and functional analyses. Analysis of individual sequences (e.g., MEGAN and G+C content and assemblies from each habitat type revealed the presence of dominant archaeal populations in all environments, 10 of whose genomes were largely reconstructed from the sequence data. Analysis of protein family occurrence, particularly of those involved in energy conservation, electron transport and autotrophic metabolism, revealed significant differences in metabolic strategies across sites consistent with differences in major geochemical attributes (e.g., sulfide, oxygen, pH. These observations provide an ecological basis for understanding the distribution of indigenous archaeal lineages across high temperature systems of YNP.

  20. Phylogenetic and Functional Analysis of Metagenome Sequence from High-Temperature Archaeal Habitats Demonstrate Linkages between Metabolic Potential and Geochemistry.

    Science.gov (United States)

    Inskeep, William P; Jay, Zackary J; Herrgard, Markus J; Kozubal, Mark A; Rusch, Douglas B; Tringe, Susannah G; Macur, Richard E; Jennings, Ryan deM; Boyd, Eric S; Spear, John R; Roberto, Francisco F

    2013-01-01

    Geothermal habitats in Yellowstone National Park (YNP) provide an unparalleled opportunity to understand the environmental factors that control the distribution of archaea in thermal habitats. Here we describe, analyze, and synthesize metagenomic and geochemical data collected from seven high-temperature sites that contain microbial communities dominated by archaea relative to bacteria. The specific objectives of the study were to use metagenome sequencing to determine the structure and functional capacity of thermophilic archaeal-dominated microbial communities across a pH range from 2.5 to 6.4 and to discuss specific examples where the metabolic potential correlated with measured environmental parameters and geochemical processes occurring in situ. Random shotgun metagenome sequence (∼40-45 Mb Sanger sequencing per site) was obtained from environmental DNA extracted from high-temperature sediments and/or microbial mats and subjected to numerous phylogenetic and functional analyses. Analysis of individual sequences (e.g., MEGAN and G + C content) and assemblies from each habitat type revealed the presence of dominant archaeal populations in all environments, 10 of whose genomes were largely reconstructed from the sequence data. Analysis of protein family occurrence, particularly of those involved in energy conservation, electron transport, and autotrophic metabolism, revealed significant differences in metabolic strategies across sites consistent with differences in major geochemical attributes (e.g., sulfide, oxygen, pH). These observations provide an ecological basis for understanding the distribution of indigenous archaeal lineages across high-temperature systems of YNP.