WorldWideScience

Sample records for sanger-derived sequencing reads

  1. Sequence Read Archive (SRA)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Sequence Read Archive (SRA) stores raw sequencing data from the next generation of sequencing platforms including Roche 454 GS System®, Illumina Genome...

  2. Transcriptome sequencing, and rapid development and application of SNP markers for the legume pod borer Maruca vitrata (Lepidoptera: Crambidae)

    Science.gov (United States)

    The legume pod borer, Maruca vitrata (Lepidoptera: Crambidae), is an insect pest species that is destructive to crops grown by subsistence farmers in tropical regions of West Africa. We present the de novo assembly of 3729 contigs from 454- and Sanger-derived sequencing reads for midgut, salivary, ...

  3. Unlocking short read sequencing for metagenomics.

    Directory of Open Access Journals (Sweden)

    Sébastien Rodrigue

    Full Text Available BACKGROUND: Different high-throughput nucleic acid sequencing platforms are currently available but a trade-off currently exists between the cost and number of reads that can be generated versus the read length that can be achieved. METHODOLOGY/PRINCIPAL FINDINGS: We describe an experimental and computational pipeline yielding millions of reads that can exceed 200 bp with quality scores approaching that of traditional Sanger sequencing. The method combines an automatable gel-less library construction step with paired-end sequencing on a short-read instrument. With appropriately sized library inserts, mate-pair sequences can overlap, and we describe the SHERA software package that joins them to form a longer composite read. CONCLUSIONS/SIGNIFICANCE: This strategy is broadly applicable to sequencing applications that benefit from low-cost high-throughput sequencing, but require longer read lengths. We demonstrate that our approach enables metagenomic analyses using the Illumina Genome Analyzer, with low error rates, and at a fraction of the cost of pyrosequencing.

  4. Flexible taxonomic assignment of ambiguous sequencing reads

    Directory of Open Access Journals (Sweden)

    Jansson Jesper

    2011-01-01

    Full Text Available Abstract Background To characterize the diversity of bacterial populations in metagenomic studies, sequencing reads need to be accurately assigned to taxonomic units in a given reference taxonomy. Reads that cannot be reliably assigned to a unique leaf in the taxonomy (ambiguous reads are typically assigned to the lowest common ancestor of the set of species that match it. This introduces a potentially severe error in the estimation of bacteria present in the sample due to false positives, since all species in the subtree rooted at the ancestor are implicitly assigned to the read even though many of them may not match it. Results We present a method that maps each read to a node in the taxonomy that minimizes a penalty score while balancing the relevance of precision and recall in the assignment through a parameter q. This mapping can be obtained in time linear in the number of matching sequences, because LCA queries to the reference taxonomy take constant time. When applied to six different metagenomic datasets, our algorithm produces different taxonomic distributions depending on whether coverage or precision is maximized. Including information on the quality of the reads reduces the number of unassigned reads but increases the number of ambiguous reads, stressing the relevance of our method. Finally, two measures of performance are described and results with a set of artificially generated datasets are discussed. Conclusions The assignment strategy of sequencing reads introduced in this paper is a versatile and a quick method to study bacterial communities. The bacterial composition of the analyzed samples can vary significantly depending on how ambiguous reads are assigned depending on the value of the q parameter. Validation of our results in an artificial dataset confirm that a combination of values of q produces the most accurate results.

  5. Inference of Isoforms from Short Sequence Reads

    Science.gov (United States)

    Feng, Jianxing; Li, Wei; Jiang, Tao

    Due to alternative splicing events in eukaryotic species, the identification of mRNA isoforms (or splicing variants) is a difficult problem. Traditional experimental methods for this purpose are time consuming and cost ineffective. The emerging RNA-Seq technology provides a possible effective method to address this problem. Although the advantages of RNA-Seq over traditional methods in transcriptome analysis have been confirmed by many studies, the inference of isoforms from millions of short sequence reads (e.g., Illumina/Solexa reads) has remained computationally challenging. In this work, we propose a method to calculate the expression levels of isoforms and infer isoforms from short RNA-Seq reads using exon-intron boundary, transcription start site (TSS) and poly-A site (PAS) information. We first formulate the relationship among exons, isoforms, and single-end reads as a convex quadratic program, and then use an efficient algorithm (called IsoInfer) to search for isoforms. IsoInfer can calculate the expression levels of isoforms accurately if all the isoforms are known and infer novel isoforms from scratch. Our experimental tests on known mouse isoforms with both simulated expression levels and reads demonstrate that IsoInfer is able to calculate the expression levels of isoforms with an accuracy comparable to the state-of-the-art statistical method and a 60 times faster speed. Moreover, our tests on both simulated and real reads show that it achieves a good precision and sensitivity in inferring isoforms when given accurate exon-intron boundary, TSS and PAS information, especially for isoforms whose expression levels are significantly high.

  6. Reading biological processes from nucleotide sequences

    Science.gov (United States)

    Murugan, Anand

    Cellular processes have traditionally been investigated by techniques of imaging and biochemical analysis of the molecules involved. The recent rapid progress in our ability to manipulate and read nucleic acid sequences gives us direct access to the genetic information that directs and constrains biological processes. While sequence data is being used widely to investigate genotype-phenotype relationships and population structure, here we use sequencing to understand biophysical mechanisms. We present work on two different systems. First, in chapter 2, we characterize the stochastic genetic editing mechanism that produces diverse T-cell receptors in the human immune system. We do this by inferring statistical distributions of the underlying biochemical events that generate T-cell receptor coding sequences from the statistics of the observed sequences. This inferred model quantitatively describes the potential repertoire of T-cell receptors that can be produced by an individual, providing insight into its potential diversity and the probability of generation of any specific T-cell receptor. Then in chapter 3, we present work on understanding the functioning of regulatory DNA sequences in both prokaryotes and eukaryotes. Here we use experiments that measure the transcriptional activity of large libraries of mutagenized promoters and enhancers and infer models of the sequence-function relationship from this data. For the bacterial promoter, we infer a physically motivated 'thermodynamic' model of the interaction of DNA-binding proteins and RNA polymerase determining the transcription rate of the downstream gene. For the eukaryotic enhancers, we infer heuristic models of the sequence-function relationship and use these models to find synthetic enhancer sequences that optimize inducibility of expression. Both projects demonstrate the utility of sequence information in conjunction with sophisticated statistical inference techniques for dissecting underlying biophysical

  7. Artificial duplicate reads in sequencing data of 454 Genome Sequencer FLX System

    Institute of Scientific and Technical Information of China (English)

    Hui Dong; Yangyi Chen; Yan Shen; Shengyue Wang; Guoping Zhao; Weirong Jin

    2011-01-01

    The 454 Genome Sequencer (GS) FLX System is one of the next-generation sequencing systems featured by long reads, high accuracy, and ultra-high throughput.Based on the mechanism of emulsion PCR, a unique DNA template would only generate a unique sequence read after being amplified and sequenced on GS FLX.However,biased amplification of DNA templates might occur in the process of emulsion PCR, which results in production of artificial duplicate reads.Under the condition that each DNA template is unique to another, 3.49%-18.14% of total reads in GS FLX-sequencing data were found to be artificial duplicate reads.These duplicate reads may lead to misunderstanding of sequencing data and special attention should be paid to the potential biases they introduced to the data.

  8. Atropos: specific, sensitive, and speedy trimming of sequencing reads

    Directory of Open Access Journals (Sweden)

    John P. Didion

    2017-08-01

    Full Text Available A key step in the transformation of raw sequencing reads into biological insights is the trimming of adapter sequences and low-quality bases. Read trimming has been shown to increase the quality and reliability while decreasing the computational requirements of downstream analyses. Many read trimming software tools are available; however, no tool simultaneously provides the accuracy, computational efficiency, and feature set required to handle the types and volumes of data generated in modern sequencing-based experiments. Here we introduce Atropos and show that it trims reads with high sensitivity and specificity while maintaining leading-edge speed. Compared to other state-of-the-art read trimming tools, Atropos achieves significant increases in trimming accuracy while remaining competitive in execution times. Furthermore, Atropos maintains high accuracy even when trimming data with elevated rates of sequencing errors. The accuracy, high performance, and broad feature set offered by Atropos makes it an appropriate choice for the pre-processing of Illumina, ABI SOLiD, and other current-generation short-read sequencing datasets. Atropos is open source and free software written in Python (3.3+ and available at https://github.com/jdidion/atropos.

  9. Comparison of sequence reads obtained from three next-generation sequencing platforms.

    Directory of Open Access Journals (Sweden)

    Shingo Suzuki

    Full Text Available Next-generation sequencing technologies enable the rapid cost-effective production of sequence data. To evaluate the performance of these sequencing technologies, investigation of the quality of sequence reads obtained from these methods is important. In this study, we analyzed the quality of sequence reads and SNP detection performance using three commercially available next-generation sequencers, i.e., Roche Genome Sequencer FLX System (FLX, Illumina Genome Analyzer (GA, and Applied Biosystems SOLiD system (SOLiD. A common genomic DNA sample obtained from Escherichia coli strain DH1 was applied to these sequencers. The obtained sequence reads were aligned to the complete genome sequence of E. coli DH1, to evaluate the accuracy and sequence bias of these sequence methods. We found that the fraction of "junk" data, which could not be aligned to the reference genome, was largest in the data set of SOLiD, in which about half of reads could not be aligned. Among data sets after alignment to the reference, sequence accuracy was poorest in GA data sets, suggesting relatively low fidelity of the elongation reaction in the GA method. Furthermore, by aligning the sequence reads to the E. coli strain W3110, we screened sequence differences between two E. coli strains using data sets of three different next-generation platforms. The results revealed that the detected sequence differences were similar among these three methods, while the sequence coverage required for the detection was significantly small in the FLX data set. These results provided valuable information on the quality of short sequence reads and the performance of SNP detection in three next-generation sequencing platforms.

  10. Haplotype-Phased Synthetic Long Reads from Short-Read Sequencing.

    Directory of Open Access Journals (Sweden)

    James A Stapleton

    Full Text Available Next-generation DNA sequencing has revolutionized the study of biology. However, the short read lengths of the dominant instruments complicate assembly of complex genomes and haplotype phasing of mixtures of similar sequences. Here we demonstrate a method to reconstruct the sequences of individual nucleic acid molecules up to 11.6 kilobases in length from short (150-bp reads. We show that our method can construct 99.97%-accurate synthetic reads from bacterial, plant, and animal genomic samples, full-length mRNA sequences from human cancer cell lines, and individual HIV env gene variants from a mixture. The preparation of multiple samples can be multiplexed into a single tube, further reducing effort and cost relative to competing approaches. Our approach generates sequencing libraries in three days from less than one microgram of DNA in a single-tube format without custom equipment or specialized expertise.

  11. A Probabilistic Genome-Wide Gene Reading Frame Sequence Model

    DEFF Research Database (Denmark)

    Have, Christian Theil; Mørk, Søren

    We introduce a new type of probabilistic sequence model, that model the sequential composition of reading frames of genes in a genome. Our approach extends gene finders with a model of the sequential composition of genes at the genome-level -- effectively producing a sequential genome annotation...... and are evaluated by the effect on prediction performance. Since bacterial gene finding to a large extent is a solved problem it forms an ideal proving ground for evaluating the explicit modeling of larger scale gene sequence composition of genomes. We conclude that the sequential composition of gene reading frames...... as output. The model can be used to obtain the most probable genome annotation based on a combination of i: a gene finder score of each gene candidate and ii: the sequence of the reading frames of gene candidates through a genome. The model --- as well as a higher order variant --- is developed and tested...

  12. Reading sequence-directed computational nucleosome maps.

    Science.gov (United States)

    Nibhani, Reshma; Trifonov, Edward N

    2015-01-01

    Recently developed latest version of the sequence-directed single-base resolution nucleosome mapping reveals existence of strong nucleosomes and chromatin columnar structures (columns). Broad application of this simple technique for further studies of chromatin and chromosome structure requires some basic understanding as to how it works and what information it affords. The paper provides such an introduction to the method. The oscillating maps of singular nucleosomes, of short and long oligonucleosome columns, are explained, as well as maps of chromatin on satellite DNA and occurrences of counter-phase (antiparallel) nucleosome neighbors.

  13. A Teaching-Learning Sequence about Weather Map Reading

    Science.gov (United States)

    Mandrikas, Achilleas; Stavrou, Dimitrios; Skordoulis, Constantine

    2017-01-01

    In this paper a teaching-learning sequence (TLS) introducing pre-service elementary teachers (PET) to weather map reading, with emphasis on wind assignment, is presented. The TLS includes activities about recognition of wind symbols, assignment of wind direction and wind speed on a weather map and identification of wind characteristics in a…

  14. ReadDepth: a parallel R package for detecting copy number alterations from short sequencing reads.

    Directory of Open Access Journals (Sweden)

    Christopher A Miller

    Full Text Available Copy number alterations are important contributors to many genetic diseases, including cancer. We present the readDepth package for R, which can detect these aberrations by measuring the depth of coverage obtained by massively parallel sequencing of the genome. In addition to achieving higher accuracy than existing packages, our tool runs much faster by utilizing multi-core architectures to parallelize the processing of these large data sets. In contrast to other published methods, readDepth does not require the sequencing of a reference sample, and uses a robust statistical model that accounts for overdispersed data. It includes a method for effectively increasing the resolution obtained from low-coverage experiments by utilizing breakpoint information from paired end sequencing to do positional refinement. We also demonstrate a method for inferring copy number using reads generated by whole-genome bisulfite sequencing, thus enabling integrative study of epigenomic and copy number alterations. Finally, we apply this tool to two genomes, showing that it performs well on genomes sequenced to both low and high coverage. The readDepth package runs on Linux and MacOSX, is released under the Apache 2.0 license, and is available at http://code.google.com/p/readdepth/.

  15. Oculus: faster sequence alignment by streaming read compression

    Science.gov (United States)

    2012-01-01

    Background Despite significant advancement in alignment algorithms, the exponential growth of nucleotide sequencing throughput threatens to outpace bioinformatic analysis. Computation may become the bottleneck of genome analysis if growing alignment costs are not mitigated by further improvement in algorithms. Much gain has been gleaned from indexing and compressing alignment databases, but many widely used alignment tools process input reads sequentially and are oblivious to any underlying redundancy in the reads themselves. Results Here we present Oculus, a software package that attaches to standard aligners and exploits read redundancy by performing streaming compression, alignment, and decompression of input sequences. This nearly lossless process (> 99.9%) led to alignment speedups of up to 270% across a variety of data sets, while requiring a modest amount of memory. We expect that streaming read compressors such as Oculus could become a standard addition to existing RNA-Seq and ChIP-Seq alignment pipelines, and potentially other applications in the future as throughput increases. Conclusions Oculus efficiently condenses redundant input reads and wraps existing aligners to provide nearly identical SAM output in a fraction of the aligner runtime. It includes a number of useful features, such as tunable performance and fidelity options, compatibility with FASTA or FASTQ files, and adherence to the SAM format. The platform-independent C++ source code is freely available online, at http://code.google.com/p/oculus-bio. PMID:23148484

  16. REPdenovo: Inferring De Novo Repeat Motifs from Short Sequence Reads.

    Directory of Open Access Journals (Sweden)

    Chong Chu

    Full Text Available Repeat elements are important components of eukaryotic genomes. One limitation in our understanding of repeat elements is that most analyses rely on reference genomes that are incomplete and often contain missing data in highly repetitive regions that are difficult to assemble. To overcome this problem we develop a new method, REPdenovo, which assembles repeat sequences directly from raw shotgun sequencing data. REPdenovo can construct various types of repeats that are highly repetitive and have low sequence divergence within copies. We show that REPdenovo is substantially better than existing methods both in terms of the number and the completeness of the repeat sequences that it recovers. The key advantage of REPdenovo is that it can reconstruct long repeats from sequence reads. We apply the method to human data and discover a number of potentially new repeats sequences that have been missed by previous repeat annotations. Many of these sequences are incorporated into various parasite genomes, possibly because the filtering process for host DNA involved in the sequencing of the parasite genomes failed to exclude the host derived repeat sequences. REPdenovo is a new powerful computational tool for annotating genomes and for addressing questions regarding the evolution of repeat families. The software tool, REPdenovo, is available for download at https://github.com/Reedwarbler/REPdenovo.

  17. Reading polymers: sequencing of natural and synthetic macromolecules.

    Science.gov (United States)

    Mutlu, Hatice; Lutz, Jean-François

    2014-11-24

    The sequencing of biopolymers such as proteins and DNA is among the most significant scientific achievements of the 20th century. Indeed, modern chemical methods for sequence analysis allow reading and understanding the codes of life. Thus, sequencing methods currently play a major role in applications as diverse as genomics, gene therapy, biotechnology, and data storage. However, in terms of fundamental science, sequencing is not really a question of molecular biology but rather a more general topic in macromolecular chemistry. Broadly speaking, it can be defined as the analysis of comonomer sequences in copolymers. However, relatively different approaches have been used in the past to study monomer sequences in biological and manmade polymers. Yet, these "cultural" differences are slowly fading away with the recent development of synthetic sequence-controlled polymers. In this context, the aim of this Minireview is to present an overview of the tools that are currently available for sequence analysis in macromolecular science. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  18. Decoding long nanopore sequencing reads of natural DNA.

    Science.gov (United States)

    Laszlo, Andrew H; Derrington, Ian M; Ross, Brian C; Brinkerhoff, Henry; Adey, Andrew; Nova, Ian C; Craig, Jonathan M; Langford, Kyle W; Samson, Jenny Mae; Daza, Riza; Doering, Kenji; Shendure, Jay; Gundlach, Jens H

    2014-08-01

    Nanopore sequencing of DNA is a single-molecule technique that may achieve long reads, low cost and high speed with minimal sample preparation and instrumentation. Here, we build on recent progress with respect to nanopore resolution and DNA control to interpret the procession of ion current levels observed during the translocation of DNA through the pore MspA. As approximately four nucleotides affect the ion current of each level, we measured the ion current corresponding to all 256 four-nucleotide combinations (quadromers). This quadromer map is highly predictive of ion current levels of previously unmeasured sequences derived from the bacteriophage phi X 174 genome. Furthermore, we show nanopore sequencing reads of phi X 174 up to 4,500 bases in length, which can be unambiguously aligned to the phi X 174 reference genome, and demonstrate proof-of-concept utility with respect to hybrid genome assembly and polymorphism detection. This work provides a foundation for nanopore sequencing of long, natural DNA strands.

  19. A teaching-learning sequence about weather map reading

    Science.gov (United States)

    Mandrikas, Achilleas; Stavrou, Dimitrios; Skordoulis, Constantine

    2017-07-01

    In this paper a teaching-learning sequence (TLS) introducing pre-service elementary teachers (PET) to weather map reading, with emphasis on wind assignment, is presented. The TLS includes activities about recognition of wind symbols, assignment of wind direction and wind speed on a weather map and identification of wind characteristics in a weather forecast. Sixty PET capabilities and difficulties in understanding weather maps were investigated, using inquiry-based learning activities. The results show that most PET became more capable of reading weather maps and assigning wind direction and speed on them. Our results also show that PET could be guided to understand meteorology concepts useful in everyday life and in teaching their future students.

  20. QTrim : a novel tool for the quality trimming of sequence reads generated using the Roche/454 sequencing platform

    OpenAIRE

    Shrestha, Ram; Lubinsky, Baruch; Bansode, Vijay B; Moinz, Mónica B. J.; McCormack, Grace P.; Travers, Simon A

    2014-01-01

    Background\\ud Many high throughput sequencing (HTS) approaches, such as the Roche/454 platform, produce sequences in which the quality of the sequence (as measured by a Phred-like quality scores) decreases linearly across a sequence read. Undertaking quality trimming of this data is essential to enable confidence in the results of subsequent downstream analysis. Here, we have developed a novel, highly sensitive and accurate approach (QTrim) for the quality trimming of sequence reads generated...

  1. Advantages of genome sequencing by long-read sequencer using SMRT technology in medical area.

    Science.gov (United States)

    Nakano, Kazuma; Shiroma, Akino; Shimoji, Makiko; Tamotsu, Hinako; Ashimine, Noriko; Ohki, Shun; Shinzato, Misuzu; Minami, Maiko; Nakanishi, Tetsuhiro; Teruya, Kuniko; Satou, Kazuhito; Hirano, Takashi

    2017-07-01

    PacBio RS II is the first commercialized third-generation DNA sequencer able to sequence a single molecule DNA in real-time without amplification. PacBio RS II's sequencing technology is novel and unique, enabling the direct observation of DNA synthesis by DNA polymerase. PacBio RS II confers four major advantages compared to other sequencing technologies: long read lengths, high consensus accuracy, a low degree of bias, and simultaneous capability of epigenetic characterization. These advantages surmount the obstacle of sequencing genomic regions such as high/low G+C, tandem repeat, and interspersed repeat regions. Moreover, PacBio RS II is ideal for whole genome sequencing, targeted sequencing, complex population analysis, RNA sequencing, and epigenetics characterization. With PacBio RS II, we have sequenced and analyzed the genomes of many species, from viruses to humans. Herein, we summarize and review some of our key genome sequencing projects, including full-length viral sequencing, complete bacterial genome and almost-complete plant genome assemblies, and long amplicon sequencing of a disease-associated gene region. We believe that PacBio RS II is not only an effective tool for use in the basic biological sciences but also in the medical/clinical setting.

  2. SRComp: short read sequence compression using burstsort and Elias omega coding.

    Directory of Open Access Journals (Sweden)

    Jeremy John Selva

    Full Text Available Next-generation sequencing (NGS technologies permit the rapid production of vast amounts of data at low cost. Economical data storage and transmission hence becomes an increasingly important challenge for NGS experiments. In this paper, we introduce a new non-reference based read sequence compression tool called SRComp. It works by first employing a fast string-sorting algorithm called burstsort to sort read sequences in lexicographical order and then Elias omega-based integer coding to encode the sorted read sequences. SRComp has been benchmarked on four large NGS datasets, where experimental results show that it can run 5-35 times faster than current state-of-the-art read sequence compression tools such as BEETL and SCALCE, while retaining comparable compression efficiency for large collections of short read sequences. SRComp is a read sequence compression tool that is particularly valuable in certain applications where compression time is of major concern.

  3. Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR: application and theory

    Directory of Open Access Journals (Sweden)

    Chaisson Mark J

    2012-09-01

    Full Text Available Abstract Background Recent methods have been developed to perform high-throughput sequencing of DNA by Single Molecule Sequencing (SMS. While Next-Generation sequencing methods may produce reads up to several hundred bases long, SMS sequencing produces reads up to tens of kilobases long. Existing alignment methods are either too inefficient for high-throughput datasets, or not sensitive enough to align SMS reads, which have a higher error rate than Next-Generation sequencing. Results We describe the method BLASR (Basic Local Alignment with Successive Refinement for mapping Single Molecule Sequencing (SMS reads that are thousands of bases long, with divergence between the read and genome dominated by insertion and deletion error. The method is benchmarked using both simulated reads and reads from a bacterial sequencing project. We also present a combinatorial model of sequencing error that motivates why our approach is effective. Conclusions The results indicate that it is possible to map SMS reads with high accuracy and speed. Furthermore, the inferences made on the mapability of SMS reads using our combinatorial model of sequencing error are in agreement with the mapping accuracy demonstrated on simulated reads.

  4. Extraction of high-molecular-weight genomic DNA for long-read sequencing of single molecules.

    Science.gov (United States)

    Mayjonade, Baptiste; Gouzy, Jérôme; Donnadieu, Cécile; Pouilly, Nicolas; Marande, William; Callot, Caroline; Langlade, Nicolas; Muños, Stéphane

    2016-10-01

    De novo sequencing of complex genomes is one of the main challenges for researchers seeking high-quality reference sequences. Many de novo assemblies are based on short reads, producing fragmented genome sequences. Third-generation sequencing, with read lengths >10 kb, will improve the assembly of complex genomes, but these techniques require high-molecular-weight genomic DNA (gDNA), and gDNA extraction protocols used for obtaining smaller fragments for short-read sequencing are not suitable for this purpose. Methods of preparing gDNA for bacterial artificial chromosome (BAC) libraries could be adapted, but these approaches are time-consuming, and commercial kits for these methods are expensive. Here, we present a protocol for rapid, inexpensive extraction of high-molecular-weight gDNA from bacteria, plants, and animals. Our technique was validated using sunflower leaf samples, producing a mean read length of 12.6 kb and a maximum read length of 80 kb.

  5. De novo assembly of human genomes with massively parallel short read sequencing

    DEFF Research Database (Denmark)

    Li, Ruiqiang; Zhu, Hongmei; Ruan, Jue

    2010-01-01

    genomes from short read sequences. We successfully assembled both the Asian and African human genome sequences, achieving an N50 contig size of 7.4 and 5.9 kilobases (kb) and scaffold of 446.3 and 61.9 kb, respectively. The development of this de novo short read assembly method creates new opportunities...... for building reference sequences and carrying out accurate analyses of unexplored genomes in a cost-effective way....

  6. Development and transferability of black and red raspberry microsatellite markers from short-read sequences

    Science.gov (United States)

    The advent of next-generation sequencing technologies has been a boon to the cost-effective development of molecular markers, particularly in non-model species. Here, we demonstrate the efficiency of microsatellite or simple sequence repeat (SSR) marker development from short-read sequences using th...

  7. MBMC: An Effective Markov Chain Approach for Binning Metagenomic Reads from Environmental Shotgun Sequencing Projects.

    Science.gov (United States)

    Wang, Ying; Hu, Haiyan; Li, Xiaoman

    2016-08-01

    Metagenomics is a next-generation omics field currently impacting postgenomic life sciences and medicine. Binning metagenomic reads is essential for the understanding of microbial function, compositions, and interactions in given environments. Despite the existence of dozens of computational methods for metagenomic read binning, it is still very challenging to bin reads. This is especially true for reads from unknown species, from species with similar abundance, and/or from low-abundance species in environmental samples. In this study, we developed a novel taxonomy-dependent and alignment-free approach called MBMC (Metagenomic Binning by Markov Chains). Different from all existing methods, MBMC bins reads by measuring the similarity of reads to the trained Markov chains for different taxa instead of directly comparing reads with known genomic sequences. By testing on more than 24 simulated and experimental datasets with species of similar abundance, species of low abundance, and/or unknown species, we report here that MBMC reliably grouped reads from different species into separate bins. Compared with four existing approaches, we demonstrated that the performance of MBMC was comparable with existing approaches when binning reads from sequenced species, and superior to existing approaches when binning reads from unknown species. MBMC is a pivotal tool for binning metagenomic reads in the current era of Big Data and postgenomic integrative biology. The MBMC software can be freely downloaded at http://hulab.ucf.edu/research/projects/metagenomics/MBMC.html .

  8. Anchored pseudo-de novo assembly of human genomes identifies extensive sequence variation from unmapped sequence reads.

    Science.gov (United States)

    Faber-Hammond, Joshua J; Brown, Kim H

    2016-07-01

    The human genome reference (HGR) completion marked the genomics era beginning, yet despite its utility universal application is limited by the small number of individuals used in its development. This is highlighted by the presence of high-quality sequence reads failing to map within the HGR. Sequences failing to map generally represent 2-5 % of total reads, which may harbor regions that would enhance our understanding of population variation, evolution, and disease. Alternatively, complete de novo assemblies can be created, but these effectively ignore the groundwork of the HGR. In an effort to find a middle ground, we developed a bioinformatic pipeline that maps paired-end reads to the HGR as separate single reads, exports unmappable reads, de novo assembles these reads per individual and then combines assemblies into a secondary reference assembly used for comparative analysis. Using 45 diverse 1000 Genomes Project individuals, we identified 351,361 contigs covering 195.5 Mb of sequence unincorporated in GRCh38. 30,879 contigs are represented in multiple individuals with ~40 % showing high sequence complexity. Genomic coordinates were generated for 99.9 %, with 52.5 % exhibiting high-quality mapping scores. Comparative genomic analyses with archaic humans and primates revealed significant sequence alignments and comparisons with model organism RefSeq gene datasets identified novel human genes. If incorporated, these sequences will expand the HGR, but more importantly our data highlight that with this method low coverage (~10-20×) next-generation sequencing can still be used to identify novel unmapped sequences to explore biological functions contributing to human phenotypic variation, disease and functionality for personal genomic medicine.

  9. Synthetic long-read sequencing reveals intraspecies diversity in the human microbiome.

    Science.gov (United States)

    Kuleshov, Volodymyr; Jiang, Chao; Zhou, Wenyu; Jahanbani, Fereshteh; Batzoglou, Serafim; Snyder, Michael

    2016-01-01

    Identifying bacterial strains in metagenome and microbiome samples using computational analyses of short-read sequences remains a difficult problem. Here, we present an analysis of a human gut microbiome using TruSeq synthetic long reads combined with computational tools for metagenomic long-read assembly, variant calling and haplotyping (Nanoscope and Lens). Our analysis identifies 178 bacterial species, of which 51 were not found using shotgun reads alone. We recover bacterial contigs that comprise multiple operons, including 22 contigs of >1 Mbp. Furthermore, we observe extensive intraspecies variation within microbial strains in the form of haplotypes that span up to hundreds of Kbp. Incorporation of synthetic long-read sequencing technology with standard short-read approaches enables more precise and comprehensive analyses of metagenomic samples.

  10. Synthetic long read sequencing reveals the composition and intraspecies diversity of the human microbiome

    Science.gov (United States)

    Kuleshov, Volodymyr; Jiang, Chao; Zhou, Wenyu; Jahanbani, Fereshteh; Batzoglou, Serafim; Snyder, Michael

    2016-01-01

    Identifying bacterial strains in metagenome and microbiome samples using computational analyses of short-read sequence remains a difficult problem. Here, we present an analysis of a human gut microbiome using on Tru-seq synthetic long reads combined with new computational tools for metagenomic long-read assembly, variant-calling and haplotyping (Nanoscope and Lens). Our analysis identifies 178 bacterial species of which 51 were not found using short sequence reads alone. We recover bacterial contigs that comprise multiple operons, including 22 contigs of >1Mbp. Extensive intraspecies variation among microbial strains in the form of haplotypes that span up to hundreds of Kbp can be observed using our approach. Our method incorporates synthetic long-read sequencing technology with standard shotgun approaches to move towards rapid, precise and comprehensive analyses of metagenome and microbiome samples. PMID:26655498

  11. Five simple guidelines for establishing basic authenticity and reliability of newly generated fungal ITS sequences

    Directory of Open Access Journals (Sweden)

    R. Henrik Nilsson

    2012-09-01

    Full Text Available Molecular data form an important research tool in most branches of mycology. A non-trivial proportion of the public fungal DNA sequences are, however, compromised in terms of quality and reliability, contributing noise and bias to sequence-borne inferences such as phylogenetic analysis, diversity assessment, and barcoding. In this paper we discuss various aspects and pitfalls of sequence quality assessment. Based on our observations, we provide a set of guidelines to assist in manual quality management of newly generated, near-full-length (Sanger-derived fungal ITS sequences and to some extent also sequences of shorter read lengths, other genes or markers, and groups of organisms. The guidelines are intentionally non-technical and do not require substantial bioinformatics skills or significant computational power. Despite their simple nature, we feel they would have caught the vast majority of the severely compromised ITS sequences in the public corpus. Our guidelines are nevertheless not infallible, and common sense and intuition remain important elements in the pursuit of compromised sequence data. The guidelines focus on basic sequence authenticity and reliability of the newly generated sequences, and the user may want to consider additional resources and steps to accomplish the best possible quality control. A discussion on the technical resources for further sequence quality management is therefore provided in the supplementary material.

  12. Curriculum Sequencing and the Acquisition of Clock-Reading Skills among Chinese and Flemish Children

    Science.gov (United States)

    Burny, Elise; Valcke, Martin; Desoete, Annemie; Van Luit, Johannes E. Hans

    2013-01-01

    The present study addresses the impact of the curriculum on primary school children's acquisition of clock-reading knowledge from analog and digital clocks. Focusing on Chinese and Flemish children's clock-reading knowledge, the study is about whether the differences in sequencing of learning and instruction opportunities--as defined by the…

  13. Comparison of microarray-predicted closest genomes to sequencing for poliovirus vaccine strain similarity and influenza A phylogeny.

    Science.gov (United States)

    Maurer-Stroh, Sebastian; Lee, Charlie W H; Patel, Champa; Lucero, Marilla; Nohynek, Hanna; Sung, Wing-Kin; Murad, Chrysanti; Ma, Jianmin; Hibberd, Martin L; Wong, Christopher W; Simões, Eric A F

    2016-03-01

    We evaluate sequence data from the PathChip high-density hybridization array for epidemiological interpretation of detected pathogens. For influenza A, we derive similar relative outbreak clustering in phylogenetic trees from PathChip-derived compared to classical Sanger-derived sequences. For a positive polio detection, recent infection could be excluded based on vaccine strain similarity.

  14. Read length and repeat resolution: Exploring prokaryote genomes using next-generation sequencing technologies

    KAUST Repository

    Cahill, Matt J.

    2010-07-12

    Background: There are a growing number of next-generation sequencing technologies. At present, the most cost-effective options also produce the shortest reads. However, even for prokaryotes, there is uncertainty concerning the utility of these technologies for the de novo assembly of complete genomes. This reflects an expectation that short reads will be unable to resolve small, but presumably abundant, repeats. Methodology/Principal Findings: Using a simple model of repeat assembly, we develop and test a technique that, for any read length, can estimate the occurrence of unresolvable repeats in a genome, and thus predict the number of gaps that would need to be closed to produce a complete sequence. We apply this technique to 818 prokaryote genome sequences. This provides a quantitative assessment of the relative performance of various lengths. Notably, unpaired reads of only 150nt can reconstruct approximately 50% of the analysed genomes with fewer than 96 repeat-induced gaps. Nonetheless, there is considerable variation amongst prokaryotes. Some genomes can be assembled to near contiguity using very short reads while others require much longer reads. Conclusions: Given the diversity of prokaryote genomes, a sequencing strategy should be tailored to the organism under study. Our results will provide researchers with a practical resource to guide the selection of the appropriate read length. 2010 Cahill et al.

  15. Read length and repeat resolution: exploring prokaryote genomes using next-generation sequencing technologies.

    Directory of Open Access Journals (Sweden)

    Matt J Cahill

    Full Text Available BACKGROUND: There are a growing number of next-generation sequencing technologies. At present, the most cost-effective options also produce the shortest reads. However, even for prokaryotes, there is uncertainty concerning the utility of these technologies for the de novo assembly of complete genomes. This reflects an expectation that short reads will be unable to resolve small, but presumably abundant, repeats. METHODOLOGY/PRINCIPAL FINDINGS: Using a simple model of repeat assembly, we develop and test a technique that, for any read length, can estimate the occurrence of unresolvable repeats in a genome, and thus predict the number of gaps that would need to be closed to produce a complete sequence. We apply this technique to 818 prokaryote genome sequences. This provides a quantitative assessment of the relative performance of various lengths. Notably, unpaired reads of only 150nt can reconstruct approximately 50% of the analysed genomes with fewer than 96 repeat-induced gaps. Nonetheless, there is considerable variation amongst prokaryotes. Some genomes can be assembled to near contiguity using very short reads while others require much longer reads. CONCLUSIONS: Given the diversity of prokaryote genomes, a sequencing strategy should be tailored to the organism under study. Our results will provide researchers with a practical resource to guide the selection of the appropriate read length.

  16. IMSA: integrated metagenomic sequence analysis for identification of exogenous reads in a host genomic background.

    Directory of Open Access Journals (Sweden)

    Michelle T Dimon

    Full Text Available Metagenomics, the study of microbial genomes within diverse environments, is a rapidly developing field. The identification of microbial sequences within a host organism enables the study of human intestinal, respiratory, and skin microbiota, and has allowed the identification of novel viruses in diseases such as Merkel cell carcinoma. There are few publicly available tools for metagenomic high throughput sequence analysis. We present Integrated Metagenomic Sequence Analysis (IMSA, a flexible, fast, and robust computational analysis pipeline that is available for public use. IMSA takes input sequence from high throughput datasets and uses a user-defined host database to filter out host sequence. IMSA then aligns the filtered reads to a user-defined universal database to characterize exogenous reads within the host background. IMSA assigns a score to each node of the taxonomy based on read frequency, and can output this as a taxonomy report suitable for cluster analysis or as a taxonomy map (TaxMap. IMSA also outputs the specific sequence reads assigned to a taxon of interest for downstream analysis. We demonstrate the use of IMSA to detect pathogens and normal flora within sequence data from a primary human cervical cancer carrying HPV16, a primary human cutaneous squamous cell carcinoma carrying HPV 16, the CaSki cell line carrying HPV16, and the HeLa cell line carrying HPV18.

  17. Low-bandwidth and non-compute intensive remote identification of microbes from raw sequencing reads.

    Directory of Open Access Journals (Sweden)

    Laurent Gautier

    Full Text Available Cheap DNA sequencing may soon become routine not only for human genomes but also for practically anything requiring the identification of living organisms from their DNA: tracking of infectious agents, control of food products, bioreactors, or environmental samples. We propose a novel general approach to the analysis of sequencing data where a reference genome does not have to be specified. Using a distributed architecture we are able to query a remote server for hints about what the reference might be, transferring a relatively small amount of data. Our system consists of a server with known reference DNA indexed, and a client with raw sequencing reads. The client sends a sample of unidentified reads, and in return receives a list of matching references. Sequences for the references can be retrieved and used for exhaustive computation on the reads, such as alignment. To demonstrate this approach we have implemented a web server, indexing tens of thousands of publicly available genomes and genomic regions from various organisms and returning lists of matching hits from query sequencing reads. We have also implemented two clients: one running in a web browser, and one as a python script. Both are able to handle a large number of sequencing reads and from portable devices (the browser-based running on a tablet, perform its task within seconds, and consume an amount of bandwidth compatible with mobile broadband networks. Such client-server approaches could develop in the future, allowing a fully automated processing of sequencing data and routine instant quality check of sequencing runs from desktop sequencers. A web access is available at http://tapir.cbs.dtu.dk. The source code for a python command-line client, a server, and supplementary data are available at http://bit.ly/1aURxkc.

  18. FANSe: an accurate algorithm for quantitative mapping of large scale sequencing reads.

    Science.gov (United States)

    Zhang, Gong; Fedyunin, Ivan; Kirchner, Sebastian; Xiao, Chuanle; Valleriani, Angelo; Ignatova, Zoya

    2012-06-01

    The most crucial step in data processing from high-throughput sequencing applications is the accurate and sensitive alignment of the sequencing reads to reference genomes or transcriptomes. The accurate detection of insertions and deletions (indels) and errors introduced by the sequencing platform or by misreading of modified nucleotides is essential for the quantitative processing of the RNA-based sequencing (RNA-Seq) datasets and for the identification of genetic variations and modification patterns. We developed a new, fast and accurate algorithm for nucleic acid sequence analysis, FANSe, with adjustable mismatch allowance settings and ability to handle indels to accurately and quantitatively map millions of reads to small or large reference genomes. It is a seed-based algorithm which uses the whole read information for mapping and high sensitivity and low ambiguity are achieved by using short and non-overlapping reads. Furthermore, FANSe uses hotspot score to prioritize the processing of highly possible matches and implements modified Smith-Watermann refinement with reduced scoring matrix to accelerate the calculation without compromising its sensitivity. The FANSe algorithm stably processes datasets from various sequencing platforms, masked or unmasked and small or large genomes. It shows a remarkable coverage of low-abundance mRNAs which is important for quantitative processing of RNA-Seq datasets.

  19. Long-read sequence assembly of the gorilla genome

    Science.gov (United States)

    Gordon, David; Huddleston, John; Chaisson, Mark J. P.; Hill, Christopher M.; Kronenberg, Zev N.; Munson, Katherine M.; Malig, Maika; Raja, Archana; Fiddes, Ian; Hillier, LaDeana W.; Dunn, Christopher; Baker, Carl; Armstrong, Joel; Diekhans, Mark; Paten, Benedict; Shendure, Jay; Wilson, Richard K.; Haussler, David; Chin, Chen-Shan; Eichler, Evan E.

    2016-01-01

    Accurate sequence and assembly of genomes is a critical first step for studies of genetic variation. We generated a high-quality assembly of the gorilla genome using single-molecule, real-time sequence technology and a string graph de novo assembly algorithm. The new assembly improves contiguity by two to three orders of magnitude with respect to previously released assemblies, recovering 87% of missing reference exons and incomplete gene models. Although regions of large, high-identity segmental duplications remain largely unresolved, this comprehensive assembly provides new biological insight into genetic diversity, structural variation, gene loss, and representation of repeat structures within the gorilla genome. The approach provides a path forward for the routine assembly of mammalian genomes at a level approaching that of the current quality of the human genome. PMID:27034376

  20. ngs_backbone: a pipeline for read cleaning, mapping and SNP calling using Next Generation Sequence

    Directory of Open Access Journals (Sweden)

    Cañizares Joaquin

    2011-06-01

    Full Text Available Abstract Background The possibilities offered by next generation sequencing (NGS platforms are revolutionizing biotechnological laboratories. Moreover, the combination of NGS sequencing and affordable high-throughput genotyping technologies is facilitating the rapid discovery and use of SNPs in non-model species. However, this abundance of sequences and polymorphisms creates new software needs. To fulfill these needs, we have developed a powerful, yet easy-to-use application. Results The ngs_backbone software is a parallel pipeline capable of analyzing Sanger, 454, Illumina and SOLiD (Sequencing by Oligonucleotide Ligation and Detection sequence reads. Its main supported analyses are: read cleaning, transcriptome assembly and annotation, read mapping and single nucleotide polymorphism (SNP calling and selection. In order to build a truly useful tool, the software development was paired with a laboratory experiment. All public tomato Sanger EST reads plus 14.2 million Illumina reads were employed to test the tool and predict polymorphism in tomato. The cleaned reads were mapped to the SGN tomato transcriptome obtaining a coverage of 4.2 for Sanger and 8.5 for Illumina. 23,360 single nucleotide variations (SNVs were predicted. A total of 76 SNVs were experimentally validated, and 85% were found to be real. Conclusions ngs_backbone is a new software package capable of analyzing sequences produced by NGS technologies and predicting SNVs with great accuracy. In our tomato example, we created a highly polymorphic collection of SNVs that will be a useful resource for tomato researchers and breeders. The software developed along with its documentation is freely available under the AGPL license and can be downloaded from http://bioinf.comav.upv.es/ngs_backbone/ or http://github.com/JoseBlanca/franklin.

  1. The Genomic Scrapheap Challenge; Extracting Relevant Data from Unmapped Whole Genome Sequencing Reads, Including Strain Specific Genomic Segments, in Rats.

    Science.gov (United States)

    van der Weide, Robin H; Simonis, Marieke; Hermsen, Roel; Toonen, Pim; Cuppen, Edwin; de Ligt, Joep

    2016-01-01

    Unmapped next-generation sequencing reads are typically ignored while they contain biologically relevant information. We systematically analyzed unmapped reads from whole genome sequencing of 33 inbred rat strains. High quality reads were selected and enriched for biologically relevant sequences; similarity-based analysis revealed clustering similar to previously reported phylogenetic trees. Our results demonstrate that on average 20% of all unmapped reads harbor sequences that can be used to improve reference genomes and generate hypotheses on potential genotype-phenotype relationships. Analysis pipelines would benefit from incorporating the described methods and reference genomes would benefit from inclusion of the genomic segments obtained through these efforts.

  2. Long-read sequencing and de novo assembly of a Chinese genome.

    Science.gov (United States)

    Shi, Lingling; Guo, Yunfei; Dong, Chengliang; Huddleston, John; Yang, Hui; Han, Xiaolu; Fu, Aisi; Li, Quan; Li, Na; Gong, Siyi; Lintner, Katherine E; Ding, Qiong; Wang, Zou; Hu, Jiang; Wang, Depeng; Wang, Feng; Wang, Lin; Lyon, Gholson J; Guan, Yongtao; Shen, Yufeng; Evgrafov, Oleg V; Knowles, James A; Thibaud-Nissen, Francoise; Schneider, Valerie; Yu, Chack-Yung; Zhou, Libing; Eichler, Evan E; So, Kwok-Fai; Wang, Kai

    2016-06-30

    Short-read sequencing has enabled the de novo assembly of several individual human genomes, but with inherent limitations in characterizing repeat elements. Here we sequence a Chinese individual HX1 by single-molecule real-time (SMRT) long-read sequencing, construct a physical map by NanoChannel arrays and generate a de novo assembly of 2.93 Gb (contig N50: 8.3 Mb, scaffold N50: 22.0 Mb, including 39.3 Mb N-bases), together with 206 Mb of alternative haplotypes. The assembly fully or partially fills 274 (28.4%) N-gaps in the reference genome GRCh38. Comparison to GRCh38 reveals 12.8 Mb of HX1-specific sequences, including 4.1 Mb that are not present in previously reported Asian genomes. Furthermore, long-read sequencing of the transcriptome reveals novel spliced genes that are not annotated in GENCODE and are missed by short-read RNA-Seq. Our results imply that improved characterization of genome functional variation may require the use of a range of genomic technologies on diverse human populations.

  3. BarraCUDA - a fast short read sequence aligner using graphics processing units

    LENUS (Irish Health Repository)

    Klus, Petr

    2012-01-13

    Abstract Background With the maturation of next-generation DNA sequencing (NGS) technologies, the throughput of DNA sequencing reads has soared to over 600 gigabases from a single instrument run. General purpose computing on graphics processing units (GPGPU), extracts the computing power from hundreds of parallel stream processors within graphics processing cores and provides a cost-effective and energy efficient alternative to traditional high-performance computing (HPC) clusters. In this article, we describe the implementation of BarraCUDA, a GPGPU sequence alignment software that is based on BWA, to accelerate the alignment of sequencing reads generated by these instruments to a reference DNA sequence. Findings Using the NVIDIA Compute Unified Device Architecture (CUDA) software development environment, we ported the most computational-intensive alignment component of BWA to GPU to take advantage of the massive parallelism. As a result, BarraCUDA offers a magnitude of performance boost in alignment throughput when compared to a CPU core while delivering the same level of alignment fidelity. The software is also capable of supporting multiple CUDA devices in parallel to further accelerate the alignment throughput. Conclusions BarraCUDA is designed to take advantage of the parallelism of GPU to accelerate the alignment of millions of sequencing reads generated by NGS instruments. By doing this, we could, at least in part streamline the current bioinformatics pipeline such that the wider scientific community could benefit from the sequencing technology. BarraCUDA is currently available from http:\\/\\/seqbarracuda.sf.net

  4. BarraCUDA - a fast short read sequence aligner using graphics processing units

    Directory of Open Access Journals (Sweden)

    Klus Petr

    2012-01-01

    Full Text Available Abstract Background With the maturation of next-generation DNA sequencing (NGS technologies, the throughput of DNA sequencing reads has soared to over 600 gigabases from a single instrument run. General purpose computing on graphics processing units (GPGPU, extracts the computing power from hundreds of parallel stream processors within graphics processing cores and provides a cost-effective and energy efficient alternative to traditional high-performance computing (HPC clusters. In this article, we describe the implementation of BarraCUDA, a GPGPU sequence alignment software that is based on BWA, to accelerate the alignment of sequencing reads generated by these instruments to a reference DNA sequence. Findings Using the NVIDIA Compute Unified Device Architecture (CUDA software development environment, we ported the most computational-intensive alignment component of BWA to GPU to take advantage of the massive parallelism. As a result, BarraCUDA offers a magnitude of performance boost in alignment throughput when compared to a CPU core while delivering the same level of alignment fidelity. The software is also capable of supporting multiple CUDA devices in parallel to further accelerate the alignment throughput. Conclusions BarraCUDA is designed to take advantage of the parallelism of GPU to accelerate the alignment of millions of sequencing reads generated by NGS instruments. By doing this, we could, at least in part streamline the current bioinformatics pipeline such that the wider scientific community could benefit from the sequencing technology. BarraCUDA is currently available from http://seqbarracuda.sf.net

  5. Characterizing novel endogenous retroviruses from genetic variation inferred from short sequence reads

    DEFF Research Database (Denmark)

    Mourier, Tobias; Mollerup, Sarah; Vinner, Lasse;

    2015-01-01

    From Illumina sequencing of DNA from brain and liver tissue from the lion, Panthera leo, and tumor samples from the pike-perch, Sander lucioperca, we obtained two assembled sequence contigs with similarity to known retroviruses. Phylogenetic analyses suggest that the pike-perch retrovirus belongs...... to the epsilonretroviruses, and the lion retrovirus to the gammaretroviruses. To determine if these novel retroviral sequences originate from an endogenous retrovirus or from a recently integrated exogenous retrovirus, we assessed the genetic diversity of the parental sequences from which the short Illumina reads...

  6. The genome of flax (Linum usitatissimum) assembled de novo from short shotgun sequence reads

    DEFF Research Database (Denmark)

    Wang, Zhiwen; Hobson, Neil; Galindo, Leonardo

    2012-01-01

    to 10 kb were sequenced using an Illumina genome analyzer. A de novo assembly, comprised exclusively of deep-coverage (approximately 94× raw, approximately 69× filtered) short-sequence reads (44-100 bp), produced a set of scaffolds with N(50) =694 kb, including contigs with N(50)=20.1 kb. The contig....... A total of 43384 protein-coding genes were predicted in the whole-genome shotgun assembly, and up to 93% of published flax ESTs, and 86% of A. thaliana genes aligned to these predicted genes, indicating excellent coverage and accuracy at the gene level. Analysis of the synonymous substitution rates (K...... these results show that de novo assembly, based solely on whole-genome shotgun short-sequence reads, is an efficient means of obtaining nearly complete genome sequence information for some plant species....

  7. How Hypertext Reading Sequences Affect Understanding of Causal and Temporal Relations in Story Comprehension

    Science.gov (United States)

    Urakami, Jacqueline; Krems, Josef F.

    2012-01-01

    The goal of this study is to examine the comprehension of global causal and temporal relations between events that are represented in single hypertext documents. In two experiments we examined how reading sequences of hypertext nodes affects the establishment of event relations and how this process can be supported by advanced organizers that…

  8. Mapping reads on a genomic sequence: an algorithmic overview and a practical comparative analysis.

    Science.gov (United States)

    Schbath, Sophie; Martin, Véronique; Zytnicki, Matthias; Fayolle, Julien; Loux, Valentin; Gibrat, Jean-François

    2012-06-01

    Mapping short reads against a reference genome is classically the first step of many next-generation sequencing data analyses, and it should be as accurate as possible. Because of the large number of reads to handle, numerous sophisticated algorithms have been developped in the last 3 years to tackle this problem. In this article, we first review the underlying algorithms used in most of the existing mapping tools, and then we compare the performance of nine of these tools on a well controled benchmark built for this purpose. We built a set of reads that exist in single or multiple copies in a reference genome and for which there is no mismatch, and a set of reads with three mismatches. We considered as reference genome both the human genome and a concatenation of all complete bacterial genomes. On each dataset, we quantified the capacity of the different tools to retrieve all the occurrences of the reads in the reference genome. Special attention was paid to reads uniquely reported and to reads with multiple hits.

  9. PrimeIndel: four-prime-number genetic code for indel decryption and sequence read alignment.

    Science.gov (United States)

    Lam, Ching-Wan

    2014-09-25

    To decrypt a doubly heterozygous sequence (DHS) in order to define the indel mutation for mutation reporting, an algorithm recursively searching the overlapped nucleotide using an offset of nucleotide positions can decrypt the indel without using a reference sequence. However, as genetic code is letter-based, special computer programs are required to run the decryption algorithm. The previous text-based algorithm was converted to a number-based algorithm by expressing DNA sequence from a 4-letter genetic code to a 4-prime-number genetic code, i.e., converting A, C, G, T to 2, 3, 5, and 7. This algorithm based on prime-number genetic code is called PrimeIndel and is executable by spreadsheet. Using prime number coded DNA sequence, the overlapped nucleotide between any 2 positions of the DHS is represented by the greatest common divisor (GCD) of the multiplication product of 2 prime numbers. This algorithm can also be used for aligning multiple overlapping sequence reads by in-silico DHS formation. The indel size of the in-silico formed DHS indicates the positions in the paired sequences for correct alignment. DHSs were successfully decrypted by the prime number-based algorithm and sequence reads were aligned correctly. DNA sequence expressed in prime numbers can be used for the decryption of DHS and the alignment of sequence reads using a well-known mathematical function GCD of a spreadsheet program. PrimeIndel is a useful tool for mutation reporting in clinical laboratories. The software is downloadable from http://www.patho.hku.hk/staff/list/cwlam.htm. Copyright © 2014 Elsevier B.V. All rights reserved.

  10. Multiplexed highly-accurate DNA sequencing of closely-related HIV-1 variants using continuous long reads from single molecule, real-time sequencing

    Science.gov (United States)

    Dilernia, Dario A.; Chien, Jung-Ting; Monaco, Daniela C.; Brown, Michael P.S.; Ende, Zachary; Deymier, Martin J.; Yue, Ling; Paxinos, Ellen E.; Allen, Susan; Tirado-Ramos, Alfredo; Hunter, Eric

    2015-01-01

    Single Molecule, Real-Time (SMRT®) Sequencing (Pacific Biosciences, Menlo Park, CA, USA) provides the longest continuous DNA sequencing reads currently available. However, the relatively high error rate in the raw read data requires novel analysis methods to deconvolute sequences derived from complex samples. Here, we present a workflow of novel computer algorithms able to reconstruct viral variant genomes present in mixtures with an accuracy of >QV50. This approach relies exclusively on Continuous Long Reads (CLR), which are the raw reads generated during SMRT Sequencing. We successfully implement this workflow for simultaneous sequencing of mixtures containing up to forty different >9 kb HIV-1 full genomes. This was achieved using a single SMRT Cell for each mixture and desktop computing power. This novel approach opens the possibility of solving complex sequencing tasks that currently lack a solution. PMID:26101252

  11. Multiplexed highly-accurate DNA sequencing of closely-related HIV-1 variants using continuous long reads from single molecule, real-time sequencing.

    Science.gov (United States)

    Dilernia, Dario A; Chien, Jung-Ting; Monaco, Daniela C; Brown, Michael P S; Ende, Zachary; Deymier, Martin J; Yue, Ling; Paxinos, Ellen E; Allen, Susan; Tirado-Ramos, Alfredo; Hunter, Eric

    2015-11-16

    Single Molecule, Real-Time (SMRT) Sequencing (Pacific Biosciences, Menlo Park, CA, USA) provides the longest continuous DNA sequencing reads currently available. However, the relatively high error rate in the raw read data requires novel analysis methods to deconvolute sequences derived from complex samples. Here, we present a workflow of novel computer algorithms able to reconstruct viral variant genomes present in mixtures with an accuracy of >QV50. This approach relies exclusively on Continuous Long Reads (CLR), which are the raw reads generated during SMRT Sequencing. We successfully implement this workflow for simultaneous sequencing of mixtures containing up to forty different >9 kb HIV-1 full genomes. This was achieved using a single SMRT Cell for each mixture and desktop computing power. This novel approach opens the possibility of solving complex sequencing tasks that currently lack a solution.

  12. Iterative error correction of long sequencing reads maximizes accuracy and improves contig assembly.

    Science.gov (United States)

    Sameith, Katrin; Roscito, Juliana G; Hiller, Michael

    2017-01-01

    Next-generation sequencers such as Illumina can now produce reads up to 300 bp with high throughput, which is attractive for genome assembly. A first step in genome assembly is to computationally correct sequencing errors. However, correcting all errors in these longer reads is challenging. Here, we show that reads with remaining errors after correction often overlap repeats, where short erroneous k-mers occur in other copies of the repeat. We developed an iterative error correction pipeline that runs the previously published String Graph Assembler (SGA) in multiple rounds of k-mer-based correction with an increasing k-mer size, followed by a final round of overlap-based correction. By combining the advantages of small and large k-mers, this approach corrects more errors in repeats and minimizes the total amount of erroneous reads. We show that higher read accuracy increases contig lengths two to three times. We provide SGA-Iteratively Correcting Errors (https://github.com/hillerlab/IterativeErrorCorrection/) that implements iterative error correction by using modules from SGA.

  13. Iterative error correction of long sequencing reads maximizes accuracy and improves contig assembly

    Science.gov (United States)

    Sameith, Katrin; Roscito, Juliana G.

    2017-01-01

    Next-generation sequencers such as Illumina can now produce reads up to 300 bp with high throughput, which is attractive for genome assembly. A first step in genome assembly is to computationally correct sequencing errors. However, correcting all errors in these longer reads is challenging. Here, we show that reads with remaining errors after correction often overlap repeats, where short erroneous k-mers occur in other copies of the repeat. We developed an iterative error correction pipeline that runs the previously published String Graph Assembler (SGA) in multiple rounds of k-mer-based correction with an increasing k-mer size, followed by a final round of overlap-based correction. By combining the advantages of small and large k-mers, this approach corrects more errors in repeats and minimizes the total amount of erroneous reads. We show that higher read accuracy increases contig lengths two to three times. We provide SGA-Iteratively Correcting Errors (https://github.com/hillerlab/IterativeErrorCorrection/) that implements iterative error correction by using modules from SGA. PMID:26868358

  14. Experimental design-based functional mining and characterization of high-throughput sequencing data in the sequence read archive.

    Directory of Open Access Journals (Sweden)

    Takeru Nakazato

    Full Text Available High-throughput sequencing technology, also called next-generation sequencing (NGS, has the potential to revolutionize the whole process of genome sequencing, transcriptomics, and epigenetics. Sequencing data is captured in a public primary data archive, the Sequence Read Archive (SRA. As of January 2013, data from more than 14,000 projects have been submitted to SRA, which is double that of the previous year. Researchers can download raw sequence data from SRA website to perform further analyses and to compare with their own data. However, it is extremely difficult to search entries and download raw sequences of interests with SRA because the data structure is complicated, and experimental conditions along with raw sequences are partly described in natural language. Additionally, some sequences are of inconsistent quality because anyone can submit sequencing data to SRA with no quality check. Therefore, as a criterion of data quality, we focused on SRA entries that were cited in journal articles. We extracted SRA IDs and PubMed IDs (PMIDs from SRA and full-text versions of journal articles and retrieved 2748 SRA ID-PMID pairs. We constructed a publication list referring to SRA entries. Since, one of the main themes of -omics analyses is clarification of disease mechanisms, we also characterized SRA entries by disease keywords, according to the Medical Subject Headings (MeSH extracted from articles assigned to each SRA entry. We obtained 989 SRA ID-MeSH disease term pairs, and constructed a disease list referring to SRA data. We previously developed feature profiles of diseases in a system called "Gendoo". We generated hyperlinks between diseases extracted from SRA and the feature profiles of it. The developed project, publication and disease lists resulting from this study are available at our web service, called "DBCLS SRA" (http://sra.dbcls.jp/. This service will improve accessibility to high-quality data from SRA.

  15. Viral population analysis and minority-variant detection using short read next-generation sequencing.

    Science.gov (United States)

    Watson, Simon J; Welkers, Matthijs R A; Depledge, Daniel P; Coulter, Eve; Breuer, Judith M; de Jong, Menno D; Kellam, Paul

    2013-03-19

    RNA viruses within infected individuals exist as a population of evolutionary-related variants. Owing to evolutionary change affecting the constitution of this population, the frequency and/or occurrence of individual viral variants can show marked or subtle fluctuations. Since the development of massively parallel sequencing platforms, such viral populations can now be investigated to unprecedented resolution. A critical problem with such analyses is the presence of sequencing-related errors that obscure the identification of true biological variants present at low frequency. Here, we report the development and assessment of the Quality Assessment of Short Read (QUASR) Pipeline (http://sourceforge.net/projects/quasr) specific for virus genome short read analysis that minimizes sequencing errors from multiple deep-sequencing platforms, and enables post-mapping analysis of the minority variants within the viral population. QUASR significantly reduces the error-related noise in deep-sequencing datasets, resulting in increased mapping accuracy and reduction of erroneous mutations. Using QUASR, we have determined influenza virus genome dynamics in sequential samples from an in vitro evolution of 2009 pandemic H1N1 (A/H1N1/09) influenza from samples sequenced on both the Roche 454 GSFLX and Illumina GAIIx platforms. Importantly, concordance between the 454 and Illumina sequencing allowed unambiguous minority-variant detection and accurate determination of virus population turnover in vitro.

  16. QColors: an algorithm for conservative viral quasispecies reconstruction from short and non-contiguous next generation sequencing reads.

    Science.gov (United States)

    Huang, Austin; Kantor, Rami; DeLong, Allison; Schreier, Leeann; Istrail, Sorin

    Next generation sequencing technologies have recently been applied to characterize mutational spectra of the heterogeneous population of viral genotypes (known as a quasispecies) within HIV-infected patients. Such information is clinically relevant because minority genetic subpopulations of HIV within patients enable viral escape from selection pressures such as the immune response and antiretroviral therapy. However, methods for quasispecies sequence reconstruction from next generation sequencing reads are not yet widely used and remains an emerging area of research. Furthermore, the majority of research methodology in HIV has focused on 454 sequencing, while many next-generation sequencing platforms used in practice are limited to shorter read lengths relative to 454 sequencing. Little work has been done in determining how best to address the read length limitations of other platforms. The approach described here incorporates graph representations of both read differences and read overlap to conservatively determine the regions of the sequence with sufficient variability to separate quasispecies sequences. Within these tractable regions of quasispecies inference, we use constraint programming to solve for an optimal quasispecies subsequence determination via vertex coloring of the conflict graph, a representation which also lends itself to data with non-contiguous reads such as paired-end sequencing. We demonstrate the utility of the method by applying it to simulations based on actual intra-patient clonal HIV-1 sequencing data.

  17. Alview: Portable Software for Viewing Sequence Reads in BAM Formatted Files.

    Science.gov (United States)

    Finney, Richard P; Chen, Qing-Rong; Nguyen, Cu V; Hsu, Chih Hao; Yan, Chunhua; Hu, Ying; Abawi, Massih; Bian, Xiaopeng; Meerzaman, Daoud M

    2015-01-01

    The name Alview is a contraction of the term Alignment Viewer. Alview is a compiled to native architecture software tool for visualizing the alignment of sequencing data. Inputs are files of short-read sequences aligned to a reference genome in the SAM/BAM format and files containing reference genome data. Outputs are visualizations of these aligned short reads. Alview is written in portable C with optional graphical user interface (GUI) code written in C, C++, and Objective-C. The application can run in three different ways: as a web server, as a command line tool, or as a native, GUI program. Alview is compatible with Microsoft Windows, Linux, and Apple OS X. It is available as a web demo at https://cgwb.nci.nih.gov/cgi-bin/alview. The source code and Windows/Mac/Linux executables are available via https://github.com/NCIP/alview.

  18. Slider--maximum use of probability information for alignment of short sequence reads and SNP detection.

    Science.gov (United States)

    Malhis, Nawar; Butterfield, Yaron S N; Ester, Martin; Jones, Steven J M

    2009-01-01

    A plethora of alignment tools have been created that are designed to best fit different types of alignment conditions. While some of these are made for aligning Illumina Sequence Analyzer reads, none of these are fully utilizing its probability (prb) output. In this article, we will introduce a new alignment approach (Slider) that reduces the alignment problem space by utilizing each read base's probabilities given in the prb files. Compared with other aligners, Slider has higher alignment accuracy and efficiency. In addition, given that Slider matches bases with probabilities other than the most probable, it significantly reduces the percentage of base mismatches. The result is that its SNP predictions are more accurate than other SNP prediction approaches used today that start from the most probable sequence, including those using base quality.

  19. Low-Bandwidth and Non-Compute Intensive Remote Identification of Microbes from Raw Sequencing Reads

    DEFF Research Database (Denmark)

    Gautier, Laurent; Lund, Ole

    2013-01-01

    Cheap DNA sequencing may soon become routine not only for human genomes but also for practically anything requiring the identification of living organisms from their DNA: tracking of infectious agents, control of food products, bioreactors, or environmental samples. We propose a novel general....... Both are able to handle a large number of sequencing reads and from portable devices (the browser-based running on a tablet), perform its task within seconds, and consume an amount of bandwidth compatible with mobile broadband networks. Such client-server approaches could develop in the future...

  20. Identifying wrong assemblies in de novo short read primary sequence assembly contigs

    Indian Academy of Sciences (India)

    VANDNA CHAWLA; RAJNISH KUMAR; RAVI SHANKAR

    2016-09-01

    With the advent of short-reads-based genome sequencing approaches, large number of organisms are being sequencedall over the world. Most of these assemblies are done using some de novo short read assemblers and other relatedapproaches. However, the contigs produced this way are prone to wrong assembly. So far, there is a conspicuousdearth of reliable tools to identify mis-assembled contigs. Mis-assemblies could result from incorrectly deleted orwrongly arranged genomic sequences. In the present work various factors related to sequence, sequencing andassembling have been assessed for their role in causing mis-assembly by using different genome sequencing data.Finally, some mis-assembly detecting tools have been evaluated for their ability to detect the wrongly assembledprimary contigs, suggesting a lot of scope for improvement in this area. The present work also proposes a simpleunsupervised learning-based novel approach to identify mis-assemblies in the contigs which was found performingreasonably well when compared to the already existing tools to report mis-assembled contigs. It was observed that theproposed methodology may work as a complementary system to the existing tools to enhance their accuracy.

  1. What's in your next-generation sequence data? An exploration of unmapped DNA and RNA sequence reads from the bovine reference individual

    Science.gov (United States)

    BACKGROUND: Next-generation sequencing projects commonly commence by aligning reads to a reference genome assembly. While improvements in alignment algorithms and computational hardware have greatly enhanced the efficiency and accuracy of alignments, a significant percentage of reads often remain u...

  2. Moleculo Long-Read Sequencing Facilitates Assembly and Genomic Binning from Complex Soil Metagenomes

    Energy Technology Data Exchange (ETDEWEB)

    White, Richard Allen; Bottos, Eric M.; Roy Chowdhury, Taniya; Zucker, Jeremy D.; Brislawn, Colin J.; Nicora, Carrie D.; Fansler, Sarah J.; Glaesemann, Kurt R.; Glass, Kevin; Jansson, Janet K.; Langille, Morgan

    2016-06-28

    ABSTRACT

    Soil metagenomics has been touted as the “grand challenge” for metagenomics, as the high microbial diversity and spatial heterogeneity of soils make them unamenable to current assembly platforms. Here, we aimed to improve soil metagenomic sequence assembly by applying the Moleculo synthetic long-read sequencing technology. In total, we obtained 267 Gbp of raw sequence data from a native prairie soil; these data included 109.7 Gbp of short-read data (~100 bp) from the Joint Genome Institute (JGI), an additional 87.7 Gbp of rapid-mode read data (~250 bp), plus 69.6 Gbp (>1.5 kbp) from Moleculo sequencing. The Moleculo data alone yielded over 5,600 reads of >10 kbp in length, and over 95% of the unassembled reads mapped to contigs of >1.5 kbp. Hybrid assembly of all data resulted in more than 10,000 contigs over 10 kbp in length. We mapped three replicate metatranscriptomes derived from the same parent soil to the Moleculo subassembly and found that 95% of the predicted genes, based on their assignments to Enzyme Commission (EC) numbers, were expressed. The Moleculo subassembly also enabled binning of >100 microbial genome bins. We obtained via direct binning the first complete genome, that of “CandidatusPseudomonas sp. strain JKJ-1” from a native soil metagenome. By mapping metatranscriptome sequence reads back to the bins, we found that several bins corresponding to low-relative-abundanceAcidobacteriawere highly transcriptionally active, whereas bins corresponding to high-relative-abundanceVerrucomicrobiawere not. These results demonstrate that Moleculo sequencing provides a significant advance for resolving complex soil microbial communities.

    IMPORTANCESoil microorganisms carry out key processes for life on our planet, including cycling of carbon and other nutrients and supporting growth of plants. However, there is poor molecular-level understanding of their

  3. SEQUENCE VARIABILITY OF HUMAN CYTOMEGALOVIRUS UL144 OPEN READING FRAME IN LOW-PASSAGE CLINICAL ISOLATES

    Institute of Scientific and Technical Information of China (English)

    Rong He; Yao-hua Ji; Qiang Ruan; Chang Xia; Lan-qing Liu; Sheng-min Lü; Ying Lu; Ying Qi; Yan-ping Ma; Qing Liu

    2004-01-01

    Objective To explore the relationship between human cytomegalovirus (HCMV) UL144 sequence variability and clinical disease.Methods HCMV UL144 open reading frame (ORF) was amplified by PCR assay in 72 lowpassage isolates [65 congenitally infective children and 7 healthy children who were HCMV-DNA positive by quantitative PCR (qPCR)]. All positive PCR products were analyzed by heteroduplex mobility assay and single-stranded conformation polymorphism (HMA-SSCP) and 32 of them were sequenced.Resuits Fifty-five patient isolates and five healthy children isolates were HCMV-UL144 positive by PCR. Sequencing and HMA-SSCP analysis showed that significant strain-specific variability was present in the UL144 ORF. Phylogenetic analysis indicated that the nucleotide sequences could be separated into 3 major genotypes. Comparing between UL144 sequences and the corresponding symptoms showed that genotype 2 did not exist in megacolon isolates. And genotype 1 and 3 were the major types among microcephaly and jaundice isolates respectively.Conclusions HCMV-UL144 existed in most of low passage isolates and sequences were hypervariable. The UL144ORF and its predicted product with the high level of sequence variability in different kinds of isolates suggest that UL144ORF might play a role in HCMV infectivity and subsequent diseases.

  4. Optimizing information in Next-Generation-Sequencing (NGS) reads for improving de novo genome assembly.

    Science.gov (United States)

    Liu, Tsunglin; Tsai, Cheng-Hung; Lee, Wen-Bin; Chiang, Jung-Hsien

    2013-01-01

    Next-Generation-Sequencing is advantageous because of its much higher data throughput and much lower cost compared with the traditional Sanger method. However, NGS reads are shorter than Sanger reads, making de novo genome assembly very challenging. Because genome assembly is essential for all downstream biological studies, great efforts have been made to enhance the completeness of genome assembly, which requires the presence of long reads or long distance information. To improve de novo genome assembly, we develop a computational program, ARF-PE, to increase the length of Illumina reads. ARF-PE takes as input Illumina paired-end (PE) reads and recovers the original DNA fragments from which two ends the paired reads are obtained. On the PE data of four bacteria, ARF-PE recovered >87% of the DNA fragments and achieved >98% of perfect DNA fragment recovery. Using Velvet, SOAPdenovo, Newbler, and CABOG, we evaluated the benefits of recovered DNA fragments to genome assembly. For all four bacteria, the recovered DNA fragments increased the assembly contiguity. For example, the N50 lengths of the P. brasiliensis contigs assembled by SOAPdenovo and Newbler increased from 80,524 bp to 166,573 bp and from 80,655 bp to 193,388 bp, respectively. ARF-PE also increased assembly accuracy in many cases. On the PE data of two fungi and a human chromosome, ARF-PE doubled and tripled the N50 length. However, the assembly accuracies dropped, but still remained >91%. In general, ARF-PE can increase both assembly contiguity and accuracy for bacterial genomes. For complex eukaryotic genomes, ARF-PE is promising because it raises assembly contiguity. But future error correction is needed for ARF-PE to also increase the assembly accuracy. ARF-PE is freely available at http://140.116.235.124/~tliu/arf-pe/.

  5. Choice of reference sequence and assembler for alignment of Listeria monocytogenes short-read sequence data greatly influences rates of error in SNP analyses.

    Directory of Open Access Journals (Sweden)

    Arthur W Pightling

    Full Text Available The wide availability of whole-genome sequencing (WGS and an abundance of open-source software have made detection of single-nucleotide polymorphisms (SNPs in bacterial genomes an increasingly accessible and effective tool for comparative analyses. Thus, ensuring that real nucleotide differences between genomes (i.e., true SNPs are detected at high rates and that the influences of errors (such as false positive SNPs, ambiguously called sites, and gaps are mitigated is of utmost importance. The choices researchers make regarding the generation and analysis of WGS data can greatly influence the accuracy of short-read sequence alignments and, therefore, the efficacy of such experiments. We studied the effects of some of these choices, including: i depth of sequencing coverage, ii choice of reference-guided short-read sequence assembler, iii choice of reference genome, and iv whether to perform read-quality filtering and trimming, on our ability to detect true SNPs and on the frequencies of errors. We performed benchmarking experiments, during which we assembled simulated and real Listeria monocytogenes strain 08-5578 short-read sequence datasets of varying quality with four commonly used assemblers (BWA, MOSAIK, Novoalign, and SMALT, using reference genomes of varying genetic distances, and with or without read pre-processing (i.e., quality filtering and trimming. We found that assemblies of at least 50-fold coverage provided the most accurate results. In addition, MOSAIK yielded the fewest errors when reads were aligned to a nearly identical reference genome, while using SMALT to align reads against a reference sequence that is ∼0.82% distant from 08-5578 at the nucleotide level resulted in the detection of the greatest numbers of true SNPs and the fewest errors. Finally, we show that whether read pre-processing improves SNP detection depends upon the choice of reference sequence and assembler. In total, this study demonstrates that researchers

  6. Filtration and Normalization of Sequencing Read Data in Whole-Metagenome Shotgun Samples

    Science.gov (United States)

    Chouvarine, Philippe; Wiehlmann, Lutz; Moran Losada, Patricia; DeLuca, David S.; Tümmler, Burkhard

    2016-01-01

    Ever-increasing affordability of next-generation sequencing makes whole-metagenome sequencing an attractive alternative to traditional 16S rDNA, RFLP, or culturing approaches for the analysis of microbiome samples. The advantage of whole-metagenome sequencing is that it allows direct inference of the metabolic capacity and physiological features of the studied metagenome without reliance on the knowledge of genotypes and phenotypes of the members of the bacterial community. It also makes it possible to overcome problems of 16S rDNA sequencing, such as unknown copy number of the 16S gene and lack of sufficient sequence similarity of the “universal” 16S primers to some of the target 16S genes. On the other hand, next-generation sequencing suffers from biases resulting in non-uniform coverage of the sequenced genomes. To overcome this difficulty, we present a model of GC-bias in sequencing metagenomic samples as well as filtration and normalization techniques necessary for accurate quantification of microbial organisms. While there has been substantial research in normalization and filtration of read-count data in such techniques as RNA-seq or Chip-seq, to our knowledge, this has not been the case for the field of whole-metagenome shotgun sequencing. The presented methods assume that complete genome references are available for most microorganisms of interest present in metagenomic samples. This is often a valid assumption in such fields as medical diagnostics of patient microbiota. Testing the model on two validation datasets showed four-fold reduction in root-mean-square error compared to non-normalized data in both cases. The presented methods can be applied to any pipeline for whole metagenome sequencing analysis relying on complete microbial genome references. We demonstrate that such pre-processing reduces the number of false positive hits and increases accuracy of abundance estimates. PMID:27760173

  7. Complete genome sequencing of Dehalococcoides sp. strain UCH007 using a differential reads picking method.

    Science.gov (United States)

    Uchino, Yoshihito; Miura, Takamasa; Hosoyama, Akira; Ohji, Shoko; Yamazoe, Atsushi; Ito, Masako; Takahata, Yoh; Suzuki, Ken-Ichiro; Fujita, Nobuyuki

    2015-01-01

    A novel Dehalococcoides sp. strain UCH007 was isolated from the groundwater polluted with chlorinated ethenes in Japan. This strain is capable of dechlorinating trichloroethene, cis-1,2-dichloroethene and vinyl chloride to ethene. Dehalococcoides bacteria are hardly cultivable, so genome sequencing has presented a challenge. In this study, we developed a differential reads picking method for mixed genomic DNA obtained from a co-culture, and applied it to the sequencing of strain UCH007. The genome of strain UCH007 consists of a 1,473,548-bp chromosome that encodes 1509 coding sequences including 29 putative reductive dehalogenase genes. Strain UCH007 is the first strain in the Victoria subgroup found to possess the pceA, tceA and vcrA genes.

  8. Stacks: building and genotyping Loci de novo from short-read sequences.

    Science.gov (United States)

    Catchen, Julian M; Amores, Angel; Hohenlohe, Paul; Cresko, William; Postlethwait, John H

    2011-08-01

    Advances in sequencing technology provide special opportunities for genotyping individuals with speed and thrift, but the lack of software to automate the calling of tens of thousands of genotypes over hundreds of individuals has hindered progress. Stacks is a software system that uses short-read sequence data to identify and genotype loci in a set of individuals either de novo or by comparison to a reference genome. From reduced representation Illumina sequence data, such as RAD-tags, Stacks can recover thousands of single nucleotide polymorphism (SNP) markers useful for the genetic analysis of crosses or populations. Stacks can generate markers for ultra-dense genetic linkage maps, facilitate the examination of population phylogeography, and help in reference genome assembly. We report here the algorithms implemented in Stacks and demonstrate their efficacy by constructing loci from simulated RAD-tags taken from the stickleback reference genome and by recapitulating and improving a genetic map of the zebrafish, Danio rerio.

  9. Improving transcriptome assembly through error correction of high-throughput sequence reads.

    Science.gov (United States)

    Macmanes, Matthew D; Eisen, Michael B

    2013-01-01

    The study of functional genomics, particularly in non-model organisms, has been dramatically improved over the last few years by the use of transcriptomes and RNAseq. While these studies are potentially extremely powerful, a computationally intensive procedure, the de novo construction of a reference transcriptome must be completed as a prerequisite to further analyses. The accurate reference is critically important as all downstream steps, including estimating transcript abundance are critically dependent on the construction of an accurate reference. Though a substantial amount of research has been done on assembly, only recently have the pre-assembly procedures been studied in detail. Specifically, several stand-alone error correction modules have been reported on and, while they have shown to be effective in reducing errors at the level of sequencing reads, how error correction impacts assembly accuracy is largely unknown. Here, we show via use of a simulated and empiric dataset, that applying error correction to sequencing reads has significant positive effects on assembly accuracy, and should be applied to all datasets. A complete collection of commands which will allow for the production of Reptile corrected reads is available at https://github.com/macmanes/error_correction/tree/master/scripts and as File S1.

  10. Design for Sequencing Spelling-to-Sound Correspondences in Mod 2 Reading Program, Volume 1 and 11.

    Science.gov (United States)

    Berdiansky, Betty; And Others

    The purpose of the study contained in this report is to provide research and design data for the Southwest Regional Laboratory (SWRL) Mod 2 Reading Program, a four-year program (K-3) for teaching reading skills to primary-grade children. The report is divided into two volumes. Volume one describes sequencing and methodology, and the specific rule…

  11. Indel variant analysis of short-read sequencing data with Scalpel.

    Science.gov (United States)

    Fang, Han; Bergmann, Ewa A; Arora, Kanika; Vacic, Vladimir; Zody, Michael C; Iossifov, Ivan; O'Rawe, Jason A; Wu, Yiyang; Jimenez Barron, Laura T; Rosenbaum, Julie; Ronemus, Michael; Lee, Yoon-Ha; Wang, Zihua; Dikoglu, Esra; Jobanputra, Vaidehi; Lyon, Gholson J; Wigler, Michael; Schatz, Michael C; Narzisi, Giuseppe

    2016-12-01

    As the second most common type of variation in the human genome, insertions and deletions (indels) have been linked to many diseases, but the discovery of indels of more than a few bases in size from short-read sequencing data remains challenging. Scalpel (http://scalpel.sourceforge.net) is an open-source software for reliable indel detection based on the microassembly technique. It has been successfully used to discover mutations in novel candidate genes for autism, and it is extensively used in other large-scale studies of human diseases. This protocol gives an overview of the algorithm and describes how to use Scalpel to perform highly accurate indel calling from whole-genome and whole-exome sequencing data. We provide detailed instructions for an exemplary family-based de novo study, but we also characterize the other two supported modes of operation: single-sample and somatic analysis. Indel normalization, visualization and annotation of the mutations are also illustrated. Using a standard server, indel discovery and characterization in the exonic regions of the example sequencing data can be completed in ∼5 h after read mapping.

  12. RNA-Seq analysis and gene discovery of Andrias davidianus using Illumina short read sequencing.

    Directory of Open Access Journals (Sweden)

    Fenggang Li

    Full Text Available The Chinese giant salamander, Andrias davidianus, is an important species in the course of evolution; however, there is insufficient genomic data in public databases for understanding its immunologic mechanisms. High-throughput transcriptome sequencing is necessary to generate an enormous number of transcript sequences from A. davidianus for gene discovery. In this study, we generated more than 40 million reads from samples of spleen and skin tissue using the Illumina paired-end sequencing technology. De novo assembly yielded 87,297 transcripts with a mean length of 734 base pairs (bp. Based on the sequence similarities, searching with known proteins, 38,916 genes were identified. Gene enrichment analysis determined that 981 transcripts were assigned to the immune system. Tissue-specific expression analysis indicated that 443 of transcripts were specifically expressed in the spleen and skin. Among these transcripts, 147 transcripts were found to be involved in immune responses and inflammatory reactions, such as fucolectin, β-defensins and lymphotoxin beta. Eight tissue-specific genes were selected for validation using real time reverse transcription quantitative PCR (qRT-PCR. The results showed that these genes were significantly more expressed in spleen and skin than in other tissues, suggesting that these genes have vital roles in the immune response. This work provides a comprehensive genomic sequence resource for A. davidianus and lays the foundation for future research on the immunologic and disease resistance mechanisms of A. davidianus and other amphibians.

  13. Reading.

    Science.gov (United States)

    Gee, James Paul

    1992-01-01

    Explores what is meant by reading, noting that to read is to respond appropriately to a specific consensus centered on certain values and that the consensus is achieved among persons whose paths through life have come together with members of dominant discourses in society. (SLD)

  14. Improved metagenome assemblies and taxonomic binning using long-read circular consensus sequence data.

    Science.gov (United States)

    Frank, J A; Pan, Y; Tooming-Klunderud, A; Eijsink, V G H; McHardy, A C; Nederbragt, A J; Pope, P B

    2016-05-09

    DNA assembly is a core methodological step in metagenomic pipelines used to study the structure and function within microbial communities. Here we investigate the utility of Pacific Biosciences long and high accuracy circular consensus sequencing (CCS) reads for metagenomic projects. We compared the application and performance of both PacBio CCS and Illumina HiSeq data with assembly and taxonomic binning algorithms using metagenomic samples representing a complex microbial community. Eight SMRT cells produced approximately 94 Mb of CCS reads from a biogas reactor microbiome sample that averaged 1319 nt in length and 99.7% accuracy. CCS data assembly generated a comparative number of large contigs greater than 1 kb, to those assembled from a ~190x larger HiSeq dataset (~18 Gb) produced from the same sample (i.e approximately 62% of total contigs). Hybrid assemblies using PacBio CCS and HiSeq contigs produced improvements in assembly statistics, including an increase in the average contig length and number of large contigs. The incorporation of CCS data produced significant enhancements in taxonomic binning and genome reconstruction of two dominant phylotypes, which assembled and binned poorly using HiSeq data alone. Collectively these results illustrate the value of PacBio CCS reads in certain metagenomics applications.

  15. MGmapper: Reference based mapping and taxonomy annotation of metagenomics sequence reads.

    Science.gov (United States)

    Petersen, Thomas Nordahl; Lukjancenko, Oksana; Thomsen, Martin Christen Frølund; Maddalena Sperotto, Maria; Lund, Ole; Møller Aarestrup, Frank; Sicheritz-Pontén, Thomas

    2017-01-01

    An increasing amount of species and gene identification studies rely on the use of next generation sequence analysis of either single isolate or metagenomics samples. Several methods are available to perform taxonomic annotations and a previous metagenomics benchmark study has shown that a vast number of false positive species annotations are a problem unless thresholds or post-processing are applied to differentiate between correct and false annotations. MGmapper is a package to process raw next generation sequence data and perform reference based sequence assignment, followed by a post-processing analysis to produce reliable taxonomy annotation at species and strain level resolution. An in-vitro bacterial mock community sample comprised of 8 genuses, 11 species and 12 strains was previously used to benchmark metagenomics classification methods. After applying a post-processing filter, we obtained 100% correct taxonomy assignments at species and genus level. A sensitivity and precision at 75% was obtained for strain level annotations. A comparison between MGmapper and Kraken at species level, shows MGmapper assigns taxonomy at species level using 84.8% of the sequence reads, compared to 70.5% for Kraken and both methods identified all species with no false positives. Extensive read count statistics are provided in plain text and excel sheets for both rejected and accepted taxonomy annotations. The use of custom databases is possible for the command-line version of MGmapper, and the complete pipeline is freely available as a bitbucked package (https://bitbucket.org/genomicepidemiology/mgmapper). A web-version (https://cge.cbs.dtu.dk/services/MGmapper) provides the basic functionality for analysis of small fastq datasets.

  16. Detection and removal of biases in the analysis of next-generation sequencing reads.

    Directory of Open Access Journals (Sweden)

    Schraga Schwartz

    Full Text Available Since the emergence of next-generation sequencing (NGS technologies, great effort has been put into the development of tools for analysis of the short reads. In parallel, knowledge is increasing regarding biases inherent in these technologies. Here we discuss four different biases we encountered while analyzing various Illumina datasets. These biases are due to both biological and statistical effects that in particular affect comparisons between different genomic regions. Specifically, we encountered biases pertaining to the distributions of nucleotides across sequencing cycles, to mappability, to contamination of pre-mRNA with mRNA, and to non-uniform hydrolysis of RNA. Most of these biases are not specific to one analyzed dataset, but are present across a variety of datasets and within a variety of genomic contexts. Importantly, some of these biases correlated in a highly significant manner with biological features, including transcript length, gene expression levels, conservation levels, and exon-intron architecture, misleadingly increasing the credibility of results due to them. We also demonstrate the relevance of these biases in the context of analyzing an NGS dataset mapping transcriptionally engaged RNA polymerase II (RNAPII in the context of exon-intron architecture, and show that elimination of these biases is crucial for avoiding erroneous interpretation of the data. Collectively, our results highlight several important pitfalls, challenges and approaches in the analysis of NGS reads.

  17. MOST: a modified MLST typing tool based on short read sequencing

    Science.gov (United States)

    Dallman, Timothy; Schaefer, Ulf; Sheppard, Carmen L.; Ashton, Philip; Pichon, Bruno; Ellington, Matthew; Swift, Craig; Green, Jonathan; Underwood, Anthony

    2016-01-01

    Multilocus sequence typing (MLST) is an effective method to describe bacterial populations. Conventionally, MLST involves Polymerase Chain Reaction (PCR) amplification of housekeeping genes followed by Sanger DNA sequencing. Public Health England (PHE) is in the process of replacing the conventional MLST methodology with a method based on short read sequence data derived from Whole Genome Sequencing (WGS). This paper reports the comparison of the reliability of MLST results derived from WGS data, comparing mapping and assembly-based approaches to conventional methods using 323 bacterial genomes of diverse species. The sensitivity of the two WGS based methods were further investigated with 26 mixed and 29 low coverage genomic data sets from Salmonella enteridis and Streptococcus pneumoniae. Of the 323 samples, 92.9% (n = 300), 97.5% (n = 315) and 99.7% (n = 322) full MLST profiles were derived by the conventional method, assembly- and mapping-based approaches, respectively. The concordance between samples that were typed by conventional (92.9%) and both WGS methods was 100%. From the 55 mixed and low coverage genomes, 89.1% (n = 49) and 67.3% (n = 37) full MLST profiles were derived from the mapping and assembly based approaches, respectively. In conclusion, deriving MLST from WGS data is more sensitive than the conventional method. When comparing WGS based methods, the mapping based approach was the most sensitive. In addition, the mapping based approach described here derives quality metrics, which are difficult to determine quantitatively using conventional and WGS-assembly based approaches. PMID:27602279

  18. Long-read sequencing of chicken transcripts and identification of new transcript isoforms.

    Directory of Open Access Journals (Sweden)

    Sean Thomas

    Full Text Available The chicken has long served as an important model organism in many fields, and continues to aid our understanding of animal development. Functional genomics studies aimed at probing the mechanisms that regulate development require high-quality genomes and transcript annotations. The quality of these resources has improved dramatically over the last several years, but many isoforms and genes have yet to be identified. We hope to contribute to the process of improving these resources with the data presented here: a set of long cDNA sequencing reads, and a curated set of new genes and transcript isoforms not currently represented in the most up-to-date genome annotation currently available to the community of researchers who rely on the chicken genome.

  19. MGmapper: Reference based mapping and taxonomy annotation of metagenomics sequence reads

    DEFF Research Database (Denmark)

    Petersen, Thomas Nordahl; Lukjancenko, Oksana; Thomsen, Martin Christen Frølund

    2017-01-01

    .5% for Kraken and both methods identified all species with no false positives. Extensive read count statistics are provided in plain text and excel sheets for both rejected and accepted taxonomy annotations. The use of custom databases is possible for the command-line version of MGmapper, and the complete......An increasing amount of species and gene identification studies rely on the use of next generation sequence analysis of either single isolate or metagenomics samples. Several methods are available to perform taxonomic annotations and a previous metagenomics benchmark study has shown that a vast......-processing analysis to produce reliable taxonomy annotation at species and strain level resolution. An in-vitro bacterial mock community sample comprised of 8 genuses, 11 species and 12 strains was previously used to benchmark metagenomics classification methods. After applying a post-processing filter, we obtained...

  20. MOST: a modified MLST typing tool based on short read sequencing

    Directory of Open Access Journals (Sweden)

    Rediat Tewolde

    2016-08-01

    Full Text Available Multilocus sequence typing (MLST is an effective method to describe bacterial populations. Conventionally, MLST involves Polymerase Chain Reaction (PCR amplification of housekeeping genes followed by Sanger DNA sequencing. Public Health England (PHE is in the process of replacing the conventional MLST methodology with a method based on short read sequence data derived from Whole Genome Sequencing (WGS. This paper reports the comparison of the reliability of MLST results derived from WGS data, comparing mapping and assembly-based approaches to conventional methods using 323 bacterial genomes of diverse species. The sensitivity of the two WGS based methods were further investigated with 26 mixed and 29 low coverage genomic data sets from Salmonella enteridis and Streptococcus pneumoniae. Of the 323 samples, 92.9% (n = 300, 97.5% (n = 315 and 99.7% (n = 322 full MLST profiles were derived by the conventional method, assembly- and mapping-based approaches, respectively. The concordance between samples that were typed by conventional (92.9% and both WGS methods was 100%. From the 55 mixed and low coverage genomes, 89.1% (n = 49 and 67.3% (n = 37 full MLST profiles were derived from the mapping and assembly based approaches, respectively. In conclusion, deriving MLST from WGS data is more sensitive than the conventional method. When comparing WGS based methods, the mapping based approach was the most sensitive. In addition, the mapping based approach described here derives quality metrics, which are difficult to determine quantitatively using conventional and WGS-assembly based approaches.

  1. Comprehensive transcriptome analysis using synthetic long-read sequencing reveals molecular co-association of distant splicing events.

    Science.gov (United States)

    Tilgner, Hagen; Jahanbani, Fereshteh; Blauwkamp, Tim; Moshrefi, Ali; Jaeger, Erich; Chen, Feng; Harel, Itamar; Bustamante, Carlos D; Rasmussen, Morten; Snyder, Michael P

    2015-07-01

    Alternative splicing shapes mammalian transcriptomes, with many RNA molecules undergoing multiple distant alternative splicing events. Comprehensive transcriptome analysis, including analysis of exon co-association in the same molecule, requires deep, long-read sequencing. Here we introduce an RNA sequencing method, synthetic long-read RNA sequencing (SLR-RNA-seq), in which small pools (≤1,000 molecules/pool, ≤1 molecule/gene for most genes) of full-length cDNAs are amplified, fragmented and short-read-sequenced. We demonstrate that these RNA sequences reconstructed from the short reads from each of the pools are mostly close to full length and contain few insertion and deletion errors. We report many previously undescribed isoforms (human brain: ∼13,800 affected genes, 14.5% of molecules; mouse brain ∼8,600 genes, 18% of molecules) and up to 165 human distant molecularly associated exon pairs (dMAPs) and distant molecularly and mutually exclusive pairs (dMEPs). Of 16 associated pairs detected in the mouse brain, 9 are conserved in human. Our results indicate conserved mechanisms that can produce distant but phased features on transcript and proteome isoforms.

  2. DNApod: DNA polymorphism annotation database from next-generation sequence read archives

    Science.gov (United States)

    Mochizuki, Takako; Tanizawa, Yasuhiro; Fujisawa, Takatomo; Ohta, Tazro; Nikoh, Naruo; Shimizu, Tokurou; Toyoda, Atsushi; Fujiyama, Asao; Kurata, Nori; Nagasaki, Hideki; Kaminuma, Eli; Nakamura, Yasukazu

    2017-01-01

    With the rapid advances in next-generation sequencing (NGS), datasets for DNA polymorphisms among various species and strains have been produced, stored, and distributed. However, reliability varies among these datasets because the experimental and analytical conditions used differ among assays. Furthermore, such datasets have been frequently distributed from the websites of individual sequencing projects. It is desirable to integrate DNA polymorphism data into one database featuring uniform quality control that is distributed from a single platform at a single place. DNA polymorphism annotation database (DNApod; http://tga.nig.ac.jp/dnapod/) is an integrated database that stores genome-wide DNA polymorphism datasets acquired under uniform analytical conditions, and this includes uniformity in the quality of the raw data, the reference genome version, and evaluation algorithms. DNApod genotypic data are re-analyzed whole-genome shotgun datasets extracted from sequence read archives, and DNApod distributes genome-wide DNA polymorphism datasets and known-gene annotations for each DNA polymorphism. This new database was developed for storing genome-wide DNA polymorphism datasets of plants, with crops being the first priority. Here, we describe our analyzed data for 679, 404, and 66 strains of rice, maize, and sorghum, respectively. The analytical methods are available as a DNApod workflow in an NGS annotation system of the DNA Data Bank of Japan and a virtual machine image. Furthermore, DNApod provides tables of links of identifiers between DNApod genotypic data and public phenotypic data. To advance the sharing of organism knowledge, DNApod offers basic and ubiquitous functions for multiple alignment and phylogenetic tree construction by using orthologous gene information. PMID:28234924

  3. Design for Sequencing Spelling-To-Sound Correspondences for the SWRL Reading Program. Technical Report No. 47.

    Science.gov (United States)

    Berdiansky, Betty; And Others

    From a 9,000-word lexicon, a set of spelling-to-sound correspondences was developed to systematically organize possible content for beginning reading instruction. With the aid of computer sorting procedures, correspondences and correspondence exemplars were sequenced according to criteria of productivity, regularity, generalizability, and…

  4. Pseudo-De Novo Assembly and Analysis of Unmapped Genome Sequence Reads in Wild Zebrafish Reveal Novel Gene Content.

    Science.gov (United States)

    Faber-Hammond, Joshua J; Brown, Kim H

    2016-04-01

    Zebrafish represents the third vertebrate with an officially completed genome, yet it remains incomplete with additions and corrections continuing with the current release, GRCz10, having 13% of zebrafish cDNA sequences unmapped. This disparity may result from population differences, given that the genome reference was generated from clonal individuals with limited genetic diversity. This is supported by the recent analysis of a single wild zebrafish, which identified over 5.2 million SNPs and 1.6 million in/dels in the previous genome build, zv9. Re-examination of this sequence data set indicated that 13.8% of quality sequence reads failed to align to GRCz10. Using a novel bioinformatics de novo assembly pipeline on these unmappable reads, we identified 1,514,491 novel contigs covering ∼224 Mb of genomic sequence. Among these, 1083 contigs were found to contain a potential gene coding sequence. RNA-seq data comparison confirmed that 362 contigs contained a transcribed DNA sequence, suggesting that a large amount of functional genomic sequence remains unannotated in the zebrafish reference genome. By utilizing the bioinformatics pipeline developed in this study, the zebrafish genome will be bolstered as a model for human disease research. Adaptation of the pipeline described here also offers a cost-efficient and effective method to identify and map novel genetic content across any genome and will ultimately aid in the completion of additional genomes for a broad range of species.

  5. A general method for nested RT-PCR amplification and sequencing the complete HCV genotype 1 open reading frame

    Directory of Open Access Journals (Sweden)

    Tavis John E

    2005-12-01

    Full Text Available Abstract Background Hepatitis C virus (HCV is a pathogenic hepatic flavivirus with a single stranded RNA genome. It has a high genetic variability and is classified into six major genotypes. Genotype 1a and 1b cause the majority of infections in the USA. Viral genomic sequence information is needed to correlate viral variation with pathology or response to therapy. However, reverse transcription-polymerase chain reaction (RT-PCR of the HCV genome must overcome low template concentration and high target sequence diversity. Amplification conditions must hence have both high sensitivity and specificity yet recognize a heterogeneous target population to permit general amplification with minimal bias. This places divergent demands of the amplification conditions that can be very difficult to reconcile. Results RT and nested PCR conditions were optimized independently and systematically for amplifying the complete open reading frame (ORF from HCV genotype 1a and 1b using several overlapping amplicons. For each amplicon, multiple pairs of nested PCR primers were optimized. Using these primers, the success rate (defined as the rate of production of sufficient DNA for sequencing with any one of the primer pairs for a given amplicon for amplification of 72 genotype 1a and 1b patient plasma samples averaged over 95% for all amplicons. In addition, two sets of sequencing primers were optimized for each genotype 1a and 1b. Viral consensus sequences were determined by directly sequencing the amplicons. HCV ORFs from 72 patients have been sequenced using these primers. Sequencing errors were negligible because sequencing depth was over 4-fold and both strands were sequenced. Primer bias was controlled and monitored through careful primer design and control experiments. Conclusion Optimized RT-PCR and sequencing conditions are useful for rapid and reliable amplification and sequencing of HCV genotype 1a and 1b ORFs.

  6. Evaluation of methods for de novo genome assembly from high-throughput sequencing reads reveals dependencies that affect the quality of the results.

    Science.gov (United States)

    Haiminen, Niina; Kuhn, David N; Parida, Laxmi; Rigoutsos, Isidore

    2011-01-01

    Recent developments in high-throughput sequencing technology have made low-cost sequencing an attractive approach for many genome analysis tasks. Increasing read lengths, improving quality and the production of increasingly larger numbers of usable sequences per instrument-run continue to make whole-genome assembly an appealing target application. In this paper we evaluate the feasibility of de novo genome assembly from short reads (≤100 nucleotides) through a detailed study involving genomic sequences of various lengths and origin, in conjunction with several of the currently popular assembly programs. Our extensive analysis demonstrates that, in addition to sequencing coverage, attributes such as the architecture of the target genome, the identity of the used assembly program, the average read length and the observed sequencing error rates are powerful variables that affect the best achievable assembly of the target sequence in terms of size and correctness.

  7. Co-barcoded sequence reads from long DNA fragments: A cost-effective solution for Perfect Genome sequencing

    Directory of Open Access Journals (Sweden)

    Brock A Peters

    2015-01-01

    Full Text Available Next generation sequencing (NGS technologies, primarily based on massively parallel sequencing (MPS, have touched and radically changed almost all aspects of research worldwide. These technologies have allowed for the rapid analysis, to date, of the genomes of more than 2,000 different species. In humans, NGS has arguably had the largest impact. Over 100,000 genomes of individual humans (based on various estimates have been sequenced allowing for deep insights into what makes individuals and families unique and what causes disease in each of us. Despite all of this progress, the current state of the art in sequence technology is far from generating a perfect genome sequence and much remains to be understood in the biology of human and other organisms’ genomes. In the article that follows we outline, why the perfect genome in humans is important, what is lacking from current human whole genome sequences, and a potential strategy for achieving the perfect genome in a cost effective manner.

  8. Characterization of a biogas-producing microbial community by short-read next generation DNA sequencing

    Directory of Open Access Journals (Sweden)

    Wirth Roland

    2012-07-01

    Full Text Available Abstract Background Renewable energy production is currently a major issue worldwide. Biogas is a promising renewable energy carrier as the technology of its production combines the elimination of organic waste with the formation of a versatile energy carrier, methane. In consequence of the complexity of the microbial communities and metabolic pathways involved the biotechnology of the microbiological process leading to biogas production is poorly understood. Metagenomic approaches are suitable means of addressing related questions. In the present work a novel high-throughput technique was tested for its benefits in resolving the functional and taxonomical complexity of such microbial consortia. Results It was demonstrated that the extremely parallel SOLiD™ short-read DNA sequencing platform is capable of providing sufficient useful information to decipher the systematic and functional contexts within a biogas-producing community. Although this technology has not been employed to address such problems previously, the data obtained compare well with those from similar high-throughput approaches such as 454-pyrosequencing GS FLX or Titanium. The predominant microbes contributing to the decomposition of organic matter include members of the Eubacteria, class Clostridia, order Clostridiales, family Clostridiaceae. Bacteria belonging in other systematic groups contribute to the diversity of the microbial consortium. Archaea comprise a remarkably small minority in this community, given their crucial role in biogas production. Among the Archaea, the predominant order is the Methanomicrobiales and the most abundant species is Methanoculleus marisnigri. The Methanomicrobiales are hydrogenotrophic methanogens. Besides corroborating earlier findings on the significance of the contribution of the Clostridia to organic substrate decomposition, the results demonstrate the importance of the metabolism of hydrogen within the biogas producing microbial

  9. Alignment of Short Reads: A Crucial Step for Application of Next-Generation Sequencing Data in Precision Medicine

    Directory of Open Access Journals (Sweden)

    Hao Ye

    2015-11-01

    Full Text Available Precision medicine or personalized medicine has been proposed as a modernized and promising medical strategy. Genetic variants of patients are the key information for implementation of precision medicine. Next-generation sequencing (NGS is an emerging technology for deciphering genetic variants. Alignment of raw reads to a reference genome is one of the key steps in NGS data analysis. Many algorithms have been developed for alignment of short read sequences since 2008. Users have to make a decision on which alignment algorithm to use in their studies. Selection of the right alignment algorithm determines not only the alignment algorithm but also the set of suitable parameters to be used by the algorithm. Understanding these algorithms helps in selecting the appropriate alignment algorithm for different applications in precision medicine. Here, we review current available algorithms and their major strategies such as seed-and-extend and q-gram filter. We also discuss the challenges in current alignment algorithms, including alignment in multiple repeated regions, long reads alignment and alignment facilitated with known genetic variants.

  10. Identification of novel non-coding RNAs using profiles of short sequence reads from next generation sequencing data

    Directory of Open Access Journals (Sweden)

    Makunin Igor V

    2010-02-01

    Full Text Available Abstract Background The increasing interest in small non-coding RNAs (ncRNAs such as microRNAs (miRNAs, small interfering RNAs (siRNAs and Piwi-interacting RNAs (piRNAs and recent advances in sequencing technology have yielded large numbers of short (18-32 nt RNA sequences from different organisms, some of which are derived from small nucleolar RNAs (snoRNAs and transfer RNAs (tRNAs. We observed that these short ncRNAs frequently cover the entire length of annotated snoRNAs or tRNAs, which suggests that other loci specifying similar ncRNAs can be identified by clusters of short RNA sequences. Results We combined publicly available datasets of tens of millions of short RNA sequence tags from Drosophila melanogaster, and mapped them to the Drosophila genome. Approximately 6 million perfectly mapping sequence tags were then assembled into 521,302 tag-contigs (TCs based on tag overlap. Most transposon-derived sequences, exons and annotated miRNAs, tRNAs and snoRNAs are detected by TCs, which show distinct patterns of length and tag-depth for different categories. The typical length and tag-depth of snoRNA-derived TCs was used to predict 7 previously unrecognized box H/ACA and 26 box C/D snoRNA candidates. We also identified one snRNA candidate and 86 loci with a high number of tags that are yet to be annotated, 7 of which have a particular 18mer motif and are located in introns of genes involved in development. A subset of new snoRNA candidates and putative ncRNA candidates was verified by Northern blot. Conclusions In this study, we have introduced a new approach to identify new members of known classes of ncRNAs based on the features of TCs corresponding to known ncRNAs. A large number of the identified TCs are yet to be examined experimentally suggesting that many more novel ncRNAs remain to be discovered.

  11. Sparc: a sparsity-based consensus algorithm for long erroneous sequencing reads

    Directory of Open Access Journals (Sweden)

    Chengxi Ye

    2016-06-01

    Full Text Available Motivation. The third generation sequencing (3GS technology generates long sequences of thousands of bases. However, its current error rates are estimated in the range of 15–40%, significantly higher than those of the prevalent next generation sequencing (NGS technologies (less than 1%. Fundamental bioinformatics tasks such as de novo genome assembly and variant calling require high-quality sequences that need to be extracted from these long but erroneous 3GS sequences. Results. We describe a versatile and efficient linear complexity consensus algorithm Sparc to facilitate de novo genome assembly. Sparc builds a sparse k-mer graph using a collection of sequences from a targeted genomic region. The heaviest path which approximates the most likely genome sequence is searched through a sparsity-induced reweighted graph as the consensus sequence. Sparc supports using NGS and 3GS data together, which leads to significant improvements in both cost efficiency and computational efficiency. Experiments with Sparc show that our algorithm can efficiently provide high-quality consensus sequences using both PacBio and Oxford Nanopore sequencing technologies. With only 30× PacBio data, Sparc can reach a consensus with error rate <0.5%. With the more challenging Oxford Nanopore data, Sparc can also achieve similar error rate when combined with NGS data. Compared with the existing approaches, Sparc calculates the consensus with higher accuracy, and uses approximately 80% less memory and time. Availability. The source code is available for download at https://github.com/yechengxi/Sparc.

  12. De novo assembly of Dekkera bruxellensis: a multi technology approach using short and long-read sequencing and optical mapping.

    Science.gov (United States)

    Olsen, Remi-Andre; Bunikis, Ignas; Tiukova, Ievgeniia; Holmberg, Kicki; Lötstedt, Britta; Pettersson, Olga Vinnere; Passoth, Volkmar; Käller, Max; Vezzi, Francesco

    2015-01-01

    It remains a challenge to perform de novo assembly using next-generation sequencing (NGS). Despite the availability of multiple sequencing technologies and tools (e.g., assemblers) it is still difficult to assemble new genomes at chromosome resolution (i.e., one sequence per chromosome). Obtaining high quality draft assemblies is extremely important in the case of yeast genomes to better characterise major events in their evolutionary history. The aim of this work is two-fold: on the one hand we want to show how combining different and somewhat complementary technologies is key to improving assembly quality and correctness, and on the other hand we present a de novo assembly pipeline we believe to be beneficial to core facility bioinformaticians. To demonstrate both the effectiveness of combining technologies and the simplicity of the pipeline, here we present the results obtained using the Dekkera bruxellensis genome. In this work we used short-read Illumina data and long-read PacBio data combined with the extreme long-range information from OpGen optical maps in the task of de novo genome assembly and finishing. Moreover, we developed NouGAT, a semi-automated pipeline for read-preprocessing, de novo assembly and assembly evaluation, which was instrumental for this work. We obtained a high quality draft assembly of a yeast genome, resolved on a chromosomal level. Furthermore, this assembly was corrected for mis-assembly errors as demonstrated by resolving a large collapsed repeat and by receiving higher scores by assembly evaluation tools. With the inclusion of PacBio data we were able to fill about 5 % of the optical mapped genome not covered by the Illumina data.

  13. A comparison of random sequence reads versus 16S rDNA sequences for estimating the biodiversity of a metagenomic library.

    Science.gov (United States)

    Manichanh, Chaysavanh; Chapple, Charles E; Frangeul, Lionel; Gloux, Karine; Guigo, Roderic; Dore, Joel

    2008-09-01

    The construction of metagenomic libraries has permitted the study of microorganisms resistant to isolation and the analysis of 16S rDNA sequences has been used for over two decades to examine bacterial biodiversity. Here, we show that the analysis of random sequence reads (RSRs) instead of 16S is a suitable shortcut to estimate the biodiversity of a bacterial community from metagenomic libraries. We generated 10,010 RSRs from a metagenomic library of microorganisms found in human faecal samples. Then searched them using the program BLASTN against a prokaryotic sequence database to assign a taxon to each RSR. The results were compared with those obtained by screening and analysing the clones containing 16S rDNA sequences in the whole library. We found that the biodiversity observed by RSR analysis is consistent with that obtained by 16S rDNA. We also show that RSRs are suitable to compare the biodiversity between different metagenomic libraries. RSRs can thus provide a good estimate of the biodiversity of a metagenomic library and, as an alternative to 16S, this approach is both faster and cheaper.

  14. Should Bilingual Children Learn Reading in Two Languages at the Same Time or in Sequence?

    Science.gov (United States)

    Berens, Melody S.; Kovelman, Ioulia; Petitto, Laura-Ann

    2013-01-01

    Is it best to learn reading in two languages simultaneously or sequentially? We observed second- and third-grade children in two-way "dual-language learning contexts": (a) 50:50 or Simultaneous dual-language (two languages within same developmental period) and (b) 90:10 or Sequential dual-language (one language, followed gradually by the other).…

  15. SRIdent: A novel pipeline for real-time identification of species from high-throughput sequencing reads in Metagenomics and clinical diagnostic assays.

    Science.gov (United States)

    Karimi, Ramin; Hajdu, Andras

    2015-01-01

    New advances in rapid sequencing of large amounts of DNA have brought a great potential for the study of complex communities of microorganisms. One of the challenging problems is rapid identification of species from sequenced reads. Delays in the identification of pathogens are a barrier to the early diagnosis and proper treatment of infectious diseases. In this paper we proposed SRIdent (Short Read Identifier), an effective pipeline for real-time identification of species from high-throughput sequencing reads in Metagenomics and clinical diagnostic assays. This pipeline is based on generating k-mers from the short reads and searching the existence of DNA signatures in the Reads k-mers, by using Apache Hive data-warehousing. RkmerG (Read k-mers Generator) is a software program presented in this paper, for producing k-mers of the short reads, in order to use in the pipeline. The purpose of this study is to identify the species in a sample, directly from the reads without assembling and alignment.

  16. Characterizing novel endogenous retroviruses from genetic variation inferred from short sequence reads

    DEFF Research Database (Denmark)

    Mourier, Tobias; Mollerup, Sarah; Vinner, Lasse;

    2015-01-01

    From Illumina sequencing of DNA from brain and liver tissue from the lion, Panthera leo, and tumor samples from the pike-perch, Sander lucioperca, we obtained two assembled sequence contigs with similarity to known retroviruses. Phylogenetic analyses suggest that the pike-perch retrovirus belongs...

  17. Chromosomal-Level Assembly of the Asian Seabass Genome Using Long Sequence Reads and Multi-layered Scaffolding.

    Directory of Open Access Journals (Sweden)

    Shubha Vij

    2016-04-01

    Full Text Available We report here the ~670 Mb genome assembly of the Asian seabass (Lates calcarifer, a tropical marine teleost. We used long-read sequencing augmented by transcriptomics, optical and genetic mapping along with shared synteny from closely related fish species to derive a chromosome-level assembly with a contig N50 size over 1 Mb and scaffold N50 size over 25 Mb that span ~90% of the genome. The population structure of L. calcarifer species complex was analyzed by re-sequencing 61 individuals representing various regions across the species' native range. SNP analyses identified high levels of genetic diversity and confirmed earlier indications of a population stratification comprising three clades with signs of admixture apparent in the South-East Asian population. The quality of the Asian seabass genome assembly far exceeds that of any other fish species, and will serve as a new standard for fish genomics.

  18. Reading Nature- experienced teachers’ reflections on a teaching sequence in ecology: implications for future teacher training

    Directory of Open Access Journals (Sweden)

    Ola Magntorn

    2012-10-01

    Full Text Available This article explores experienced primary teachers views on teaching for ‘reading nature’. The concept ‘reading nature’ has to do with an ability to recognise organisms and relate them to material cycling and energy flow in the specific habitat which is to be read. It has to do with the natural world that we face outside and the tools we have are our experiences from previous learning situations both in and out-of-doors. The teachers were asked to comment on the content of a CD-ROM with teaching sequences from a primary class studying a river ecosystem. Perceptions that teachers held were found to be supportive but complex and varied regarding the possibilities and advantages of implementing this type of teaching design in the everyday classroom. The paper finishes by identifying some implications for teacher training to support fieldwork and ecological literacy in primary schools in the future.

  19. Application of Long Sequence Reads To Improve Genomes for Clostridium thermocellum AD2, Clostridium thermocellum LQRI, and Pelosinus fermentans R7.

    Science.gov (United States)

    Utturkar, Sagar M; Bayer, Edward A; Borovok, Ilya; Lamed, Raphael; Hurt, Richard A; Land, Miriam L; Klingeman, Dawn M; Elias, Dwayne; Zhou, Jizhong; Huntemann, Marcel; Clum, Alicia; Pillay, Manoj; Palaniappan, Krishnaveni; Varghese, Neha; Mikhailova, Natalia; Stamatis, Dimitrios; Reddy, T B K; Ngan, Chew Yee; Daum, Chris; Shapiro, Nicole; Markowitz, Victor; Ivanova, Natalia; Kyrpides, Nikos; Woyke, Tanja; Brown, Steven D

    2016-09-29

    We and others have shown the utility of long sequence reads to improve genome assembly quality. In this study, we generated PacBio DNA sequence data to improve the assemblies of draft genomes for Clostridium thermocellum AD2, Clostridium thermocellum LQRI, and Pelosinus fermentans R7.

  20. Fine de novo sequencing of a fungal genome using only SOLiD short read data: verification on Aspergillus oryzae RIB40.

    Directory of Open Access Journals (Sweden)

    Myco Umemura

    Full Text Available The development of next-generation sequencing (NGS technologies has dramatically increased the throughput, speed, and efficiency of genome sequencing. The short read data generated from NGS platforms, such as SOLiD and Illumina, are quite useful for mapping analysis. However, the SOLiD read data with lengths of <60 bp have been considered to be too short for de novo genome sequencing. Here, to investigate whether de novo sequencing of fungal genomes is possible using only SOLiD short read sequence data, we performed de novo assembly of the Aspergillus oryzae RIB40 genome using only SOLiD read data of 50 bp generated from mate-paired libraries with 2.8- or 1.9-kb insert sizes. The assembled scaffolds showed an N50 value of 1.6 Mb, a 22-fold increase than those obtained using only SOLiD short read in other published reports. In addition, almost 99% of the reference genome was accurately aligned by the assembled scaffold fragments in long lengths. The sequences of secondary metabolite biosynthetic genes and clusters, whose products are of considerable interest in fungal studies due to their potential medicinal, agricultural, and cosmetic properties, were also highly reconstructed in the assembled scaffolds. Based on these findings, we concluded that de novo genome sequencing using only SOLiD short reads is feasible and practical for molecular biological study of fungi. We also investigated the effect of filtering low quality data, library insert size, and k-mer size on the assembly performance, and recommend for the assembly use of mild filtered read data where the N50 was not so degraded and the library has an insert size of ∼2.0 kb, and k-mer size 33.

  1. Considerations for clinical read alignment and mutational profiling using next-generation sequencing

    Directory of Open Access Journals (Sweden)

    Gavin R Oliver

    2012-07-01

    Full Text Available Next-generation sequencing technologies are increasingly being applied in clinical settings, however the data are characterized by a range of platform-specific artifacts making downstream analysis problematic and error prone. One major application of NGS is in the profiling of clinically relevant mutations whereby sequences are aligned to a reference genome and potential mutations assessed and scored. Accurate sequence alignment is pivotal in reliable assessment of potential mutations however selection of appropriate alignment tools is a non-trivial task complicated by the availability of multiple solutions each with its own performance characteristics. Using BRCA1 as an example, we have simulated and mutated a test dataset based on Illumina sequencing technology. Our findings reveal key differences in the performances of a range of common commercial and open source tools and will be of importance to anyone using NGS to profile mutations in clinical or basic research.

  2. The Chloroplast Genome of Passiflora edulis (Passifloraceae) Assembled from Long Sequence Reads: Structural Organization and Phylogenomic Studies in Malpighiales

    Science.gov (United States)

    Cauz-Santos, Luiz A.; Munhoz, Carla F.; Rodde, Nathalie; Cauet, Stephane; Santos, Anselmo A.; Penha, Helen A.; Dornelas, Marcelo C.; Varani, Alessandro M.; Oliveira, Giancarlo C. X.; Bergès, Hélène; Vieira, Maria Lucia C.

    2017-01-01

    The family Passifloraceae consists of some 700 species classified in around 16 genera. Almost all its members belong to the genus Passiflora. In Brazil, the yellow passion fruit (Passiflora edulis) is of considerable economic importance, both for juice production and consumption as fresh fruit. The availability of chloroplast genomes (cp genomes) and their sequence comparisons has led to a better understanding of the evolutionary relationships within plant taxa. In this study, we obtained the complete nucleotide sequence of the P. edulis chloroplast genome, the first entirely sequenced in the Passifloraceae family. We determined its structure and organization, and also performed phylogenomic studies on the order Malpighiales and the Fabids clade. The P. edulis chloroplast genome is characterized by the presence of two copies of an inverted repeat sequence (IRA and IRB) of 26,154 bp, each separating a small single copy region of 13,378 bp and a large single copy (LSC) region of 85,720 bp. The annotation resulted in the identification of 105 unique genes, including 30 tRNAs, 4 rRNAs, and 71 protein coding genes. Also, 36 repetitive elements and 85 SSRs (microsatellites) were identified. The structure of the complete cp genome of P. edulis differs from that of other species because of rearrangement events detected by means of a comparison based on 22 members of the Malpighiales. The rearrangements were three inversions of 46,151, 3,765 and 1,631 bp, located in the LSC region. Phylogenomic analysis resulted in strongly supported trees, but this could also be a consequence of the limited taxonomic sampling used. Our results have provided a better understanding of the evolutionary relationships in the Malpighiales and the Fabids, confirming the potential of complete chloroplast genome sequences in inferring evolutionary relationships and the utility of long sequence reads for generating very accurate biological information. PMID:28344587

  3. Sequence-specific binding of a hormonally regulated mRNA binding protein to cytidine-rich sequences in the lutropin receptor open reading frame.

    Science.gov (United States)

    Kash, J C; Menon, K M

    1999-12-21

    In previous studies, a lutropin receptor mRNA binding protein implicated in the hormonal regulation of lutropin receptor mRNA stability was identified. This protein, termed LRBP-1, was shown by RNA gel electrophoretic mobility shift assay to specifically interact with lutropin receptor RNA sequences. The present studies have examined the specificity of lutropin receptor mRNA recognition by LRBP-1 and mapped the contact site by RNA footprinting and by site-directed mutagenesis. LRBP-1 was partially purified by cation-exchange chromatography, and the mRNA binding properties of the partially purified LRBP-1 were examined by RNA gel electrophoretic mobility shift assay and hydroxyl-radical RNA footprinting. These data showed that the LRBP-1 binding site is located between nucleotides 203 and 220 of the receptor open reading frame, and consists of the bipartite polypyrimidine sequence 5'-UCUC-X(7)-UCUCCCU-3'. Competition RNA gel electrophoretic mobility shift assays demonstrated that homoribopolymers of poly(rC) were effective RNA binding competitors, while poly(rA), poly(rG), and poly(rU) showed no effect. Mutagenesis of the cytidine residues contained within the LRBP-1 binding site demonstrated that all the cytidines in the bipartite sequence contribute to LRBP-1 binding specificity. Additionally, RNA gel electrophoretic mobility supershift analysis showed that LRBP-1 was not recognized by antibodies against two well-characterized poly(rC) RNA binding proteins, alphaCP-1 and alphaCP-2, implicated in the regulation of RNA stability of alpha-globin and tyrosine hydroxylase mRNAs. In summary, we show that partially purified LRBP-1 binds to a polypyrimidine sequence within nucleotides 203 and 220 of lutropin receptor mRNA with a high degree of specificity which is indicative of its role in posttranscriptional control of lutropin receptor expression.

  4. Pash 3.0: A versatile software package for read mapping and integrative analysis of genomic and epigenomic variation using massively parallel DNA sequencing

    Directory of Open Access Journals (Sweden)

    Chen Zuozhou

    2010-11-01

    Full Text Available Abstract Background Massively parallel sequencing readouts of epigenomic assays are enabling integrative genome-wide analyses of genomic and epigenomic variation. Pash 3.0 performs sequence comparison and read mapping and can be employed as a module within diverse configurable analysis pipelines, including ChIP-Seq and methylome mapping by whole-genome bisulfite sequencing. Results Pash 3.0 generally matches the accuracy and speed of niche programs for fast mapping of short reads, and exceeds their performance on longer reads generated by a new generation of massively parallel sequencing technologies. By exploiting longer read lengths, Pash 3.0 maps reads onto the large fraction of genomic DNA that contains repetitive elements and polymorphic sites, including indel polymorphisms. Conclusions We demonstrate the versatility of Pash 3.0 by analyzing the interaction between CpG methylation, CpG SNPs, and imprinting based on publicly available whole-genome shotgun bisulfite sequencing data. Pash 3.0 makes use of gapped k-mer alignment, a non-seed based comparison method, which is implemented using multi-positional hash tables. This allows Pash 3.0 to run on diverse hardware platforms, including individual computers with standard RAM capacity, multi-core hardware architectures and large clusters.

  5. Enhanced diagnostic yield in Meckel-Gruber and Joubert syndrome through exome sequencing supplemented with split-read mapping.

    Science.gov (United States)

    Watson, Christopher M; Crinnion, Laura A; Berry, Ian R; Harrison, Sally M; Lascelles, Carolina; Antanaviciute, Agne; Charlton, Ruth S; Dobbie, Angus; Carr, Ian M; Bonthron, David T

    2016-01-04

    The widespread adoption of high-throughput sequencing technologies by genetic diagnostic laboratories has enabled significant expansion of their testing portfolios. Rare autosomal recessive conditions have been a particular focus of many new services. Here we report a cohort of 26 patients referred for genetic analysis of Joubert (JBTS) and Meckel-Gruber (MKS) syndromes, two clinically and genetically heterogeneous neurodevelopmental conditions that define a phenotypic spectrum, with MKS at the severe end. Exome sequencing was performed for all cases, using Agilent SureSelect v5 reagents and Illumina paired-end sequencing. For two cases medium-coverage (9×) whole genome sequencing was subsequently undertaken. Using a standard analysis pipeline for the detection of single nucleotide and small insertion or deletion variants, molecular diagnoses were confirmed in 12 cases (4%). Seeking to determine whether our cohort harboured pathogenic copy number variants (CNV), in JBTS- or MKS-associated genes, targeted comparative read-depth analysis was performed using FishingCNV. These analyses identified a putative intragenic AHI1 deletion that included three exons spanning at least 3.4 kb and an intergenic MPP4 to TMEM237 deletion that included exons spanning at least 21.5 kb. Whole genome sequencing enabled confirmation of the deletion-containing alleles and precise characterisation of the mutation breakpoints at nucleotide resolution. These data were validated following development of PCR-based assays that could be subsequently used for "cascade" screening and/or prenatal diagnosis. Our investigations expand the AHI1 and TMEM237 mutation spectrum and highlight the importance of performing CNV screening of disease-associated genes. We demonstrate a robust increasingly cost-effective CNV detection workflow that is applicable to all MKS/JBTS referrals.

  6. Causality, criticality, and reading words: distinct sources of fractal scaling in behavioral sequences.

    Science.gov (United States)

    Moscoso del Prado Martín, Fermín

    2011-07-01

    The finding of fractal scaling (FS) in behavioral sequences has raised a debate on whether FS is a pervasive property of the cognitive system or is the result of specific processes. Inferences about the origins of properties in time sequences are causal. That is, as opposed to correlational inferences reflecting instantaneous symmetrical relations, causal inferences concern asymmetric relations lagged in time. Here, I integrate Granger-causality with inferences about FS. Four simulations illustrate that causal analyses can isolate distinct FS sources, whereas correlational techniques cannot. I then analyze three simultaneous sequences of responses from a database of word-naming trials. I find that two, or perhaps three, distinct sources account for the presence of FS in these sequences, but FS is not a general property of the system. This suggests that FS arises due to the properties of a limited number of identifiable psychological and/or neural processes. Finally, I reanalyze a previously published dataset of acoustic frequency spectra using the new tools. The causality/criticality combination introduced here offers a new important perspective in the study of cognition. Copyright © 2011 Cognitive Science Society, Inc.

  7. Using a priori knowledge to align sequencing reads to their exact genomic position

    NARCIS (Netherlands)

    Böttcher, René; Amberg, Ronny; Ruzius, F P; Guryev, V; Verhaegh, Wim F J; Beyerlein, Peter; van der Zaag, P J

    2012-01-01

    The use of a priori knowledge in the alignment of targeted sequencing data is investigated using computational experiments. Adapting a Needleman-Wunsch algorithm to incorporate the genomic position information from the targeted capture, we demonstrate that alignment can be done to just the target re

  8. Using a priori knowledge to align sequencing reads to their exact genomic position

    NARCIS (Netherlands)

    Böttcher, R.; Amberg, R.; Ruzius, F.P.; Guryev, V.; Verhaegh, W.F.J.; Beyerlein, P.; Van der Zaag, P.J.

    2011-01-01

    The use of a priori knowledge in aligning targeted sequencing data is investigated using computational experiments. With conventional aligners such as Bowtie, BWA or MAQ, alignment is performed against the whole genome. Using an alignment method in which the genomic position information from the

  9. PathoQC: Computationally Efficient Read Preprocessing and Quality Control for High-Throughput Sequencing Data Sets.

    Science.gov (United States)

    Hong, Changjin; Manimaran, Solaiappan; Johnson, William Evan

    2014-01-01

    Quality control and read preprocessing are critical steps in the analysis of data sets generated from high-throughput genomic screens. In the most extreme cases, improper preprocessing can negatively affect downstream analyses and may lead to incorrect biological conclusions. Here, we present PathoQC, a streamlined toolkit that seamlessly combines the benefits of several popular quality control software approaches for preprocessing next-generation sequencing data. PathoQC provides a variety of quality control options appropriate for most high-throughput sequencing applications. PathoQC is primarily developed as a module in the PathoScope software suite for metagenomic analysis. However, PathoQC is also available as an open-source Python module that can run as a stand-alone application or can be easily integrated into any bioinformatics workflow. PathoQC achieves high performance by supporting parallel computation and is an effective tool that removes technical sequencing artifacts and facilitates robust downstream analysis. The PathoQC software package is available at http://sourceforge.net/projects/PathoScope/.

  10. Efficient Graph Based Assembly of Short-Read Sequences on Hybrid Core Architecture

    Energy Technology Data Exchange (ETDEWEB)

    Sczyrba, Alex; Pratap, Abhishek; Canon, Shane; Han, James; Copeland, Alex; Wang, Zhong; Brewer, Tony; Soper, David; D' Jamoos, Mike; Collins, Kirby; Vacek, George

    2011-03-22

    Advanced architectures can deliver dramatically increased throughput for genomics and proteomics applications, reducing time-to-completion in some cases from days to minutes. One such architecture, hybrid-core computing, marries a traditional x86 environment with a reconfigurable coprocessor, based on field programmable gate array (FPGA) technology. In addition to higher throughput, increased performance can fundamentally improve research quality by allowing more accurate, previously impractical approaches. We will discuss the approach used by Convey?s de Bruijn graph constructor for short-read, de-novo assembly. Bioinformatics applications that have random access patterns to large memory spaces, such as graph-based algorithms, experience memory performance limitations on cache-based x86 servers. Convey?s highly parallel memory subsystem allows application-specific logic to simultaneously access 8192 individual words in memory, significantly increasing effective memory bandwidth over cache-based memory systems. Many algorithms, such as Velvet and other de Bruijn graph based, short-read, de-novo assemblers, can greatly benefit from this type of memory architecture. Furthermore, small data type operations (four nucleotides can be represented in two bits) make more efficient use of logic gates than the data types dictated by conventional programming models.JGI is comparing the performance of Convey?s graph constructor and Velvet on both synthetic and real data. We will present preliminary results on memory usage and run time metrics for various data sets with different sizes, from small microbial and fungal genomes to very large cow rumen metagenome. For genomes with references we will also present assembly quality comparisons between the two assemblers.

  11. De novo assembly of a 40 Mb eukaryotic genome from short sequence reads: Sordaria macrospora, a model organism for fungal morphogenesis.

    Science.gov (United States)

    Nowrousian, Minou; Stajich, Jason E; Chu, Meiling; Engh, Ines; Espagne, Eric; Halliday, Karen; Kamerewerd, Jens; Kempken, Frank; Knab, Birgit; Kuo, Hsiao-Che; Osiewacz, Heinz D; Pöggeler, Stefanie; Read, Nick D; Seiler, Stephan; Smith, Kristina M; Zickler, Denise; Kück, Ulrich; Freitag, Michael

    2010-04-08

    Filamentous fungi are of great importance in ecology, agriculture, medicine, and biotechnology. Thus, it is not surprising that genomes for more than 100 filamentous fungi have been sequenced, most of them by Sanger sequencing. While next-generation sequencing techniques have revolutionized genome resequencing, e.g. for strain comparisons, genetic mapping, or transcriptome and ChIP analyses, de novo assembly of eukaryotic genomes still presents significant hurdles, because of their large size and stretches of repetitive sequences. Filamentous fungi contain few repetitive regions in their 30-90 Mb genomes and thus are suitable candidates to test de novo genome assembly from short sequence reads. Here, we present a high-quality draft sequence of the Sordaria macrospora genome that was obtained by a combination of Illumina/Solexa and Roche/454 sequencing. Paired-end Solexa sequencing of genomic DNA to 85-fold coverage and an additional 10-fold coverage by single-end 454 sequencing resulted in approximately 4 Gb of DNA sequence. Reads were assembled to a 40 Mb draft version (N50 of 117 kb) with the Velvet assembler. Comparative analysis with Neurospora genomes increased the N50 to 498 kb. The S. macrospora genome contains even fewer repeat regions than its closest sequenced relative, Neurospora crassa. Comparison with genomes of other fungi showed that S. macrospora, a model organism for morphogenesis and meiosis, harbors duplications of several genes involved in self/nonself-recognition. Furthermore, S. macrospora contains more polyketide biosynthesis genes than N. crassa. Phylogenetic analyses suggest that some of these genes may have been acquired by horizontal gene transfer from a distantly related ascomycete group. Our study shows that, for typical filamentous fungi, de novo assembly of genomes from short sequence reads alone is feasible, that a mixture of Solexa and 454 sequencing substantially improves the assembly, and that the resulting data can be used for

  12. Nucleotide sequences of three tRNA(Ser) from Drosophila melanogaster reading the six serine codons.

    Science.gov (United States)

    Cribbs, D L; Gillam, I C; Tener, G M

    1987-10-05

    The nucleotide sequences of three serine tRNAs from Drosophila melanogaster, together capable of decoding the six serine codons, were determined. tRNA(Ser)2b has the anticodon GCU, tRNA(Ser)4 has CGA and tRNA(Ser)7 has IGA. tRNA(Ser)2b differs from the last two by about 25%. However, tRNA(Ser)4 and tRNA(Ser)7 are 96% homologous, differing only at the first position of the anticodon and two other sites. This unusual sequence relationship suggests, together with similar pairs in the yeasts Schizosaccharomyces pombe and Saccharomyces cerevisiae, that eukaryotic tRNA(Ser)UCN may be undergoing concerted evolution.

  13. MS/MS-Assisted Design of Sequence-Controlled Synthetic Polymers for Improved Reading of Encoded Information

    Science.gov (United States)

    Charles, Laurence; Cavallo, Gianni; Monnier, Valérie; Oswald, Laurence; Szweda, Roza; Lutz, Jean-François

    2016-12-01

    In order to improve their MS/MS sequencing, structure of sequence-controlled synthetic polymers can be optimized based on considerations regarding their fragmentation behavior in collision-induced dissociation conditions, as demonstrated here for two digitally encoded polymer families. In poly(triazole amide)s, the main dissociation route proceeded via cleavage of the amide bond in each monomer, hence allowing the chains to be safely sequenced. However, a competitive cleavage of an ether bond in a tri(ethylene glycol) spacer placed between each coding moiety complicated MS/MS spectra while not bringing new structural information. Changing the tri(ethylene glycol) spacer to an alkyl group of the same size allowed this unwanted fragmentation pathway to be avoided, hence greatly simplifying the MS/MS reading step for such undecyl-based poly(triazole amide)s. In poly(alkoxyamine phosphodiester)s, a single dissociation pathway was achieved with repeating units containing an alkoxyamine linkage, which, by very low dissociation energy, made any other chemical bonds MS/MS-silent. Structure of these polymers was further tailored to enhance the stability of those precursor ions with a negatively charged phosphate group per monomer in order to improve their MS/MS readability. Increasing the size of both the alkyl coding moiety and the nitroxide spacer allowed sufficient distance between phosphate groups for all of them to be deprotonated simultaneously. Because the charge state of product ions increased with their polymerization degree, MS/MS spectra typically exhibited groups of fragments at one or the other side of the precursor ion depending on the original α or ω end-group they contain, allowing sequence reconstruction in a straightforward manner.

  14. MS/MS-Assisted Design of Sequence-Controlled Synthetic Polymers for Improved Reading of Encoded Information

    Science.gov (United States)

    Charles, Laurence; Cavallo, Gianni; Monnier, Valérie; Oswald, Laurence; Szweda, Roza; Lutz, Jean-François

    2017-06-01

    In order to improve their MS/MS sequencing, structure of sequence-controlled synthetic polymers can be optimized based on considerations regarding their fragmentation behavior in collision-induced dissociation conditions, as demonstrated here for two digitally encoded polymer families. In poly(triazole amide)s, the main dissociation route proceeded via cleavage of the amide bond in each monomer, hence allowing the chains to be safely sequenced. However, a competitive cleavage of an ether bond in a tri(ethylene glycol) spacer placed between each coding moiety complicated MS/MS spectra while not bringing new structural information. Changing the tri(ethylene glycol) spacer to an alkyl group of the same size allowed this unwanted fragmentation pathway to be avoided, hence greatly simplifying the MS/MS reading step for such undecyl-based poly(triazole amide)s. In poly(alkoxyamine phosphodiester)s, a single dissociation pathway was achieved with repeating units containing an alkoxyamine linkage, which, by very low dissociation energy, made any other chemical bonds MS/MS-silent. Structure of these polymers was further tailored to enhance the stability of those precursor ions with a negatively charged phosphate group per monomer in order to improve their MS/MS readability. Increasing the size of both the alkyl coding moiety and the nitroxide spacer allowed sufficient distance between phosphate groups for all of them to be deprotonated simultaneously. Because the charge state of product ions increased with their polymerization degree, MS/MS spectra typically exhibited groups of fragments at one or the other side of the precursor ion depending on the original α or ω end-group they contain, allowing sequence reconstruction in a straightforward manner. [Figure not available: see fulltext.

  15. Cartography of neurexin alternative splicing mapped by single-molecule long-read mRNA sequencing.

    Science.gov (United States)

    Treutlein, Barbara; Gokce, Ozgun; Quake, Stephen R; Südhof, Thomas C

    2014-04-01

    Neurexins are evolutionarily conserved presynaptic cell-adhesion molecules that are essential for normal synapse formation and synaptic transmission. Indirect evidence has indicated that extensive alternative splicing of neurexin mRNAs may produce hundreds if not thousands of neurexin isoforms, but no direct evidence for such diversity has been available. Here we use unbiased long-read sequencing of full-length neurexin (Nrxn)1α, Nrxn1β, Nrxn2β, Nrxn3α, and Nrxn3β mRNAs to systematically assess how many sites of alternative splicing are used in neurexins with a significant frequency, and whether alternative splicing events at these sites are independent of each other. In sequencing more than 25,000 full-length mRNAs, we identified a novel, abundantly used alternatively spliced exon of Nrxn1α and Nrxn3α (referred to as alternatively spliced sequence 6) that encodes a 9-residue insertion in the flexible hinge region between the fifth LNS (laminin-α, neurexin, sex hormone-binding globulin) domain and the third EGF-like sequence. In addition, we observed several larger-scale events of alternative splicing that deleted multiple domains and were much less frequent than the canonical six sites of alternative splicing in neurexins. All of the six canonical events of alternative splicing appear to be independent of each other, suggesting that neurexins may exhibit an even larger isoform diversity than previously envisioned and comprise thousands of variants. Our data are consistent with the notion that α-neurexins represent extracellular protein-interaction scaffolds in which different LNS and EGF domains mediate distinct interactions that affect diverse functions and are independently regulated by independent events of alternative splicing.

  16. Cytoplasmic male sterility-associated chimeric open reading frames identified by mitochondrial genome sequencing of four Cajanus genotypes.

    Science.gov (United States)

    Tuteja, Reetu; Saxena, Rachit K; Davila, Jaime; Shah, Trushar; Chen, Wenbin; Xiao, Yong-Li; Fan, Guangyi; Saxena, K B; Alverson, Andrew J; Spillane, Charles; Town, Christopher; Varshney, Rajeev K

    2013-10-01

    The hybrid pigeonpea (Cajanus cajan) breeding technology based on cytoplasmic male sterility (CMS) is currently unique among legumes and displays major potential for yield increase. CMS is defined as a condition in which a plant is unable to produce functional pollen grains. The novel chimeric open reading frames (ORFs) produced as a results of mitochondrial genome rearrangements are considered to be the main cause of CMS. To identify these CMS-related ORFs in pigeonpea, we sequenced the mitochondrial genomes of three C. cajan lines (the male-sterile line ICPA 2039, the maintainer line ICPB 2039, and the hybrid line ICPH 2433) and of the wild relative (Cajanus cajanifolius ICPW 29). A single, circular-mapping molecule of length 545.7 kb was assembled and annotated for the ICPA 2039 line. Sequence annotation predicted 51 genes, including 34 protein-coding and 17 RNA genes. Comparison of the mitochondrial genomes from different Cajanus genotypes identified 31 ORFs, which differ between lines within which CMS is present or absent. Among these chimeric ORFs, 13 were identified by comparison of the related male-sterile and maintainer lines. These ORFs display features that are known to trigger CMS in other plant species and to represent the most promising candidates for CMS-related mitochondrial rearrangements in pigeonpea.

  17. Detection of copy number variation from array intensity and sequencing read depth using a stepwise Bayesian model

    Directory of Open Access Journals (Sweden)

    Gerstein Mark B

    2010-10-01

    Full Text Available Abstract Background Copy number variants (CNVs have been demonstrated to occur at a high frequency and are now widely believed to make a significant contribution to the phenotypic variation in human populations. Array-based comparative genomic hybridization (array-CGH and newly developed read-depth approach through ultrahigh throughput genomic sequencing both provide rapid, robust, and comprehensive methods to identify CNVs on a whole-genome scale. Results We developed a Bayesian statistical analysis algorithm for the detection of CNVs from both types of genomic data. The algorithm can analyze such data obtained from PCR-based bacterial artificial chromosome arrays, high-density oligonucleotide arrays, and more recently developed high-throughput DNA sequencing. Treating parameters--e.g., the number of CNVs, the position of each CNV, and the data noise level--that define the underlying data generating process as random variables, our approach derives the posterior distribution of the genomic CNV structure given the observed data. Sampling from the posterior distribution using a Markov chain Monte Carlo method, we get not only best estimates for these unknown parameters but also Bayesian credible intervals for the estimates. We illustrate the characteristics of our algorithm by applying it to both synthetic and experimental data sets in comparison to other segmentation algorithms. Conclusions In particular, the synthetic data comparison shows that our method is more sensitive than other approaches at low false positive rates. Furthermore, given its Bayesian origin, our method can also be seen as a technique to refine CNVs identified by fast point-estimate methods and also as a framework to integrate array-CGH and sequencing data with other CNV-related biological knowledge, all through informative priors.

  18. Unmapped reads from cattle RNAseq data: A source for missing and misassembled sequences in the reference assemblies and for detection of pathogens in the host.

    Science.gov (United States)

    Usman, Tahir; Hadlich, Frieder; Demasius, Wiebke; Weikard, Rosemarie; Kühn, Christa

    2017-01-01

    Usually, reads from transcriptome sequencing data unmapped to the target species' reference genome are disregarded. A recent RNAseq project on the new fatal disease Bovine Neonatal Pancytopenia had indicated an unexplained immune response signature to a double-stranded RNA virus. To unravel its background, contigs were de novo assembled from unmapped RNAseq reads and aligned against the bovine genome assemblies and multispecies NCBI databases. Lack of genuine virus sequence contigs rejected the hypothesis of a live virus being causal for the unexplained immune response. Alignment data also demonstrated incomplete bovine reference genome assemblies. In addition, we found that several parasite and virus genome reference assemblies in NCBI were contaminated with bovine DNA and confirmed recombination of bovine DNA into BVD virus strains. Exploring unmapped reads can extract useful biological information regarding the presence of microorganisms and can highlight issues with reference genome assemblies of host and pathogen species. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.

  19. Complete genome sequence of a copper-resistant bacterium from the citrus phyllosphere, #Stenotrophomonas# sp. strain LM091, obtained using long-read technology

    OpenAIRE

    Richard, Damien; Boyer, Claudine; Lefeuvre, Pierre; Pruvost, Olivier

    2016-01-01

    The Stenotrophomonas genus shows great adaptive potential including resistance to multiple antimicrobials, opportunistic pathogenicity, and production of numerous secondary metabolites. Using long-read technology, we report the sequence of a plant-associated Stenotrophomonas strain originating from the citrus phyllosphere that displays a copper resistance phenotype.(Résumé d'auteur)

  20. Complete Genome Sequence of a Copper-Resistant Bacterium from the Citrus Phyllosphere, Stenotrophomonas sp. Strain LM091, Obtained Using Long-Read Technology

    Science.gov (United States)

    Richard, Damien; Boyer, Claudine; Lefeuvre, Pierre

    2016-01-01

    The Stenotrophomonas genus shows great adaptive potential including resistance to multiple antimicrobials, opportunistic pathogenicity, and production of numerous secondary metabolites. Using long-read technology, we report the sequence of a plant-associated Stenotrophomonas strain originating from the citrus phyllosphere that displays a copper resistance phenotype. PMID:27979933

  1. Benefit-of-doubt (BOD) scoring: a sequencing-based method for SNP candidate assessment from high to medium read number data sets.

    Science.gov (United States)

    Sedlazeck, Fritz Joachim; Talloji, Prabhavathi; von Haeseler, Arndt; Bachmair, Andreas

    2013-03-01

    Identification of single nucleotide polymorphisms (SNPs) is a key element in sequence-based genetic analysis. Next generation sequencing offers a cost-effective basis to generate the necessary, large sequence data sets, and bioinformatic methods are being developed to process sequencing machine readouts. We were interested in detection of SNPs in a 350 kb region of an EMS-mutagenized Arabidopsis chromosome 3. The region was selectively analyzed using PCR-generated, overlapping fragments for Solexa sequencing. The ensuing reads provided a high coverage and were processed bioinformatically. In order to assess the SNP candidates obtained with a frequently used alignment program and SNP caller, we developed an additional method that allows the identification of high confidence SNP loci. The method can easily be applied to complete genome sequence data of sufficient coverage.

  2. Lncident: A Tool for Rapid Identification of Long Noncoding RNAs Utilizing Sequence Intrinsic Composition and Open Reading Frame Information

    Directory of Open Access Journals (Sweden)

    Siyu Han

    2016-01-01

    Full Text Available More and more studies have demonstrated that long noncoding RNAs (lncRNAs play critical roles in diversity of biological process and are also associated with various types of disease. How to rapidly identify lncRNAs and messenger RNA is the fundamental step to uncover the function of lncRNAs identification. Here, we present a novel method for rapid identification of lncRNAs utilizing sequence intrinsic composition features and open reading frame information based on support vector machine model, named as Lncident (LncRNAs identification. The 10-fold cross-validation and ROC curve are used to evaluate the performance of Lncident. The main advantage of Lncident is high speed without the loss of accuracy. Compared with the exiting popular tools, Lncident outperforms Coding-Potential Calculator, Coding-Potential Assessment Tool, Coding-Noncoding Index, and PLEK. Lncident is also much faster than Coding-Potential Calculator and Coding-Noncoding Index. Lncident presents an outstanding performance on microorganism, which offers a great application prospect to the analysis of microorganism. In addition, Lncident can be trained by users’ own collected data. Furthermore, R package and web server are simultaneously developed in order to maximize the convenience for the users. The R package “Lncident” can be easily installed on multiple operating system platforms, as long as R is supported.

  3. DDBJ read annotation pipeline: a cloud computing-based pipeline for high-throughput analysis of next-generation sequencing data.

    Science.gov (United States)

    Nagasaki, Hideki; Mochizuki, Takako; Kodama, Yuichi; Saruhashi, Satoshi; Morizaki, Shota; Sugawara, Hideaki; Ohyanagi, Hajime; Kurata, Nori; Okubo, Kousaku; Takagi, Toshihisa; Kaminuma, Eli; Nakamura, Yasukazu

    2013-08-01

    High-performance next-generation sequencing (NGS) technologies are advancing genomics and molecular biological research. However, the immense amount of sequence data requires computational skills and suitable hardware resources that are a challenge to molecular biologists. The DNA Data Bank of Japan (DDBJ) of the National Institute of Genetics (NIG) has initiated a cloud computing-based analytical pipeline, the DDBJ Read Annotation Pipeline (DDBJ Pipeline), for a high-throughput annotation of NGS reads. The DDBJ Pipeline offers a user-friendly graphical web interface and processes massive NGS datasets using decentralized processing by NIG supercomputers currently free of charge. The proposed pipeline consists of two analysis components: basic analysis for reference genome mapping and de novo assembly and subsequent high-level analysis of structural and functional annotations. Users may smoothly switch between the two components in the pipeline, facilitating web-based operations on a supercomputer for high-throughput data analysis. Moreover, public NGS reads of the DDBJ Sequence Read Archive located on the same supercomputer can be imported into the pipeline through the input of only an accession number. This proposed pipeline will facilitate research by utilizing unified analytical workflows applied to the NGS data. The DDBJ Pipeline is accessible at http://p.ddbj.nig.ac.jp/.

  4. Complete telomere-to-telomere de novo assembly of the Plasmodium falciparum genome through long-read (>11 kb), single molecule, real-time sequencing.

    Science.gov (United States)

    Vembar, Shruthi Sridhar; Seetin, Matthew; Lambert, Christine; Nattestad, Maria; Schatz, Michael C; Baybayan, Primo; Scherf, Artur; Smith, Melissa Laird

    2016-08-01

    The application of next-generation sequencing to estimate genetic diversity of Plasmodium falciparum, the most lethal malaria parasite, has proved challenging due to the skewed AT-richness [∼80.6% (A + T)] of its genome and the lack of technology to assemble highly polymorphic subtelomeric regions that contain clonally variant, multigene virulence families (Ex: var and rifin). To address this, we performed amplification-free, single molecule, real-time sequencing of P. falciparum genomic DNA and generated reads of average length 12 kb, with 50% of the reads between 15.5 and 50 kb in length. Next, using the Hierarchical Genome Assembly Process, we assembled the P. falciparum genome de novo and successfully compiled all 14 nuclear chromosomes telomere-to-telomere. We also accurately resolved centromeres [∼90-99% (A + T)] and subtelomeric regions and identified large insertions and duplications that add extra var and rifin genes to the genome, along with smaller structural variants such as homopolymer tract expansions. Overall, we show that amplification-free, long-read sequencing combined with de novo assembly overcomes major challenges inherent to studying the P. falciparum genome. Indeed, this technology may not only identify the polymorphic and repetitive subtelomeric sequences of parasite populations from endemic areas but may also evaluate structural variation linked to virulence, drug resistance and disease transmission.

  5. The complete sequence of a 9000 bp fragment of the right arm of Saccharomyces cerevisiae chromosome VII contains four previously unknown open reading frames.

    Science.gov (United States)

    Guerreiro, P; Silva, A M; Barreiros, T; Arroyo, J; García-Gonzalez, M; García-Saez, M I; Rodrigues-Pousada, C; Nombela, C

    1995-09-15

    We report the sequence of a 9000 bp fragment from the right arm of Saccharomyces cerevisiae chromosome VII. Analysis of the sequence revealed four complete previously unknown open reading frames, which were named G7587, G7589, G7591 and G7594 following standard rules for provisional nomenclature. Outstanding features of some of these proteins were the homology of the putative protein coded by G7589 with proteins involved in transcription regulation and the transmembrane domains predicted in the putative protein coded by G7591.

  6. Diagnostic accuracy of unenhanced, contrast-enhanced perfusion and angiographic MRI sequences for pulmonary embolism diagnosis: results of independent sequence readings

    Energy Technology Data Exchange (ETDEWEB)

    Revel, Marie Pierre [Hopital Europeen Georges Pompidou, APHP, Departments of Radiology, Paris (France); Universite Paris Descartes Sorbonne Paris Cite, Paris (France); Hotel-Dieu, Service de Radiologie, Paris (France); Sanchez, Olivier; Meyer, Guy [Hopital Europeen Georges Pompidou, APHP, Respiratory and intensive care and, Paris (France); Universite Paris Descartes Sorbonne Paris Cite, Paris (France); INSERM Unite 765, Paris (France); Lefort, Catherine; Couchon, Sophie; Hernigou, Anne; Frija, Guy [Hopital Europeen Georges Pompidou, APHP, Departments of Radiology, Paris (France); Niarra, Ralph [Hopital Europeen Georges Pompidou, APHP, Clinical Epidemiology, Paris (France); Universite Paris Descartes Sorbonne Paris Cite, Paris (France); Chatellier, Gilles [Hopital Europeen Georges Pompidou, APHP, Clinical Epidemiology, Paris (France); Universite Paris Descartes Sorbonne Paris Cite, Paris (France); INSERM CIC-EC E4, Paris (France)

    2013-09-15

    To independently evaluate unenhanced, contrast-enhanced perfusion and angiographic MR sequences for pulmonary embolism (PE) diagnosis. Prospective investigation, including 274 patients who underwent perfusion, unenhanced 2D steady-state-free-precession (SSFP) and contrast-enhanced 3D angiographic MR sequences on a 1.5-T unit, in addition to CTA (CT angiography). Two independent readers evaluated each sequence independently in random order. Sensitivity, specificity, predictive values and inter-reader agreement were calculated for each sequence, excluding sequences judged inconclusive. Sensitivity was also calculated according to PE location. Contrast-enhanced angiographic sequences showed the highest sensitivity (82.9 and 89.7 %, reader 1 and reader 2, respectively), specificity (98.5 and 100 %) and agreement (kappa value 0.77). Unenhanced angiographic sequences, although less sensitive overall (68.7 and 76.4 %), were sensitive for the detection of proximal PE (92.7 and 100 %) and showed high specificity (96.1 and 99.1 %) and good agreement (kappa value 0.62). Perfusion sequences showed lower sensitivity (75.0 and 79.3 %), specificity (84.8 and 89.7 %) and agreement (kappa value 0.51), and a negative predictive value of 84.8 % at best. Compared with contrast-enhanced angiographic sequences, unenhanced sequences demonstrate lower sensitivity, except for proximal PE, but high specificity and agreement. The negative predictive value of perfusion sequences was insufficient to safely rule out PE. (orig.)

  7. Predicting mutually exclusive spliced exons based on exon length, splice site and reading frame conservation, and exon sequence homology

    Directory of Open Access Journals (Sweden)

    Hammesfahr Björn

    2011-06-01

    Full Text Available Abstract Background Alternative splicing of pre-mature RNA is an important process eukaryotes utilize to increase their repertoire of different protein products. Several types of different alternative splice forms exist including exon skipping, differential splicing of exons at their 3'- or 5'-end, intron retention, and mutually exclusive splicing. The latter term is used for clusters of internal exons that are spliced in a mutually exclusive manner. Results We have implemented an extension to the WebScipio software to search for mutually exclusive exons. Here, the search is based on the precondition that mutually exclusive exons encode regions of the same structural part of the protein product. This precondition provides restrictions to the search for candidate exons concerning their length, splice site conservation and reading frame preservation, and overall homology. Mutually exclusive exons that are not homologous and not of about the same length will not be found. Using the new algorithm, mutually exclusive exons in several example genes, a dynein heavy chain, a muscle myosin heavy chain, and Dscam were correctly identified. In addition, the algorithm was applied to the whole Drosophila melanogaster X chromosome and the results were compared to the Flybase annotation and an ab initio prediction. Clusters of mutually exclusive exons might be subsequent to each other and might encode dozens of exons. Conclusions This is the first implementation of an automatic search for mutually exclusive exons in eukaryotes. Exons are predicted and reconstructed in the same run providing the complete gene structure for the protein query of interest. WebScipio offers high quality gene structure figures with the clusters of mutually exclusive exons colour-coded, and several analysis tools for further manual inspection. The genome scale analysis of all genes of the Drosophila melanogaster X chromosome showed that WebScipio is able to find all but two of the 28

  8. The mitochondrial genome of a Texas outbreak strain of the cattle tick, Rhipicephalus (Boophilus) microplus, derived from whole genome sequencing Pacific Biosciences and Illumina reads.

    Science.gov (United States)

    McCooke, John K; Guerrero, Felix D; Barrero, Roberto A; Black, Michael; Hunter, Adam; Bell, Callum; Schilkey, Faye; Miller, Robert J; Bellgard, Matthew I

    2015-10-15

    The cattle fever tick, Rhipicephalus (Boophilus) microplus is one of the most significant medical veterinary pests in the world, vectoring several serious livestock diseases negatively impacting agricultural economies of tropical and subtropical countries around the world. In our study, we assembled the complete R. microplus mitochondrial genome from Illumina and Pac Bio sequencing reads obtained from the ongoing R. microplus (Deutsch strain from Texas, USA) genome sequencing project. We compared the Deutsch strain mitogenome to the mitogenome from a Brazilian R. microplus and from an Australian cattle tick that has recently been taxonomically designated as Rhipicephalus australis after previously being considered R. microplus. The sequence divergence of the Texas and Australia ticks is much higher than the divergence between the Texas and Brazil ticks. This is consistent with the idea that the Australian ticks are distinct from the R. microplus of the Americas.

  9. GenHtr: a tool for comparative assessment of genetic heterogeneity in microbial genomes generated by massive short-read sequencing

    Directory of Open Access Journals (Sweden)

    Yu GongXin

    2010-10-01

    Full Text Available Abstract Background Microevolution is the study of short-term changes of alleles within a population and their effects on the phenotype of organisms. The result of the below-species-level evolution is heterogeneity, where populations consist of subpopulations with a large number of structural variations. Heterogeneity analysis is thus essential to our understanding of how selective and neutral forces shape bacterial populations over a short period of time. The Solexa Genome Analyzer, a next-generation sequencing platform, allows millions of short sequencing reads to be obtained with great accuracy, allowing for the ability to study the dynamics of the bacterial population at the whole genome level. The tool referred to as GenHtr was developed for genome-wide heterogeneity analysis. Results For particular bacterial strains, GenHtr relies on a set of Solexa short reads on given bacteria pathogens and their isogenic reference genome to identify heterogeneity sites, the chromosomal positions with multiple variants of genes in the bacterial population, and variations that occur in large gene families. GenHtr accomplishes this by building and comparatively analyzing genome-wide heterogeneity genotypes for both the newly sequenced genomes (using massive short-read sequencing and their isogenic reference (using simulated data. As proof of the concept, this approach was applied to SRX007711, the Solexa sequencing data for a newly sequenced Staphylococcus aureus subsp. USA300 cell line, and demonstrated that it could predict such multiple variants. They include multiple variants of genes critical in pathogenesis, e.g. genes encoding a LysR family transcriptional regulator, 23 S ribosomal RNA, and DNA mismatch repair protein MutS. The heterogeneity results in non-synonymous and nonsense mutations, leading to truncated proteins for both LysR and MutS. Conclusion GenHtr was developed for genome-wide heterogeneity analysis. Although it is much more time

  10. Enhancing the detection of barcoded reads in high throughput DNA sequencing data by controlling the false discovery rate

    NARCIS (Netherlands)

    Buschmann, Tilo; Zhang, Rong; Brash, Douglas E.; Bystrykh, Leonid V.

    2014-01-01

    Background: DNA barcodes are short unique sequences used to label DNA or RNA-derived samples in multiplexed deep sequencing experiments. During the demultiplexing step, barcodes must be detected and their position identified. In some cases (e. g., with PacBio SMRT), the position of the barcode and D

  11. Expression of four genes of bacteriophage MB78 from contiguous open reading frames: the genomic organization as deduced by sequence analysis.

    Science.gov (United States)

    Sharma, R; Datta, P; Chakravorty, M

    2000-01-01

    Four proteins of bacteriophage MB78 having apparent molecular weights as 35, 14, 21 and 16 kDa are expressed from 3.9 kb SalI-HindIII fragment located almost in the middle of the phage genome. Analysis of the sequence supported by some experimental evidences suggest that these four proteins are expressed from polycistronic message without any intercistronic gap. Stop and start codons of consecutive ORFs overlap and rare initiation codons are used. Computer analysis of the sequence suggests the presence of two more open reading frames within the ORFs of 35 and 16 kDa proteins but in the opposite orientation, i.e. in the complementary strand.

  12. Open reading frame sequencing and structure-based alignment of polypeptides encoded by RT1-Bb, RT1-Ba, RT1-Db, and RT1-Da alleles.

    Science.gov (United States)

    Ettinger, Ruth A; Moustakas, Antonis K; Lobaton, Suzanne D

    2004-11-01

    MHC class II genes are major genetic components in rats developing autoimmunity. The majority of rat MHC class II sequencing has focused on exon 2, which forms the first external domain. Sequence of the complete open reading frame for rat MHC class II haplotypes and structure-based alignment is lacking. Herein, the complete open reading frame for RT1-Bbeta, RT1-Balpha, RT1-Dbeta, and RT1-Dalpha was sequenced from ten different rat strains, covering eight serological haplotypes, namely a, b, c, d, k, l, n, and u. Each serological haplotype was unique at the nucleotide level of the sequenced RT1-B/D region. Within individual genes, the number of alleles identified was seven, seven, six, and three and the degree of amino-acid polymorphism between allotypes for each gene was 22%, 16%, 19%, and 0.4% for RT1-Bbeta, RT1-Balpha, RT1-Dbeta, and RT1-Dalpha, respectively. The extent and distribution of amino-acid polymorphism was comparable with mouse and human MHC class II. Structure-based alignment identified the beta65-66 deletion, the beta84a insertion, the alpha9a insertion, and the alpha1a-1c insertion in RT1-B previously described for H2-A. Rat allele-specific deletions were found at RT1-Balpha76 and RT1-Dbeta90-92. The mature RT1-Dbeta polypeptide was one amino acid longer than HLA-DRB1 due to the position of the predicted signal peptide cleavage site. These data are important to a comprehensive understanding of MHC class II structure-function and for mechanistic studies of rat models of autoimmunity.

  13. Re-annotation of the physical map of Glycine max for polyploid-like regions by BAC end sequence driven whole genome shotgun read assembly

    Directory of Open Access Journals (Sweden)

    Shultz Jeffry

    2008-07-01

    Full Text Available Abstract Background Many of the world's most important food crops have either polyploid genomes or homeologous regions derived from segmental shuffling following polyploid formation. The soybean (Glycine max genome has been shown to be composed of approximately four thousand short interspersed homeologous regions with 1, 2 or 4 copies per haploid genome by RFLP analysis, microsatellite anchors to BACs and by contigs formed from BAC fingerprints. Despite these similar regions,, the genome has been sequenced by whole genome shotgun sequence (WGS. Here the aim was to use BAC end sequences (BES derived from three minimum tile paths (MTP to examine the extent and homogeneity of polyploid-like regions within contigs and the extent of correlation between the polyploid-like regions inferred from fingerprinting and the polyploid-like sequences inferred from WGS matches. Results Results show that when sequence divergence was 1–10%, the copy number of homeologous regions could be identified from sequence variation in WGS reads overlapping BES. Homeolog sequence variants (HSVs were single nucleotide polymorphisms (SNPs; 89% and single nucleotide indels (SNIs 10%. Larger indels were rare but present (1%. Simulations that had predicted fingerprints of homeologous regions could be separated when divergence exceeded 2% were shown to be false. We show that a 5–10% sequence divergence is necessary to separate homeologs by fingerprinting. BES compared to WGS traces showed polyploid-like regions with less than 1% sequence divergence exist at 2.3% of the locations assayed. Conclusion The use of HSVs like SNPs and SNIs to characterize BACs wil improve contig building methods. The implications for bioinformatic and functional annotation of polyploid and paleopolyploid genomes show that a combined approach of BAC fingerprint based physical maps, WGS sequence and HSV-based partitioning of BAC clones from homeologous regions to separate contigs will allow reliable de

  14. Human herpesvirus 8 open reading frame 26 and open reading frame 65 sequences from multiple myeloma patients: a shared pattern not found in Kaposi's sarcoma or primary effusion lymphoma.

    Science.gov (United States)

    Ma, H J; Sjak-Shie, N N; Vescio, R A; Kaminsky, M; Mikail, A; Pold, M; Parker, K; Beksac, M; Belson, D; Moss, T J; Wu, C H; Zhou, J; Zhang, L; Chen, G; Said, J W; Berenson, J R

    2000-11-01

    Human herpesvirus 8 (HHV-8), also known as Kaposi's sarcoma-associated herpesvirus, has been implicated in the pathogenesis of Kaposi's sarcoma (KS), primary effusion lymphoma (PEL), multicentric Castleman's disease, and recently multiple myeloma (MM). DNA sequence analyses of HHV-8 suggest that multiple HHV-8 strains exist. We extracted DNA from 24 patients with MM and 3 patients with monoclonal gammopathy of undetermined significance and compared HHV-8 open reading frames (ORFs) 26 and 65 sequences with those derived from patients with KS, PEL, and two HHV-8-positive PEL cell lines KS-1 and BC-1. ORF26 sequence data suggest that MM patients are consistently carriers of HHV-8 strain subtype C3. All MM patients also consistently revealed either a single bp deletion or substitution at position 112197 in ORF65. This unique alteration is not present in patients with KS or PEL or in PEL cell lines. It occurs in the portion of ORF65 that is known to be responsible for a serological response to HHV-8.

  15. Fast and sensitive alignment of microbial whole genome sequencing reads to large sequence datasets on a desktop PC: application to metagenomic datasets and pathogen identification.

    Directory of Open Access Journals (Sweden)

    Lőrinc S Pongor

    Full Text Available Next generation sequencing (NGS of metagenomic samples is becoming a standard approach to detect individual species or pathogenic strains of microorganisms. Computer programs used in the NGS community have to balance between speed and sensitivity and as a result, species or strain level identification is often inaccurate and low abundance pathogens can sometimes be missed. We have developed Taxoner, an open source, taxon assignment pipeline that includes a fast aligner (e.g. Bowtie2 and a comprehensive DNA sequence database. We tested the program on simulated datasets as well as experimental data from Illumina, IonTorrent, and Roche 454 sequencing platforms. We found that Taxoner performs as well as, and often better than BLAST, but requires two orders of magnitude less running time meaning that it can be run on desktop or laptop computers. Taxoner is slower than the approaches that use small marker databases but is more sensitive due the comprehensive reference database. In addition, it can be easily tuned to specific applications using small tailored databases. When applied to metagenomic datasets, Taxoner can provide a functional summary of the genes mapped and can provide strain level identification. Taxoner is written in C for Linux operating systems. The code and documentation are available for research applications at http://code.google.com/p/taxoner.

  16. Qualitative de novo analysis of full length cDNA and quantitative analysis of gene expression for common marmoset (Callithrix jacchus) transcriptomes using parallel long-read technology and short-read sequencing.

    Science.gov (United States)

    Shimizu, Makiko; Iwano, Shunsuke; Uno, Yasuhiro; Uehara, Shotaro; Inoue, Takashi; Murayama, Norie; Onodera, Jun; Sasaki, Erika; Yamazaki, Hiroshi

    2014-01-01

    The common marmoset (Callithrix jacchus) is a non-human primate that could prove useful as human pharmacokinetic and biomedical research models. The cytochromes P450 (P450s) are a superfamily of enzymes that have critical roles in drug metabolism and disposition via monooxygenation of a broad range of xenobiotics; however, information on some marmoset P450s is currently limited. Therefore, identification and quantitative analysis of tissue-specific mRNA transcripts, including those of P450s and flavin-containing monooxygenases (FMO, another monooxygenase family), need to be carried out in detail before the marmoset can be used as an animal model in drug development. De novo assembly and expression analysis of marmoset transcripts were conducted with pooled liver, intestine, kidney, and brain samples from three male and three female marmosets. After unique sequences were automatically aligned by assembling software, the mean contig length was 718 bp (with a standard deviation of 457 bp) among a total of 47,883 transcripts. Approximately 30% of the total transcripts were matched to known marmoset sequences. Gene expression in 18 marmoset P450- and 4 FMO-like genes displayed some tissue-specific patterns. Of these, the three most highly expressed in marmoset liver were P450 2D-, 2E-, and 3A-like genes. In extrahepatic tissues, including brain, gene expressions of these monooxygenases were lower than those in liver, although P450 3A4 (previously P450 3A21) in intestine and P450 4A11- and FMO1-like genes in kidney were relatively highly expressed. By means of massive parallel long-read sequencing and short-read technology applied to marmoset liver, intestine, kidney, and brain, the combined next-generation sequencing analyses reported here were able to identify novel marmoset drug-metabolizing P450 transcripts that have until now been little reported. These results provide a foundation for mechanistic studies and pave the way for the use of marmosets as model animals

  17. Genome-wide detection of chromosomal rearrangements, indels, and mutations in circular chromosomes by short read sequencing

    DEFF Research Database (Denmark)

    Skovgaard, Ole; Bak, Mads; Løbner-Olesen, Anders;

    2011-01-01

    a combination of WGS and genome copy number analysis, for the identification of mutations that suppress the growth deficiency imposed by excessive initiations from the Escherichia coli origin of replication, oriC. The E. coli chromosome, like the majority of bacterial chromosomes, is circular, and DNA...... replication is initiated by assembling two replication complexes at the origin, oriC. These complexes then replicate the chromosome bidirectionally toward the terminus, ter. In a population of growing cells, this results in a copy number gradient, so that origin-proximal sequences are more frequent than...... origin-distal sequences. Major rearrangements in the chromosome are, therefore, readily identified by changes in copy number, i.e., certain sequences become over- or under-represented. Of the eight mutations analyzed in detail here, six were found to affect a single gene only, one was a large chromosomal...

  18. Leveraging long read sequencing from a single individual to provide a comprehensive resource for benchmarking variant calling methods.

    Science.gov (United States)

    Mu, John C; Tootoonchi Afshar, Pegah; Mohiyuddin, Marghoob; Chen, Xi; Li, Jian; Bani Asadi, Narges; Gerstein, Mark B; Wong, Wing H; Lam, Hugo Y K

    2015-09-28

    A high-confidence, comprehensive human variant set is critical in assessing accuracy of sequencing algorithms, which are crucial in precision medicine based on high-throughput sequencing. Although recent works have attempted to provide such a resource, they still do not encompass all major types of variants including structural variants (SVs). Thus, we leveraged the massive high-quality Sanger sequences from the HuRef genome to construct by far the most comprehensive gold set of a single individual, which was cross validated with deep Illumina sequencing, population datasets, and well-established algorithms. It was a necessary effort to completely reanalyze the HuRef genome as its previously published variants were mostly reported five years ago, suffering from compatibility, organization, and accuracy issues that prevent their direct use in benchmarking. Our extensive analysis and validation resulted in a gold set with high specificity and sensitivity. In contrast to the current gold sets of the NA12878 or HS1011 genomes, our gold set is the first that includes small variants, deletion SVs and insertion SVs up to a hundred thousand base-pairs. We demonstrate the utility of our HuRef gold set to benchmark several published SV detection tools.

  19. The nucleotide sequence of a 39 kb segment of yeast chromosome IV: 12 new open reading frames, nine known genes and one genes for Gly-tRNA.

    Science.gov (United States)

    Bahr, A; Möller-Rieker, S; Hankeln, T; Kraemer, C; Protin, U; Schmidt, E R

    1997-02-01

    The complete nucleotide sequence of a 39,090 bp segment from the left arm of yeast chromosome IV was determined. Twenty-one open reading frames (ORFs) longer than 100 amino acids and a Gly-tRNA gene were discovered. Nine of the 21 ORFs (D0892, D1022, D1037, D1045, D1057, D1204, D1209, D1214, D1219) correspond to the previously sequenced Saccharomyces cerevisiae genes for the NAD-dependent glutamate dehydrogenase (GDH), the secretory component (SHR3), the GABA transport protein (UGA4), the high mobility group-like protein (NHP2), the hydroxymethylbilane synthase (HEM3), the methylated DNA protein-cysteine S-methyltransferase (MGT1), a putative sugar transport protein, the Shm1 protein (SHM1) and the anti-silencing protein (ASF2). The inferred amino acid sequences of 11 ORFs show significant similarity with known proteins from various organisms, whereas the remaining ORF does not share any similarity with known proteins.

  20. De Novo Assembly of Human Herpes Virus Type 1 (HHV-1) Genome, Mining of Non-Canonical Structures and Detection of Novel Drug-Resistance Mutations Using Short- and Long-Read Next Generation Sequencing Technologies.

    Science.gov (United States)

    Karamitros, Timokratis; Harrison, Ian; Piorkowska, Renata; Katzourakis, Aris; Magiorkinis, Gkikas; Mbisa, Jean Lutamyo

    2016-01-01

    Human herpesvirus type 1 (HHV-1) has a large double-stranded DNA genome of approximately 152 kbp that is structurally complex and GC-rich. This makes the assembly of HHV-1 whole genomes from short-read sequencing data technically challenging. To improve the assembly of HHV-1 genomes we have employed a hybrid genome assembly protocol using data from two sequencing technologies: the short-read Roche 454 and the long-read Oxford Nanopore MinION sequencers. We sequenced 18 HHV-1 cell culture-isolated clinical specimens collected from immunocompromised patients undergoing antiviral therapy. The susceptibility of the samples to several antivirals was determined by plaque reduction assay. Hybrid genome assembly resulted in a decrease in the number of contigs in 6 out of 7 samples and an increase in N(G)50 and N(G)75 of all 7 samples sequenced by both technologies. The approach also enhanced the detection of non-canonical contigs including a rearrangement between the unique (UL) and repeat (T/IRL) sequence regions of one sample that was not detectable by assembly of 454 reads alone. We detected several known and novel resistance-associated mutations in UL23 and UL30 genes. Genome-wide genetic variability ranged from assembly of accurate, full-length HHV-1 genomes will be useful in determining genetic determinants of drug resistance, virulence, pathogenesis and viral evolution. The numerous, complex repeat regions of the HHV-1 genome currently remain a barrier towards this goal.

  1. Characterization of genome-wide microsatellites of Saccharina japonica based on a preliminary assembly of Illumina sequencing reads

    Science.gov (United States)

    Zhang, Linan; Peng, Jie; Li, Xiaojie; Cui, Cuiju; Sun, Juan; Yang, Guanpin

    2016-06-01

    Microsatellites or simple sequence repeats (SSR) function widely and locate dependently in genome. However, their characteristics are often ignored due to the lack of genomic sequences of most species. Kelp ( Saccharina japonica), a brown macroalga, is extensively cultured in China. In this study, the genome of S. japonica was surveyed using an Illumina sequencing platform, and its microsatellites were characterized. The preliminarily assembled genome was 469.4 Mb in size, with a scaffold N50 of 20529 bp. Among the 128370 identified microsatellites, 90671, 25726 and 11973 were found in intergenic regions, introns and exons, averaging 339.3, 178.8 and 205.4 microsatellites per Mb, respectively. These microsatellites distributed unevenly in S. japonica genome. Mononucleotide motifs were the most abundant in the genome, while trinucleotide ones were the most prevalent in exons. The microsatellite abundance decreased significantly with the increase of motif repeat numbers, and the microsatellites with a small number of repeats accounted for a higher proportion of the exons than those of the intergenic regions and introns. C/G-rich motifs were more common in exons than in intergenic regions and introns. These characteristics of microsatellites in S. japonica genome may associate with their functions, and ultimately their adaptation and evolution. Among the 120140 pairs of designed microsatellite primers, approximately 75% were predicted to be able to amplify S. japonica DNA. These microsatellite markers will be extremely useful for the genetic breeding and population evolution studies of kelp.

  2. Sequence analysis of a 9873 bp fragment of the left arm of yeast chromosome XV that contains the ARG8 and CDC33 genes, a putative riboflavin synthase beta chain gene, and four new open reading frames.

    Science.gov (United States)

    Casas, C; Aldea, M; Casamayor, A; Lafuente, M J; Gamo, F J; Gancedo, C; Ariño, J; Herrero, E

    1995-09-15

    The DNA sequence of a 9873 bp fragment located near the left telomere of chromosome XV has been determined. Sequence analysis reveals seven open reading frames. One is the ARG8 gene coding for N-acetylornithine aminotransferase. Another corresponds to CDC33, which codes for the initiation factor 4E or cap binding protein. The open reading frame AOE169 can be considered as the putative gene for the Saccharomyces cerevisiae riboflavin synthase beta chain, since its translation product shows strong homology with four prokaryotic riboflavin synthase beta chains.

  3. Comprehensive definition of genome features in Spirodela polyrhiza by high-depth physical mapping and short-read DNA sequencing strategies.

    Science.gov (United States)

    Michael, Todd P; Bryant, Douglas; Gutierrez, Ryan; Borisjuk, Nikolai; Chu, Philomena; Zhang, Hanzhong; Xia, Jing; Zhou, Junfei; Peng, Hai; El Baidouri, Moaine; Ten Hallers, Boudewijn; Hastie, Alex R; Liang, Tiffany; Acosta, Kenneth; Gilbert, Sarah; McEntee, Connor; Jackson, Scott A; Mockler, Todd C; Zhang, Weixiong; Lam, Eric

    2017-02-01

    Spirodela polyrhiza is a fast-growing aquatic monocot with highly reduced morphology, genome size and number of protein-coding genes. Considering these biological features of Spirodela and its basal position in the monocot lineage, understanding its genome architecture could shed light on plant adaptation and genome evolution. Like many draft genomes, however, the 158-Mb Spirodela genome sequence has not been resolved to chromosomes, and important genome characteristics have not been defined. Here we deployed rapid genome-wide physical maps combined with high-coverage short-read sequencing to resolve the 20 chromosomes of Spirodela and to empirically delineate its genome features. Our data revealed a dramatic reduction in the number of the rDNA repeat units in Spirodela to fewer than 100, which is even fewer than that reported for yeast. Consistent with its unique phylogenetic position, small RNA sequencing revealed 29 Spirodela-specific microRNA, with only two being shared with Elaeis guineensis (oil palm) and Musa balbisiana (banana). Combining DNA methylation data and small RNA sequencing enabled the accurate prediction of 20.5% long terminal repeats (LTRs) that doubled the previous estimate, and revealed a high Solo:Intact LTR ratio of 8.2. Interestingly, we found that Spirodela has the lowest global DNA methylation levels (9%) of any plant species tested. Taken together our results reveal a genome that has undergone reduction, likely through eliminating non-essential protein coding genes, rDNA and LTRs. In addition to delineating the genome features of this unique plant, the methodologies described and large-scale genome resources from this work will enable future evolutionary and functional studies of this basal monocot family. © 2016 The Authors The Plant Journal © 2016 John Wiley & Sons Ltd.

  4. Long read nanopore sequencing for detection of HLA and CYP2D6 variants and haplotypes [v1; ref status: indexed, http://f1000r.es/4zj

    Directory of Open Access Journals (Sweden)

    Ron Ammar

    2015-01-01

    Full Text Available Haplotypes are often critical for the interpretation of genetic laboratory observations into medically actionable findings. Current massively parallel DNA sequencing technologies produce short sequence reads that are often unable to resolve haplotype information. Phasing short read data typically requires supplemental statistical phasing based on known haplotype structure in the population or parental genotypic data. Here we demonstrate that the MinION nanopore sequencer is capable of producing very long reads to resolve both variants and haplotypes of HLA-A, HLA-B and CYP2D6 genes important in determining patient drug response in sample NA12878 of CEPH/UTAH pedigree 1463, without the need for statistical phasing. Long read data from a single 24-hour nanopore sequencing run was used to reconstruct haplotypes, which were confirmed by HapMap data and statistically phased Complete Genomics and Sequenom genotypes. Our results demonstrate that nanopore sequencing is an emerging standalone technology with potential utility in a clinical environment to aid in medical decision-making.

  5. Long read nanopore sequencing for detection of HLA and CYP2D6 variants and haplotypes [v2; ref status: indexed, http://f1000r.es/5ev

    Directory of Open Access Journals (Sweden)

    Ron Ammar

    2015-05-01

    Full Text Available Haplotypes are often critical for the interpretation of genetic laboratory observations into medically actionable findings. Current massively parallel DNA sequencing technologies produce short sequence reads that are often unable to resolve haplotype information. Phasing short read data typically requires supplemental statistical phasing based on known haplotype structure in the population or parental genotypic data. Here we demonstrate that the MinION nanopore sequencer is capable of producing very long reads to resolve both variants and haplotypes of HLA-A, HLA-B and CYP2D6 genes important in determining patient drug response in sample NA12878 of CEPH/UTAH pedigree 1463, without the need for statistical phasing. Long read data from a single 24-hour nanopore sequencing run was used to reconstruct haplotypes, which were confirmed by HapMap data and statistically phased Complete Genomics and Sequenom genotypes. Our results demonstrate that nanopore sequencing is an emerging standalone technology with potential utility in a clinical environment to aid in medical decision-making.

  6. The sequence of a 8 kb segment on the right arm of yeast chromosome VII identifies four new open reading frames and the genes for yTAFII145.

    Science.gov (United States)

    Ruzzi, M; Marconi, A; Saliola, M; Fabiani, L; Montebove, F; Frontali, L

    1997-03-30

    We report the sequence of a 8,061 bp fragment of Saccharomyces cerevisiae chromosome VII. Five open reading frames (ORFs) of at least 100 amino acids were identified. Three show similarities to the amino-acid sequence of known gene products. ORF G9374 corresponds to the gene coding for the yTAFII145 protein: a TBP-associated factor whose amino-acid sequence was previously reported (Reese et al., 1994). The remaining ORF does not display similarities to known sequences.

  7. De novo genome assembly and annotation of Australia's largest freshwater fish, the Murray cod (Maccullochella peelii), from Illumina and Nanopore sequencing read.

    Science.gov (United States)

    Austin, Christopher M; Tan, Mun Hua; Harrisson, Katherine A; Lee, Yin Peng; Croft, Laurence J; Sunnucks, Paul; Pavlova, Alexandra; Gan, Han Ming

    2017-08-01

    One of the most iconic Australian fish is the Murray cod, Maccullochella peelii (Mitchell 1838), a freshwater species that can grow to ∼1.8 metres in length and live to age ≥48 years. The Murray cod is of a conservation concern as a result of strong population contractions, but it is also popular for recreational fishing and is of growing aquaculture interest. In this study, we report the whole genome sequence of the Murray cod to support ongoing population genetics, conservation, and management research, as well as to better understand the evolutionary ecology and history of the species. A draft Murray cod genome of 633 Mbp (N50 = 109 974bp; BUSCO and CEGMA completeness of 94.2% and 91.9%, respectively) with an estimated 148 Mbp of putative repetitive sequences was assembled from the combined sequencing data of 2 fish individuals with an identical maternal lineage; 47.2 Gb of Illumina HiSeq data and 804 Mb of Nanopore data were generated from the first individual while 23.2 Gb of Illumina MiSeq data were generated from the second individual. The inclusion of Nanopore reads for scaffolding followed by subsequent gap-closing using Illumina data led to a 29% reduction in the number of scaffolds and a 55% and 54% increase in the scaffold and contig N50, respectively. We also report the first transcriptome of Murray cod that was subsequently used to annotate the Murray cod genome, leading to the identification of 26 539 protein-coding genes. We present the whole genome of the Murray cod and anticipate this will be a catalyst for a range of genetic, genomic, and phylogenetic studies of the Murray cod and more generally other fish species of the Percichthydae family. © The Authors 2017. Published by Oxford University Press.

  8. A not-so-big crisis: re-reading Silurian conodont diversity in a sequence-stratigraphic framework

    Science.gov (United States)

    Jarochowska, Emilia; Munnecke, Axel

    2016-04-01

    Conodonts are extensively used in Ordovician through Triassic biostratigraphy and fossil-based geochemistry. However, their distribution in rock successions is commonly taken at face value, without taking into account their diverse and poorly understood ecology. Multielement taxonomy, ontogenetic and environmental variability, difficulties in extraction, and relative rarity all contribute to the general lack of quantitative studies on conodont stratigraphic distribution and temporal turnover. With respect to Silurian conodonts, the concept of recurrent conodont extinction events - the so called Ireviken, Mulde and Lau events - has become a standard in the stratigraphic literature. The concept has been proposed based on qualitative observations of local extirpations of open-marine pelagic or nekto-benthic taxa and temporary dominance of shallow-water species in the Silurian succession of the Swedish island of Gotland. These changes coincided with positive carbon isotope excursions, abrupt facies shifts, "blooms" of benthic fauna, and changes in reef communities, which have all been combined into a general view of Silurian bio-geochemical events. This view posits a deterministic, reproducible pattern in Silurian conodont diversity, attributed to recurrent ecological or geochemical conditions. The growing body of sequence-stratigraphic interpretations across these events in Gotland and other sections worldwide indicate that in all cases the Silurian "events" are associated with rapid global regressions. This suggests that faunal changes such as the dominance of shallow-water, low-diversity conodont fauna and the increase of benthic invertebrate diversity and abundance represent predictable consequences of the variation in the completeness of the rock record and preservation potential of different environments. Our studies in Poland and Ukraine indicate that the magnitude of change in the taxonomic composition of conodont assemblages across the middle Silurian global

  9. About Reading

    Institute of Scientific and Technical Information of China (English)

    独行墨客

    2004-01-01

    As for reading and for learning, reading rate (that is, words per minute, WPM) is important, especially for students who have to pass some reading test. How to compute your reading rate? You may know it after reading the following. Reading Rate (WPM) = Total number of words + reading time.

  10. The sequence of an 11.1 kb fragment on the left arm of Saccharomyces cerevisiae chromosome VII reveals six open reading frames including NSP49, KEM1 and four putative new genes.

    Science.gov (United States)

    Bertani, I; Coglievina, M; Zaccaria, P; Klima, R; Bruschi, C V

    1995-09-30

    We report the sequence of an 11.1 kb fragment located on the left arm of chromosome VII of Saccharomyces cerevisiae. By sequence analysis we have detected six open reading frames (ORFs) longer that 300 bp, which cover 87% of the entire sequence. ORF G1645 is 100% identical to the KEM1 gene, also identified as DST2, XRN1, SEP1 and RAR5, while G1648 is 100% identical to the NSP49 or NUP49 gene. ORF G1642 shares some identity with a hypothetical protein of Caenorhabditis elegans, while the other four ORFs show no significant homology to known proteins.

  11. Cerebral Laterality and Reading.

    Science.gov (United States)

    Mackworth, Jane F.

    Recent research has confirmed that hemispheric patterns of dominance are related to reading skills. Reading is more complex than speech because it includes a visuo-spatial element. In the great majority of people, the left hemisphere deals with speech and sequencing skills. Visual matching of printed words requires the spatial skills of the right…

  12. Reading faster

    OpenAIRE

    Paul Nation

    2009-01-01

    This article describes the visual nature of the reading process as it relates to reading speed. It points out that there is a physical limit on normal reading speed and beyond this limit the reading process will be different from normal reading where almost every word is attended to. The article describes a range of activities for developing reading fluency, and suggests how the development of fluency can become part of a reading programme.

  13. Reading faster

    Directory of Open Access Journals (Sweden)

    Paul Nation

    2009-12-01

    Full Text Available This article describes the visual nature of the reading process as it relates to reading speed. It points out that there is a physical limit on normal reading speed and beyond this limit the reading process will be different from normal reading where almost every word is attended to. The article describes a range of activities for developing reading fluency, and suggests how the development of fluency can become part of a reading programme.

  14. extendFromReads

    Energy Technology Data Exchange (ETDEWEB)

    2013-10-03

    This package assists in genome assembly. extendFromReads takes as input a set of Illumina (eg, MiSeq) DNA sequencing reads, a query seed sequence and a direction to extend the seed. The algorithm collects all seed--]matching reads (flipping reverse--]orientation hits), trims off the seed and additional sequence in the other direction, sorts the remaining sequences alphabetically, and prints them aligned without gaps from the point of seed trimming. This produces a visual display distinguishing the flanks of multi-]copy seeds. A companion script hitMates.pl collects the mates of seed--]hi]ng reads, whose alignment reveals longer extensions from the seed. The collect/trim/sort strategy was made iterative and scaled up in the script denovo.pl, for de novo contig assembly. An index is pre--]built using indexReads.pl that for each unique 21--]mer found in all the reads, records its gfateh of extension (whether extendable, blocked by low coverage, or blocked by branching after a duplicated sequence) and other characteristics. Importantly, denovo.pl records all branchings that follow a branching contig endpoint, providing contig-]extension information

  15. Generation of iPS cells using defined factors linked via the self-cleaving 2A sequences in a single open reading frame

    Science.gov (United States)

    Shao, Lijian; Feng, Wei; Sun, Yan; Bai, Hao; Liu, Jun; Currie, Caroline; Kim, Jaejung; Gama, Rafael; Wang, Zack; Qian, Zhijian; Liaw, Lucy; Wu, Wen-Shu

    2010-01-01

    Generation of induced pluripotent stem (iPS) cells from somatic cells has been achieved successfully by simultaneous viral transduction of defined reprogramming transcription factors (TFs). However, the process requires multiple viral vectors for gene delivery. As a result, generated iPS cells harbor numerous viral integration sites in their genomes. This can increase the probability of gene mutagenesis and genomic instability, and present significant barriers to both research and clinical application studies of iPS cells. In this paper, we present a simple lentivirus reprogramming system in which defined factors are fused in-frame into a single open reading frame (ORF) via self-cleaving 2A sequences. A GFP marker is placed downstream of the transgene to enable tracking of transgene expression. We demonstrate that this polycistronic expression system efficiently generates iPS cells. The generated iPS cells have normal karyotypes and are similar to mouse embryonic stem cells in morphology and gene expression. Moreover, they can differentiate into cell types of the three embryonic germ layers in both in vitro and in vivo assays. Remarkably, most of these iPS cells only harbor a single copy of viral vector. This system provides a valuable tool for generation of iPS cells, and our data suggest that the balance of expression of transduced reprogramming TFs in each cell is essential for the reprogramming process. More importantly, when delivered by non-integrating gene-delivery systems, this re-engineered single ORF will facilitate efficient generation of human iPS cells free of genetic modifications. PMID:19238173

  16. Generation of Ips cells using defined factors linked via the self-cleaving 2A sequences in a single open reading frame

    Institute of Scientific and Technical Information of China (English)

    Lijian Shao; Wei Feng; Yan Sun; Hao Bai; Jun Liu; Caroline Currie; Jaejung Kim; Rafael Gama; Zack Wang; Zhijian Qian; Lucy Liaw; Wen-Shu Wu

    2009-01-01

    Generation of induced pluripotent stem (iPS) cells from somatic cells has been achieved successfully by simultane-ous viral transduction of defined reprogramming transcription factors (TFs). However, the process requires multiple viral vectors for gene delivery. As a result, generated iPS cells harbor numerous viral integration sites in their ge-nomes. This can increase the probability of gene mutagenesis and genomic instability, and present significant barriers to both research and clinical application studies of iPS cells. In this paper, we present a simple lentivirus reprogram-ming system in which defined factors are fused in-frame into a single open reading frame (ORF) via self-cleaving 2A sequences. A GFP marker is placed downstream of the transgene to enable tracking of transgene expression. We demonstrate that this polycistronic expression system efficiently generates iPS cells. The generated iPS cells have nor-mal karyotypes and are similar to mouse embryonic stem cells in morphology and gene expression. Moreover, they can differentiate into cell types of the three embryonic germ layers in both in vitro and in vivo assays. Remarkably, most of these iPS cells only harbor a single copy of viral vector. This system provides a valuable tool for generation of iPS cells, and our data suggest that the balance of expression of transduced reprogramming TFs in each cell is essen-tial for the reprogramming process. More importantly, when delivered by non-integrating gene-delivery systems, this re-engineered single ORF will facilitate efficient generation of human iPS cells free of genetic modifications.

  17. Coverage-based consensus calling (CbCC) of short sequence reads and comparison of CbCC results to identify SNPs in chickpea (Cicer arietinum; Fabaceae), a crop species without a reference genome.

    Science.gov (United States)

    Azam, Sarwar; Thakur, Vivek; Ruperao, Pradeep; Shah, Trushar; Balaji, Jayashree; Amindala, BhanuPrakash; Farmer, Andrew D; Studholme, David J; May, Gregory D; Edwards, David; Jones, Jonathan D G; Varshney, Rajeev K

    2012-02-01

    Next-generation sequencing (NGS) technologies are frequently used for resequencing and mining of single nucleotide polymorphisms (SNPs) by comparison to a reference genome. In crop species such as chickpea (Cicer arietinum) that lack a reference genome sequence, NGS-based SNP discovery is a challenge. Therefore, unlike probability-based statistical approaches for consensus calling and by comparison with a reference sequence, a coverage-based consensus calling (CbCC) approach was applied and two genotypes were compared for SNP identification. A CbCC approach is used in this study with four commonly used short read alignment tools (Maq, Bowtie, Novoalign, and SOAP2) and 15.7 and 22.1 million Illumina reads for chickpea genotypes ICC4958 and ICC1882, together with the chickpea trancriptome assembly (CaTA). A nonredundant set of 4543 SNPs was identified between two chickpea genotypes. Experimental validation of 224 randomly selected SNPs showed superiority of Maq among individual tools, as 50.0% of SNPs predicted by Maq were true SNPs. For combinations of two tools, greatest accuracy (55.7%) was reported for Maq and Bowtie, with a combination of Bowtie, Maq, and Novoalign identifying 61.5% true SNPs. SNP prediction accuracy generally increased with increasing reads depth. This study provides a benchmark comparison of tools as well as read depths for four commonly used tools for NGS SNP discovery in a crop species without a reference genome sequence. In addition, a large number of SNPs have been identified in chickpea that would be useful for molecular breeding.

  18. Oncological whole-body staging in integrated {sup 18}F-FDG PET/MR: Value of different MR sequences for simultaneous PET and MR reading

    Energy Technology Data Exchange (ETDEWEB)

    Schaarschmidt, Benedikt M., E-mail: benedikt.schaarschmidt@med.uni-duesseldorf.de [Univ Dusseldorf, Medical Faculty, Department of Diagnostic and Interventional Radiology, Dusseldorf (Germany); Univ Duisburg-Essen, Medical Faculty, Department of Diagnostic and Interventional Radiology and Neuroradiology, Essen (Germany); Grueneisen, Johannes [Univ Duisburg-Essen, Medical Faculty, Department of Diagnostic and Interventional Radiology and Neuroradiology, Essen (Germany); Heusch, Philipp, E-mail: Philipp.Heusch@med.uni-duesseldorf.de [Univ Dusseldorf, Medical Faculty, Department of Diagnostic and Interventional Radiology, Dusseldorf (Germany); Gomez, Benedikt [Univ Duisburg-Essen, Medical Faculty, Department of Nuclear Medicine, D-45147 Essen (Germany); Beiderwellen, Karsten [Univ Duisburg-Essen, Medical Faculty, Department of Diagnostic and Interventional Radiology and Neuroradiology, Essen (Germany); Ruhlmann, Verena [Univ Duisburg-Essen, Medical Faculty, Department of Nuclear Medicine, D-45147 Essen (Germany); Umutlu, Lale [Univ Duisburg-Essen, Medical Faculty, Department of Diagnostic and Interventional Radiology and Neuroradiology, Essen (Germany); Quick, Harald H. [Erwin L. Hahn Institute for Magnetic Resonance Imaging, University of Duisburg-Essen, Essen (Germany); High Field and Hybrid MR Imaging, University Hospital Essen, Essen (Germany); Antoch, Gerald; Buchbender, Christian [Univ Dusseldorf, Medical Faculty, Department of Diagnostic and Interventional Radiology, Dusseldorf (Germany)

    2015-07-15

    Highlights: • We assessed the value of different MR sequences for simultaneous PET and MR reading. • Two quality markers were evaluated intraindividually and in comparison to PET/CT. • T2, TIRM, and contrast-enhanced T1 have a similar quality as contrast-enhanced PET/CT. - Abstract: Objective: To evaluate different magnetic resonance (MR) imaging sequences in integrated positron emission tomography (PET)/MR concerning their ability to detect tumors and allocate increased radionuclide uptake on {sup 18}F-fluorodeoxyglucose ({sup 18}F-FDG) PET in intraindividual comparison with computed tomography (CT) from PET/CT. Material and methods: Sixty-one patients (34 female, 27 male, mean age 57.6 y) who were examined with contrast-enhanced PET/CT and subsequent PET/MR (mean delay for PET/MR after injection: 147 ± 43 min) were included. A maximum of ten {sup 18}F-FDG-avid lesions per patient were analyzed on CT from PET/CT and with the following MR sequences from PET/MR: T2, turbo inversion recovery magnitude (TIRM), non-enhanced T1, contrast-enhanced T1, and diffusion-weighted imaging (DWI). All lesions were rated using a four-point ordinal scale (scored from 0 to 3) concerning visual detectability of the lesion against the surrounding background and anatomical allocation of the PET finding. In each category (detectability and allocation), Wilcoxon rank sum tests were performed. Bonferroni–Holm correction was performed to prevent α-error accumulation. Results: In 225 {sup 18}F-FDG-avid lesions (156 confirmed as malignant by radiological follow up, 69 by histopathology), visual detectability was comparably high on CT (mean: 2.5 ± 0.9), TIRM (mean: 2.5 ± 0.9), T2 (mean: 2.4 ± 0.9), and DWI (mean: 2.5 ± 1.0) and was significantly higher than on non-enhanced T1 (mean: 2.2 ± 1.0). While anatomic allocation of the PET finding was comparable with CT (mean: 2.6 ± 0.7), T2 (mean: 2.6 ± 0.7), and TIRM (mean: 2.8 ± 0.7), it was significantly higher compared to DWI

  19. Reading Comics

    Science.gov (United States)

    Tilley, Carol L.

    2008-01-01

    Many adults, even librarians who willingly add comics to their collections, often dismiss the importance of comics. Compared to reading "real" books, reading comics appears to be a simple task and compared to reading no books, reading comics might be preferable. After all, comics do have words, but the plentiful pictures seem to carry most of the…

  20. Reading Comics

    Science.gov (United States)

    Tilley, Carol L.

    2008-01-01

    Many adults, even librarians who willingly add comics to their collections, often dismiss the importance of comics. Compared to reading "real" books, reading comics appears to be a simple task and compared to reading no books, reading comics might be preferable. After all, comics do have words, but the plentiful pictures seem to carry most of the…

  1. Mappability and Read Length

    Directory of Open Access Journals (Sweden)

    Wentian eLi

    2014-11-01

    Full Text Available Power-law distributions are the main functional form forthe distribution of repeat size and repeat copy number in the human genome. When the genome is broken into fragments for sequencing, the limited size offragments and reads may prevent an unique alignment of repeatsequences to the reference sequence. Repeats in the human genome canbe as long as $10^4$ bases, or $10^5-10^6$ bases when allowing for mismatches between repeat units. Sequence reads from these regions are therefore unmappable when the read length is in the range of $10^3$ bases.With the read length of exactly 1000 bases, slightly more than 1% of theassembled genome, and slightly less than 1% of the 1kbreads, are unmappable, excluding the unassembled portion of the humangenome (8% in GRCh37. The slow decay (long tail ofthe power-law function implies a diminishing return in convertingunmappable regions/reads to become mappable with the increase of theread length, with the understanding that increasing read length willalways move towards the direction of 100% mappability.

  2. Methods for Allocating Ambiguous Short-reads

    National Research Council Canada - National Science Library

    Taub, Margaret; Lipson, Doron; Speed, Terence P

    2010-01-01

    With the rise in prominence of biological research using new short-read DNA sequencing technologies comes the need for new techniques for aligning and assigning these reads to their genomic location of origin...

  3. Information Theory of DNA Sequencing

    CERN Document Server

    Motahari, Abolfazl; Tse, David

    2012-01-01

    DNA sequencing is the basic workhorse of modern day biology and medicine. Shotgun sequencing is the dominant technique used: many randomly located short fragments called reads are extracted from the DNA sequence, and these reads are assembled to reconstruct the original sequence. By drawing an analogy between the DNA sequencing problem and the classic communication problem, we define an information theoretic notion of sequencing capacity. This is the maximum number of DNA base pairs that can be resolved reliably per read, and provides a fundamental limit to the performance that can be achieved by any assembly algorithm. We compute the sequencing capacity explicitly for a simple statistical model of the DNA sequence and the read process. Using this framework, we also study the impact of noise in the read process on the sequencing capacity.

  4. ReadDB Provides Efficient Storage for Mapped Short Reads

    Directory of Open Access Journals (Sweden)

    Gifford David K

    2011-07-01

    Full Text Available Abstract Background The advent of high-throughput sequencing has enabled sequencing based measurements of cellular function, with an individual measurement potentially consisting of more than 108 reads. While tools are available for aligning sets of reads to genomes and interpreting the results, fewer tools have been developed to address the storage and retrieval requirements of large collections of aligned datasets. We present ReadDB, a network accessible column store database system for aligned high-throughput read datasets. Results ReadDB stores collections of aligned read positions and provides a client interface to support visualization and analysis. ReadDB is implemented as a network server that responds to queries on genomic intervals in an experiment with either the set of contained reads or a histogram based interval summary. Tests on datasets ranging from 105 to 108 reads demonstrate that ReadDB performance is generally within a factor of two of local-storage based methods and often three to five times better than other network-based methods. Conclusions ReadDB is a high-performance foundation for ChIP-Seq and RNA-Seq analysis. The client-server model provides convenient access to compute cluster nodes or desktop visualization software without requiring a shared network filesystem or large amounts of local storage. The client code provides a simple interface for fast data access to visualization or analysis. ReadDB provides a new way to store genome-aligned reads for use in applications where read sequence and alignment mismatches are not needed.

  5. Procedural versus Narrative Cross-Language Priming and Bilingual Children's Reading and Sentence Sequencing of Same Genre and Opposite Genre Text in the Other Language

    Science.gov (United States)

    Vital, Hedva; Karniol, Rachel

    2011-01-01

    How bilingual children represent procedural versus narrative text is important for both pedagogical and theoretical reasons. To examine this issue, bilingual children and children learning English as a Second Language (ESL) read Hebrew sentences comprising either a procedural (i.e., "how to") or a narrative text (i.e., description of "doing") and…

  6. Multicultural Reading

    Science.gov (United States)

    Veltze, Linda

    2004-01-01

    Multicultural reading advocates believe in the power of literature to transform and to change people's lives. They take seriously the arguments that racism and prejudice can be lessened through multicultural reading, and also that children from undervalued societal groups who read books that depict people like themselves in a positive light will…

  7. Sequence and functional analysis of a 7.2 kb DNA fragment containing four open reading frames located between RPB5 and CDC28 on the right arm of chromosome II.

    Science.gov (United States)

    Rose, M; Kiesau, P; Proft, M; Entian, K D

    1995-07-01

    In a coordinated approach, several laboratories sequenced Saccharomyces cerevisiae chromosome II during the European BRIDGE project. Here we report on the sequence and functional analysis of a 7217 bp fragment located on the right arm of chromosome II between RPB5 and CDC28. The fragment contains four open reading frames probably encoding proteins of 79.2 kDa (corresponding gene YBR156c), 12.1 kDa (YBR157c), 62.7 kDa (YBR158w) and 38.7 kDa (YBR159w). All four open reading frames encode new proteins, as concluded from data base searches. The respective genes were destroyed by gene replacement in one allele of diploid cells. After sporulation and tetrad analysis, the resulting mutant haploid strains were investigated. No phenotype with respect to spore germination, viability, carbohydrate utilization, and growth was found for YBR157c, encoding the smallest open reading frame investigated. Gene replacement within the YBR156c gene encoding a highly basic and possibly nuclear located protein was lethal. Ybr158 revealed similarities to the Grrl (Cat80) protein with respect to the leucine-rich region. Cells harboring a mutation in the YBR158w gene showed strongly reduced growth as compared to the wild-type cells. The protein predicted from YBR159w shared 33% identical amino acid residues with the human estradiol 17-beta-hydroxysterol dehydrogenase 3. Haploid ybr159c mutants were only able to grow at reduced temperatures, but even under these conditions the mutants grew slower than wild-type strains.

  8. Long-read sequencing improves assembly of Trichinella genomes 10-fold, revealing substantial synteny between lineages diverged over seven million years

    Science.gov (United States)

    Genome evolution influences a parasite’s’s pathogenicity, host-pathogen interactions, environmental constraints, and invasion biology, while genome assemblies form the basis of comparative sequence analyses. Given that closely related organisms typically maintain appreciable synteny, the genome asse...

  9. Comparison of reading speed with 3 different log-scaled reading charts.

    Science.gov (United States)

    Buari, Noor Halilah; Chen, Ai-Hong; Musa, Nuraini

    2014-01-01

    A reading chart that resembles real reading conditions is important to evaluate the quality of life in terms of reading performance. The purpose of this study was to compare the reading speed of UiTM Malay related words (UiTM-Mrw) reading chart with MNread Acuity Chart and Colenbrander Reading Chart. Fifty subjects with normal sight were randomly recruited through randomized sampling in this study (mean age=22.98±1.65 years). Subjects were asked to read three different near charts aloud and as quickly as possible at random sequence. The charts were the UiTM-Mrw Reading Chart, MNread Acuity Chart and Colenbrander Reading Chart, respectively. The time taken to read each chart was recorded and any errors while reading were noted. Reading performance was quantified in terms of reading speed as words per minute (wpm). The mean reading speed for UiTM-Mrw Reading Chart, MNread Acuity Chart and Colenbrander Reading Chart was 200±30wpm, 196±28wpm and 194±31wpm, respectively. Comparison of reading speed between UiTM-Mrw Reading Chart and MNread Acuity Chart showed no significant difference (t=-0.73, p=0.72). The same happened with the reading speed between UiTM-Mrw Reading Chart and Colenbrander Reading Chart (t=-0.97, p=0.55). Bland and Altman plot showed good agreement between reading speed of UiTM-Mrw Reading Chart with MNread Acuity Chart with the Colenbrander Reading Chart. UiTM-Mrw Reading Chart in Malay language is highly comparable with standardized charts and can be used for evaluating reading speed. Copyright © 2013 Spanish General Council of Optometry. Published by Elsevier Espana. All rights reserved.

  10. Report on achievements in fiscal 1999 on the project for research and development of an intellectual base creating and utilizing technology. Development of a base sequencer for ultra-difficult-to-read DNA; 1999 nendo chiteki kiban sosei riyo gijutsu kenkyu kaihatsu seika hokokusho. Chonandoku DNA enki hairetsu sequencer no kaihatsu

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    2000-03-01

    This paper describes the achievements in fiscal 1999 on developing a device to read ultra-difficult-to-read DNA basic sequence. When DNA synthesizes complementary chains by using DNA polymerase , the DNA incorporates deoxyribonucleatide triphosphates (dNTP, available in four kinds), stretches the complementary chains, and discharges pyrophosphoric acid at the same time. Light is emitted when this is converted into adenosine triphosphate (ATP) by using sulfurylase, and reacted with luciferase. Progress of the complementary chain synthesis can be known by monitoring this reaction. Which nucleic acid having been put into has caused the complementary chain synthesis can be known by sending the four kinds of dNTPs independently into the reacting section in the respective sequences, and the basic sequence of the subject DNA can be decided. It was so devised that the excess dNTP in the pre-reaction is decomposed by enzyme not to remain because the reaction is performed being divided step-wise. The problems in the development include: achievement of sequential complementary chain synthesis at high rate and with high reaction yield, decision of reacting conditions suitable for automation and micronization, and development of a module that can supply dNTP being the reaction material into the reacting section sequentially. A prospect was attained on developing the elementary technology. (NEDO)

  11. Promoting preschool reading

    OpenAIRE

    2013-01-01

    The thesis titled Promoting preschool reading consists of a theoretiral and an empirical part. In the theoretical part I wrote about reading, the importance of reading, types of reading, about reading motivation, promoting reading motivation, internal and external motivation, influence of reading motivation on the child's reading activity, reading and familial literacy, the role of adults in promotion reading literacy, reading to a child and promoting reading in pre-school years, where I ...

  12. How Reading Volume Affects both Reading Fluency and Reading Achievement

    Directory of Open Access Journals (Sweden)

    Richard L. ALLINGTON

    2014-10-01

    Full Text Available Long overlooked, reading volume is actually central to the development of reading proficiencies, especially in the development of fluent reading proficiency. Generally no one in schools monitors the actual volume of reading that children engage in. We know that the commonly used commercial core reading programs provide only material that requires about 15 minutes of reading activity daily. The remaining 75 minute of reading lessons is filled with many other activities such as completing workbook pages or responding to low-level literal questions about what has been read. Studies designed to enhance the volume of reading that children do during their reading lessons demonstrate one way to enhance reading development. Repeated readings have been widely used in fostering reading fluency but wide reading options seem to work faster and more broadly in developing reading proficiencies, including oral reading fluency.

  13. Sequence analysis of a 14.2 kb fragment of Saccharomyces cerevisiae chromosome XIV that includes the ypt53, tRNALeu and gsr m2 genes and four new open reading frames.

    Science.gov (United States)

    Garcia-Cantalejo, J M; Boskovic, J; Jimenez, A

    1996-05-01

    As part of the EU yeast genome program, a fragment of 14,262 bp from the left arm of Saccharomyces cerevisiae chromosome XIV has been sequenced. This fragment corresponds to cosmid 14-14b and is located roughly 130 kb from the centromere. It contains four new open reading frames which encode potential proteins of more than 99 amino acids, as well as the ypt53, tRNALeu and gsr moffenes. The putative protein N2212 is similar to the ribosomal protein S7 from humans. N2215 contains several predicted transmembrane elements. N2231 contains regions which are rich in acidic, as well as basic, residues which could from alpha-helical structures. Similar regions are found in a variety of proteins including glutamic acid rich protein, trichohyalin, caldesmon, Tb-29 and several cytoskeleton-interacting proteins.

  14. Teaching Reading.

    Science.gov (United States)

    Ricketts, Mary

    1980-01-01

    Described are five approaches to teaching reading: Language Experience, Modified Alphabet, Linguistic, Programmed, and Basal. It is suggested that a good teacher, well trained, certified in his or her profession, an active participant in professional organizations, can teach reading successfully using almost any approach. (KC)

  15. Reading Evaluation

    Science.gov (United States)

    Fagan, W. T.

    1978-01-01

    The Canadian Institute for Research in Behavioral and Social Sciences of Calgary was awarded a contract by the Provincial Government of Alberta to assess student skills and knowledge in reading and written composition. Here evaluation is defined and the use of standardized and criterion referenced tests for evaluating reading performance are…

  16. Reading Remixed

    Science.gov (United States)

    Valenza, Joyce Kasman; Stephens, Wendy

    2012-01-01

    Critics claim that digital technologies are killing reading, but these teacher-librarians have observed that teens are as excited about reading as they ever were. Online communities give these readers opportunities to get to know authors, communicate with other fans, and learn more about books of interest. Publishers and authors are responding to…

  17. Reading Letters

    DEFF Research Database (Denmark)

    Beier, Sofie

    2012-01-01

    In our everyday life we constantly encounter a diversity of reading matters, including display types on traffic signage, printed text in novels, newspaper headlines, or our own writing on a computer screen. All these conditions place different demands on the typefaces applied. The book discusses ...... these aspects by drawing on typography history, designers’ ideas, and available scientific data concerning the reading process....

  18. Accurate taxonomic assignment of short pyrosequencing reads.

    Science.gov (United States)

    Clemente, José C; Jansson, Jesper; Valiente, Gabriel

    2010-01-01

    Ambiguities in the taxonomy dependent assignment of pyrosequencing reads are usually resolved by mapping each read to the lowest common ancestor in a reference taxonomy of all those sequences that match the read. This conservative approach has the drawback of mapping a read to a possibly large clade that may also contain many sequences not matching the read. A more accurate taxonomic assignment of short reads can be made by mapping each read to the node in the reference taxonomy that provides the best precision and recall. We show that given a suffix array for the sequences in the reference taxonomy, a short read can be mapped to the node of the reference taxonomy with the best combined value of precision and recall in time linear in the size of the taxonomy subtree rooted at the lowest common ancestor of the matching sequences. An accurate taxonomic assignment of short reads can thus be made with about the same efficiency as when mapping each read to the lowest common ancestor of all matching sequences in a reference taxonomy. We demonstrate the effectiveness of our approach on several metagenomic datasets of marine and gut microbiota.

  19. Determination of the sequence of the complete open reading frame and the 5 ' NTR of the Paderborn isolate of classical swine fever virus

    DEFF Research Database (Denmark)

    Oleksiewicz, Martin B.; Rasmussen, Thomas Bruun; Normann, Preben

    2003-01-01

    the outbreak has remained largely uncharacterized. The Dutch epizootic is epidemiologically linked to a small CSF outbreak in 1997, in Paderborn in Germany. E2 and partial 5' NTR sequencing has shown that the index Paderborn isolate, and several Dutch isolates taken during the 1997-1998 epizootic......, are virtually identical, confirming that the Paderborn isolate triggered the Dutch outbreak, and furthermore showing that this single isolate was stable throughout the whole Dutch outbreak (the above reviewed in [C. Terpstra, A. J. de Smit, Veterinary Microbiol. 77 (2000) 3-15]). We determined the nucleotide...

  20. Sequencing of a 9.9 kb segment on the right arm of yeast chromosome VII reveals four open reading frames, including PFK1, the gene coding for succinyl-CoA synthetase (beta-chain) and two ORFs sharing homology with ORFs of the yeast chromosome VIII.

    Science.gov (United States)

    Guerreiro, P; Azevedo, D; Barreiros, T; Rodrigues-Pousada, C

    1997-03-15

    A 9.9 kb DNA fragment from the right arm of chromosome VII of Saccharomyces cerevisiae has been sequenced and analysed. The sequence contains four open reading frames (ORFs) longer than 100 amino acids. One gene, PFK1, has already been cloned and sequenced and the other one is the probable yeast gene coding for the beta-subunit of the succinyl-CoA synthetase. The two remaining ORFs share homology with the deduced amino acid sequence (and their physical arrangement is similar to that) of the YHR161c and YHR162w ORFs from chromosome VIII.

  1. Sequence analysis of a 13.4 kbp fragment from the left arm of chromosome XV reveals a malate dehydrogenase gene, a putative Ser/Thr protein kinase, the ribosomal L25 gene and four new open reading frames.

    Science.gov (United States)

    Casamayor, A; Khalid, H; Balcells, L; Aldea, M; Casas, C; Herrero, E; Ariño, J

    1996-09-01

    A 13421 bp fragment located near the left telomere of chromosome XV (cosmid pEOA461) has been sequenced. Seven non-overlapping open reading frames (ORFs) encoding polypeptides longer than 100 residues have been found (AOB859, AOC184, AOE375, AOX142i, AOE423, AOA476 and AOE433). An additional ORF (AOE131) is found within AOA476. Three of them (AOC184, AOA476 and AOE433) show no remarkable identity with proteins deposited in the data banks. ORF AOB859 is quite similar to a hypothetical yeast protein of similar size located in chromosome VI, particularly within the C-terminal half. AOE375 encodes a new member of the glycogen synthase kinase-3 subfamily of Ser/Thr protein kinases. AOX142i is the gene encoding the previously described ribosomal protein L25. AOE423 codes for a protein virtually identical to the MDH2 malate dehydrogenase isozyme. However, our DNA sequence shows a single one-base insertion upstream of the reported initiating codon. This would produce a larger ORF by extending 46 residues the N-terminus of the protein. The existence of this insertion has been confirmed in three different yeast strains, including FY1679.

  2. Reading on the Internet: Realizing and Constructing Potential Texts

    Science.gov (United States)

    Cho, Byeong-Young; Afflerbach, Peter

    2015-01-01

    Successful Internet reading requires making strategic decisions about what texts to read and a sequence for reading them, all in accordance with readers' goals. In this paper, we describe the process of realizing and constructing potential texts as an important and critical part of successful Internet reading and use verbal report data to…

  3. Learning to read is much more than learning to read: a neuropsychologically based reading program.

    Science.gov (United States)

    Ardila, A; Ostrosky-Solis, F; Mendoza, V U

    2000-11-01

    Departing from the observation that illiterates significantly underscore in some neuropsychological tests, a learning-to-read method named NEUROALFA was developed. NEUROALFA is directed to reinforce these underscored abilities during the learning-to-read process. It was administered to a sample of 21 adult illiterates in Colima (Mexico). Results were compared with 2 control groups using more traditional procedures in learning to read. The NEUROPSI neuropsychological test battery was administered to all the participants before and after completing the learning-to-read training program. All 3 groups presented some improvement in the test scores. Gains, however, were significantly higher in the experimental group in Orientation in Time, Digits Backward, Visual Detection, Verbal Memory, Copy of a Semi-Complex Figure, Language Comprehension, Phonological Verbal Fluency, Similarities, Calculation Abilities, Sequences, and all the recall subtests, excluding Recognition. Performance in standard reading tests was also significantly higher in the experimental group. Correlations between pretest NEUROPSI scores and reading ability were low. However, correlations between posttest NEUROPSI scores and reading scores were higher and significant for several subtests. Results are interpreting as supporting the assumption that reinforcement of those abilities in which illiterates significantly underscore results in a significant improvement in neuropsychological test scores and strongly facilitates the learning-to-read process. The NEUROALFA method of teaching reading to adult illiterates is beginning to be used extensively in Mexico. To our knowledge, this is the first attempt to apply neuropsychological principles to social problems.

  4. A Description of Reading Instruction: The Tail Is Wagging the Dog. Reading Education Report No. 35.

    Science.gov (United States)

    Mason, Jana

    A study was conducted to determine whether there is a typical instructional sequence used to teach reading in third and fourth grade classrooms, to what extent the core text-related sequence (a child-related introduction of the text, followed by reading, and then discussion and interpretation of text content) is used, and if it is not used, what…

  5. How Reading Volume Affects Both Reading Fluency and Reading Achievement

    Science.gov (United States)

    Allington, Richard L.

    2014-01-01

    Long overlooked, reading volume is actually central to the development of reading proficiencies, especially in the development of fluent reading proficiency. Generally no one in schools monitors the actual volume of reading that children engage in. We know that the commonly used commercial core reading programs provide only material that requires…

  6. Accelerating metagenomic read classification on CUDA-enabled GPUs

    National Research Council Canada - National Science Library

    Kobus, Robin; Hundt, Christian; Müller, André; Schmidt, Bertil

    2017-01-01

    ... metagenomic read classification are urgently needed. Results We present cuCLARK, a read-level classifier for CUDA-enabled GPUs, based on the fast and accurate classification of metagenomic sequences using reduced k-mers (CLARK) method...

  7. Optimization of de novo transcriptome assembly from high-throughput short read sequencing data improves functional annotation for non-model organisms

    Directory of Open Access Journals (Sweden)

    Haznedaroglu Berat Z

    2012-07-01

    Full Text Available Abstract Background The k-mer hash length is a key factor affecting the output of de novo transcriptome assembly packages using de Bruijn graph algorithms. Assemblies constructed with varying single k-mer choices might result in the loss of unique contiguous sequences (contigs and relevant biological information. A common solution to this problem is the clustering of single k-mer assemblies. Even though annotation is one of the primary goals of a transcriptome assembly, the success of assembly strategies does not consider the impact of k-mer selection on the annotation output. This study provides an in-depth k-mer selection analysis that is focused on the degree of functional annotation achieved for a non-model organism where no reference genome information is available. Individual k-mers and clustered assemblies (CA were considered using three representative software packages. Pair-wise comparison analyses (between individual k-mers and CAs were produced to reveal missing Kyoto Encyclopedia of Genes and Genomes (KEGG ortholog identifiers (KOIs, and to determine a strategy that maximizes the recovery of biological information in a de novo transcriptome assembly. Results Analyses of single k-mer assemblies resulted in the generation of various quantities of contigs and functional annotations within the selection window of k-mers (k-19 to k-63. For each k-mer in this window, generated assemblies contained certain unique contigs and KOIs that were not present in the other k-mer assemblies. Producing a non-redundant CA of k-mers 19 to 63 resulted in a more complete functional annotation than any single k-mer assembly. However, a fraction of unique annotations remained (~0.19 to 0.27% of total KOIs in the assemblies of individual k-mers (k-19 to k-63 that were not present in the non-redundant CA. A workflow to recover these unique annotations is presented. Conclusions This study demonstrated that different k-mer choices result in various quantities

  8. Reading Columns

    OpenAIRE

    Coutts, Marion

    2008-01-01

    Reading Columns are twin permanent public sculptures commissioned as part of a £245m scheme for the redevelopment of the Chatham Place area in Reading. Dimensions: 3.5m high x 1.3m diameter each Field of knowledge: The work consists of twin bespoke columns of stainless steel and glass over digital colour transparencies. The piece revisits and reworks the idea of the Morris Column, a 19th C feature characteristic of major European metropolitan centres. A wraparound image on each of ...

  9. Does Extensive Reading Promote Reading Speed?

    Science.gov (United States)

    He, Mu

    2014-01-01

    Research has shown a wide range of learning benefits accruing from extensive reading. Not only is there improvement in reading, but also in a wide range of language uses and areas of language knowledge. However, few research studies have examined reading speed. The existing literature on reading speed focused on students' reading speed without…

  10. Oral Reading Fluency in Second Language Reading

    Science.gov (United States)

    Jeon, Eun Hee

    2012-01-01

    This study investigated the role of oral reading fluency in second language reading. Two hundred and fifty-five high school students in South Korea were assessed on three oral reading fluency (ORF) variables and six other reading predictors. The relationship between ORF and other reading predictors was examined through an exploratory factor…

  11. Does Extensive Reading Promote Reading Speed?

    Science.gov (United States)

    He, Mu

    2014-01-01

    Research has shown a wide range of learning benefits accruing from extensive reading. Not only is there improvement in reading, but also in a wide range of language uses and areas of language knowledge. However, few research studies have examined reading speed. The existing literature on reading speed focused on students' reading speed…

  12. WHAT IS READING?

    Institute of Scientific and Technical Information of China (English)

    2003-01-01

    Reading is enjoying,entertaining and ent ightening Reading is listening,speaking and writingReading is talking and discussing,with yourself with the author and with the others Reading is exploring,investigating and guessing. Reading is traveling backward and forward,historically and geographically. Reading is thinking in your own language,and/or in the other language. Reading is encoding and decoding. Reading is civilizing,rationalizing and intellectualizing. Reading is assimilating,associating,accumula...

  13. Reading Hygiene

    Institute of Scientific and Technical Information of China (English)

    耿一铭

    2006-01-01

    Here are some good points for good eye health that everyone can follow: 1.Rest your eyes before they get tired. Just close your eyes from time to time or look offat some distant object. 2.Do not read in either too dark a light or

  14. Beyond Cognition: Reading Motivation and Reading Comprehension

    OpenAIRE

    Wigfield, Allan; Gladstone, Jessica; Turci, Lara

    2016-01-01

    The authors review research on children’s reading motivation and its relation to their reading comprehension. They begin by discussing work on the development of school motivation in general and reading motivation in particular, reviewing work showing that many children’s reading motivation declines over the school years. Girls tend to have more positive motivation for reading than do boys, and there are ethnic differences in children’s reading motivation. Over the last 15 years researchers h...

  15. Using quality scores and longer reads improves accuracy of Solexa read mapping

    Directory of Open Access Journals (Sweden)

    Xuan Zhenyu

    2008-02-01

    Full Text Available Abstract Background Second-generation sequencing has the potential to revolutionize genomics and impact all areas of biomedical science. New technologies will make re-sequencing widely available for such applications as identifying genome variations or interrogating the oligonucleotide content of a large sample (e.g. ChIP-sequencing. The increase in speed, sensitivity and availability of sequencing technology brings demand for advances in computational technology to perform associated analysis tasks. The Solexa/Illumina 1G sequencer can produce tens of millions of reads, ranging in length from ~25–50 nt, in a single experiment. Accurately mapping the reads back to a reference genome is a critical task in almost all applications. Two sources of information that are often ignored when mapping reads from the Solexa technology are the 3' ends of longer reads, which contain a much higher frequency of sequencing errors, and the base-call quality scores. Results To investigate whether these sources of information can be used to improve accuracy when mapping reads, we developed the RMAP tool, which can map reads having a wide range of lengths and allows base-call quality scores to determine which positions in each read are more important when mapping. We applied RMAP to analyze data re-sequenced from two human BAC regions for varying read lengths, and varying criteria for use of quality scores. RMAP is freely available for downloading at http://rulai.cshl.edu/rmap/. Conclusion Our results indicate that significant gains in Solexa read mapping performance can be achieved by considering the information in 3' ends of longer reads, and appropriately using the base-call quality scores. The RMAP tool we have developed will enable researchers to effectively exploit this information in targeted re-sequencing projects.

  16. Automatic Reading

    Institute of Scientific and Technical Information of China (English)

    胡迪

    2007-01-01

    <正>Reading is the key to school success and,like any skill,it takes practice.A child learns to walk by practising until he no longer has to think about how to put one foot in front of the other.The great athlete practises until he can play quickly,accurately and without thinking.Ed- ucators call it automaticity.

  17. Reading Together: A Successful Reading Fluency Intervention

    Science.gov (United States)

    Young, Chase; Mohr, Kathleen A. J.; Rasinski, Timothy

    2015-01-01

    The article describes a reading fluency intervention called Reading Together that combines the method of repeated readings (Samuels, 1979) and the Neurological Impress Method (Heckelman, 1969). Sixteen volunteers from various backgrounds were recruited and trained to deliver the Reading Together intervention to struggling readers in third through…

  18. Overview of Sequence Data Formats.

    Science.gov (United States)

    Zhang, Hongen

    2016-01-01

    Next-generation sequencing experiment can generate billions of short reads for each sample and processing of the raw reads will add more information. Various file formats have been introduced/developed in order to store and manipulate this information. This chapter presents an overview of the file formats including FASTQ, FASTA, SAM/BAM, GFF/GTF, BED, and VCF that are commonly used in analysis of next-generation sequencing data.

  19. Human cytomegalovirus UL144 open reading frame: sequence variability in Guangzhou congenital infected children%广州地区HCMV临床病毒株UL144 ORF的序列变异研究

    Institute of Scientific and Technical Information of China (English)

    王波; 李月琴; 叶宁; 胡兢晶; 何震宇; 田传军; 张纯青; 叶铁真; 周天鸿

    2008-01-01

    目的 研究广州地区先天性感染的人巨细胞病毒(HCMV)临床低传代分离病毒株UL144基因序列的多态性,探讨UL144基因在HCMV致病中的作用.方法 对3株经多重PCR鉴定HCMV DNA为阳性的临床低传代分离株进行HCMV UL144基因全序列PCR扩增,PCR产物克隆到pMD18-T载体上再测序,将其序列与GenBank中公布的其它10株临床分离株UL144基因一起进行分析.结果 本实验克隆并测序了HCMV临床低传代D3、D2和D52病毒株的UL144基因,提交GenBank,已被GenBank收录,序列号分别为DQ180368、DQ180382和DQ180355.HCMV临床低传代D3、D2和D52病毒株的UL144基因均全长531 bp.通过blast分析,从GenBank中找到了10株HCMV病毒株的UL144与D3、D2和D52的UL144基因具有较高的同源性,经过序列的比对,发现UL144基因DNA序列比较保守,只在4处有变异,且变异均为碱基替换,无插入或缺失,编码蛋白由176个氨基酸残基组成,氨基酸序列也比较保守,各分离株中变异率为1.1%;HCMV UL144编码蛋白翻译后修饰位点在所有分离株中均高度保守;所有分离株UL144蛋白的等电点均为8.97.结论 广州地区临床低传代分离株HCMV UL144基因DNA及其编码产物的氨基酸序列是比较保守的,但仍存在一定的多态性.提示UL144基因在先天性感染中可能具有重要作用.%Objective To investigate the polymorphism of human cytomegalovirus (HCMV) UL144 gene of the low passage clinical isolates in Guangzhou and explore the role of UL144 gene in HCMV pathogenicity. Methods The clinical isolates of HCMV were obtained from the urine sample collected from those infants with intra-uterus HCMV infection in Guangzhou. The virus genome DNA was extracted. According to the genome sequence of Toledo, primers for UL144 gene were designed and used to amplify the complete open reading frames (ORF) of the UL144 gene in our 3 different clinical isolates. These ORFs of the UL144 gene were cloned into pMD18-T vector

  20. Quantum reading capacity

    Science.gov (United States)

    Pirandola, Stefano; Lupo, Cosmo; Giovannetti, Vittorio; Mancini, Stefano; Braunstein, Samuel L.

    2011-11-01

    The readout of a classical memory can be modelled as a problem of quantum channel discrimination, where a decoder retrieves information by distinguishing the different quantum channels encoded in each cell of the memory (Pirandola 2011 Phys. Rev. Lett. 106 090504). In the case of optical memories, such as CDs and DVDs, this discrimination involves lossy bosonic channels and can be remarkably boosted by the use of nonclassical light (quantum reading). Here we generalize these concepts by extending the model of memory from single-cell to multi-cell encoding. In general, information is stored in a block of cells by using a channel-codeword, i.e. a sequence of channels chosen according to a classical code. Correspondingly, the readout of data is realized by a process of ‘parallel’ channel discrimination, where the entire block of cells is probed simultaneously and decoded via an optimal collective measurement. In the limit of a large block we define the quantum reading capacity of the memory, quantifying the maximum number of readable bits per cell. This notion of capacity is nontrivial when we suitably constrain the physical resources of the decoder. For optical memories (encoding bosonic channels), such a constraint is energetic and corresponds to fixing the mean total number of photons per cell. In this case, we are able to prove a separation between the quantum reading capacity and the maximum information rate achievable by classical transmitters, i.e. arbitrary classical mixtures of coherent states. In fact, we can easily construct nonclassical transmitters that are able to outperform any classical transmitter, thus showing that the advantages of quantum reading persist in the optimal multi-cell scenario.

  1. Beyond Cognition: Reading Motivation and Reading Comprehension.

    Science.gov (United States)

    Wigfield, Allan; Gladstone, Jessica; Turci, Lara

    2016-09-01

    The authors review research on children's reading motivation and its relation to their reading comprehension. They begin by discussing work on the development of school motivation in general and reading motivation in particular, reviewing work showing that many children's reading motivation declines over the school years. Girls tend to have more positive motivation for reading than do boys, and there are ethnic differences in children's reading motivation. Over the last 15 years researchers have identified in both laboratory and classroom-based research instructional practices that positively impact students' reading motivation and ultimately their reading comprehension. There is a strong need for researchers to build on this work and develop and study in different age groups of children effective classroom-based reading motivation instructional programs for a variety of narrative and informational materials.

  2. Multilocus Sequence Typing of Total-Genome-Sequenced Bacteria

    DEFF Research Database (Denmark)

    Larsen, Mette Voldby; Cosentino, Salvatore; Rasmussen, Simon

    2012-01-01

    Accurate strain identification is essential for anyone working with bacteria. For many species, multilocus sequence typing (MLST) is considered the "gold standard" of typing, but it is traditionally performed in an expensive and time-consuming manner. As the costs of whole-genome sequencing (WGS...... the MLST databases are downloaded monthly, and the best-matching MLST alleles of the specified MLST scheme are found using a BLAST-based ranking method. The sequence type is then determined by the combination of alleles identified. The method was tested on preassembled genomes from 336 isolates covering 56...... MLST schemes, on short sequence reads from 387 isolates covering 10 schemes, and on a small test set of short sequence reads from 29 isolates for which the sequence type had been determined by traditional methods. The method presented here enables investigators to determine the sequence types...

  3. Reading Instruction Today.

    Science.gov (United States)

    Williams, Joanna

    1979-01-01

    Describes current achievement in the areas of reading theory and reading instruction. Reviews reading research in the fields of educational and cognitive psychology. Considers the overall role of formal education in the development of literacy. (GC)

  4. Phylogenetic Trees From Sequences

    Science.gov (United States)

    Ryvkin, Paul; Wang, Li-San

    In this chapter, we review important concepts and approaches for phylogeny reconstruction from sequence data.We first cover some basic definitions and properties of phylogenetics, and briefly explain how scientists model sequence evolution and measure sequence divergence. We then discuss three major approaches for phylogenetic reconstruction: distance-based phylogenetic reconstruction, maximum parsimony, and maximum likelihood. In the third part of the chapter, we review how multiple phylogenies are compared by consensus methods and how to assess confidence using bootstrapping. At the end of the chapter are two sections that list popular software packages and additional reading.

  5. Reading(s) in the Writing Classroom.

    Science.gov (United States)

    Foster, David

    1997-01-01

    Interrogates the reading/writing connection by evaluating how three essays by published writers affected the attitude and writing practices of university students in a course on the personal essay. Describes the course. Suggests what findings imply for current rationales about the reading/writing connection and for the use of anthology readings in…

  6. Teaching Adults to Read with Reading Apprenticeship

    Science.gov (United States)

    Lesmeister, Michele Benjamin

    2010-01-01

    Many adult students have basic reading skills, but they are inexperienced readers who need to learn skills beyond the basics to equip them for success in college and career. How do educators face such adults with optimism and an eagerness to help improve specific reading skills so that these students can read and understand a variety of materials?…

  7. Cosmetology Reading Strategies. 1980 Vocational Reading Series.

    Science.gov (United States)

    Thornton, L. Jay; And Others

    Cosmetology Reading Strategies is one of five instructional guides in the Reading Strategies in Vocational Education Series. Developed to assist teachers working with students considered disadvantaged because of reading deficiency, the guide contains several strategies, suitable for adaptation, specifically related to cosmetology instruction. Each…

  8. Promoting Reading Motivation by Reading Together

    Science.gov (United States)

    Monteiro, Vera

    2013-01-01

    In the present project we tested the hypothesis that tutorial situations with peers would benefit children's reading motivation. Participants were from elementary school--80 fourth-graders and 80 second-graders. We used a questionnaire to assess reading motivation. In the tutorial sessions we developed a Paired Reading Program. The children who…

  9. EMPOWERING THE READING READABILITY

    Directory of Open Access Journals (Sweden)

    Handoko Handoko

    2014-06-01

    Full Text Available A general assumption about reading is that students improve their reading ability by reading a lot. This research was conducted to explain the use of extensive reading and aimed to figure out its implementation in improving students’ reading readability by using the class action research technique. The data of this research relates to the students ‘reading progress shown in their reading reports: spoken and written summary, reading comprehension and vocabulary mastery and their participation. The strategy was evolved in the continuity of reading. Students were encouraged to read extensively in and outside class. The findings indicated that the implementation could improve students’ reading readability.This attainment demonstrated that students’ reading readabilityis frosted through the continuity of reading. Other facts showed that students enjoyed reading. Students’ curiosity was also a significant factor. Their high curiosity explained why students continued reading though they realized that materials they read were difficult enough. Students’ self-confidence was also built as they were required to write a retelling story and to share their previous reading. Instead of their retelling and summarizing, students felt to be appreciated as readers. This appreciation indirectly helped students to improve the reading fondness.

  10. Quantum Reading Capacity

    CERN Document Server

    Pirandola, Stefano; Giovannetti, Vittorio; Mancini, Stefano; Braunstein, Samuel L

    2011-01-01

    The readout of a classical memory can be modelled as a problem of quantum channel discrimination, where a decoder retrieves information by distinguishing the different quantum channels encoded in each cell of the memory [S. Pirandola, Phys. Rev. Lett. 106, 090504 (2011)]. In the case of optical memories, such as CDs and DVDs, this discrimination involves lossy bosonic channels and can be remarkably boosted by the use of nonclassical light (quantum reading). Here we generalize these concepts by extending the model of memory from single-cell to multi-cell encoding. In general, information is stored in a block of cells by using a channel-codeword, i.e., a sequence of channels chosen according to a classical code. Correspondingly, the readout of data is realized by a process of "parallel" channel discrimination, where the entire block of cells is probed simultaneously and decoded via an optimal collective measurement. In the limit of an infinite block we define the quantum reading capacity of the memory, quantify...

  11. Auditory stream biasing in children with reading impairments.

    Science.gov (United States)

    Ouimet, Tialee; Balaban, Evan

    2010-02-01

    Reading impairments have previously been associated with auditory processing differences. We examined auditory stream biasing, a global aspect of auditory temporal processing. Children with reading impairments, control children and adults heard a 10 s long stream-bias-inducing sound sequence (a repeating 1000 Hz tone) and a test sequence (eight repetitions of two pure tones of 1000 and 1420 Hz in an XYX-XYX... pattern) with a variable delay interval (from 0.09 to 8 s) between the two sequences. Reading-impaired children had a significantly lower proportion of streamed responses than control children and adults. Streamed responses in reading-impaired participants differed according to their musical experience, but musically experienced reading-impaired participants were still significantly different from musically experienced controls. Reading impairments are associated with global differences in auditory integration, and musical experience needs to be considered when investigating auditory processing capabilities.

  12. Sequencing the maize genome.

    Science.gov (United States)

    Martienssen, Robert A; Rabinowicz, Pablo D; O'Shaughnessy, Andrew; McCombie, W Richard

    2004-04-01

    Sequencing of complex genomes can be accomplished by enriching shotgun libraries for genes. In maize, gene-enrichment by copy-number normalization (high C(0)t) and methylation filtration (MF) have been used to generate up to two-fold coverage of the gene-space with less than 1 million sequencing reads. Simulations using sequenced bacterial artificial chromosome (BAC) clones predict that 5x coverage of gene-rich regions, accompanied by less than 1x coverage of subclones from BAC contigs, will generate high-quality mapped sequence that meets the needs of geneticists while accommodating unusually high levels of structural polymorphism. By sequencing several inbred strains, we propose a strategy for capturing this polymorphism to investigate hybrid vigor or heterosis.

  13. RIKEN Integrated Sequence Analysis (RISA) System—384-Format Sequencing Pipeline with 384 Multicapillary Sequencer

    Science.gov (United States)

    Shibata, Kazuhiro; Itoh, Masayoshi; Aizawa, Katsunori; Nagaoka, Sumiharu; Sasaki, Nobuya; Carninci, Piero; Konno, Hideaki; Akiyama, Junichi; Nishi, Katsuo; Kitsunai, Tokuji; Tashiro, Hideo; Itoh, Mari; Sumi, Noriko; Ishii, Yoshiyuki; Nakamura, Shin; Hazama, Makoto; Nishine, Tsutomu; Harada, Akira; Yamamoto, Rintaro; Matsumoto, Hiroyuki; Sakaguchi, Sumito; Ikegami, Takashi; Kashiwagi, Katsuya; Fujiwake, Syuji; Inoue, Kouji; Togawa, Yoshiyuki; Izawa, Masaki; Ohara, Eiji; Watahiki, Masanori; Yoneda, Yuko; Ishikawa, Tomokazu; Ozawa, Kaori; Tanaka, Takumi; Matsuura, Shuji; Kawai, Jun; Okazaki, Yasushi; Muramatsu, Masami; Inoue, Yorinao; Kira, Akira; Hayashizaki, Yoshihide

    2000-01-01

    The RIKEN high-throughput 384-format sequencing pipeline (RISA system) including a 384-multicapillary sequencer (the so-called RISA sequencer) was developed for the RIKEN mouse encyclopedia project. The RISA system consists of colony picking, template preparation, sequencing reaction, and the sequencing process. A novel high-throughput 384-format capillary sequencer system (RISA sequencer system) was developed for the sequencing process. This system consists of a 384-multicapillary auto sequencer (RISA sequencer), a 384-multicapillary array assembler (CAS), and a 384-multicapillary casting device. The RISA sequencer can simultaneously analyze 384 independent sequencing products. The optical system is a scanning system chosen after careful comparison with an image detection system for the simultaneous detection of the 384-capillary array. This scanning system can be used with any fluorescent-labeled sequencing reaction (chain termination reaction), including transcriptional sequencing based on RNA polymerase, which was originally developed by us, and cycle sequencing based on thermostable DNA polymerase. For long-read sequencing, 380 out of 384 sequences (99.2%) were successfully analyzed and the average read length, with more than 99% accuracy, was 654.4 bp. A single RISA sequencer can analyze 216 kb with >99% accuracy in 2.7 h (90 kb/h). For short-read sequencing to cluster the 3′ end and 5′ end sequencing by reading 350 bp, 384 samples can be analyzed in 1.5 h. We have also developed a RISA inoculator, RISA filtrator and densitometer, RISA plasmid preparator which can handle throughput of 40,000 samples in 17.5 h, and a high-throughput RISA thermal cycler which has four 384-well sites. The combination of these technologies allowed us to construct the RISA system consisting of 16 RISA sequencers, which can process 50,000 DNA samples per day. One haploid genome shotgun sequence of a higher organism, such as human, mouse, rat, domestic animals, and plants, can

  14. RIKEN integrated sequence analysis (RISA) system--384-format sequencing pipeline with 384 multicapillary sequencer.

    Science.gov (United States)

    Shibata, K; Itoh, M; Aizawa, K; Nagaoka, S; Sasaki, N; Carninci, P; Konno, H; Akiyama, J; Nishi, K; Kitsunai, T; Tashiro, H; Itoh, M; Sumi, N; Ishii, Y; Nakamura, S; Hazama, M; Nishine, T; Harada, A; Yamamoto, R; Matsumoto, H; Sakaguchi, S; Ikegami, T; Kashiwagi, K; Fujiwake, S; Inoue, K; Togawa, Y

    2000-11-01

    The RIKEN high-throughput 384-format sequencing pipeline (RISA system) including a 384-multicapillary sequencer (the so-called RISA sequencer) was developed for the RIKEN mouse encyclopedia project. The RISA system consists of colony picking, template preparation, sequencing reaction, and the sequencing process. A novel high-throughput 384-format capillary sequencer system (RISA sequencer system) was developed for the sequencing process. This system consists of a 384-multicapillary auto sequencer (RISA sequencer), a 384-multicapillary array assembler (CAS), and a 384-multicapillary casting device. The RISA sequencer can simultaneously analyze 384 independent sequencing products. The optical system is a scanning system chosen after careful comparison with an image detection system for the simultaneous detection of the 384-capillary array. This scanning system can be used with any fluorescent-labeled sequencing reaction (chain termination reaction), including transcriptional sequencing based on RNA polymerase, which was originally developed by us, and cycle sequencing based on thermostable DNA polymerase. For long-read sequencing, 380 out of 384 sequences (99.2%) were successfully analyzed and the average read length, with more than 99% accuracy, was 654.4 bp. A single RISA sequencer can analyze 216 kb with >99% accuracy in 2.7 h (90 kb/h). For short-read sequencing to cluster the 3' end and 5' end sequencing by reading 350 bp, 384 samples can be analyzed in 1.5 h. We have also developed a RISA inoculator, RISA filtrator and densitometer, RISA plasmid preparator which can handle throughput of 40,000 samples in 17.5 h, and a high-throughput RISA thermal cycler which has four 384-well sites. The combination of these technologies allowed us to construct the RISA system consisting of 16 RISA sequencers, which can process 50,000 DNA samples per day. One haploid genome shotgun sequence of a higher organism, such as human, mouse, rat, domestic animals, and plants, can be

  15. Genome Sequence Databases (Overview): Sequencing and Assembly

    Energy Technology Data Exchange (ETDEWEB)

    Lapidus, Alla L.

    2009-01-01

    From the date its role in heredity was discovered, DNA has been generating interest among scientists from different fields of knowledge: physicists have studied the three dimensional structure of the DNA molecule, biologists tried to decode the secrets of life hidden within these long molecules, and technologists invent and improve methods of DNA analysis. The analysis of the nucleotide sequence of DNA occupies a special place among the methods developed. Thanks to the variety of sequencing technologies available, the process of decoding the sequence of genomic DNA (or whole genome sequencing) has become robust and inexpensive. Meanwhile the assembly of whole genome sequences remains a challenging task. In addition to the need to assemble millions of DNA fragments of different length (from 35 bp (Solexa) to 800 bp (Sanger)), great interest in analysis of microbial communities (metagenomes) of different complexities raises new problems and pushes some new requirements for sequence assembly tools to the forefront. The genome assembly process can be divided into two steps: draft assembly and assembly improvement (finishing). Despite the fact that automatically performed assembly (or draft assembly) is capable of covering up to 98% of the genome, in most cases, it still contains incorrectly assembled reads. The error rate of the consensus sequence produced at this stage is about 1/2000 bp. A finished genome represents the genome assembly of much higher accuracy (with no gaps or incorrectly assembled areas) and quality ({approx}1 error/10,000 bp), validated through a number of computer and laboratory experiments.

  16. Biosensors for DNA sequence detection

    Science.gov (United States)

    Vercoutere, Wenonah; Akeson, Mark

    2002-01-01

    DNA biosensors are being developed as alternatives to conventional DNA microarrays. These devices couple signal transduction directly to sequence recognition. Some of the most sensitive and functional technologies use fibre optics or electrochemical sensors in combination with DNA hybridization. In a shift from sequence recognition by hybridization, two emerging single-molecule techniques read sequence composition using zero-mode waveguides or electrical impedance in nanoscale pores.

  17. Reading Assessment: Looking Ahead

    Science.gov (United States)

    Afflerbach, Peter

    2016-01-01

    In this article, I focus on three areas of reading assessment that I believe to be crucial for students' reading development: developing comprehensive formative assessments, assessing the wide array of factors that contribute to students' reading development, and fostering student independence by helping students learn to use reading assessment on…

  18. On Efficient Reading

    Institute of Scientific and Technical Information of China (English)

    陈伟平

    2003-01-01

    Time is limited for each reader,but many readers waste a lot oftime on unimportant things, and they read everything at the same speed and in the same way. As a result, they often fail to understand the word and the sentence; they don't know how one sentence relates to another, and how the whole text fixes together. They are not reading efficiently. It is high time that we held a discussion on efficient reading. The author states that efficient reading involves adequate comprehension with appropriate reading rate. Pointing out the factors that influence reading rate and comprehension, this article put forward some suggestions on efficient reading.

  19. QuorUM: An Error Corrector for Illumina Reads

    OpenAIRE

    Guillaume Marçais; Yorke, James A.; Aleksey Zimin

    2015-01-01

    Motivation Illumina Sequencing data can provide high coverage of a genome by relatively short (most often 100 bp to 150 bp) reads at a low cost. Even with low (advertised 1%) error rate, 100 × coverage Illumina data on average has an error in some read at every base in the genome. These errors make handling the data more complicated because they result in a large number of low-count erroneous k-mers in the reads. However, there is enough information in the reads to correct most of the sequenc...

  20. The DNA sequence of a 7941 bp fragment of the left arm of chromosome VII of Saccharomyces cerevisiae contains four open reading frames including the multicopy suppressor gene of the pop2 mutation and a putative serine/threonine protein kinase gene.

    Science.gov (United States)

    Coglievina, M; Bertani, I; Klima, R; Zaccaria, P; Bruschi, C V

    1995-06-30

    We report the sequence of a 7941 bp DNA fragment from the left arm of chromosome VII of Saccharomyces cerevisiae which contains four open reading frames (ORFs) of greater than 100 amino acid residues. ORF biC834 shows 100% bp identity with the recently identified multicopy suppressor gene of the pop2 mutation (MPT5); its deduced protein product carries an eight-repeat domain region, homologous to that found in the hypothetical regulatory YGL023 protein of S. cerevisiae and the Pumilio protein of Drosophila. ORF biE560 protein exhibits patterns typical of serine/threonine protein kinases, with which it shares high degrees of homology.

  1. 501 reading comprehension questions

    CERN Document Server

    2014-01-01

    This updated edition offers the most extensive and varied practice for all types of questions students might face on standardized and in-class tests. With this guide, students will learn to develop expert reading strategies, understand how to read faster and with greater comprehension, overcome reading anxiety, and increase appreciation of reading for pleasure. This book's step-by-step approach provides graduated coverage that moves from the basics to more advanced reading.

  2. A survey of sequence alignment algorithms for next-generation sequencing.

    Science.gov (United States)

    Li, Heng; Homer, Nils

    2010-09-01

    Rapidly evolving sequencing technologies produce data on an unparalleled scale. A central challenge to the analysis of this data is sequence alignment, whereby sequence reads must be compared to a reference. A wide variety of alignment algorithms and software have been subsequently developed over the past two years. In this article, we will systematically review the current development of these algorithms and introduce their practical applications on different types of experimental data. We come to the conclusion that short-read alignment is no longer the bottleneck of data analyses. We also consider future development of alignment algorithms with respect to emerging long sequence reads and the prospect of cloud computing.

  3. What Oral Text Reading Fluency Can Reveal about Reading Comprehension

    Science.gov (United States)

    Veenendaal, Nathalie J.; Groen, Margriet A.; Verhoeven, Ludo

    2015-01-01

    Text reading fluency--the ability to read quickly, accurately and with a natural intonation--has been proposed as a predictor of reading comprehension. In the current study, we examined the role of oral text reading fluency, defined as text reading rate and text reading prosody, as a contributor to reading comprehension outcomes in addition to…

  4. What Oral Text Reading Fluency Can Reveal about Reading Comprehension

    Science.gov (United States)

    Veenendaal, Nathalie J.; Groen, Margriet A.; Verhoeven, Ludo

    2015-01-01

    Text reading fluency--the ability to read quickly, accurately and with a natural intonation--has been proposed as a predictor of reading comprehension. In the current study, we examined the role of oral text reading fluency, defined as text reading rate and text reading prosody, as a contributor to reading comprehension outcomes in addition to…

  5. PacBio Sequencing and Its Applications

    Institute of Scientific and Technical Information of China (English)

    Anthony Rhoads; Kin Fai Au

    2015-01-01

    Single-molecule, real-time sequencing developed by Pacific BioSciences offers longer read lengths than the second-generation sequencing (SGS) technologies, making it well-suited for unsolved problems in genome, transcriptome, and epigenetics research. The highly-contiguous de novo assemblies using PacBio sequencing can close gaps in current reference assemblies and characterize structural variation (SV) in personal genomes. With longer reads, we can sequence through extended repetitive regions and detect mutations, many of which are associated with dis-eases. Moreover, PacBio transcriptome sequencing is advantageous for the identification of gene isoforms and facilitates reliable discoveries of novel genes and novel isoforms of annotated genes, due to its ability to sequence full-length transcripts or fragments with significant lengths. Addition-ally, PacBio’s sequencing technique provides information that is useful for the direct detection of base modifications, such as methylation. In addition to using PacBio sequencing alone, many hybrid sequencing strategies have been developed to make use of more accurate short reads in conjunction with PacBio long reads. In general, hybrid sequencing strategies are more affordable and scalable especially for small-size laboratories than using PacBio Sequencing alone. The advent of PacBio sequencing has made available much information that could not be obtained via SGS alone.

  6. Whole genome complete resequencing of Bacillus subtilis natto by combining long reads with high-quality short reads.

    Science.gov (United States)

    Kamada, Mayumi; Hase, Sumitaka; Sato, Kengo; Toyoda, Atsushi; Fujiyama, Asao; Sakakibara, Yasubumi

    2014-01-01

    De novo microbial genome sequencing reached a turning point with third-generation sequencing (TGS) platforms, and several microbial genomes have been improved by TGS long reads. Bacillus subtilis natto is closely related to the laboratory standard strain B. subtilis Marburg 168, and it has a function in the production of the traditional Japanese fermented food "natto." The B. subtilis natto BEST195 genome was previously sequenced with short reads, but it included some incomplete regions. We resequenced the BEST195 genome using a PacBio RS sequencer, and we successfully obtained a complete genome sequence from one scaffold without any gaps, and we also applied Illumina MiSeq short reads to enhance quality. Compared with the previous BEST195 draft genome and Marburg 168 genome, we found that incomplete regions in the previous genome sequence were attributed to GC-bias and repetitive sequences, and we also identified some novel genes that are found only in the new genome.

  7. Short Read Mapping: An Algorithmic Tour.

    Science.gov (United States)

    Canzar, Stefan; Salzberg, Steven L

    2017-03-01

    Ultra-high-throughput next-generation sequencing (NGS) technology allows us to determine the sequence of nucleotides of many millions of DNA molecules in parallel. Accompanied by a dramatic reduction in cost since its introduction in 2004, NGS technology has provided a new way of addressing a wide range of biological and biomedical questions, from the study of human genetic disease to the analysis of gene expression, protein-DNA interactions, and patterns of DNA methylation. The data generated by NGS instruments comprise huge numbers of very short DNA sequences, or 'reads', that carry little information by themselves. These reads therefore have to be pieced together by well-engineered algorithms to reconstruct biologically meaningful measurments, such as the level of expression of a gene. To solve this complex, high-dimensional puzzle, reads must be mapped back to a reference genome to determine their origin Due to sequencing errors and to genuine differences between the reference genome and the individual being sequenced, this mapping process must be tolerant of mismatches, insertions, and deletions. Although optimal alignment algorithms to solve this problem have long been available, the practical requirements of aligning hundreds of millions of short reads to the 3 billion base pair long human genome have stimulated the development of new, more efficient methods, which today are used routinely throughout the world for the analysis of NGS data.

  8. Cultural Factors in Reading

    Institute of Scientific and Technical Information of China (English)

    孔敏

    2005-01-01

    Reading is a basic ability in learning English and reading comprehension exercise is a common way to assess this ability.Since reading is a communicative activity between author and reader in written form,there are some different rules and regulations of this communication in different countries.Therefore,cultural factors,existing in reading,decide,help,and influence the percentage of the right answers.This article attempts to analyze the effects of cultural differences in reading and the barriers in comprehension,and aims to improve students awareness of cultural differences in reading.

  9. Sequence assembly using next generation sequencing data--challenges and solutions.

    Science.gov (United States)

    Chin, Francis Y L; Leung, Henry C M; Yiu, S M

    2014-11-01

    Sequence assembling is an important step for bioinformatics study. With the help of next generation sequencing (NGS) technology, high throughput DNA fragment (reads) can be randomly sampled from DNA or RNA molecular sequence. However, as the positions of reads being sampled are unknown, assembling process is required for combining overlapped reads to reconstruct the original DNA or RNA sequence. Compared with traditional Sanger sequencing methods, although the throughput of NGS reads increases, the read length is shorter and the error rate is higher. It introduces several problems in assembling. Moreover, paired-end reads instead of single-end reads can be sampled which contain more information. The existing assemblers cannot fully utilize this information and fails to assemble longer contigs. In this article, we will revisit the major problems of assembling NGS reads on genomic, transcriptomic, metagenomic and metatranscriptomic data. We will also describe our IDBA package for solving these problems. IDBA package has adopted several novel ideas in assembling, including using multiple k, local assembling and progressive depth removal. Compared with existence assemblers, IDBA has better performance on many simulated and real sequencing datasets.

  10. Lights, Camera, Read! Arizona Reading Program Manual.

    Science.gov (United States)

    Arizona State Dept. of Library, Archives and Public Records, Phoenix.

    This document is the manual for the Arizona Reading Program (ARP) 2003 entitled "Lights, Camera, Read!" This theme spotlights books that were made into movies, and allows readers to appreciate favorite novels and stories that have progressed to the movie screen. The manual consists of eight sections. The Introduction includes welcome letters from…

  11. To read or not to read

    NARCIS (Netherlands)

    Mol, Suzanne Elizabeth

    2010-01-01

    There is a widely held belief that reading (story)books makes us smarter and helps promote success in life. Does scientific evidence support this notion? The three meta-analyses in this thesis comprise 146 studies between 1988 and 2010 (N=10,308 participants) that addressed the role of book reading

  12. A putative helicase, the SUA5, PMR1, tRNALys1 genes and four open reading frames have been detected in the DNA sequence of an 8.8 kb fragment of the left arm of chromosome VII of Saccharomyces cerevisiae.

    Science.gov (United States)

    Klima, R; Coglievina, M; Zaccaria, P; Bertani, I; Bruschi, C V

    1996-09-01

    We report the sequence of an 8.8 kb segment of DNA from the left arm of chromosome VII of Saccharomyces cerevisiae. The sequence reveals seven open reading frames (ORFs) G1651, G1654, G1660, G1663, G1666, G1667 and G1669 greater than 100 amino acids in length and the tRNALys1 gene. ORF G1651 shows 100% identity with the ROK1 protein which is a putative RNA helicase of the 'DEAD box' protein family. ORF G1654 exhibits a motif highly conserved in ATP/GTP binding proteins generally referred to as 'P-loop'. From FastA analysis, G1660 and G1666 were found to be previously sequenced genes, respectively SUA5 and PMR1. The three other ORFs identified are partially (G1663) or completely (G1667 and G1669) overlapping with the PMR1 sequence on the complementary strand. This feature, together with their low codon adaptation indexes and the absence of significant homology with known proteins suggest that they do not correspond to real genes.

  13. Enhanced virome sequencing using targeted sequence capture.

    Science.gov (United States)

    Wylie, Todd N; Wylie, Kristine M; Herter, Brandi N; Storch, Gregory A

    2015-12-01

    Metagenomic shotgun sequencing (MSS) is an important tool for characterizing viral populations. It is culture independent, requires no a priori knowledge of the viruses in the sample, and may provide useful genomic information. However, MSS can lack sensitivity and may yield insufficient data for detailed analysis. We have created a targeted sequence capture panel, ViroCap, designed to enrich nucleic acid from DNA and RNA viruses from 34 families that infect vertebrate hosts. A computational approach condensed ∼1 billion bp of viral reference sequence into <200 million bp of unique, representative sequence suitable for targeted sequence capture. We compared the effectiveness of detecting viruses in standard MSS versus MSS following targeted sequence capture. First, we analyzed two sets of samples, one derived from samples submitted to a diagnostic virology laboratory and one derived from samples collected in a study of fever in children. We detected 14 and 18 viruses in the two sets, comprising 19 genera from 10 families, with dramatic enhancement of genome representation following capture enrichment. The median fold-increases in percentage viral reads post-capture were 674 and 296. Median breadth of coverage increased from 2.1% to 83.2% post-capture in the first set and from 2.0% to 75.6% in the second set. Next, we analyzed samples containing a set of diverse anellovirus sequences and demonstrated that ViroCap could be used to detect viral sequences with up to 58% variation from the references used to select capture probes. ViroCap substantially enhances MSS for a comprehensive set of viruses and has utility for research and clinical applications.

  14. Read clouds uncover variation in complex regions of the human genome.

    Science.gov (United States)

    Bishara, Alex; Liu, Yuling; Weng, Ziming; Kashef-Haghighi, Dorna; Newburger, Daniel E; West, Robert; Sidow, Arend; Batzoglou, Serafim

    2015-10-01

    Although an increasing amount of human genetic variation is being identified and recorded, determining variants within repeated sequences of the human genome remains a challenge. Most population and genome-wide association studies have therefore been unable to consider variation in these regions. Core to the problem is the lack of a sequencing technology that produces reads with sufficient length and accuracy to enable unique mapping. Here, we present a novel methodology of using read clouds, obtained by accurate short-read sequencing of DNA derived from long fragment libraries, to confidently align short reads within repeat regions and enable accurate variant discovery. Our novel algorithm, Random Field Aligner (RFA), captures the relationships among the short reads governed by the long read process via a Markov Random Field. We utilized a modified version of the Illumina TruSeq synthetic long-read protocol, which yielded shallow-sequenced read clouds. We test RFA through extensive simulations and apply it to discover variants on the NA12878 human sample, for which shallow TruSeq read cloud sequencing data are available, and on an invasive breast carcinoma genome that we sequenced using the same method. We demonstrate that RFA facilitates accurate recovery of variation in 155 Mb of the human genome, including 94% of 67 Mb of segmental duplication sequence and 96% of 11 Mb of transcribed sequence, that are currently hidden from short-read technologies.

  15. Teaching Reading Skills

    Institute of Scientific and Technical Information of China (English)

    刘恒

    2014-01-01

    Reading skills are very important part in language teaching and learning. This paper is written after attending lectures given by an Australian teacher named Rod Ellis focusing on how to teach reading skills using authentic materials.

  16. On English Reading Skills

    Institute of Scientific and Technical Information of China (English)

    钱芬

    2008-01-01

    Reading is one of the four important skills in English learning.It is also a skill that the students need to possess to support independent and self-directed learning.With the development of society,science and technology develop at hish speed and the competition in the society become sharp.Reading is a way for students to be more knowledgeable and successful.So,it becomes more and more important to speed up their reading in order to acquire as much information as possible.Thus,fostering a good English reading habit is essential,and being able to adopt different reading skills for different reading materials and purposes will also help to read more effectively.The paper mainly concenls some basic English reading skills.

  17. Remote Control Reading.

    Science.gov (United States)

    Ervin, Helen

    1995-01-01

    Explains how students who have difficulty remembering what they have read may be taught how to reread sections of text by suggesting to them that reading is analogous to watching a video with the remote control in hand. (TB)

  18. Can Reading Help?

    Science.gov (United States)

    Crowe, Chris

    2003-01-01

    Ponders the effect of September 11th on teenagers. Proposes that reading books can help teenagers sort out complicated issues. Recommends young adult novels that offer hope for overcoming tragedy. Lists 50 short story collections worth reading. (PM)

  19. Tactics for Reading Comprehension

    Institute of Scientific and Technical Information of China (English)

    孔祥航; 张艳荣

    2003-01-01

    In recent years, reading comprehension is taking up a larger and larger part in almost every international test or domestic examination. Knowing the basic knowledge and grasping the test - taking tactics are key factors of good reading comprehension. In this thesis, I will dwell on nine commonly used tactics for reading comprehension. This will help you to deal with the problems with reading comprehension efficiently.

  20. GASSST: global alignment short sequence search tool

    National Research Council Canada - National Science Library

    Rizk, Guillaume; Lavenier, Dominique

    2010-01-01

    .... Our goal with our new aligner GASSST (Global Alignment Short Sequence Search Tool) is thus 2-fold-achieving high performance with no restrictions on the number of indels with a design that is still effective on long reads...

  1. Discussion on college students' English Reading Skills —intensive reading and extensive reading

    Institute of Scientific and Technical Information of China (English)

    敖登

    2016-01-01

    Students are exposed to a growing amount of vocabularies and knowledge as they step into higher grades, while seem to have little improvement in their reading ability and no alternative means to realize their reading goals. Improving the English reading skills is very important for students as reading enjoys a huge ratio in an English exam paper. Talking about reading, we can not ignore these facts: reading materials, reading speed and reading skills. The followings analyzed are some common skills applied to reading-intensive reading and extensive reading as well as the obstacles that may be met while reading.

  2. Reading and Empathy

    Science.gov (United States)

    McCreary, John J.; Marchant, Gregory J.

    2017-01-01

    The relationship between reading and empathy was explored. Controlling for GPA and gender, reading variables were hypothesized as related to empathy; the relationship was expected to differ for males and females. For the complete sample, affective components were related to GPA but not reading. Perspective taking was related to reading…

  3. Voiced Reading and Rhythm

    Institute of Scientific and Technical Information of China (English)

    詹艳萍

    2007-01-01

    Since voiced reading is an important way in learning English,rhythm is the most critical factor that enables to read beautifully.This article illustrates the relationship between rhythm and voiced reading,the importance of rhythm,and the methods to develop the sense of rhythm.

  4. Monster Moose Reading.

    Science.gov (United States)

    Finney, Frank

    Monster Moose (MM) Reading is a program specifically aimed at improving children's language, beginning reading, and self-concept development through the creation and utilization of student-authored reading materials which feature a series of wordless picture books about a magical moose. The MM Program is based on the following general principles…

  5. Rapid Reading, Yes

    Science.gov (United States)

    Frommer, Harvey

    1971-01-01

    Recommends instruction in rapid reading fo high school and college students and asserts that flexibility of speed and reasoning provide the foundation for effective rapid reading. Describes the components of rapid reading as orientation, selection, clarification, arrangement, review, and study. (RW)

  6. Reading/Writing Connection.

    Science.gov (United States)

    Fernandez, Melanie

    In the past, students and teachers alike viewed reading and writing instruction as two separate entities. Reading and writing instruction was often characterized by linear and behaviorist theories and methods, with students rarely coming away from their schooling experience with confidence in and respect for their own writing. To both read and…

  7. Reading and Empathy

    Science.gov (United States)

    McCreary, John J.; Marchant, Gregory J.

    2017-01-01

    The relationship between reading and empathy was explored. Controlling for GPA and gender, reading variables were hypothesized as related to empathy; the relationship was expected to differ for males and females. For the complete sample, affective components were related to GPA but not reading. Perspective taking was related to reading…

  8. Dutch for Reading Knowledge

    NARCIS (Netherlands)

    van Baalen, C.; Blom, F.R.E.; Hollander, I.

    2012-01-01

    This first Dutch for Reading Knowledge book on the market promotes a high level of reading and translation competency by drawing from Dutch grammar, vocabulary and reading strategies, and providing many translation "shortcuts" and tips when tackling complex texts in Dutch. Aimed at students, researc

  9. Prose reading in neglect.

    Science.gov (United States)

    Beschin, Nicoletta; Cisari, Carlo; Cubelli, Roberto; Della Sala, Sergio

    2014-02-01

    Prose reading has been shown to be a very sensitive measure of Unilateral Spatial Neglect. However, little is known about the relationship between prose reading and other measures of neglect and its severity, or between prose reading and single word reading. Thirty participants with a first stroke in the right hemisphere and clear symptoms of spatial neglect in everyday life were assessed with tests of prose reading (text in one column book-like, and in two columns magazine-like), single words reading, and a battery of 13 tests investigating neglect. Seventy percent of these participants omitted words at the beginning of the text (left end), showing Prose Reading Neglect (PRN). The participants showing PRN differed from those not showing PRN only for the overall severity of neglect, and had a lesion centred on the insula, putamen and superior temporal gyrus. Double dissociations emerged between PRN and single word reading neglect, suggesting different cognitive requirements between the two tests: parallel processing in single word reading vs. serial analysis in text reading. Notably, the pattern of neglected text varied dramatically across participants presenting with PRN, including dissociations between reading performance of one and two columns text. Prose reading proved a complex and unique task which should be directly investigated to predict the effects of unilateral neglect. The outcome of this study should also inform clinical assessment and advises given to patients and care-givers.

  10. Family Reading Night

    Science.gov (United States)

    Hutchins, Darcy; Greenfeld, Marsha; Epstein, Joyce

    2007-01-01

    This book offers clear and practical guidelines to help engage families in student success. It shows families how to conduct a successful Family Reading Night at their school. Family Night themes include Scary Stories, Books We Love, Reading Olympics, Dr. Seuss, and other themes. Family reading nights invite parents to come to school with their…

  11. Reading difficulties in Albanian.

    Science.gov (United States)

    Avdyli, Rrezarta; Cuetos, Fernando

    2012-10-01

    Albanian is an Indo-European language with a shallow orthography, in which there is an absolute correspondence between graphemes and phonemes. We aimed to know reading strategies used by Albanian disabled children during word and pseudoword reading. A pool of 114 Kosovar reading disabled children matched with 150 normal readers aged 6 to 11 years old were tested. They had to read 120 stimuli varied in lexicality, frequency, and length. The results in terms of reading accuracy as well as in reading times show that both groups were affected by lexicality and length effects. In both groups, length and lexicality effects were significantly modulated by school year being greater in early grades and later diminish in length and just the opposite in lexicality. However, the reading difficulties group was less accurate and slower than the control group across all school grades. Analyses of the error patterns showed that phonological errors, when the letter replacement leading to new nonwords, are the most common error type in both groups, although as grade rises, visual errors and lexicalizations increased more in the control group than the reading difficulties group. These findings suggest that Albanian normal children use both routes (lexical and sublexical) from the beginning of reading despite of the complete regularity of Albanian, while children with reading difficulties start using sublexical reading and the lexical reading takes more time to acquire, but finally both routes are functional.

  12. Reading: United States.

    Science.gov (United States)

    Weber, Rose-Marie

    1983-01-01

    An exploration of the increasingly important role of linguistics in literacy research and instruction reviews literature on reading comprehension, written language, orthography, metalinguistics, classroom language use, reading disabilities, native tongues, nonstandard dialects, bilingual education, adult literacy, and second-language reading. (86…

  13. The Future of Reading

    Science.gov (United States)

    Peters, Tom

    2009-01-01

    The future of reading is very much in doubt. In this century, reading could soar to new heights or crash and burn. Some educators and librarians fear that sustained reading for learning, for work, and for pleasure may be slowly dying out as a widespread social practice. Several social and technological developments of the 20th century, such as…

  14. Free Reading Is UTOPIA

    Science.gov (United States)

    LeCrone, Nancy

    2010-01-01

    In high school students get tied up in extracurricular activities and have little time for pleasure reading. It is true that with rigorous academic schedules they have little time for pleasure reading. Thus began a conversation with a sophomore English teacher at the author's high school. As they were discussing the plight of free reading he was…

  15. Bullen Reading Attitude Measure.

    Science.gov (United States)

    Bullen, Gertrude F.

    The Bullen Reading Attitude Measure (BRAM) is an instrument that was developed to serve as a diagnostic aid in assessing reading attitudes of elementary school children in grades one through six. The objectives of the test are to measure the subject's attitude toward reading at home or school, visiting the library, owning and buying books,…

  16. What oral text reading fluency can reveal about reading comprehension

    NARCIS (Netherlands)

    Veenendaal, N.J.; Groen, M.A.; Verhoeven, L.T.W.

    2015-01-01

    Text reading fluency – the ability to read quickly, accurately and with a natural intonation – has been proposed as a predictor of reading comprehension. In the current study, we examined the role of oral text reading fluency, defined as text reading rate and text reading prosody, as a contributor t

  17. What oral text reading fluency can reveal about reading comprehension

    NARCIS (Netherlands)

    Veenendaal, N.J.; Groen, M.A.; Verhoeven, L.T.W.

    2015-01-01

    Text reading fluency – the ability to read quickly, accurately and with a natural intonation – has been proposed as a predictor of reading comprehension. In the current study, we examined the role of oral text reading fluency, defined as text reading rate and text reading prosody, as a contributor t

  18. Pig genome sequence - analysis and publication strategy

    DEFF Research Database (Denmark)

    Archibald, Alan L.; Bolund, Lars; Churcher, Carol;

    2010-01-01

    BACKGROUND: The pig genome is being sequenced and characterised under the auspices of the Swine Genome Sequencing Consortium. The sequencing strategy followed a hybrid approach combining hierarchical shotgun sequencing of BAC clones and whole genome shotgun sequencing. RESULTS: Assemblies......) is under construction and will incorporate whole genome shotgun sequence (WGS) data providing > 30x genome coverage. The WGS sequence, most of which comprise short Illumina/Solexa reads, were generated from DNA from the same single Duroc sow as the source of the BAC library from which clones were...

  19. To Read or Not to Read

    Science.gov (United States)

    1975-07-01

    until the students are exposed to reading material above the mid- fifth grade level. The emphasis in these courses is upon improving comprehension...completing the material in the course books, students work at improving their reading speed through the use of controlled reading, pacers, and timed tests. As...FjN7 1473 EDITION OF INOV 𔄀 IS OBSOLETE Unclassified S/N 0102 tF-01t.6e0d) SECURITY CLASS•IFICATION OF THIS PAGE (147-m Do!& Entered) SECURITY

  20. Accelerating read mapping with FastHASH.

    Science.gov (United States)

    Xin, Hongyi; Lee, Donghyuk; Hormozdiari, Farhad; Yedkar, Samihan; Mutlu, Onur; Alkan, Can

    2013-01-01

    With the introduction of next-generation sequencing (NGS) technologies, we are facing an exponential increase in the amount of genomic sequence data. The success of all medical and genetic applications of next-generation sequencing critically depends on the existence of computational techniques that can process and analyze the enormous amount of sequence data quickly and accurately. Unfortunately, the current read mapping algorithms have difficulties in coping with the massive amounts of data generated by NGS.We propose a new algorithm, FastHASH, which drastically improves the performance of the seed-and-extend type hash table based read mapping algorithms, while maintaining the high sensitivity and comprehensiveness of such methods. FastHASH is a generic algorithm compatible with all seed-and-extend class read mapping algorithms. It introduces two main techniques, namely Adjacency Filtering, and Cheap K-mer Selection.We implemented FastHASH and merged it into the codebase of the popular read mapping program, mrFAST. Depending on the edit distance cutoffs, we observed up to 19-fold speedup while still maintaining 100% sensitivity and high comprehensiveness.

  1. DNA Sequencing Using capillary Electrophoresis

    Energy Technology Data Exchange (ETDEWEB)

    Dr. Barry Karger

    2011-05-09

    application papers of sequencing up to this level were also published in the mid 1990's. A major interest of the sequencing community has always been read length. The longer the sequence read per run the more efficient the process as well as the ability to read repeat sequences. We therefore devoted a great deal of time to studying the factors influencing read length in capillary electrophoresis, including polymer type and molecule weight, capillary column temperature, applied electric field, etc. In our initial optimization, we were able to demonstrate, for the first time, the sequencing of over 1000 bases with 90% accuracy. The run required 80 minutes for separation. Sequencing of 1000 bases per column was next demonstrated on a multiple capillary instrument. Our studies revealed that linear polyacrylamide produced the longest read lengths because the hydrophilic single strand DNA had minimal interaction with the very hydrophilic linear polyacrylamide. Any interaction of the DNA with the polymer would lead to broader peaks and lower read length. Another important parameter was the molecular weight of the linear chains. High molecular weight (> 1 MDA) was important to allow the long single strand DNA to reptate through the entangled polymer matrix. In an important paper, we showed an inverse emulsion method to prepare reproducibility linear polyacrylamide polymer with an average MWT of 9MDa. This approach was used in the polymer for sequencing the human genome. Another critical factor in the successful use of capillary electrophoresis for sequencing was the sample preparation method. In the Sanger sequencing reaction, high concentration of salts and dideoxynucleotide remained. Since the sample was introduced to the capillary column by electrokinetic injection, these salt ions would be favorably injected into the column over the sequencing fragments, thus reducing the signal for longer fragments and hence reading read length. In two papers, we examined the role of

  2. Amblyopic reading is crowded.

    Science.gov (United States)

    Levi, Dennis M; Song, Shuang; Pelli, Denis G

    2007-10-26

    We measure acuity, crowding, and reading in amblyopic observers to answer four questions. (1) Is reading with the amblyopic eye impaired because of larger required letter size (i.e., worse acuity) or larger required spacing (i.e., worse crowding)? The size or spacing required to read at top speed is called "critical". For each eye of seven amblyopic observers and the preferred eyes of two normal observers, we measure reading rate as a function of the center-to-center spacing of the letters in central and peripheral vision. From these results, we estimate the critical spacing for reading. We also measured traditional acuity for an isolated letter and the critical spacing for identifying a letter among other letters, which is the classic measure of crowding. For both normals and amblyopes, in both central and peripheral vision, we find that the critical spacing for reading equals the critical spacing for crowding. The identical critical spacings, and very different critical sizes, show that crowding, not acuity, limits reading. (2) Does amblyopia affect peripheral reading? No. We find that amblyopes read normally with their amblyopic eye except that abnormal crowding in the fovea prevents them from reading fine print. (3) Is the normal periphery a good model for the amblyopic fovea? No. Reading centrally, the amblyopic eye has an abnormally large critical spacing but reads all larger spacings at normal rates. This is unlike the normal periphery, in which both critical spacing and maximum reading rate are severely impaired relative to the normal fovea. (4) Can the uncrowded-span theory of reading rate explain amblyopic reading? Yes. The case of amblyopia shows that crowding limits reading solely by determining the uncrowded span: the number of characters that are not crowded. Characters are uncrowded if and only if their spacing is more than critical. The text spacing may be uniform, but the observer's critical spacing increases with distance from fixation, so the

  3. Methodological Variables in Choral Reading

    Science.gov (United States)

    Poore, Meredith A.; Ferguson, Sarah Hargus

    2008-01-01

    This preliminary study explored changes in prosodic variability during choral reading and investigated whether these changes are affected by the method of eliciting choral reading. Ten typical adult talkers recorded three reading materials (poetry, fiction and textbook) in three reading conditions: solo (reading aloud alone), track (reading aloud…

  4. Impact of the Reading Buddies Program on Reading Level and Attitude Towards Reading

    National Research Council Canada - National Science Library

    Hayley Dolman; Serena Boyte-Hawryluk

    2013-01-01

    ... children’s reading levels and attitudes towards reading.Methods – During the first and last sessions of the Reading Buddies program, the participants completed the Elementary Reading Attitudes Survey (ERAS...

  5. Automatic sequences

    CERN Document Server

    Haeseler, Friedrich

    2003-01-01

    Automatic sequences are sequences which are produced by a finite automaton. Although they are not random they may look as being random. They are complicated, in the sense of not being not ultimately periodic, they may look rather complicated, in the sense that it may not be easy to name the rule by which the sequence is generated, however there exists a rule which generates the sequence. The concept automatic sequences has special applications in algebra, number theory, finite automata and formal languages, combinatorics on words. The text deals with different aspects of automatic sequences, in particular:· a general introduction to automatic sequences· the basic (combinatorial) properties of automatic sequences· the algebraic approach to automatic sequences· geometric objects related to automatic sequences.

  6. Algorithms for mapping high-throughput DNA sequences

    DEFF Research Database (Denmark)

    Frellsen, Jes; Menzel, Peter; Krogh, Anders

    2014-01-01

    of data generation, new bioinformatics approaches have been developed to cope with the large amount of sequencing reads obtained in these experiments. In this chapter, we first introduce HTS technologies and their usage in molecular biology and discuss the problem of mapping sequencing reads...

  7. Exploring the Relationship between Adolescent's Reading Skills, Reading Motivation and Reading Habits

    Science.gov (United States)

    McGeown, Sarah P.; Duncan, Lynne G.; Griffiths, Yvonne M.; Stothard, Sue E.

    2015-01-01

    The present study examines the extent to which adolescents' reading affect (reading motivation) and behaviour (reading habits) predict different components of reading (word reading, comprehension, summarisation and text reading speed) and also adds to the limited research examining group differences (gender, age, ability) in adolescents' reading…

  8. Exploring the Relationship between Adolescent's Reading Skills, Reading Motivation and Reading Habits

    Science.gov (United States)

    McGeown, Sarah P.; Duncan, Lynne G.; Griffiths, Yvonne M.; Stothard, Sue E.

    2015-01-01

    The present study examines the extent to which adolescents' reading affect (reading motivation) and behaviour (reading habits) predict different components of reading (word reading, comprehension, summarisation and text reading speed) and also adds to the limited research examining group differences (gender, age, ability) in adolescents' reading…

  9. Lip reading using neural networks

    Science.gov (United States)

    Kalbande, Dhananjay; Mishra, Akassh A.; Patil, Sanjivani; Nirgudkar, Sneha; Patel, Prashant

    2011-10-01

    Computerized lip reading, or speech reading, is concerned with the difficult task of converting a video signal of a speaking person to written text. It has several applications like teaching deaf and dumb to speak and communicate effectively with the other people, its crime fighting potential and invariance to acoustic environment. We convert the video of the subject speaking vowels into images and then images are further selected manually for processing. However, several factors like fast speech, bad pronunciation, and poor illumination, movement of face, moustaches and beards make lip reading difficult. Contour tracking methods and Template matching are used for the extraction of lips from the face. K Nearest Neighbor algorithm is then used to classify the 'speaking' images and the 'silent' images. The sequence of images is then transformed into segments of utterances. Feature vector is calculated on each frame for all the segments and is stored in the database with properly labeled class. Character recognition is performed using modified KNN algorithm which assigns more weight to nearer neighbors. This paper reports the recognition of vowels using KNN algorithms

  10. Accelerating metagenomic read classification on CUDA-enabled GPUs.

    Science.gov (United States)

    Kobus, Robin; Hundt, Christian; Müller, André; Schmidt, Bertil

    2017-01-03

    Metagenomic sequencing studies are becoming increasingly popular with prominent examples including the sequencing of human microbiomes and diverse environments. A fundamental computational problem in this context is read classification; i.e. the assignment of each read to a taxonomic label. Due to the large number of reads produced by modern high-throughput sequencing technologies and the rapidly increasing number of available reference genomes software tools for fast and accurate metagenomic read classification are urgently needed. We present cuCLARK, a read-level classifier for CUDA-enabled GPUs, based on the fast and accurate classification of metagenomic sequences using reduced k-mers (CLARK) method. Using the processing power of a single Titan X GPU, cuCLARK can reach classification speeds of up to 50 million reads per minute. Corresponding speedups for species- (genus-)level classification range between 3.2 and 6.6 (3.7 and 6.4) compared to multi-threaded CLARK executed on a 16-core Xeon CPU workstation. cuCLARK can perform metagenomic read classification at superior speeds on CUDA-enabled GPUs. It is free software licensed under GPL and can be downloaded at https://github.com/funatiq/cuclark free of charge.

  11. Improve your reading

    CERN Document Server

    Fry, Ron

    2012-01-01

    Help your students discover the practical solution to their reading frustrations, with Improve Your Reading. Written by bestselling author and education advocate Ron Fry, this book avoids gimmicks and tricks in favor of proven strategies that will help your students better retain and comprehend what they've read in any textbook, in any course, at any academic level. Endlessly adaptable to each student's individual learning needs, the text focuses on fundamental skills students can carry beyond the classroom.

  12. Emerging applications of read profiles towards the functional annotation of the genome

    DEFF Research Database (Denmark)

    Pundhir, Sachin; Poirazi, Panayiota; Gorodkin, Jan

    2015-01-01

    is typically a result of the protocol designed to address specific research questions. The sequencing results in reads, which when mapped to a reference genome often leads to the formation of distinct patterns (read profiles). Interpretation of these read profiles is essential for their analysis in relation...... to the research question addressed. Several strategies have been employed at varying levels of abstraction ranging from a somewhat ad hoc to a more systematic analysis of read profiles. These include methods which can compare read profiles, e.g., from direct (non-sequence based) alignments to classification...

  13. Reading disorders and dyslexia.

    Science.gov (United States)

    Hulme, Charles; Snowling, Margaret J

    2016-12-01

    We review current knowledge about the nature of reading development and disorders, distinguishing between the processes involved in learning to decode print, and the processes involved in reading comprehension. Children with decoding difficulties/dyslexia experience deficits in phoneme awareness, letter-sound knowledge and rapid automatized naming in the preschool years and beyond. These phonological/language difficulties appear to be proximal causes of the problems in learning to decode print in dyslexia. We review data from a prospective study of children at high risk of dyslexia to show that being at family risk of dyslexia is a primary risk factor for poor reading and children with persistent language difficulties at school entry are more likely to develop reading problems. Early oral language difficulties are strong predictors of later difficulties in reading comprehension. There are two distinct forms of reading disorder in children: dyslexia (a difficulty in learning to translate print into speech) and reading comprehension impairment. Both forms of reading problem appear to be predominantly caused by deficits in underlying oral language skills. Implications for screening and for the delivery of robust interventions for language and reading are discussed.

  14. On Reading Test

    Institute of Scientific and Technical Information of China (English)

    孙健

    2005-01-01

    There has been a long discussion over the construct validity of reading tests. In china's reading tests, multiple choice is the main test method in view of the4 long controversy over the validity of multiple choice, construct validation is called for to empirically test the hypothesized relationships between test scores and abilities. The national CET committee conducted a comprehensive validation study. As part of the project, the specialists studied the reading comprehension test's validity by qualitative means, namely "introspective verbal reports". The analysis revealed that an overwhelming majority of the questions items were handled through "expected reading operations".

  15. Reading disorders and dyslexia

    Science.gov (United States)

    Hulme, Charles; Snowling, Margaret J.

    2016-01-01

    Purpose of review We review current knowledge about the nature of reading development and disorders, distinguishing between the processes involved in learning to decode print, and the processes involved in reading comprehension. Recent findings Children with decoding difficulties/dyslexia experience deficits in phoneme awareness, letter-sound knowledge and rapid automatized naming in the preschool years and beyond. These phonological/language difficulties appear to be proximal causes of the problems in learning to decode print in dyslexia. We review data from a prospective study of children at high risk of dyslexia to show that being at family risk of dyslexia is a primary risk factor for poor reading and children with persistent language difficulties at school entry are more likely to develop reading problems. Early oral language difficulties are strong predictors of later difficulties in reading comprehension. Summary There are two distinct forms of reading disorder in children: dyslexia (a difficulty in learning to translate print into speech) and reading comprehension impairment. Both forms of reading problem appear to be predominantly caused by deficits in underlying oral language skills. Implications for screening and for the delivery of robust interventions for language and reading are discussed. PMID:27496059

  16. The Reading Brain

    OpenAIRE

    Kassuba, Tanja; Kastner, Sabine

    2015-01-01

    Do you enjoy reading books? Reading is one of the unique activities that only humans do, and we have not been doing it for that long! Humans have talked to each other using a language system with grammatical rules for at least 100,000 years, but we have been reading and writing only for a few thousand years! What happens in our brain when we read? Our brain has developed a region that is specialized in knowing what written words look like. It closely works together with other parts of the bra...

  17. How to Develop Reading Strategies in Extensive Reading

    Institute of Scientific and Technical Information of China (English)

    陈腊梅

    2016-01-01

    Extensive reading demands readers to read in quantity for general, overall meaning, and for information. Learners are supposed to know how to choose appropriate reading tactics in accordance with different reading purposes. This paper mainly proposes some effective strategies to develop learners' reading skills and competency.

  18. Reading Strategy: Tackling Reading through Topic and Main Ideas

    Science.gov (United States)

    Naidu, Bharathi; Briewin, Marshal; Embi, Mohamed Amin

    2013-01-01

    Reading comprehension is one of the four skills essential in learning English. In a reading class, students tend to read all the information provided in reading materials, but how much do they actually retain? This study explores whether learners use identification of the topic and main ideas as a reading strategy to assist in reading…

  19. Developmental Relations between Reading Comprehension and Reading Strategies

    Science.gov (United States)

    Muijselaar, Marloes M. L.; Swart, Nicole M.; Steenbeek-Planting, Esther G.; Droop, Mienke; Verhoeven, Ludo; de Jong, Peter F.

    2017-01-01

    We examined the developmental relations between knowledge of reading strategies and reading comprehension in a longitudinal study of 312 Dutch children from the beginning of fourth grade to the end of fifth grade. Measures for reading comprehension, reading strategies, reading fluency, vocabulary, and working memory were administered. A structural…

  20. Assisted Reading with Digital Audiobooks for Students with Reading Disabilities

    Science.gov (United States)

    Esteves, Kelli J.; Whitten, Elizabeth

    2011-01-01

    The goal of this study was to compare the efficacy of assisted reading with digital audiobooks with the traditional practice of sustained silent reading (SSR) in terms of reading fluency and reading attitude with upper elementary students with reading disabilities. Treatment group participants selected authentic children's literature and engaged…

  1. Assisted Reading with Digital Audiobooks for Students with Reading Disabilities

    Science.gov (United States)

    Esteves, Kelli J.; Whitten, Elizabeth

    2011-01-01

    The goal of this study was to compare the efficacy of assisted reading with digital audiobooks with the traditional practice of sustained silent reading (SSR) in terms of reading fluency and reading attitude with upper elementary students with reading disabilities. Treatment group participants selected authentic children's literature and engaged…

  2. Developmental relations between reading comprehension and reading strategies

    NARCIS (Netherlands)

    Muijselaar, M.M.L.; Swart, N.M.; Steenbeek-Planting, E.G.; Droop, W.; Verhoeven, L.T.W.; Jong, P.F. de

    2017-01-01

    We examined the developmental relations between knowledge of reading strategies and reading comprehension in a longitudinal study of 312 Dutch children from the beginning of fourth grade to the end of fifth grade. Measures for reading comprehension, reading strategies, reading fluency, vocabulary,

  3. Separating metagenomic short reads into genomes via clustering

    Directory of Open Access Journals (Sweden)

    Tanaseichuk Olga

    2012-09-01

    Full Text Available Abstract Background The metagenomics approach allows the simultaneous sequencing of all genomes in an environmental sample. This results in high complexity datasets, where in addition to repeats and sequencing errors, the number of genomes and their abundance ratios are unknown. Recently developed next-generation sequencing (NGS technologies significantly improve the sequencing efficiency and cost. On the other hand, they result in shorter reads, which makes the separation of reads from different species harder. Among the existing computational tools for metagenomic analysis, there are similarity-based methods that use reference databases to align reads and composition-based methods that use composition patterns (i.e., frequencies of short words or l-mers to cluster reads. Similarity-based methods are unable to classify reads from unknown species without close references (which constitute the majority of reads. Since composition patterns are preserved only in significantly large fragments, composition-based tools cannot be used for very short reads, which becomes a significant limitation with the development of NGS. A recently proposed algorithm, AbundanceBin, introduced another method that bins reads based on predicted abundances of the genomes sequenced. However, it does not separate reads from genomes of similar abundance levels. Results In this work, we present a two-phase heuristic algorithm for separating short paired-end reads from different genomes in a metagenomic dataset. We use the observation that most of the l-mers belong to unique genomes when l is sufficiently large. The first phase of the algorithm results in clusters of l-mers each of which belongs to one genome. During the second phase, clusters are merged based on l-mer repeat information. These final clusters are used to assign reads. The algorithm could handle very short reads and sequencing errors. It is initially designed for genomes with similar abundance levels and then

  4. DistMap: a toolkit for distributed short read mapping on a Hadoop cluster.

    Science.gov (United States)

    Pandey, Ram Vinay; Schlötterer, Christian

    2013-01-01

    With the rapid and steady increase of next generation sequencing data output, the mapping of short reads has become a major data analysis bottleneck. On a single computer, it can take several days to map the vast quantity of reads produced from a single Illumina HiSeq lane. In an attempt to ameliorate this bottleneck we present a new tool, DistMap - a modular, scalable and integrated workflow to map reads in the Hadoop distributed computing framework. DistMap is easy to use, currently supports nine different short read mapping tools and can be run on all Unix-based operating systems. It accepts reads in FASTQ format as input and provides mapped reads in a SAM/BAM format. DistMap supports both paired-end and single-end reads thereby allowing the mapping of read data produced by different sequencing platforms. DistMap is available from http://code.google.com/p/distmap/

  5. Sequence assembly

    DEFF Research Database (Denmark)

    Scheibye-Alsing, Karsten; Hoffmann, S.; Frankel, Annett Maria

    2009-01-01

    Despite the rapidly increasing number of sequenced and re-sequenced genomes, many issues regarding the computational assembly of large-scale sequencing data have remain unresolved. Computational assembly is crucial in large genome projects as well for the evolving high-throughput technologies...

  6. Recovering Innocence: Growing Up Reading.

    Science.gov (United States)

    Spencer, Patricia Sylvester

    1991-01-01

    Offers a plan for changing the perception among teenagers that reading for pleasure is an activity of their childhood past. Suggests an elective Reading Workshop that allows students to share favorite books and authors, to read aloud and listen to others read aloud, and to discuss their reading processes and decisions. (PRA)

  7. Reading Nature from a "Bottom-Up" Perspective

    Science.gov (United States)

    Magntorn, Ola; Hellden, Gustav

    2007-01-01

    This paper reports on a study of ecology teaching and learning in a Swedish primary school class (age 10-11 yrs). A teaching sequence was designed to help students read nature in a river ecosystem. The teaching sequence had a "bottom up" approach, taking as its starting point a common key organism--the freshwater shrimp. From this species and its…

  8. Anything but Reading

    Science.gov (United States)

    Krashen, Stephen

    2009-01-01

    Both the popular media and professional literature are filled with suggestions on how to improve reading, but the one approach that always works is rarely mentioned: provide readers with a supply of interesting and comprehensible books. Instead, people are given advice that is dead wrong as a means of improving reading (e.g., roller skating and…

  9. LINGUISTICS AND JAPANESE READING.

    Science.gov (United States)

    CROWLEY, DALE P.

    THE PRINCIPLES OF STRUCTURAL LINGUISTICS, THE DEVELOPMENT OF JAPANESE ORTHOGRAPHY, AND THE PSYCHOLOGY OF LEARNING ARE USED AS A BASIS FOR DEVELOPMENT OF A LINGUISTICALLY ORIENTED COURSE IN JAPANESE READING. THE FIRST PART OF THE TEXT IS DEVOTED TO THE RELATION BETWEEN READING AND LINGUISTICS. THE SECOND PART GIVES BACKGROUND MATERIAL ON JAPANESE…

  10. Little Herder Reading Series.

    Science.gov (United States)

    Bureau of Indian Affairs (Dept. of Interior), Washington, DC.

    The Little Herder Reading Series is comprised of 4 volumes based on the life of a Navajo Indian girl. The books are written in English blank verse and describe many facets of Indian life. The volumes contain illustrations by Hoke Denetsosie which give a pictorial representation of the printed verse. The reading level is for the middle and upper…

  11. Books for Summer Reading.

    Science.gov (United States)

    Phi Delta Kappan, 1996

    1996-01-01

    Suggests several novels for educators' summer reading enjoyment, including classics by Robert Pirsig, Robertson Davies, John Steinbeck, Albert Camus, and Charles Dickens. Educators might also read Alex Kotlowitz's "There Are No Children Here" (Doubleday, 1991) and Sharon Quint's "Schooling Homeless Children" (Teachers College Press, 1994) to gain…

  12. Teaching Reading in Homeschool.

    Science.gov (United States)

    Yambo, Idalia

    This paper discusses the home-schooling trend and identifies reading instructional methods used by home-schooling parents. Interviews were conducted with 5 home-schooling families of children ranging in age from 1 to 14 years. Parents reported that they began reading instruction with their child at about age 5 and agreed that instruction in…

  13. Lippincott Basic Reading Program.

    Science.gov (United States)

    Monterey Peninsula Unified School District, Monterey, CA.

    This program, included in "Effective Reading Programs...," serves 459 students in grades 1-3 at 15 elementary schools. The program employs a diagnostic-prescriptive approach to instruction in a nongraded setting through the use of the Lippincott Basic Reading program. When a child enters the program, he is introduced to a decoding…

  14. Uninterrupted Sustained Silent Reading.

    Science.gov (United States)

    Meyers, Rick

    A study investigated the effect Sustained Silent Reading (SSR) has had on literacy at Estancia High School in California which recently implemented an SSR program. It also examined the role SSR has on language development, comprehension, vocabulary, student attitudes, and its corollary consequence on the development of reading habits. A survey of…

  15. Reading Where It Counts

    Science.gov (United States)

    Miller, Harry

    2014-01-01

    In this article, teachers are reminded that their content subject areas require acquainting children with special words or symbols related to that subject area (e.g. mathematics or social studies). Because children can read well does not mean they would be understanding of any special reading skill required in a content subject area; that the…

  16. Reading and Perestroika.

    Science.gov (United States)

    Plotnikov, Sergei N.

    1992-01-01

    Presents a short historical and sociological analysis of reading in the Soviet Union from the beginning of the twentieth century to perestroika. Discusses some sociocultural problems associated with reading, including the prevailing social, economic and political crises in all spheres of life, particularly the cultural. (RS)

  17. Reading Patterns Changing

    Institute of Scientific and Technical Information of China (English)

    2011-01-01

    Modern life is changing the way people read April 23 was the 16th World Book and Copyright Day,also known as the World Book Day.Reading-related problems have once again attracted people’s attention.Today,living a life with an increasingly rapid pace,most people are

  18. Toddler Reading Time

    Science.gov (United States)

    ... Your 1- to 2-Year-Old Toddler Reading Time KidsHealth > For Parents > Toddler Reading Time A A A What's in this article? Reasons ... Kids make big leaps in vocabulary during this time, and learn about letters, shapes, colors, weather, animals, ...

  19. Improving Attitudes Toward Reading.

    Science.gov (United States)

    Dean, Stephanie J.; Trent, Jane A.

    This report describes a program for improving students' attitudes toward reading. The targeted population consisted of second and third grade students in a growing middle class community. The problem of the lack of interest in reading and the poor quality of classroom work were evident in parent and student surveys, and teacher observations.…

  20. Online Reading Test Evaluation

    Institute of Scientific and Technical Information of China (English)

    雷鸣

    2011-01-01

    Language test has been used as a scientific assessment tool in providing valuable information for teaching and learning. In fact,lots of online reading tests are not designed with validity. This paper analyzes those online reading tests from the aspects o

  1. Assistive Technologies for Reading

    Science.gov (United States)

    Ruffin, Tiece M.

    2012-01-01

    Twenty-first century teachers working with diverse readers are often faced with the question of how to integrate technology in reading instruction that meets the needs of the techno-generation. Are today's teachers equipped with the knowledge of how to effectively use Assistive Technologies (AT) for reading? This position paper discusses AT for…

  2. Reading: Seven to Eleven.

    Science.gov (United States)

    Merritt, John E.

    This paper focuses on ways of improving reading by developing the intermediate skills and the higher order comprehension skills in reading. The paper consists of four sections: "Intermediate Skills and Context Cues" discusses the use of the cloze procedure for improving comprehension skills and for analyzing words in terms of class membership,…

  3. MORE ABOUT READING.

    Science.gov (United States)

    RASMUSSEN, MARGARET

    FOUR ARTICLES ON INDIVIDUALIZED READING AND SELF-SELECTION REPRINTED FROM "CHILDHOOD EDUCATION" AND "READING," THE JOURNAL AND BULLETIN OF THE ASSOCIATION FOR CHILDHOOD EDUCATION INTERNATIONAL (ACEI), ARE PRESENTED. THE FIRST ARTICLE IS A DISCUSSION OF SELF-SELECTION, OF THE TEACHER'S ROLE IN PROVIDING OPPORTUNITIES FOR…

  4. Reading Rate and Comprehension

    Science.gov (United States)

    Jodai, Hojat

    2011-01-01

    Reading fluency is one of the most important signs of language proficiency both for native and foreign language speakers (Grabe, 2010; Macalister, 2010; Winston, 2010; Hasbrouck, 2008; Rasinski, 2004; Oakley, 2003; Waldman, 1985; Cited in: Sayenko, 2010, Introduction Para 1). This paper is in the area of reading fluency and tries to investigate…

  5. Extending Extensive Reading

    Science.gov (United States)

    Day, Richard R.

    2015-01-01

    The April 2015 issue of "Reading in a Foreign Language" featured a discussion forum on extensive reading (ER). Most of the authors, recognized authorities on ER, discussed their views of the principles of ER, particularly in establishing and conducting ER programs. The purpose of this discussion is to review developments in the practice…

  6. Read Like a Scientist

    Science.gov (United States)

    Mawyer, Kirsten K. N.; Johnson, Heather J.

    2017-01-01

    Scientists read, and so should students. Unfortunately, many high school teachers overlook science texts as a way to engage students in the work of scientists. This article addresses how to help students develop literacy skills by strategically reading a variety of science texts. Unfortunately, most science teachers aren't trained to teach…

  7. DNA Sequencing Using capillary Electrophoresis

    Energy Technology Data Exchange (ETDEWEB)

    Dr. Barry Karger

    2011-05-09

    application papers of sequencing up to this level were also published in the mid 1990's. A major interest of the sequencing community has always been read length. The longer the sequence read per run the more efficient the process as well as the ability to read repeat sequences. We therefore devoted a great deal of time to studying the factors influencing read length in capillary electrophoresis, including polymer type and molecule weight, capillary column temperature, applied electric field, etc. In our initial optimization, we were able to demonstrate, for the first time, the sequencing of over 1000 bases with 90% accuracy. The run required 80 minutes for separation. Sequencing of 1000 bases per column was next demonstrated on a multiple capillary instrument. Our studies revealed that linear polyacrylamide produced the longest read lengths because the hydrophilic single strand DNA had minimal interaction with the very hydrophilic linear polyacrylamide. Any interaction of the DNA with the polymer would lead to broader peaks and lower read length. Another important parameter was the molecular weight of the linear chains. High molecular weight (> 1 MDA) was important to allow the long single strand DNA to reptate through the entangled polymer matrix. In an important paper, we showed an inverse emulsion method to prepare reproducibility linear polyacrylamide polymer with an average MWT of 9MDa. This approach was used in the polymer for sequencing the human genome. Another critical factor in the successful use of capillary electrophoresis for sequencing was the sample preparation method. In the Sanger sequencing reaction, high concentration of salts and dideoxynucleotide remained. Since the sample was introduced to the capillary column by electrokinetic injection, these salt ions would be favorably injected into the column over the sequencing fragments, thus reducing the signal for longer fragments and hence reading read length. In two papers, we examined the role of

  8. EXTENSIVE READING IN CHINA

    Institute of Scientific and Technical Information of China (English)

    1996-01-01

    IntroductionMost teachers and students in China are quite familiar with the term‘extensive reading’,but how itshould be taught still remains a problem.This paper covers the aims of extensive reading and the methodsand materials used in the course.Then some practical suggestions will be given to make the course moreinteresting and efficient.According to Dzao(1990).extensive reading is‘the course where other reading skills-speed,predictionand making inference-can be developed,’and‘where there is practice in geting the gist,in summarisingmain ideas,in understanding the author’s purpose and theme...’.So the aims of this course are todevelop general reading skills,the ability,to read quickly and to grasp the main ideas of the text.Toachieve these,students must enlarge their vocabulary,so this is also regarded as one of the aims.

  9. Science Fiction: Serious Reading, Critical Reading

    Science.gov (United States)

    Zigo, Diane; Moore, Michael T.

    2004-01-01

    Science fiction deserves a greater respect, serious and critical reading and a better place in high school literature classes. Some of the science fiction books by Isaac Asimov, Alfred Bester, Ray Bradbury and Octavia L. Butler and various activities for incorporating science fiction into the English language arts instruction classroom are…

  10. Science Fiction: Serious Reading, Critical Reading

    Science.gov (United States)

    Zigo, Diane; Moore, Michael T.

    2004-01-01

    Science fiction deserves a greater respect, serious and critical reading and a better place in high school literature classes. Some of the science fiction books by Isaac Asimov, Alfred Bester, Ray Bradbury and Octavia L. Butler and various activities for incorporating science fiction into the English language arts instruction classroom are…

  11. Grasp Reading Skills to Improve Reading Ability?

    Institute of Scientific and Technical Information of China (English)

    CaoJingt

    2004-01-01

    Reading is a kind of communication, and for most people in most situations it is more important than speaking. For college students today, it is more important for them to obtain the newest information on their own fields through English,rather than showing their English certificates. However, the most common problem students have nowadays, including

  12. Integrating Reading and Writing through Extensive Reading

    Science.gov (United States)

    Park, Jeongyeon

    2016-01-01

    This study explores whether an extensive reading (ER) approach can enhance L2 learners' writing performance in an English for Academic Purposes context. Two classes were compared in terms of writing improvement after one semester: a 'traditional' writing class primarily focused on writing practice and grammar instruction, and an ER class in which…

  13. Next generation sequencing (NGS)technologies and applications

    Energy Technology Data Exchange (ETDEWEB)

    Vuyisich, Momchilo [Los Alamos National Laboratory

    2012-09-11

    NGS technology overview: (1) NGS library preparation - Nucleic acids extraction, Sample quality control, RNA conversion to cDNA, Addition of sequencing adapters, Quality control of library; (2) Sequencing - Clonal amplification of library fragments, (except PacBio), Sequencing by synthesis, Data output (reads and quality); and (3) Data analysis - Read mapping, Genome assembly, Gene expression, Operon structure, sRNA discovery, and Epigenetic analyses.

  14. Geoseq: a tool for dissecting deep-sequencing datasets

    OpenAIRE

    Homann Robert; George Ajish; Levovitz Chaya; Shah Hardik; Cancio Anthony; Gurtowski James; Sachidanandam Ravi

    2010-01-01

    Abstract Background Datasets generated on deep-sequencing platforms have been deposited in various public repositories such as the Gene Expression Omnibus (GEO), Sequence Read Archive (SRA) hosted by the NCBI, or the DNA Data Bank of Japan (ddbj). Despite being rich data sources, they have not been used much due to the difficulty in locating and analyzing datasets of interest. Results Geoseq http://geoseq.mssm.edu provides a new method of analyzing short reads from deep sequencing experiments...

  15. Forecasting Reading Anxiety for Promoting English-Language Reading Performance Based on Reading Annotation Behavior

    Science.gov (United States)

    Chen, Chih-Ming; Wang, Jung-Ying; Chen, Yong-Ting; Wu, Jhih-Hao

    2016-01-01

    To reduce effectively the reading anxiety of learners while reading English articles, a C4.5 decision tree, a widely used data mining technique, was used to develop a personalized reading anxiety prediction model (PRAPM) based on individual learners' reading annotation behavior in a collaborative digital reading annotation system (CDRAS). In…

  16. Role of Reading Engagement in Mediating Effects of Reading Comprehension Instruction on Reading Outcomes

    Science.gov (United States)

    Wigfield, Allan; Guthrie, John T.; Perencevich, Kathleen C.; Taboada, Ana; Klauda, Susan Lutz; McRae, Angela; Barbosa, Pedro

    2008-01-01

    The engagement model of reading development suggests that instruction improves students' reading comprehension to the extent that it increases students' engagement processes in reading. We compared how Concept-Oriented Reading Instruction (CORI) (support for cognitive and motivational processes in reading), strategy instruction (support for…

  17. The Relationships between Korean University Students' Reading Attitude, Reading Strategy Use, and Reading Proficiency

    Science.gov (United States)

    Kim, Hyangil

    2016-01-01

    This present study investigated the relationships among L2 readers' reading attitude, reading strategy use, and reading proficiency in order to identify patterns caused by individuals' differences. For this study, 153 Korean university students replied to a reading attitude and reading strategy questionnaire. An ANOVA and frequency analysis were…

  18. Student-Centered Reading Activities.

    Science.gov (United States)

    Moffett, James; Wagner, Betty Jane

    1991-01-01

    Offers student-centered reading activities designed to bring students to reading maturity and involvement in literature. Discusses partner reading, dramatizing and performing texts, transforming texts, journal writing, discussion, and writing. (PRA)

  19. Discourse Awareness and Teaching Reading

    Institute of Scientific and Technical Information of China (English)

    FEI Jun

    2014-01-01

    Reading is an interactive process and discourse knowldege facilitates interpretation of the text. This paper discusses the thoery of reading from a discouse apporach, explores the interactive approach of teaching reading and discusses pedagogical impi-cations with contexts concerned.

  20. Illuminating the Black Box of Genome Sequence Assembly: A Free Online Tool to Introduce Students to Bioinformatics

    Science.gov (United States)

    Taylor, D. Leland; Campbell, A. Malcolm; Heyer, Laurie J.

    2013-01-01

    Next-generation sequencing technologies have greatly reduced the cost of sequencing genomes. With the current sequencing technology, a genome is broken into fragments and sequenced, producing millions of "reads." A computer algorithm pieces these reads together in the genome assembly process. PHAST is a set of online modules…

  1. Metacognition and Reading Comprehension

    Institute of Scientific and Technical Information of China (English)

    Cecilia Chang

    2000-01-01

    @@ Introduction Reading is a mysterious process that has attracted attention from psychologists, reading researchers and educators alike for decades. Currently, reading is viewed as a meaningconstructing process where the reader interacts with the text by simultaneously using information from a variety of sources ranging from one's background knowledge of the content and about the world to the knowledge about the language in which the text is written (Mulling, 1994). Moreover, reading comprehension is achieved only after skillful orchestration of all the resources the reader possess when engaged in the act of reading. Among the various kinds of strategies a reader needs during the reading process are the monitoring strategies. In order to fully utilize the strategies, the reader needs to possess good metacognition. Metacognition refers to one's deliberate conscious control of one's cognition actions (Brown, 1980). Since it is crucial for success that we know what we know and what we don't know and, consequently what to do about what we know and don't know, it is not surprising to find the notion of metacognition being stressed in almost every situation of learning. Given the importance of metacognition in successful learning, the purpose of this paper is to explore the role metacognition plays in reading comprehension, and consequently, identify future research directions.

  2. Reading between eye saccades.

    Directory of Open Access Journals (Sweden)

    Caroline Blais

    Full Text Available BACKGROUND: Skilled adult readers, in contrast to beginners, show no or little increase in reading latencies as a function of the number of letters in words up to seven letters. The information extraction strategy underlying such efficiency in word identification is still largely unknown, and methods that allow tracking of the letter information extraction through time between eye saccades are needed to fully address this question. METHODOLOGY/PRINCIPAL FINDINGS: The present study examined the use of letter information during reading, by means of the Bubbles technique. Ten participants each read 5,000 five-letter French words sampled in space-time within a 200 ms window. On the temporal dimension, our results show that two moments are especially important during the information extraction process. On the spatial dimension, we found a bias for the upper half of words. We also show for the first time that letter positions four, one, and three are particularly important for the identification of five-letter words. CONCLUSIONS/SIGNIFICANCE: Our findings are consistent with either a partially parallel reading strategy or an optimal serial reading strategy. We show using computer simulations that this serial reading strategy predicts an absence of a word-length effect for words from four- to seven letters in length. We believe that the Bubbles technique will play an important role in further examining the nature of reading between eye saccades.

  3. Newspaper Reading and English Teaching

    Institute of Scientific and Technical Information of China (English)

    霍丽蓉

    2014-01-01

    Reading is not only the most effective way to get the knowledge of the language but also the only way to improve the language skills of English learners. However, the vast majority of students' reading ability are far apart from the curriculum stan⁃dards in reality. Reading English newspaper helps to mobilize the enthusiasm of students' reading, and improve their reading abili⁃ty. It forms a bridge between learning and real life. It extends the English reading from the teaching materials and guides students to a lot of extracurricular reading, creating a new reading space for students.

  4. I read, you read, we read: the history of reading in Slovenia

    Directory of Open Access Journals (Sweden)

    Anja Dular

    2013-03-01

    Full Text Available ABSTRACTPurpose: The aim of the article is to research reading habits in Slovenia in the period between 16th and 19th century and to find similarities with Austria and other European countries of that time.Methodology/approach: For the purpose of the analysis different resources were used – study books, catechisms, prayer books and manuals. We were focused on introductions in which readers are advised how to read, explaining to whom the work is intended and emphasizing the importance of meditation on the texts.Results: Historically the laud reading was prefered, as to continue the folk tradition. However, the 16th century texts were transmitted by women while the folk tradition was narrated by males. In the 18th century the higher level of literacy and greater book production and availability caused that the books were not a privilege of a few. At that time more texts were intended for silent, individual reading. Interestingly, the authors emphasized the importance of meditation on the texts, too. It was also advised when to read – it wasrecommedend to read in leisure time on Sundays, and on holidays. The role of books was also to breakaway with the reality and to forget everyday problems. Due to the overproduction of books in the 17th centrury it was concerned that books are misleading the crowds. The church considered the reading of books as inappropriate, and criticized fiction, novels and adventure stories mostly read by women.Research limitation: The study is based on Slovenian texts only, although the foreign literature, especially in German, was generally available, too.Originality/practical implications: The study is fullfiling the gap in the history of reading in Slovenia.

  5. Interruptions disrupt reading comprehension.

    Science.gov (United States)

    Foroughi, Cyrus K; Werner, Nicole E; Barragán, Daniela; Boehm-Davis, Deborah A

    2015-06-01

    Previous research suggests that being interrupted while reading a text does not disrupt the later recognition or recall of information from that text. This research is used as support for Ericsson and Kintsch's (1995) long-term working memory (LT-WM) theory, which posits that disruptions while reading (e.g., interruptions) do not impair subsequent text comprehension. However, to fully comprehend a text, individuals may need to do more than recognize or recall information that has been presented in the text at a later time. Reading comprehension often requires individuals to connect and synthesize information across a text (e.g., successfully identifying complex topics such as themes and tones) and not just make a familiarity-based decision (i.e., recognition). The goal for this study was to determine whether interruptions while reading disrupt reading comprehension when the questions assessing comprehension require participants to connect and synthesize information across the passage. In Experiment 1, interruptions disrupted reading comprehension. In Experiment 2, interruptions disrupted reading comprehension but not recognition of information from the text. In Experiment 3, the addition of a 15-s time-out prior to the interruption successfully removed these negative effects. These data suggest that the time it takes to process the information needed to successfully comprehend text when reading is greater than that required for recognition. Any interference (e.g., an interruption) that occurs during the comprehension process may disrupt reading comprehension. This evidence supports the need for transient activation of information in working memory for successful text comprehension and does not support LT-WM theory. (c) 2015 APA, all rights reserved).

  6. Transcriptome sequencing of the Microarray Quality Control (MAQC RNA reference samples using next generation sequencing

    Directory of Open Access Journals (Sweden)

    Thierry-Mieg Danielle

    2009-06-01

    Full Text Available Abstract Background Transcriptome sequencing using next-generation sequencing platforms will soon be competing with DNA microarray technologies for global gene expression analysis. As a preliminary evaluation of these promising technologies, we performed deep sequencing of cDNA synthesized from the Microarray Quality Control (MAQC reference RNA samples using Roche's 454 Genome Sequencer FLX. Results We generated more that 3.6 million sequence reads of average length 250 bp for the MAQC A and B samples and introduced a data analysis pipeline for translating cDNA read counts into gene expression levels. Using BLAST, 90% of the reads mapped to the human genome and 64% of the reads mapped to the RefSeq database of well annotated genes with e-values ≤ 10-20. We measured gene expression levels in the A and B samples by counting the numbers of reads that mapped to individual RefSeq genes in multiple sequencing runs to evaluate the MAQC quality metrics for reproducibility, sensitivity, specificity, and accuracy and compared the results with DNA microarrays and Quantitative RT-PCR (QRTPCR from the MAQC studies. In addition, 88% of the reads were successfully aligned directly to the human genome using the AceView alignment programs with an average 90% sequence similarity to identify 137,899 unique exon junctions, including 22,193 new exon junctions not yet contained in the RefSeq database. Conclusion Using the MAQC metrics for evaluating the performance of gene expression platforms, the ExpressSeq results for gene expression levels showed excellent reproducibility, sensitivity, and specificity that improved systematically with increasing shotgun sequencing depth, and quantitative accuracy that was comparable to DNA microarrays and QRTPCR. In addition, a careful mapping of the reads to the genome using the AceView alignment programs shed new light on the complexity of the human transcriptome including the discovery of thousands of new splice variants.

  7. Genome Sequencing

    DEFF Research Database (Denmark)

    Sato, Shusei; Andersen, Stig Uggerhøj

    2014-01-01

    The current Lotus japonicus reference genome sequence is based on a hybrid assembly of Sanger TAC/BAC, Sanger shotgun and Illumina shotgun sequencing data generated from the Miyakojima-MG20 accession. It covers nearly all expressed L. japonicus genes and has been annotated mainly based on transcr......The current Lotus japonicus reference genome sequence is based on a hybrid assembly of Sanger TAC/BAC, Sanger shotgun and Illumina shotgun sequencing data generated from the Miyakojima-MG20 accession. It covers nearly all expressed L. japonicus genes and has been annotated mainly based...

  8. Mapping RNA-seq Reads with STAR

    Science.gov (United States)

    Dobin, Alexander; Gingeras, Thomas R.

    2015-01-01

    Mapping of large sets of high-throughput sequencing reads to a reference genome is one of the foundational steps in RNA-seq data analysis. The STAR software package performs this task with high levels of accuracy and speed. In addition to detecting annotated and novel splice junctions, STAR is capable of discovering more complex RNA sequence arrangements, such as chimeric and circular RNA. STAR can align spliced sequences of any length with moderate error rates providing scalability for emerging sequencing technologies. STAR generates output files that can be used for many downstream analyses such as transcript/gene expression quantification, differential gene expression, novel isoform reconstruction, signal visualization, and so forth. In this unit we describe computational protocols that produce various output files, use different RNA-seq datatypes, and utilize different mapping strategies. STAR is Open Source software that can be run on Unix, Linux or Mac OS X systems. PMID:26334920

  9. The origin of biased sequence depth in sequence-independent nucleic acid amplification and optimization for efficient massive parallel sequencing.

    Directory of Open Access Journals (Sweden)

    Toon Rosseel

    Full Text Available Sequence Independent Single Primer Amplification is one of the most widely used random amplification approaches in virology for sequencing template preparation. This technique relies on oligonucleotides consisting of a 3' random part used to prime complementary DNA synthesis and a 5' defined tag sequence for subsequent amplification. Recently, this amplification method was combined with next generation sequencing to obtain viral sequences. However, these studies showed a biased distribution of the resulting sequence reads over the analyzed genomes. The aim of this study was to elucidate the mechanisms that lead to biased sequence depth when using random amplification. Avian paramyxovirus type 8 was used as a model RNA virus to investigate these mechanisms. We showed, based on in silico analysis of the sequence depth in relation to GC-content, predicted RNA secondary structure and sequence complementarity to the 3' part of the tag sequence, that the tag sequence has the main contribution to the observed bias in sequence depth. We confirmed this finding experimentally using both fragmented and non-fragmented viral RNAs as well as primers differing in random oligomer length (6 or 12 nucleotides and in the sequence of the amplification tag. The observed oligonucleotide annealing bias can be reduced by extending the random oligomer sequence and by in silico combining sequence data from SISPA experiments using different 5' defined tag sequences. These findings contribute to the optimization of random nucleic acid amplification protocols that are currently required for downstream applications such as viral metagenomics and microarray analysis.

  10. Inferring viral quasispecies spectra from 454 pyrosequencing reads

    Directory of Open Access Journals (Sweden)

    Măndoiu Ion

    2011-07-01

    Full Text Available Abstract Background RNA viruses infecting a host usually exist as a set of closely related sequences, referred to as quasispecies. The genomic diversity of viral quasispecies is a subject of great interest, particularly for chronic infections, since it can lead to resistance to existing therapies. High-throughput sequencing is a promising approach to characterizing viral diversity, but unfortunately standard assembly software was originally designed for single genome assembly and cannot be used to simultaneously assemble and estimate the abundance of multiple closely related quasispecies sequences. Results In this paper, we introduce a new Viral Spectrum Assembler (ViSpA method for quasispecies spectrum reconstruction and compare it with the state-of-the-art ShoRAH tool on both simulated and real 454 pyrosequencing shotgun reads from HCV and HIV quasispecies. Experimental results show that ViSpA outperforms ShoRAH on simulated error-free reads, correctly assembling 10 out of 10 quasispecies and 29 sequences out of 40 quasispecies. While ShoRAH has a significant advantage over ViSpA on reads simulated with sequencing errors due to its advanced error correction algorithm, ViSpA is better at assembling the simulated reads after they have been corrected by ShoRAH. ViSpA also outperforms ShoRAH on real 454 reads. Indeed, 7 most frequent sequences reconstructed by ViSpA from a real HCV dataset are viable (do not contain internal stop codons, and the most frequent sequence was within 1% of the actual open reading frame obtained by cloning and Sanger sequencing. In contrast, only one of the sequences reconstructed by ShoRAH is viable. On a real HIV dataset, ShoRAH correctly inferred only 2 quasispecies sequences with at most 4 mismatches whereas ViSpA correctly reconstructed 5 quasispecies with at most 2 mismatches, and 2 out of 5 sequences were inferred without any mismatches. ViSpA source code is available at http

  11. How to read efficiently%How to read Efficiently

    Institute of Scientific and Technical Information of China (English)

    肖平

    2007-01-01

    At present, the university graduates have mountains of books to read. For this phenomenon, this paper first indicates why the students must read extensively and then tentatively suggests three measures establish new concept in reading, practise reading pragmatically, review and exchange.

  12. FIRST Reading: Focussed Instruction in Reading for Successful Teaching.

    Science.gov (United States)

    Newman, Anabel P.; Metz, Elizabeth

    This book describes FIRST (Focussed Instruction in Reading for Successful Teaching) Reading, a computer program that takes answers to 20 questions about a learner and matches this profile against profiles in the database. FIRST Reading, formerly called "Consult Reading," can recommend the most-likely-to-succeed teaching focus(es) for…

  13. Reading Acts: An Inquiry into Reading and Teaching

    Science.gov (United States)

    Sams, Brandon L.

    2012-01-01

    This text performs reading for teaching in an audit culture. Two teachers, myself and Steven, read the memoir "Hole in My Life" by Jack Gantos and, while reading, recorded our experiences as readers and planned to teach the book to Steven's English class. This study is an inquiry into the phenomenon of "reading to teach,"…

  14. Developing New Reading Assessments to Promote Beginning Reading in Singapore

    Science.gov (United States)

    Koh, Kim H.; Paris, Scott G.

    2011-01-01

    Effective reading instruction and intervention are rooted in effective assessments of children's developing skills in reading. The article aims to describe the development of new reading assessments to help promote beginning reading in Singapore primary schools. We begin with an introduction to the educational landscape and policies before…

  15. Teen Summer Reading Program, 1999. Read around the World.

    Science.gov (United States)

    Virginia State Library, Richmond.

    This guide for the 1999 Virginia teen summer reading program for public libraries, "Read around the World," includes the following chapters: (1) "Reading and Teens," including serving the underserved, tips for teens, and a recipe for choosing a book to read for fun; (2) "Programming and Teens," including "The Why…

  16. Reading Motivation and Reading Engagement: Clarifying Commingled Conceptions

    Science.gov (United States)

    Unrau, Norman J.; Quirk, Matthew

    2014-01-01

    The constructs of motivation for reading and reading engagement have frequently become blurred and ambiguous in both research and discussions of practice. To address this commingling of constructs, the authors provide a concise review of the literature on motivation for reading and reading engagement and illustrate the blurring of those concepts…

  17. "Read the Text, as if!"The Reading Retention Strategy

    Science.gov (United States)

    Divoll, Kent; Browning, Sandra

    2013-01-01

    Students do not always read what is expected in college courses (Berry, Cook, Hill, & Stevens, 2010; Phillips & Phillips, 2007; Sikorski et al., 2002) or they read to cram for an exam or quiz (Clump, Bauer, & Bradley, 2004). The Reading Retention Strategy (RRS) is designed to motivate students to read and assist students in…

  18. Child-centered reading intervention: See, talk, dictate, read, write!

    Directory of Open Access Journals (Sweden)

    Muhammet BAŞTUĞ

    2016-06-01

    Full Text Available Poor reading achievement of children in elementary schools has been one of the major concerns in education. The aim of this study is to examine the effectiveness of a child-centered reading intervention in eliminating the reading problems of a student with poor reading achievement. The research was conducted with a student having difficulty in reading. A reading intervention was designed that targeted multiple areas of reading and aimed to improve reading skills through the use of multiple strategies. This intervention is child-centered and includes visual aids, talking, dictating, reading and writing stages. The study was performed in 35 sessions consisting of stages of a single sentence (5 sessions, two sentences (5 sessions, three sentences (20 sessions and the text stage (5 sessions. The intervention sessions were audio-taped. These recordings and the written responses to the reading comprehension questions provided the data for analysis. The findings on the reading intervention revealed positive outcomes. The student exhibited certain improvements at the levels of reading, reading rate and reading comprehension. These results were discussed in the literature and the findings suggest that child-centered reading strategies such as talking, dictating and writing should be the main focus of instruction for students with low reading literacy achievement to enable these students to meet the demands of the curriculum.

  19. Improving EFL Learners' Reading Levels through Extensive Reading

    Science.gov (United States)

    Mermelstein, Aaron David

    2014-01-01

    Today there is an increasing amount of research promoting the effectiveness of extensive reading (ER) towards increasing learners' vocabulary, comprehension, reading speed, and motivation towards reading. However, little has been done to measure the effects of ER on learners' reading levels. This quantitative study examined the effects of ER on…

  20. Experiences in Reading Instruction as the Road to Teaching Reading.

    Science.gov (United States)

    Corlett, Donna Jean

    1988-01-01

    Describes a model self-improvement reading course for teachers incorporating the communications model, the skills model, and the sustained silent reading model. Concludes that basic reading skills instruction led to improvement in reading skills and that lesson plans incorporating course objectives were produced. (RS)

  1. Exploring Students' Reading Profiles to Guide a Reading Intervention Programme

    Science.gov (United States)

    Boakye, Naomi A. N. Y.

    2017-01-01

    There have been a number of studies on reading interventions to improve students' reading proficiency, yet the majority of these interventions are undertaken with the assumption that students' reading challenges are obvious and generic in nature. The interventions do not take into consideration the diversity in students' reading backgrounds and…

  2. Reading the Tourist Guidebook

    DEFF Research Database (Denmark)

    Therkelsen, Anette; Sørensen, Anders

    2005-01-01

    This article investigates tourists’ ways of reading their guidebooks on the basis of qualitative interviews with tourists visiting Copenhagen, Denmark. Tourist guidebooks have only been dealt with sporadically by tourism scholars. The relatively few studies that focus on guidebooks either present...... a historical perspective on the guidebook or centre on content analyses of place representation, whereas virtually no research exists on the way in which tourists read and use their guidebooks. This study reveals that tourists read the same guidebooks in a number of different ways regarding types...... of information sought, amount of information read and level of involvement displayed, indicating a three-pronged typology of guidebook readers. The guidebook reader typology thus constructed may be regarded as a first step in understanding the effect of guidebooks on tourists’ behaviour and their experience...

  3. Assessment of Reading Comprehension

    Directory of Open Access Journals (Sweden)

    Madani HABIB

    2016-06-01

    Full Text Available This study attempts to shed light on the concept of assessment as an essential pedagogical practice for the improvement of the teaching-learning process. Particularly, it stresses the strategies and the techniques that should be used in assessing reading comprehension with reference to EFL classrooms. It describes the kinds of tasks that actually reveal students’ reading comprehension abilities and needs. Moreover, this paper aims to illustrate the types and the advantages of assessment for both teachers and learners. More importantly, this study tries to bring equitable evidence of how reading comprehension can be adequately assessed. The findings showed that assessment of reading comprehension is central to English language teaching as it provides teachers with essential information about students’ weaknesses, needs, obstacles, and deficits. Thus, teachers can implement the appropriate techniques and use the assessment results to amend their classroom instruction and enhance the learning abilities.

  4. Art Criticism and Reading.

    Science.gov (United States)

    Feldman, Edmund Burke; Woods, Don

    1981-01-01

    The authors review a body of theory and accumulating evidence which suggests that critical study of the arts facilitates the development of cognitive skills, including those essential to reading. (Author/SJL)

  5. Adult Basic Education: Reading.

    Science.gov (United States)

    Newman, Anabel P.

    This book is designed to provide practical suggestions and teaching approaches for both administrators and instructors involved in teaching reading to adults. The book contains the following chapters: (1) "Overview"; (2) "Diagnosing Learner Characteristics"; (3) "Goals and Objectives"; (4) "Planning…

  6. The cold reading technique.

    Science.gov (United States)

    Dutton, D L

    1988-04-15

    For many people, belief in the paranormal derives from personal experience of face-to-face interviews with astrologers, palm readers, aura and Tarot readers, and spirit mediums. These encounters typically involve cold reading, a process in which a reader makes calculated guesses about a client's background and problems and, depending on the reaction, elaborates a reading which seems to the client so uniquely appropriate that it carries with it the illusion of having been produced by paranormal means. The cold reading process is shown to depend initially on the Barnum effect, the tendency for people to embrace generalized personality descriptions as idiosyncratically their own. Psychological research into the Barnum effect is critically reviewed, and uses of the effect by a professional magician are described. This is followed by detailed analysis of the cold reading performances of a spirit medium. Future research should investigate the degree to which cold readers may have convinced themselves that they actually possess psychic or paranormal abilities.

  7. Reading-Boxing Class

    Science.gov (United States)

    Kravitz, Richard; Shapiro, Marvin

    1969-01-01

    The physical education department of the Pennsylvania Advancement School of Philadelphia has established a reading and communication skill project that uses the appeal of sports to help students improve their basic skills. (Author)

  8. Reading-Boxing Class

    Science.gov (United States)

    Kravitz, Richard; Shapiro, Marvin

    1969-01-01

    The physical education department of the Pennsylvania Advancement School of Philadelphia has established a reading and communication skill project that uses the appeal of sports to help students improve their basic skills. (Author)

  9. STUDENTS’ READING PRACTICES AND ENVIRONMENTS

    Directory of Open Access Journals (Sweden)

    Aiza Johari

    2013-07-01

    Full Text Available Abstract: The challenges of reading are indeed apparent in most teaching and learning processes in ESL classrooms. As a result, this study is conducted to resolve the issues of students who seem to find reading to be unbearable. Many of them have limited ability to read well and hence, possess insufficient reading habits to become competent readers, particularly out-of-school context. Besides, poor home literacy environments also contribute to their shortcomings in reading. The main objectives of this study are to identify the students’ reasons for reading as well as to find out their home reading environments (reading backgrounds and habits; reading attitudes and motivation; reading exposure and supports. To identify these, questionnaires were distributed to 120 secondary school students (Form 4: 16 years old from one of the urban schools in Sarawak, Malaysia. The findings indicate that the students read to gain information and knowledge though many chose reading as a hobby as their last choice in explaining their motives of reading. Besides, they preferred non-academic reading materials, mainly lighter forms reading materials such as comics, story books and magazines. Though the students acknowledged the importance of reading in their daily lives, their average reading habits, attitude, motivation, exposure and support within the home domain had suggested otherwise. They mainly read for instrumental purposes while reading for pleasure seemed not to be given priority. Besides, the respondents acknowledge that their parents and themselves did not read much at home. As an implication, it is vital for students to improve their reading perceptions, abilities and practices to achieve personal, societal and national progress. On a final note, parents’ early and continuous efforts to be involved in their children’s literacy events in an out-of-school context are believed to be vital to inculcate positive reading environments, habits and culture

  10. Assessment of replicate bias in 454 pyrosequencing and a multi-purpose read-filtering tool

    Directory of Open Access Journals (Sweden)

    Klopp Christophe

    2011-05-01

    Full Text Available Abstract Background Roche 454 pyrosequencing platform is often considered the most versatile of the Next Generation Sequencing technology platforms, permitting the sequencing of large genomes, the analysis of variations or the study of transcriptomes. A recent reported bias leads to the production of multiple reads for a unique DNA fragment in a random manner within a run. This bias has a direct impact on the quality of the measurement of the representation of the fragments using the reads. Other cleaning steps are usually performed on the reads before assembly or alignment. Findings PyroCleaner is a software module intended to clean 454 pyrosequencing reads in order to ease the assembly process. This program is a free software and is distributed under the terms of the GNU General Public License as published by the Free Software Foundation. It implements several filters using criteria such as read duplication, length, complexity, base-pair quality and number of undetermined bases. It also permits to clean flowgram files (.sff of paired-end sequences generating on one hand validated paired-ends file and the other hand single read file. Conclusions Read cleaning has always been an important step in sequence analysis. The pyrocleaner python module is a Swiss knife dedicated to 454 reads cleaning. It includes commonly used filters as well as specialised ones such as duplicated read removal and paired-end read verification.

  11. Metacognition in Reading Comprehension

    OpenAIRE

    Ceylan, Eda; Harputlu, Leyla

    2015-01-01

    Metacognition is defined basically as thinking about thinking. It is a significant factor that affects many activities related to language use. Reading comprehension, which is an indispensable part of daily life and language classrooms, is affected by metacognition, too. Hence, this paper aims to present an overview of the recent theoretical and empirical studies about metacognition and reading comprehension. Firstly, it provides the definitions and the importance of metacognition. Secondly, ...

  12. Microbiome profiling by illumina sequencing of combinatorial sequence-tagged PCR products

    NARCIS (Netherlands)

    G.B. Gloor (Gregory); R.B.S. Hummelen (Ruben); J.M. Macklaim (Jean); R.J. Dickson (Russell); A.D. Fernandes (Andrew); R.A. MacPhee (Roderick); G. Reid (Gregor)

    2010-01-01

    textabstractWe developed a low-cost, high-throughput microbiome profiling method that uses combinatorial sequence tags attached to PCR primers that amplify the rRNA V6 region. Amplified PCR products are sequenced using an Illumina paired-end protocol to generate millions of overlapping reads.

  13. NGS-based deep bisulfite sequencing.

    Science.gov (United States)

    Lee, Suman; Kim, Joomyeong

    2016-01-01

    We have developed an NGS-based deep bisulfite sequencing protocol for the DNA methylation analysis of genomes. This approach allows the rapid and efficient construction of NGS-ready libraries with a large number of PCR products that have been individually amplified from bisulfite-converted DNA. This approach also employs a bioinformatics strategy to sort the raw sequence reads generated from NGS platforms and subsequently to derive DNA methylation levels for individual loci. The results demonstrated that this NGS-based deep bisulfite sequencing approach provide not only DNA methylation levels but also informative DNA methylation patterns that have not been seen through other existing methods.•This protocol provides an efficient method generating NGS-ready libraries from individually amplified PCR products.•This protocol provides a bioinformatics strategy sorting NGS-derived raw sequence reads.•This protocol provides deep bisulfite sequencing results that can measure DNA methylation levels and patterns of individual loci.

  14. Strategies for complete plastid genome sequencing.

    Science.gov (United States)

    Twyford, Alex D; Ness, Rob W

    2016-10-28

    Plastid sequencing is an essential tool in the study of plant evolution. This high-copy organelle is one of the most technically accessible regions of the genome, and its sequence conservation makes it a valuable region for comparative genome evolution, phylogenetic analysis and population studies. Here, we discuss recent innovations and approaches for de novo plastid assembly that harness genomic tools. We focus on technical developments including low-cost sequence library preparation approaches for genome skimming, enrichment via hybrid baits and methylation-sensitive capture, sequence platforms with higher read outputs and longer read lengths, and automated tools for assembly. These developments allow for a much more streamlined assembly than via conventional short-range PCR. Although newer methods make complete plastid sequencing possible for any land plant or green alga, there are still challenges for producing finished plastomes particularly from herbarium material or from structurally divergent plastids such as those of parasitic plants.

  15. FastUniq: a fast de novo duplicates removal tool for paired short reads.

    Directory of Open Access Journals (Sweden)

    Haibin Xu

    Full Text Available The presence of duplicates introduced by PCR amplification is a major issue in paired short reads from next-generation sequencing platforms. These duplicates might have a serious impact on research applications, such as scaffolding in whole-genome sequencing and discovering large-scale genome variations, and are usually removed. We present FastUniq as a fast de novo tool for removal of duplicates in paired short reads. FastUniq identifies duplicates by comparing sequences between read pairs and does not require complete genome sequences as prerequisites. FastUniq is capable of simultaneously handling reads with different lengths and results in highly efficient running time, which increases linearly at an average speed of 87 million reads per 10 minutes. FastUniq is freely available at http://sourceforge.net/projects/fastuniq/.

  16. When Do Children Read Books?

    Science.gov (United States)

    van Ours, Jan C.

    2008-01-01

    This paper investigates the reading of fiction books by 15 year olds in 18 OECD countries. It appears that girls read fiction books more often than boys, whereas boys read comic books more often than girls. Parental education, family structure, and the number of books and televisions at home influence the intensity with which children read fiction…

  17. In Defense of Reading Quizzes

    Science.gov (United States)

    Tropman, Elizabeth

    2014-01-01

    Many students fail to read the assigned material before class. A failure to read is detrimental to both student learning and course engagement. This paper considers the often-neglected teaching technique of giving frequent quizzes on the reading. Drawing on the author's experiences assigning reading quizzes, together with student opinions…

  18. How to Improve Reading Speed

    Institute of Scientific and Technical Information of China (English)

    王仲亮

    2008-01-01

    @@ It is well known that reading plays an important role not only in our daily life but also in learning a foreign language.And,of course,reading depends on reading speed to some degree.So it is natural that it is important to improve reading speed.

  19. Obstacles to Effective Fast Reading

    Institute of Scientific and Technical Information of China (English)

    赵振国

    2016-01-01

    Reading is a visual as well as a mental understanding process. Reading process can be divided into visual input process and mental understanding information process. Based on experts’ researches and theories, the author has analyzed the two reading processes and discover the main factors influencing the Chinese English learners’ reading rate are visual input with low speed and understanding information with low efficiency.

  20. Dyslexia: Problems of Reading Disabilities.

    Science.gov (United States)

    Goldberg, Herman K.; Schiffman, Gilbert B.

    The purpose of this book is to provide an understanding of both the educational and medical aspects of reading and to show how they are interrelated in reading disabilities. The various aspects of reading disabilities are presented in the following chapters: Introduction to the Reading Problem; Early Predictive Studies; Psychological Evaluation;…

  1. Literature in the Reading Curriculum.

    Science.gov (United States)

    Johnson, Nancy J.; Giorgis, Cyndi

    2003-01-01

    Presents annotations of 32 works of children's literature that invite consideration of the wondrous possibilities of literature in the reading curriculum--from reading aloud to time set aside for independent reading, from focused instruction using paired and shared reading to engagement through book discussions, and from using literature to learn…

  2. Peer Tutors Improve Reading Comprehension

    Science.gov (United States)

    LaGue, Kristina M.; Wilson, Katrina

    2011-01-01

    The influential report "Teaching Children to Read: An Evidenced-Based Assessment of the Scientific Research Literature on Reading and Its Implications for Reading Instruction," published by the National Reading Panel in 2000, presented recommendations for daily literacy instruction in five key areas: phonemic awareness, phonics, fluency,…

  3. Focus on Reading. New Edition.

    Science.gov (United States)

    Hood, Susan; Solomon, Nicky; Burns, Anne

    The handbook is designed as an introductory text on reading instruction for teachers of English as a Second Language. The first chapter explores the nature of reading through a series of activities that help identify the kind of knowledge one draws on and the strategies one uses in reading. Chapter 2 reviews key theories of reading that have…

  4. The diploid genome sequence of an Asian individual

    DEFF Research Database (Denmark)

    Wang, Jun; Wang, Wei; Li, Ruiqiang

    2008-01-01

    Here we present the first diploid genome sequence of an Asian individual. The genome was sequenced to 36-fold average coverage using massively parallel sequencing technology. We aligned the short reads onto the NCBI human reference genome to 99.97% coverage, and guided by the reference genome, we...

  5. Identification of optimum sequencing depth especially for de novo genome assembly of small genomes using next generation sequencing data.

    Science.gov (United States)

    Desai, Aarti; Marwah, Veer Singh; Yadav, Akshay; Jha, Vineet; Dhaygude, Kishor; Bangar, Ujwala; Kulkarni, Vivek; Jere, Abhay

    2013-01-01

    Next Generation Sequencing (NGS) is a disruptive technology that has found widespread acceptance in the life sciences research community. The high throughput and low cost of sequencing has encouraged researchers to undertake ambitious genomic projects, especially in de novo genome sequencing. Currently, NGS systems generate sequence data as short reads and de novo genome assembly using these short reads is computationally very intensive. Due to lower cost of sequencing and higher throughput, NGS systems now provide the ability to sequence genomes at high depth. However, currently no report is available highlighting the impact of high sequence depth on genome assembly using real data sets and multiple assembly algorithms. Recently, some studies have evaluated the impact of sequence coverage, error rate and average read length on genome assembly using multiple assembly algorithms, however, these evaluations were performed using simulated datasets. One limitation of using simulated datasets is that variables such as error rates, read length and coverage which are known to impact genome assembly are carefully controlled. Hence, this study was undertaken to identify the minimum depth of sequencing required for de novo assembly for different sized genomes using graph based assembly algorithms and real datasets. Illumina reads for E.coli (4.6 MB) S.kudriavzevii (11.18 MB) and C.elegans (100 MB) were assembled using SOAPdenovo, Velvet, ABySS, Meraculous and IDBA-UD. Our analysis shows that 50X is the optimum read depth for assembling these genomes using all assemblers except Meraculous which requires 100X read depth. Moreover, our analysis shows that de novo assembly from 50X read data requires only 6-40 GB RAM depending on the genome size and assembly algorithm used. We believe that this information can be extremely valuable for researchers in designing experiments and multiplexing which will enable optimum utilization of sequencing as well as analysis resources.

  6. Dna Sequencing

    Science.gov (United States)

    Tabor, Stanley; Richardson, Charles C.

    1995-04-25

    A method for sequencing a strand of DNA, including the steps off: providing the strand of DNA; annealing the strand with a primer able to hybridize to the strand to give an annealed mixture; incubating the mixture with four deoxyribonucleoside triphosphates, a DNA polymerase, and at least three deoxyribonucleoside triphosphates in different amounts, under conditions in favoring primer extension to form nucleic acid fragments complementory to the DNA to be sequenced; labelling the nucleic and fragments; separating them and determining the position of the deoxyribonucleoside triphosphates by differences in the intensity of the labels, thereby to determine the DNA sequence.

  7. A practical algorithm for finding maximal exact matches in large sequence datasets using sparse suffix arrays

    OpenAIRE

    Khan, Zia; Bloom, Joshua S.; Kruglyak, Leonid; Singh, Mona

    2009-01-01

    Motivation: High-throughput sequencing technologies place ever increasing demands on existing algorithms for sequence analysis. Algorithms for computing maximal exact matches (MEMs) between sequences appear in two contexts where high-throughput sequencing will vastly increase the volume of sequence data: (i) seeding alignments of high-throughput reads for genome assembly and (ii) designating anchor points for genome–genome comparisons.

  8. Open Reading Frame Phylogenetic Analysis on the Cloud

    Directory of Open Access Journals (Sweden)

    Che-Lun Hung

    2013-01-01

    Full Text Available Phylogenetic analysis has become essential in researching the evolutionary relationships between viruses. These relationships are depicted on phylogenetic trees, in which viruses are grouped based on sequence similarity. Viral evolutionary relationships are identified from open reading frames rather than from complete sequences. Recently, cloud computing has become popular for developing internet-based bioinformatics tools. Biocloud is an efficient, scalable, and robust bioinformatics computing service. In this paper, we propose a cloud-based open reading frame phylogenetic analysis service. The proposed service integrates the Hadoop framework, virtualization technology, and phylogenetic analysis methods to provide a high-availability, large-scale bioservice. In a case study, we analyze the phylogenetic relationships among Norovirus. Evolutionary relationships are elucidated by aligning different open reading frame sequences. The proposed platform correctly identifies the evolutionary relationships between members of Norovirus.

  9. Open Reading Frame Phylogenetic Analysis on the Cloud

    Science.gov (United States)

    2013-01-01

    Phylogenetic analysis has become essential in researching the evolutionary relationships between viruses. These relationships are depicted on phylogenetic trees, in which viruses are grouped based on sequence similarity. Viral evolutionary relationships are identified from open reading frames rather than from complete sequences. Recently, cloud computing has become popular for developing internet-based bioinformatics tools. Biocloud is an efficient, scalable, and robust bioinformatics computing service. In this paper, we propose a cloud-based open reading frame phylogenetic analysis service. The proposed service integrates the Hadoop framework, virtualization technology, and phylogenetic analysis methods to provide a high-availability, large-scale bioservice. In a case study, we analyze the phylogenetic relationships among Norovirus. Evolutionary relationships are elucidated by aligning different open reading frame sequences. The proposed platform correctly identifies the evolutionary relationships between members of Norovirus. PMID:23671843

  10. Accelerating the Next Generation Long Read Mapping with the FPGA-Based System.

    Science.gov (United States)

    Chen, Peng; Wang, Chao; Li, Xi; Zhou, Xuehai

    2014-01-01

    To compare the newly determined sequences against the subject sequences stored in the databases is a critical job in the bioinformatics. Fortunately, recent survey reports that the state-of-the-art aligners are already fast enough to handle the ultra amount of short sequence reads in the reasonable time. However, for aligning the long sequence reads (>400 bp) generated by the next generation sequencing (NGS) technology, it is still quite inefficient with present aligners. Furthermore, the challenge becomes more and more serious as the lengths and the amounts of the sequence reads are both keeping increasing with the improvement of the sequencing technology. Thus, it is extremely urgent for the researchers to enhance the performance of the long read alignment. In this paper, we propose a novel FPGA-based system to improve the efficiency of the long read mapping. Compared to the state-of-the-art long read aligner BWA-SW, our accelerating platform could achieve a high performance with almost the same sensitivity. Experiments demonstrate that, for reads with lengths ranging from 512 up to 4,096 base pairs, the described system obtains a 10x -48x speedup for the bottleneck of the software. As to the whole mapping procedure, the FPGA-based platform could achieve a 1.8x -3:3x speedup versus the BWA-SW aligner, reducing the alignment cycles from weeks to days.

  11. Student Aptitudes and Methods of Teaching Beginning Reading: A Predictive Instrument for Determining Interaction Patterns. Final Report.

    Science.gov (United States)

    Stallings, Jane A.; Keepes, Bruce D.

    The question of whether reading methods interact differentially with student sequencing abilities was investigated. One hundred and thirty-one children from three schools in Palo Alto, California, were given reading instruction using a linguistic approach (Palo Alto Reading Program), and 115 children from three Palo Alto schools used a whole-word…

  12. Reading Laboratories: The Conversion of the Speed Reading Lab into an ESL Reading Lab.

    Science.gov (United States)

    Novak, Sigrid Scholtz

    It is proposed that the reading-machine laboratory provides a means for the classroom ESL instructor to continue using his present method in the classroom (intensive, theoretical-grammatical instruction) while providing additional extensive reading and learning practice with the machines in the reading laboratory. Two speed reading systems…

  13. What Are Teenagers Reading? Adolescent Fiction Reading Habits and Reading Choices

    Science.gov (United States)

    Hopper, Rosemary

    2005-01-01

    What are adolescents choosing to read? This is an important question because of potential divergence between school students' reading interests and reading expectations in school. This article considers the findings from a study of the reading over one week in May 2002 of 707 school students aged between 11 and 15, undertaken in 30 schools in the…

  14. Normal and compound poisson approximations for pattern occurrences in NGS reads.

    Science.gov (United States)

    Zhai, Zhiyuan; Reinert, Gesine; Song, Kai; Waterman, Michael S; Luan, Yihui; Sun, Fengzhu

    2012-06-01

    Next generation sequencing (NGS) technologies are now widely used in many biological studies. In NGS, sequence reads are randomly sampled from the genome sequence of interest. Most computational approaches for NGS data first map the reads to the genome and then analyze the data based on the mapped reads. Since many organisms have unknown genome sequences and many reads cannot be uniquely mapped to the genomes even if the genome sequences are known, alternative analytical methods are needed for the study of NGS data. Here we suggest using word patterns to analyze NGS data. Word pattern counting (the study of the probabilistic distribution of the number of occurrences of word patterns in one or multiple long sequences) has played an important role in molecular sequence analysis. However, no studies are available on the distribution of the number of occurrences of word patterns in NGS reads. In this article, we build probabilistic models for the background sequence and the sampling process of the sequence reads from the genome. Based on the models, we provide normal and compound Poisson approximations for the number of occurrences of word patterns from the sequence reads, with bounds on the approximation error. The main challenge is to consider the randomness in generating the long background sequence, as well as in the sampling of the reads using NGS. We show the accuracy of these approximations under a variety of conditions for different patterns with various characteristics. Under realistic assumptions, the compound Poisson approximation seems to outperform the normal approximation in most situations. These approximate distributions can be used to evaluate the statistical significance of the occurrence of patterns from NGS data. The theory and the computational algorithm for calculating the approximate distributions are then used to analyze ChIP-Seq data using transcription factor GABP. Software is available online (www-rcf.usc.edu/

  15. DNA Sequencing Sensors: An Overview

    Directory of Open Access Journals (Sweden)

    Jose Antonio Garrido-Cardenas

    2017-03-01

    Full Text Available The first sequencing of a complete genome was published forty years ago by the double Nobel Prize in Chemistry winner Frederick Sanger. That corresponded to the small sized genome of a bacteriophage, but since then there have been many complex organisms whose DNA have been sequenced. This was possible thanks to continuous advances in the fields of biochemistry and molecular genetics, but also in other areas such as nanotechnology and computing. Nowadays, sequencing sensors based on genetic material have little to do with those used by Sanger. The emergence of mass sequencing sensors, or new generation sequencing (NGS meant a quantitative leap both in the volume of genetic material that was able to be sequenced in each trial, as well as in the time per run and its cost. One can envisage that incoming technologies, already known as fourth generation sequencing, will continue to cheapen the trials by increasing DNA reading lengths in each run. All of this would be impossible without sensors and detection systems becoming smaller and more precise. This article provides a comprehensive overview on sensors for DNA sequencing developed within the last 40 years.

  16. The psychophysiology of reading.

    Science.gov (United States)

    Chiarenza, Giuseppe A; Di Pietro, Sara F; Casarotto, Silvia

    2014-11-01

    Early identification of dyslexia would be fundamental to prevent the negative consequences of delayed treatment in the social, psychological and occupational domains. Movement-related potentials of dyslexic children are characterized by inadequate ability to program movements and reduced capacity to evaluate their performance and to correct their errors. Reading-related potentials recorded during different reading conditions elicit a series of positive and negative components with specific functional meaning and with a characteristic spatial-temporal pattern. These reading-related potentials, when analyzed with sLORETA, show significantly different patterns of activation when comparing self-paced reading aloud to passive viewing of single letters. Comparison of fMRI and sLORETA during both tasks showed that the cortical region with the widest inter-modality similarities is the middle-superior temporal lobe during self-paced reading aloud. Neuropsychological studies have shown the existence of clinical subtypes of dyslexia; these studies have been confirmed by the results of ICA applied to the EEG. Dyslexia can be defined as a disorder of programming and integrating ideokinetic elements, associated with a deficiency in the fast processing and integration of sensory information, with reduced efficiency of error systems analysis. Each of these phenomena occurs at different levels of the central nervous system and at different times.

  17. Reading through Films

    Directory of Open Access Journals (Sweden)

    Madhavi Gayathri Raman

    2016-02-01

    Full Text Available This paper captures the design of a comprehensive curriculum incorporating the four skills based exclusively on the use of parallel audio-visual and written texts. We discuss the use of authentic materials to teach English to Indian undergraduates aged 18 to 20 years. Specifically, we talk about the use of parallel reading (screen-play and audio-visual texts (Shawshank Redemption, and Life is Beautiful, A Few Good Men and Lion King drawn from popular culture in the classroom as an effective teaching medium. Students were gradually introduced to films based on novels with extracts from the original texts (Schindler’s List, Beautiful Mind for extended reading and writing practice. We found that students began to pay more attention to aspects such as pronunciation, intonational variations, discourse markers and vocabulary items (phrasal verbs, synonyms, homophones, and puns. Keywords: Reading, films, popular culture, ESL classroom, language skills

  18. [Reading research articles].

    Science.gov (United States)

    van der Graaf, Yolanda; Zaat, Joost

    2015-01-01

    Keeping up with the latest developments is not easy, but neither is reading articles on research. There are too many medical journals that contain information that is irrelevant to clinical practice. From this mass of articles you have to decide which are important for your own clinical practice and which are not. Most articles naturally fall into the latter category as spectacular findings with important consequences for medical practice do not occur every week. The most important thing in a research article is the research question. If you begin with this, then you can put aside much scientific literature. The methodology section is essential; reading this can save you a lot of time. In this article we take you step-by-step through the process of reading research articles. The articles in our Methodology series can be used as background information. These articles have been combined in a tablet app, which is available via www.ntvg.nl/methodologie.

  19. Saccades and fixations in children with delayed reading skills.

    Science.gov (United States)

    Vinuela-Navarro, Valldeflors; Erichsen, Jonathan T; Williams, Cathy; Woodhouse, J Margaret

    2017-07-01

    Previous studies have reported that eye movements differ between good/average and poor readers. However, these studies have been limited to investigating eye movements during reading related tasks, and thus, the differences found could arise from deficits in higher cognitive processes involved in reading rather than oculomotor performance. The purpose of the study is to determine the extent to which eye movements in children with delayed reading skills are different to those obtained from children with good/average reading skills in non-reading related tasks. After a screening optometric assessment, eye movement recordings were obtained from 120 children without delayed reading skills and 43 children with delayed reading skills (4 to 11 years) using a Tobii TX300 eye tracker. Cartoon characters were presented horizontally from -20° to +20° in steps of 5° to study saccades. An animated stimulus in the centre of the screen was presented for 8 seconds to study fixation stability. Saccadic main sequences, and the number and amplitude of the saccades during fixation were obtained for each participant. Children with delayed reading skills (n = 43) were unmasked after data collection was completed. Medians and quartiles were calculated for each eye movement parameter for children without (n = 120) and with (n = 43) delayed reading skills. Independent t-tests with Bonferroni correction showed no significant differences in any of the saccadic main sequence parameters (Slope, Intercept, A, n and Q ratio) between children without and with delayed reading (p > 0.01). Similarly, no significant differences were found in the number of saccades and their amplitude during the fixation task between the two groups (p > 0.05). Further, none of the gross optometric parameters assessed (visual acuity, refractive error, ocular alignment, convergence, stereopsis and accommodation accuracy) were found to be associated with delayed reading skills (p > 0.05). Eye movements in

  20. Effects of Short Read Quality and Quantity on a de novo Vertebrate Transcriptome Assembly✰

    Science.gov (United States)

    Garcia, T.I.; Shen, Y.; Catchen, J.; Amores, A.; Schartl, M.; Postlethwait, J.; Walter, R. B.

    2011-01-01

    For many researchers, next generation sequencing data holds the key to answering a category of questions previously unassailable. One of the important and challenging steps in achieving these goals is accurately assembling the massive quantity of short sequencing reads into full nucleic acid sequences. For research groups working with non-model or wild systems, short read assembly can pose a significant challenge due to the lack of pre-existing EST or genome reference libraries. While many publications describe the overall process of sequencing and assembly, few address the topic of how many and what types of reads are best for assembly. The goal of this project was use real world data to explore the effects of read quantity and short read quality scores on the resulting de novo assemblies. Using several samples of short reads of various sizes and qualities we produced many assemblies in an automated manner. We observe how the properties of read length, read quality, and read quantity affect the resulting assemblies and provide some general recommendations based on our real-world data set. PMID:21651990

  1. Effects of short read quality and quantity on a de novo vertebrate transcriptome assembly.

    Science.gov (United States)

    Garcia, T I; Shen, Y; Catchen, J; Amores, A; Schartl, M; Postlethwait, J; Walter, R B

    2012-01-01

    For many researchers, next generation sequencing data holds the key to answering a category of questions previously unassailable. One of the important and challenging steps in achieving these goals is accurately assembling the massive quantity of short sequencing reads into full nucleic acid sequences. For research groups working with non-model or wild systems, short read assembly can pose a significant challenge due to the lack of pre-existing EST or genome reference libraries. While many publications describe the overall process of sequencing and assembly, few address the topic of how many and what types of reads are best for assembly. The goal of this project was use real world data to explore the effects of read quantity and short read quality scores on the resulting de novo assemblies. Using several samples of short reads of various sizes and qualities we produced many assemblies in an automated manner. We observe how the properties of read length, read quality, and read quantity affect the resulting assemblies and provide some general recommendations based on our real-world data set.

  2. READING STATISTICS AND RESEARCH

    Directory of Open Access Journals (Sweden)

    Reviewed by Yavuz Akbulut

    2008-10-01

    Full Text Available The book demonstrates the best and most conservative ways to decipher and critique research reports particularly for social science researchers. In addition, new editions of the book are always better organized, effectively structured and meticulously updated in line with the developments in the field of research statistics. Even the most trivial issues are revisited and updated in new editions. For instance, purchaser of the previous editions might check the interpretation of skewness and kurtosis indices in the third edition (p. 34 and in the fifth edition (p.29 to see how the author revisits every single detail. Theory and practice always go hand in hand in all editions of the book. Re-reading previous editions (e.g. third edition before reading the fifth edition gives the impression that the author never stops ameliorating his instructional text writing methods. In brief, “Reading Statistics and Research” is among the best sources showing research consumers how to understand and critically assess the statistical information and research results contained in technical research reports. In this respect, the review written by Mirko Savić in Panoeconomicus (2008, 2, pp. 249-252 will help the readers to get a more detailed overview of each chapters. I cordially urge the beginning researchers to pick a highlighter to conduct a detailed reading with the book. A thorough reading of the source will make the researchers quite selective in appreciating the harmony between the data analysis, results and discussion sections of typical journal articles. If interested, beginning researchers might begin with this book to grasp the basics of research statistics, and prop up their critical research reading skills with some statistics package applications through the help of Dr. Andy Field’s book, Discovering Statistics using SPSS (second edition published by Sage in 2005.

  3. Sequencing of BAC pools by different next generation sequencing platforms and strategies

    Directory of Open Access Journals (Sweden)

    Scholz Uwe

    2011-10-01

    Full Text Available Abstract Background Next generation sequencing of BACs is a viable option for deciphering the sequence of even large and highly repetitive genomes. In order to optimize this strategy, we examined the influence of read length on the quality of Roche/454 sequence assemblies, to what extent Illumina/Solexa mate pairs (MPs improve the assemblies by scaffolding and whether barcoding of BACs is dispensable. Results Sequencing four BACs with both FLX and Titanium technologies revealed similar sequencing accuracy, but showed that the longer Titanium reads produce considerably less misassemblies and gaps. The 454 assemblies of 96 barcoded BACs were improved by scaffolding 79% of the total contig length with MPs from a non-barcoded library. Assembly of the unmasked 454 sequences without separation by barcodes revealed chimeric contig formation to be a major problem, encompassing 47% of the total contig length. Masking the sequences reduced this fraction to 24%. Conclusion Optimal BAC pool sequencing should be based on the longest available reads, with barcoding essential for a comprehensive assessment of both repetitive and non-repetitive sequence information. When interest is restricted to non-repetitive regions and repeats are masked prior to assembly, barcoding is non-essential. In any case, the assemblies can be improved considerably by scaffolding with non-barcoded BAC pool MPs.

  4. Going beyond intensive reading

    Institute of Scientific and Technical Information of China (English)

    George Murdoch

    2010-01-01

    @@ For ELT teachers worldwide,the most convenient way to teach reading is to use the materials provided in the course books selected for the classes they teach.However,these published texts and materials rarely satisfy the motivational and reading development needs of our students.One common problem is that texts in global textbooks often lack relevance to local contexts.They may also seem rather bland because of the writers' concern to make them culturally acceptable in a wide range of countries.

  5. RAMBO-K: Rapid and Sensitive Removal of Background Sequences from Next Generation Sequencing Data.

    Directory of Open Access Journals (Sweden)

    Simon H Tausch

    Full Text Available The assembly of viral or endosymbiont genomes from Next Generation Sequencing (NGS data is often hampered by the predominant abundance of reads originating from the host organism. These reads increase the memory and CPU time usage of the assembler and can lead to misassemblies.We developed RAMBO-K (Read Assignment Method Based On K-mers, a tool which allows rapid and sensitive removal of unwanted host sequences from NGS datasets. Reaching a speed of 10 Megabases/s on 4 CPU cores and a standard hard drive, RAMBO-K is faster than any tool we tested, while showing a consistently high sensitivity and specificity across different datasets.RAMBO-K rapidly and reliably separates reads from different species without data preprocessing. It is suitable as a straightforward standard solution for workflows dealing with mixed datasets. Binaries and source code (java and python are available from http://sourceforge.net/projects/rambok/.

  6. SchemaOnRead Manual

    Energy Technology Data Exchange (ETDEWEB)

    North, Michael J. [Argonne National Lab. (ANL), Argonne, IL (United States)

    2015-09-30

    SchemaOnRead provides tools for implementing schema-on-read including a single function call (e.g., schemaOnRead("filename")) that reads text (TXT), comma separated value (CSV), raster image (BMP, PNG, GIF, TIFF, and JPG), R data (RDS), HDF5, NetCDF, spreadsheet (XLS, XLSX, ODS, and DIF), Weka Attribute-Relation File Format (ARFF), Epi Info (REC), Pajek network (PAJ), R network (NET), Hypertext Markup Language (HTML), SPSS (SAV), Systat (SYS), and Stata (DTA) files. It also recursively reads folders (e.g., schemaOnRead("folder")), returning a nested list of the contained elements.

  7. A Read-Aloud for Science (Read It Aloud).

    Science.gov (United States)

    Richardson, Judy S.; Breen, Margaret

    1996-01-01

    Recommends a young adult read-aloud selection for science classes on endangered species. Describes listening, writing, discussing, investigating, and debating activities capitalizing on this read-aloud. (SR)

  8. A Special Chinese Reading Acceleration Training Paradigm: To Enhance the Reading Fluency and Comprehension of Chinese Children with Reading Disabilities

    Directory of Open Access Journals (Sweden)

    Li Dai

    2016-12-01

    Full Text Available According to a number of studies, use of a Reading Acceleration Program as reading intervention training has been demonstrated to improve reading speed and comprehension level effectively in most languages and countries. The objective of the current study was to provide further evidence of the effectiveness of a Reading Acceleration Program for Chinese children with reading disabilities using a distinctive Chinese reading acceleration training paradigm. The reading acceleration training paradigm is divided into a non-accelerated reading paradigm, a Character-accelerated reading paradigm and a Words-accelerated reading paradigm. The results of training Chinese children with reading disabilities indicate that the acceleration reading paradigm applies to children with Chinese-reading disabilities. In addition, compared with other reading acceleration paradigms, Words- accelerated reading training is more effective in helping children with reading disabilities read at a high speed while maintaining superior comprehension levels.

  9. Developmental, Component-Based Model of Reading Fluency: An Investigation of Predictors of Word-Reading Fluency, Text-Reading Fluency, and Reading Comprehension

    OpenAIRE

    Kim, Young-Suk Grace

    2015-01-01

    The primary goal was to expand our understanding of text reading fluency (efficiency or automaticity)—how its relation to other constructs (e.g., word reading fluency and reading comprehension) changes over time and how it is different from word reading fluency and reading comprehension. We examined (1) developmentally changing relations among word reading fluency, listening comprehension, text reading fluency, and reading comprehension; (2) the relation of reading comprehension to text readi...

  10. Sequencing technologies for animal cell culture research.

    Science.gov (United States)

    Kremkow, Benjamin G; Lee, Kelvin H

    2015-01-01

    Over the last 10 years, 2nd and 3rd generation sequencing technologies have made the use of genomic sequencing within the animal cell culture community increasingly commonplace. Each technology's defining characteristics are unique, including the cost, time, sequence read length, daily throughput, and occurrence of sequence errors. Given each sequencing technology's intrinsic advantages and disadvantages, the optimal technology for a given experiment depends on the particular experiment's objective. This review discusses the current characteristics of six next-generation sequencing technologies, compares the differences between them, and characterizes their relevance to the animal cell culture community. These technologies are continually improving, as evidenced by the recent achievement of the field's benchmark goal: sequencing a human genome for less than $1,000.

  11. Single-primer fluorescent sequencing

    Energy Technology Data Exchange (ETDEWEB)

    Ruth, J.L.; Morgan, C.A.; Middendorf, L.R.; Grone, D.L.; Brumbaugh, J.A.

    1987-05-01

    Modified linker arm oligonucleotides complementary to standard M13 priming sites were synthesized, labelled with either one, two, or three fluoresceins, and purified by reverse-phase HPLC. When used as primers in standard dideoxy M13 sequencing with /sup 32/P-dNTPs, normal autoradiographic patterns were obtained. To eliminate the radioactivity, direct on-line fluorescence detection was achieved by the use of a scanning 10 mW Argon laser emitting 488 nm light. Fluorescent bands were detected directly in standard 0.2 or 0.35 mm thick polyacrylamide gels at a distance of 24 cm from the loading wells by a photomultiplier tube filtered at 520 nm. Horizontal and temporal location of each band was displayed by computer as a band in real time, providing visual appearance similar to normal 4-lane autoradiograms. Using a single primer labelled with two fluoresceins, sequences of between 500 and 600 bases have been read in a single loading with better than 98% accuracy; up to 400 bases can be read reproducibly with no errors. More than 50 sequences have been determined by this method. This approach requires only 1-2 ug of cloned template, and produces continuous sequence data at about one band per minute.

  12. TEACHERS’ BELIEFS ABOUT READING AND USE OF READING STRATEGIES

    OpenAIRE

    2014-01-01

    The aim of this article is to place the focus on teachers’ beliefs about reading and reading strategies to the purpose of emphasizing the im portance of reading strategies in the reading process. The method of study is analytic analysis of teachers’ beliefs obtained through ques tionnaires delivered to 18 English language teachers of elementary, secondary and high level education in the region of Saranda in lbania. The results of the study pointed to a great concordance between teach ers’ bel...

  13. Reading the Right Way.

    Science.gov (United States)

    Honig, Bill

    1997-01-01

    Extensive research and practical experience demonstrate that learning to read comes less naturally than learning to speak. Although half of all children intuit the alphabetic system from exposure to print and context-driven activities, many (particularly dyslexic, low-socioeconomic, and second-language kids) need an organized program that teaches…

  14. Rules or Reading?

    Science.gov (United States)

    Ruefle, Anne E.

    2011-01-01

    Research shows that an important step in the development of readers is having access to books. Steven Krashen, in "The Power of Reading," cites multiple studies that demonstrate access to books is crucial in developing strong readers: "the richer the print environment, the better the literacy development." Limiting access runs counter to research…

  15. Personal Achievement Reading: Business.

    Science.gov (United States)

    Swinton, Janet R.

    Exercises are provided in this set of four workbooks designed to aid students in business programs in building vocabulary and reading skills. Each workbook borrows from business terminology to provide explanations and exercises for a sequential series of instructional objectives. One workbook concentrates on developing the ability to determine…

  16. Teaching reading comprehension strategies

    Directory of Open Access Journals (Sweden)

    Majlinda Lika

    2017-03-01

    Full Text Available The academic debate nowadays is focused on producing an applied science of learning, aiming to teach students how to learn and be strategic in their acquisition. The aim of the study is to identify and discuss the reading comprehension instruction approach applied in the Albanian system of education. Findings from 10 classes of Albanian language and literature with students of third grade were directly observed and analyzed, in order to gather evidence based on indicators and instruments that assess the way of reading comprehension. Findings were categorized according to strategy use; the frequency of their application in different classes was counted and represented in percentages. In this paper we will try to respond to questions like: What are students' main barriers of comprehending? Does the instructional approach respond to students’ needs and level of comprehension? Are teachers prepared to teach comprehension strategies? Furthermore, examples of procedures on how to deliver instruction of comprehension strategies in natural contexts will be represented. Results from teacher practices during lessons of reading comprehension confirmed that teachers use limited teaching strategies to deliver lessons. They mainly use strategies to test comprehension; while the approach of teaching students to read independently and strategically is an unknown practice.

  17. Reading Angles in Maps

    Science.gov (United States)

    Izard, Véronique; O'Donnell, Evan; Spelke, Elizabeth S.

    2014-01-01

    Preschool children can navigate by simple geometric maps of the environment, but the nature of the geometric relations they use in map reading remains unclear. Here, children were tested specifically on their sensitivity to angle. Forty-eight children (age 47:15-53:30 months) were presented with fragments of geometric maps, in which angle sections…

  18. Readings in Cooperative Education.

    Science.gov (United States)

    Leventhal, Jerome I.

    Twenty-three journal articles on cooperative education were selected in a review of the literature by two Temple University graduate classes in the fall of 1975 and the spring of 1976 for those interested in the role of coordinating cooperative education programs. The journal readings consist of articles on theory/planning (6), implementation…

  19. Speaking of Reading.

    Science.gov (United States)

    Rosenthal, Nadine

    Written in the tradition of Studs Terkel, this book presents oral histories of 77 diverse readers (from avid to infrequent readers) about how reading affects their lives. Sprinkled throughout the book are narratives of nationally recognized personalities, such as Maxine Hong-Kingston, Robert MacNeil, Gloria Steinem, Linus Pauling, Julie Harris,…

  20. Scaffolding Reading Comprehension Skills

    Science.gov (United States)

    Salem, Ashraf Atta Mohamed Safein

    2017-01-01

    The current study investigates whether English language teachers use scaffolding strategies for developing their students' reading comprehension skills or just for assessing their comprehension. It also tries to demonstrate whether teachers are aware of these strategies or they use them as a matter of habit. A questionnaire as well as structured…

  1. Painless reading comprehension

    CERN Document Server

    Jones, EdD, Darolyn "Lyn"

    2016-01-01

    Reading comprehension gets easier as students learn what kind of reader they are, discover how to keep facts in their head, and much more. Bonus Online Component: includes additional games, including Beat the Clock, a line match game, and a word scramble.

  2. Books for Summer Reading.

    Science.gov (United States)

    Phi Delta Kappan, 2000

    2000-01-01

    Recommends leisurely reading for teachers: biographies on St. Augustine and Charles Lindbergh; novels by Edwidge Danticat, Kate Chopin, and Velma Allis; Edward Tufte's three volumes on the visual presentation of information; Jean Vanier's "Becoming Human;" the Harry Potter series, and Michael Tolkin's novel "The Player." (MLH)

  3. Developing Critical Reading Practices.

    Science.gov (United States)

    Clark, Romy J.

    1994-01-01

    Discusses the limitations of dominant notions of background knowledge (BGK) as neutral information and objective knowledge and of the reading techniques that accompany these notions. The article argues that the notion of BGK should be broadened to include awareness of the social processes of production and interpretation of text. (26 references)…

  4. Books for Summer Reading.

    Science.gov (United States)

    Phi Delta Kappan, 1995

    1995-01-01

    Recommends many books for summer reading enjoyment, concentrating on historical and contemporary fiction. Different cultures clash in William T. Vollman's "Fathers and Crows" and John Demos's adventuresome "Unredeemed Captive." Other suggestions: "Snow Falling on Cedar Mountain" (David Gutterman) and "Foxfire" (Joyce Carol Oates). For professional…

  5. Literature and Reading.

    Science.gov (United States)

    French, Michael P., Ed.; Elford, Shirley J., Ed.

    1986-01-01

    The articles in this themed issue focus on the use of real literature in reading instruction. Following a comment from the editor, the titles of the articles and their authors are as follows: (1) "Understanding Poetry: Questions to Consider" (Nancy Wiseman Seminoff); (2) "Experiencing a Novel: The Short and Long of It" (David M. Bishop and…

  6. Books for Summer Reading.

    Science.gov (United States)

    Phi Delta Kappan, 1993

    1993-01-01

    Recommends fine fiction for summer reading, including Nadine Gordimer's "My Son's Story" (1991), Lillian Smith's "Strange Fruit" (1944), Josephine Hart's "Damage" (1991), Jane Smiley's "A Thousand Acres" (1991), and George Eliot's "Middlemarch" (1874). Nonfiction suggestions include Harlan Lane's "Mask of Benevolence" (1992), Diane Ackerman's "A…

  7. Time for Reading

    Science.gov (United States)

    Waters, Lindsay

    2007-01-01

    Over the last 50 years, certain ideas have become dominant that make learning to read different than it once was than the ideas that children are neurologically "wired" to use language "competently" in certain ways. Noam Chomsky has promoted the idea that there are certain "syntactic structures" hard-wired in the human brain. That view, the author…

  8. Time for Reading

    Science.gov (United States)

    Waters, Lindsay

    2007-01-01

    Over the last 50 years, certain ideas have become dominant that make learning to read different than it once was than the ideas that children are neurologically "wired" to use language "competently" in certain ways. Noam Chomsky has promoted the idea that there are certain "syntactic structures" hard-wired in the human brain. That view, the author…

  9. Teaching Reading through Writing

    Science.gov (United States)

    Takala, Marjatta

    2013-01-01

    This article discusses a teaching method called reading through writing (RtW), based on the use of computers rather than handwriting. The pupils use the computers in pairs and decide themselves what they will write about. The use of this method is studied via a questionnaire to 22 teachers and via seven Master's and two Bachelor's theses,…

  10. Books for Summer Reading.

    Science.gov (United States)

    [Editors

    2001-01-01

    Teachers and education professors suggest various nonfiction and fiction books for summer reading enjoyment, from Robert Putnam's "Bowling Alone," C.A. Bowers's "Let Them Eat Data," and Larry McMurtry's "Roads: Driving America's Great Highways" to Kent Hauf's "Plainsong, J.M. Coetzee's "Disgrace," and Michael Cunningham's "The Hours." (MLH)

  11. Impact of the Reading Buddies Program on Reading Level and Attitude Towards Reading

    Directory of Open Access Journals (Sweden)

    Hayley Dolman

    2013-03-01

    Full Text Available Objective – This research examines the Reading Buddies program at the Grande Prairie Public Library, which took place in July and August of 2011 and 2012. The Reading Buddies program pairs lower elementary students with teen volunteers for reading practice over the summer. The aim of the study was to discover how much impact the program would have on participating children’s reading levels and attitudes towards reading.Methods – During the first and last sessions of the Reading Buddies program, the participants completed the Elementary Reading Attitudes Survey (ERAS and the Graded Word Recognition Lists from the Bader Reading and Language Inventory (6th ed., 2008.Participants were also asked for their grade and sex, and the program coordinator kept track of attendance. Results – There were 37 Reading Buddies participants who completed both the pre- and post-tests for the study. On average, the program had a small positive effect on participants’ reading levels and a small negative effect on their attitudes towards reading. There was a larger range of changes to the ERAS scores than to the reading test scores, but most participants’ scores did not change dramatically on either measure.Conclusions – Although findings are limited by the small size of the data-set, results indicate that many of the Reading Buddies participants maintained their reading level over the summer and had a similar attitude towards reading at the end of the program. On average, reading levels increased slightly and attitudes towards reading were slightly more negative. Many factors could not be taken into account during the study (e.g., the amount of reading done at home. A study with a control group that did not participate in the program could help to assess whether the program helped to combat summer learning loss.

  12. Whole genome complete resequencing of Bacillus subtilis natto by combining long reads with high-quality short reads.

    Directory of Open Access Journals (Sweden)

    Mayumi Kamada

    Full Text Available De novo microbial genome sequencing reached a turning point with third-generation sequencing (TGS platforms, and several microbial genomes have been improved by TGS long reads. Bacillus subtilis natto is closely related to the laboratory standard strain B. subtilis Marburg 168, and it has a function in the production of the traditional Japanese fermented food "natto." The B. subtilis natto BEST195 genome was previously sequenced with short reads, but it included some incomplete regions. We resequenced the BEST195 genome using a PacBio RS sequencer, and we successfully obtained a complete genome sequence from one scaffold without any gaps, and we also applied Illumina MiSeq short reads to enhance quality. Compared with the previous BEST195 draft genome and Marburg 168 genome, we found that incomplete regions in the previous genome sequence were attributed to GC-bias and repetitive sequences, and we also identified some novel genes that are found only in the new genome.

  13. Emerging applications of read profiles towards the functional annotation of the genome

    DEFF Research Database (Denmark)

    Pundhir, Sachin; Poirazi, Panayiota; Gorodkin, Jan

    2015-01-01

    to the research question addressed. Several strategies have been employed at varying levels of abstraction ranging from a somewhat ad hoc to a more systematic analysis of read profiles. These include methods which can compare read profiles, e.g., from direct (non-sequence based) alignments to classification...... is typically a result of the protocol designed to address specific research questions. The sequencing results in reads, which when mapped to a reference genome often leads to the formation of distinct patterns (read profiles). Interpretation of these read profiles is essential for their analysis in relation...... of patterns into functional groups. In this review, we highlight the emerging applications of read profiles for the annotation of non-coding RNA and cis-regulatory elements (CREs) such as enhancers and promoters. We also discuss the biological rationale behind their formation....

  14. Verbal Coding Efficiency, Conceptually Guided Reading, and Reading Failure.

    Science.gov (United States)

    Perfetti, Charles A.

    Word recognition and reading failure are examined in this report using an interactive framework of the reading process based on the premise that reading is both a top-down and a bottom-up process, both conceptually guided and graphically based. Experiments are discussed that show that less-skilled readers are affected by anomalous contexts and…

  15. Early Reading Intervention by Means of a Multicomponent Reading Game

    Science.gov (United States)

    van de Ven, M.; de Leeuw, L.; van Weerdenburg, M.; Steenbeek-Planting, E. G.

    2017-01-01

    This study examined the effects of an intervention with a multicomponent reading game on the development of reading skills in 60 Dutch primary school children with special educational needs. The game contains evidence-based reading exercises and is based on principles of applied gaming. Using a multiple baseline approach, we tested children's…

  16. Early reading intervention by means of a multicomponent reading game

    NARCIS (Netherlands)

    Ven, M.A.M. van de; Leeuw, L.C. de; Weerdenburg, M.W.C. van; Steenbeek-Planting, E.G.

    2017-01-01

    This study examined the effects of an intervention with a multicomponent reading game on the development of reading skills in 60 Dutch primary school children with special educational needs. The game contains evidence-based reading exercises and is based on principles of applied gaming. Using a

  17. The Importance of Metacognitive Reading Strategy Awareness in Reading Comprehension

    Science.gov (United States)

    Ahmadi, Mohammad Reza; Ismail, Hairul Nizam; Abdullah, Muhammad Kamarul Kabilan

    2013-01-01

    Metacognitive reading strategy awareness plays a significant role in reading comprehension and educational process. In spite of its importance, metacognitive strategy has long been the ignored skill in English language teaching, research, learning, and assessment. This lack of good metacognitive reading strategy skill is exacerbated by the central…

  18. How do children read words? A focus on reading processes

    NARCIS (Netherlands)

    M. van den Boer

    2014-01-01

    Being able to read is very important in our literate society. Many studies, therefore, have examined children’s reading skills to improve our understanding of reading development. In general, there have been two types of studies. On the one hand, there is a line of research that focuses on the devel

  19. The Effects of Oral and Silent Reading on Reading Comprehension

    Science.gov (United States)

    Schimmel, Naomi; Ness, Molly

    2017-01-01

    This study examined the effects of reading mode (oral and silent) and text genre (narrative and expository) on fourth graders' reading comprehension. While controlling for prior reading ability of 48 participants, we measured comprehension. Using a repeated measured design, data were analyzed using analysis of covariance, paired t-tests, and…

  20. Fluency and reading comprehension in students with reading difficulties.

    Science.gov (United States)

    Nascimento, Tânia Augusto; Carvalho, Carolina Alves Ferreira de; Kida, Adriana de Souza Batista; Avila, Clara Regina Brandão de

    2011-12-01

    To characterize the performance of students with reading difficulties in decoding and reading comprehension tasks as well as to investigate the possible correlations between them. Sixty students (29 girls) from 3rd to 5th grades of public Elementary Schools were evaluated. Thirty students (Research Group - RG), ten from each grade, were nominated by their teachers as presenting evidences of learning disabilities. The other thirty students were indicated as good readers, and were matched by gender, age and grade to the RG, composing the Comparison Group (CG). All subjects were assessed regarding the parameters of reading fluency (rate and accuracy in words, pseudowords and text reading) and reading comprehension (reading level, number and type of ideas identified, and correct responses on multiple choice questions). The RG presented significantly lower scores than the CG in fluency and reading comprehension. Different patterns of positive and negative correlations, from weak to excellent, among the decoding and comprehension parameters were found in both groups. In the RG, low values of reading rate and accuracy were observed, which were correlated to low scores in comprehension and improvement in decoding, but not in comprehension, with grade increase. In CG, correlation was found between different fluency parameters, but none of them was correlated to the reading comprehension variables. Students with reading and writing difficulties show lower values of reading fluency and comprehension than good readers. Fluency and comprehension are correlated in the group with difficulties, showing that deficits in decoding influence reading comprehension, which does not improve with age increase.

  1. Clarifying Differences between Reading Skills and Reading Strategies

    Science.gov (United States)

    Afflerbach, Peter; Pearson, P. David; Paris, Scott G.

    2008-01-01

    The terms "reading skill" and "reading strategy" are central to how we conceptualize and teach reading. Despite their importance and widespread use, the terms are not consistently used or understood. This article examines the current and historical uses of the terms, defines them, and describes their differences, similarities, and relationships.…

  2. Reading Proficiency and a Psycholinguistic Approach to Second Language Reading.

    Science.gov (United States)

    Woytak, Lidia

    1984-01-01

    Presents psycholinguistic views of second language reading which see reading comprehension as a result of an interaction among three factors: higher-level conceptual abilities, background knowledge, and process strategies. Discusses kinds of reading to teach and kinds of texts and materials to select for different proficiency levels and given…

  3. Does Early Reading Failure Decrease Children's Reading Motivation?

    Science.gov (United States)

    Morgan, Paul L.; Fuchs, Douglas; Compton, Donald L.; Cordray, David S.; Fuchs, Lynn S.

    2008-01-01

    The authors used a pretest-posttest control group design with random assignment to evaluate whether early reading failure decreases children's motivation to practice reading. First, they investigated whether 60 first-grade children would report substantially different levels of interest in reading as a function of their relative success or failure…

  4. College-Adult Reading Instruction. Perspectives in Reading , No. 1.

    Science.gov (United States)

    Leedy, Paul D., Ed.

    Papers dealing with topics relating to college and adult reading instruction and discussions of these papers by reading authorities who offer differing viewpoints are presented. Subjects treated include humanistic aspects of reading; materials and methods in use; current and future programs; programs operated by business and industry; illiteracy;…

  5. Read-Alouds: Do They Enhance Students' Ability To Read?

    Science.gov (United States)

    Terblanche, Leezill

    Teachers can greatly extend a child's literacy development through the use of interactive read-alouds. When a story is read aloud to children a number of opportunities arise for extended activities that are related to the story and further literacy support. Children are able to learn about literacy through an adult modeling good reading behavior.…

  6. Pleasure Reading Cures Readicide and Facilitates Academic Reading

    Science.gov (United States)

    Jennifer, J. Mary; Ponniah, R. Joseph

    2015-01-01

    Pleasure reading is an absolute choice to eradicate readicide, a systematic killing of the love for reading. This paper encompasses the different forms and consequences of readicide which will have negative impact not only on comprehension but also on the prior knowledge of a reader. Reading to score well on tests impedes the desire for reading…

  7. Reading by Design: Two Case Studies of Digital Reading Practices

    Science.gov (United States)

    Rowsell, Jennifer; Burke, Anne

    2009-01-01

    The digital reading practices of two middle school students in US and Canadian contexts are examined. Using a multimodal discourse framework, the authors contemplate what digital reading practice is and distinctive practices of reading texts online compared with printed, school-based literacy practices. By focusing on two different genres of…

  8. Predicting FCAT Reading Scores Using the Reading-Level Indicator

    Science.gov (United States)

    Stanley, Nile; Stanley, Laurel

    2011-01-01

    Multiple regression analysis indicates that the Reading-Level Indicator, a paper-and-pencil test, is a moderately strong predictor for the high-stakes standardized test, the Florida Comprehensive Achievement Test in Reading. Classroom teachers can administer the inexpensive Reading-Level Indicator in a short period of time and use the results as a…

  9. Reading for Real: Our Year with Reading Buddies

    Science.gov (United States)

    Ross, Patricia

    2014-01-01

    When Patricia Ross' high school students at the Phoenix Day School for the Deaf buddied up with elementary school students to improve their reading skills, amazing things happened. As they read to them, Ross' students, who were part of the 2012-2013 Integrated Language Arts and Social Studies program, increased their reading scores and forged…

  10. How do children read words? A focus on reading processes

    NARCIS (Netherlands)

    van den Boer, M.

    2014-01-01

    Being able to read is very important in our literate society. Many studies, therefore, have examined children’s reading skills to improve our understanding of reading development. In general, there have been two types of studies. On the one hand, there is a line of research that focuses on the

  11. A Probabilistic Genome-Wide Gene Reading Frame Sequence Model

    DEFF Research Database (Denmark)

    Have, Christian Theil; Mørk, Søren

    using the probabilistic logic programming language and machine learning system PRISM - a fast and efficient model prototyping environment, using bacterial gene finding performance as a benchmark of signal strength. The model is used to prune a set of gene predictions from an underlying gene finder...

  12. Problems Brought about by "Reading" a Sequence of Pictures.

    Science.gov (United States)

    Bornens, Marie-Therese

    1990-01-01

    A study investigated 4 problems of children between 3 and 7.5 years of age: difficulty in seeing the same character in different representations; the process of linking several pictures into 1 story; the correlation between the temporal order and spatial disposition of pictures; and the tendency to consider the setting of pictures as a puzzle to…

  13. Cultural Schema and Reading Comprehension

    Institute of Scientific and Technical Information of China (English)

    TanFuhong

    2004-01-01

    This paper is mainly focused on the examination of the role of cultural schema in readinghow the cultural schemacomprehension, in particular,helps or impedes reading comprehension; most important of all, the implications for teaching reading in China.

  14. General ideas on English reading

    Institute of Scientific and Technical Information of China (English)

    祝文瑛

    2015-01-01

    This paper is talking about some general ideas on English reading in order to help teachers and learners foster some reading skills by seeing through its natures aiming at effective and productive English teaching and learning.

  15. Selected Readings in Genetic Engineering

    Science.gov (United States)

    Mertens, Thomas R.; Robinson, Sandra K.

    1973-01-01

    Describes different sources of readings for understanding issues and concepts of genetic engineering. Broad categories of reading materials are: concerns about genetic engineering; its background; procedures; and social, ethical and legal issues. References are listed. (PS)

  16. Reading: Tired of Round Robin?

    Science.gov (United States)

    Indrisano, Roselmina

    1975-01-01

    "Oral reading is a process abundant in potential for teacher and learner" says the author, in response to a reader's question. Here she sets down some strategies to help you with a more effective reading program. (Editor/RK)

  17. Main: Sequences [KOME

    Lifescience Database Archive (English)

    Full Text Available Sequences Nucleotide Sequence Nucleotide sequence of full length cDNA (trimmed sequence) kome_ine_full_seq...uence_db.fasta.zip kome_ine_full_sequence_db.zip kome_ine_full_sequence_db ...

  18. Objective and Comprehensive Evaluation of Bisulfite Short Read Mapping Tools

    Directory of Open Access Journals (Sweden)

    Hong Tran

    2014-01-01

    Full Text Available Background. Large-scale bisulfite treatment and short reads sequencing technology allow comprehensive estimation of methylation states of Cs in the genomes of different tissues, cell types, and developmental stages. Accurate characterization of DNA methylation is essential for understanding genotype phenotype association, gene and environment interaction, diseases, and cancer. Aligning bisulfite short reads to a reference genome has been a challenging task. We compared five bisulfite short read mapping tools, BSMAP, Bismark, BS-Seeker, BiSS, and BRAT-BW, representing two classes of mapping algorithms (hash table and suffix/prefix tries. We examined their mapping efficiency (i.e., the percentage of reads that can be mapped to the genomes, usability, running time, and effects of changing default parameter settings using both real and simulated reads. We also investigated how preprocessing data might affect mapping efficiency. Conclusion. Among the five programs compared, in terms of mapping efficiency, Bismark performs the best on the real data, followed by BiSS, BSMAP, and finally BRAT-BW and BS-Seeker with very similar performance. If CPU time is not a constraint, Bismark is a good choice of program for mapping bisulfite treated short reads. Data quality impacts a great deal mapping efficiency. Although increasing the number of mismatches allowed can increase mapping efficiency, it not only significantly slows down the program, but also runs the risk of having increased false positives. Therefore, users should carefully set the related parameters depending on the quality of their sequencing data.

  19. Reading Speed as a Constraint of Accuracy of Self-Perception of Reading Skill

    Science.gov (United States)

    Kwon, Heekyung; Linderholm, Tracy

    2015-01-01

    We hypothesised that college students take reading speed into consideration when evaluating their own reading skill, even if reading speed does not reliably predict actual reading skill. To test this hypothesis, we measured self-perception of reading skill, self-perception of reading speed, actual reading skill and actual reading speed to…

  20. Reading Speed as a Constraint of Accuracy of Self-Perception of Reading Skill

    Science.gov (United States)

    Kwon, Heekyung; Linderholm, Tracy

    2015-01-01

    We hypothesised that college students take reading speed into consideration when evaluating their own reading skill, even if reading speed does not reliably predict actual reading skill. To test this hypothesis, we measured self-perception of reading skill, self-perception of reading speed, actual reading skill and actual reading speed to…

  1. So Much to Read, So Little Time: How Do We Read, and Can Speed Reading Help?

    Science.gov (United States)

    Rayner, Keith; Schotter, Elizabeth R; Masson, Michael E J; Potter, Mary C; Treiman, Rebecca

    2016-05-01

    The prospect of speed reading--reading at an increased speed without any loss of comprehension--has undeniable appeal. Speed reading has been an intriguing concept for decades, at least since Evelyn Wood introduced her Reading Dynamics training program in 1959. It has recently increased in popularity, with speed-reading apps and technologies being introduced for smartphones and digital devices. The current article reviews what the scientific community knows about the reading process--a great deal--and discusses the implications of the research findings for potential students of speed-reading training programs or purchasers of speed-reading apps. The research shows that there is a trade-off between speed and accuracy. It is unlikely that readers will be able to double or triple their reading speeds (e.g., from around 250 to 500-750 words per minute) while still being able to understand the text as well as if they read at normal speed. If a thorough understanding of the text is not the reader's goal, then speed reading or skimming the text will allow the reader to get through it faster with moderate comprehension. The way to maintain high comprehension and get through text faster is to practice reading and to become a more skilled language user (e.g., through increased vocabulary). This is because language skill is at the heart of reading speed.

  2. The Explicit Instruction of Reading Strategies: Directed Reading Thinking Activity vs. Guided Reading Strategies

    Directory of Open Access Journals (Sweden)

    Mohammad Mehdi Yazdani

    2015-05-01

    Full Text Available Investigating the efficiencies and deficiencies of reading strategies is one of the noticeable issues in the related theory and research in reading comprehension instruction. This study was to examine the impact of Directed Reading Thinking Activity (DRTA and Guided Reading (GR on reading comprehension. Sixty three Iranian students of grade one in Shahed high school in the city of Bojnourd took part in the study. They were assigned in three groups, one control and two experimental groups. The instruction lasted for ten weeks. This study utilized a pretest posttest control group in quantitative quasi- experimental design. The same reading comprehension test was administered as pre-test and post-test. The results were twofold: First, the instruction of learning strategies could foster reading comprehension skill. Second, while the explicit instruction of both strategies could improve the students' reading comprehension skill, Directed Reading Thinking Activity had a more significant positive effect than Guided Reading. Keywords: reading strategy, explicit, directed reading thinking activity (DRTA, guided reading (GR

  3. Improving Students' English Reading Skills

    Institute of Scientific and Technical Information of China (English)

    段士爱

    2009-01-01

    This paper aims to help students who study English as a second language improve reading skills.Based on the reading theoretical foundation.some problems which often encountered during the process of reading will be discussed.Finally,a conclusion as to how to solve these problems will be made.

  4. Reading Processes and Parenting Styles

    Science.gov (United States)

    Carreteiro, Rui Manuel; Justo, João Manuel; Figueira, Ana Paula

    2016-01-01

    Home literacy environment explains between 12 and 18.5% of the variance of children's language skills. Although most authors agree that children whose parents encourage them to read tend to develop better and earlier reading skills, some authors consider that the impact of family environment in reading skills is overvalued. Probably, other…

  5. Reading Processes and Parenting Styles

    Science.gov (United States)

    Carreteiro, Rui Manuel; Justo, João Manuel; Figueira, Ana Paula

    2016-01-01

    Home literacy environment explains between 12 and 18.5% of the variance of children's language skills. Although most authors agree that children whose parents encourage them to read tend to develop better and earlier reading skills, some authors consider that the impact of family environment in reading skills is overvalued. Probably, other…

  6. Investigating Gender Differences in Reading

    Science.gov (United States)

    Logan, Sarah; Johnston, Rhona

    2010-01-01

    Girls consistently outperform boys on tests of reading comprehension, although the reason for this is not clear. In this review, differences between boys and girls in areas relating to reading will be investigated as possible explanations for consistent gender differences in reading attainment. The review will examine gender differences within the…

  7. Reading comprehension in Parkinson's disease.

    Science.gov (United States)

    Murray, Laura L; Rutledge, Stefanie

    2014-05-01

    Although individuals with Parkinson's disease (PD) self-report reading problems and experience difficulties in cognitive-linguistic functions that support discourse-level reading, prior research has primarily focused on sentence-level processing and auditory comprehension. Accordingly, the authors investigated the presence and nature of reading comprehension in PD, hypothesizing that (a) individuals with PD would display impaired accuracy and/or speed on reading comprehension tests and (b) reading performances would be correlated with cognitive test results. Eleven adults with PD and 9 age- and education-matched control participants completed tests that evaluated reading comprehension; general language and cognitive abilities; and aspects of attention, memory, and executive functioning. The PD group obtained significantly lower scores on several, but not all, reading comprehension, language, and cognitive measures. Memory, language, and disease severity were significantly correlated with reading comprehension for the PD group. Individuals in the early stages of PD without dementia or broad cognitive deficits can display reading comprehension difficulties, particularly for high- versus basic-level reading tasks. These reading difficulties are most closely related to memory, high-level language, and PD symptom severity status. The findings warrant additional research to delineate further the types and nature of reading comprehension impairments experienced by individuals with PD.

  8. Neurological Implications of Reading Disability.

    Science.gov (United States)

    Richards, Edith G.

    1981-01-01

    A review of studies into the neurological aspects of reading disabilities indicates that two positions have been taken with regard to the brain and reading: (1) language skills are generally considered to be the function of the left hemisphere of the brain; and (2) very poor reading may be related to bilateral spatial processing for both boys and…

  9. Content Schemata and Reading Comprehension

    Institute of Scientific and Technical Information of China (English)

    LiKe

    2004-01-01

    For students of non-English majors in China, reading ability has been considered as one of the most important skills that they should acquire. However, teachers of English often complain that students reading in English seem to read with less comprehension and slower speed than expected. It is true that their failure is due to inadequate knowledge of vocabulary and

  10. Teaching Reading: Research into Practice

    Science.gov (United States)

    Macalister, John

    2014-01-01

    In pre-service and in-service language teacher education, and in curriculum-related projects in second and foreign language settings, a recurrent issue is the failure to relate the teaching of reading to reading as a meaning-making activity. In this paper, I will consider what current research on second language (L2) reading has actually succeeded…

  11. Awareness Development for Online Reading

    Science.gov (United States)

    Zenotz, Victoria

    2012-01-01

    In a world in which online reading is becoming increasingly common and, as a consequence, online literacy more and more necessary, this paper focuses on possibility of training L2 (second language) readers to bridge the gap between paper reading and online reading. Many researchers believe metacognitive awareness to be the most important ability…

  12. The Unexplained Nature of Reading

    Science.gov (United States)

    Adelman, James S.; Marquis, Suzanne J.; Sabatos-DeVito, Maura G.; Estes, Zachary

    2013-01-01

    The effects of properties of words on their reading aloud response times (RTs) are 1 major source of evidence about the reading process. The precision with which such RTs could potentially be predicted by word properties is critical to evaluate our understanding of reading but is often underestimated due to contamination from individual…

  13. Improve Reading with Complex Texts

    Science.gov (United States)

    Fisher, Douglas; Frey, Nancy

    2015-01-01

    The Common Core State Standards have cast a renewed light on reading instruction, presenting teachers with the new requirements to teach close reading of complex texts. Teachers and administrators should consider a number of essential features of close reading: They are short, complex texts; rich discussions based on worthy questions; revisiting…

  14. Neurological Aspects of Reading Disability.

    Science.gov (United States)

    Nelson, Louis R.

    The author, a neurologist, looks at the nature of reading disabilities. He suggests that many reading disabilities are the result of normal constitutional differences and that the term "minimal brain dysfunction" is rarely appropriate and does not help the remediation process. Noted are various theories which relate neurology and reading ability.…

  15. The "RAP" on Reading Comprehension

    Science.gov (United States)

    Hagaman, Jessica L.; Luschen, Kati; Reid, Robert

    2010-01-01

    Reading problems are one of the most frequent reasons students are referred for special education services and the disparity between students with reading difficulties and those who read successfully appears to be increasing. As a result, there is now an emphasis on early intervention programs such as RTI. In many cases, early intervention in…

  16. Draft Genome Sequence of Lactobacillus rhamnosus 2166.

    OpenAIRE

    Karlyshev, Andrey V.; Melnikov, Vyacheslav G.; Kosarev, Igor V.; Abramov, Vyacheslav M.

    2014-01-01

    In this report, we present a draft sequence of the genome of Lactobacillus rhamnosus strain 2166, a potential novel probiotic. Genome annotation and read mapping onto a reference genome of L. rhamnosus strain GG allowed for the identification of the differences and similarities in the genomic contents and gene arrangements of these strains.

  17. Probabilistic error correction for RNA sequencing.

    Science.gov (United States)

    Le, Hai-Son; Schulz, Marcel H; McCauley, Brenna M; Hinman, Veronica F; Bar-Joseph, Ziv

    2013-05-01

    Sequencing of RNAs (RNA-Seq) has revolutionized the field of transcriptomics, but the reads obtained often contain errors. Read error correction can have a large impact on our ability to accurately assemble transcripts. This is especially true for de novo transcriptome analysis, where a reference genome is not available. Current read error correction methods, developed for DNA sequence data, cannot handle the overlapping effects of non-uniform abundance, polymorphisms and alternative splicing. Here we present SEquencing Error CorrEction in Rna-seq data (SEECER), a hidden Markov Model (HMM)-based method, which is the first to successfully address these problems. SEECER efficiently learns hundreds of thousands of HMMs and uses these to correct sequencing errors. Using human RNA-Seq data, we show that SEECER greatly improves on previous methods in terms of quality of read alignment to the genome and assembly accuracy. To illustrate the usefulness of SEECER for de novo transcriptome studies, we generated new RNA-Seq data to study the development of the sea cucumber Parastichopus parvimensis. Our corrected assembled transcripts shed new light on two important stages in sea cucumber development. Comparison of the assembled transcripts to known transcripts in other species has also revealed novel transcripts that are unique to sea cucumber, some of which we have experimentally validated. Supporting website: http://sb.cs.cmu.edu/seecer/.

  18. Draft Genome Sequence of Lactobacillus rhamnosus 2166.

    OpenAIRE

    Karlyshev, Andrey V.; Melnikov, Vyacheslav G.; Kosarev, Igor V.; Abramov, Vyacheslav M.

    2014-01-01

    In this report, we present a draft sequence of the genome of Lactobacillus rhamnosus strain 2166, a potential novel probiotic. Genome annotation and read mapping onto a reference genome of L. rhamnosus strain GG allowed for the identification of the differences and similarities in the genomic contents and gene arrangements of these strains.

  19. Relations Among Oral Reading Fluency, Silent Reading Fluency, and Reading Comprehension: A Latent Variable Study of First-Grade Readers

    OpenAIRE

    Y. S. Kim; Wagner, Richard K.; Foster, E.

    2011-01-01

    The present study examined oral and silent reading fluency and their relations with reading comprehension. In a series of structural equation models (SEM) with latent variables using data from 316 first-grade students, (1) silent and oral reading fluency were found to be related yet distinct forms of reading fluency; (2) silent reading fluency predicted reading comprehension better for skilled readers than for average readers; (3) list reading fluency predicted reading compr...

  20. Read length versus depth of coverage for viral quasispecies reconstruction.

    Directory of Open Access Journals (Sweden)

    Osvaldo Zagordi

    Full Text Available Recent advancements of sequencing technology have opened up unprecedented opportunities in many application areas. Virus samples can now be sequenced efficiently with very deep coverage to infer the genetic diversity of the underlying virus populations. Several sequencing platforms with different underlying technologies and performance characteristics are available for viral diversity studies. Here, we investigate how the differences between two common platforms provided by 454/Roche and Illumina affect viral diversity estimation and the reconstruction of viral haplotypes. Using a mixture of ten HIV clones sequenced with both platforms and additional simulation experiments, we assessed the trade-off between sequencing coverage, read length, and error rate. For fixed costs, short Illumina reads can be generated at higher coverage and allow for detecting variants at lower frequencies. They can also be sufficient to assess the diversity of the sample if sequences are dissimilar enough, but, in general, assembly of full-length haplotypes is feasible only with the longer 454/Roche reads. The quantitative comparison highlights the advantages and disadvantages of both platforms and provides guidance for the design of viral diversity studies.

  1. Read length versus depth of coverage for viral quasispecies reconstruction.

    Science.gov (United States)

    Zagordi, Osvaldo; Däumer, Martin; Beisel, Christian; Beerenwinkel, Niko

    2012-01-01

    Recent advancements of sequencing technology have opened up unprecedented opportunities in many application areas. Virus samples can now be sequenced efficiently with very deep coverage to infer the genetic diversity of the underlying virus populations. Several sequencing platforms with different underlying technologies and performance characteristics are available for viral diversity studies. Here, we investigate how the differences between two common platforms provided by 454/Roche and Illumina affect viral diversity estimation and the reconstruction of viral haplotypes. Using a mixture of ten HIV clones sequenced with both platforms and additional simulation experiments, we assessed the trade-off between sequencing coverage, read length, and error rate. For fixed costs, short Illumina reads can be generated at higher coverage and allow for detecting variants at lower frequencies. They can also be sufficient to assess the diversity of the sample if sequences are dissimilar enough, but, in general, assembly of full-length haplotypes is feasible only with the longer 454/Roche reads. The quantitative comparison highlights the advantages and disadvantages of both platforms and provides guidance for the design of viral diversity studies.

  2. Reading Strategy Guides to Assist Middle School Educators of Students with Dyslexia

    Science.gov (United States)

    Nichols-Yehling, M.; Strohl, C.

    2014-07-01

    According to the 2010 International Dyslexia Association publication, “Knowledge and Practice Standards for Teachers of Reading,” effective instruction is the key to addressing students' reading difficulties associated with dyslexia, a language-based disorder of learning to read and write. “Informed and effective classroom instruction. . . can prevent or at least effectively address and limit the severity of reading and writing problems.” The Interstellar Boundary Explorer (IBEX) mission Education and Public Outreach program recently funded the development of six strategy guides for teachers of middle school students with reading difficulties, especially dyslexia. These guides utilize space science-themed reading materials developed by the Great Exploration in Math and Science (GEMS), including the IBEX-funded GEMS Space Science Sequence (Grades 6-8). The aforementioned reading strategy guides are now available on the IBEX mission website.

  3. An examination of the rapid automatized naming-reading relationship using functional magnetic resonance imaging.

    Science.gov (United States)

    Cummine, J; Chouinard, B; Szepesvari, E; Georgiou, G K

    2015-10-01

    Rapid automatized naming (RAN) has been established to be a strong predictor of reading. Yet, the neural correlates underlying the RAN-reading relationship remain unknown. Thus, the purpose of this study was to determine: (a) the extent to which RAN and reading activate similar brain regions (within subjects), (b) whether RAN and reading are directly related in the shared activity network outlined in (a), and (c) to what extent RAN neural activation predicts behavioral reading performance. Using functional magnetic resonance imaging (fMRI), university students (N=15; Mean age=20.6 years) were assessed on RAN (letters and digits) and single-word reading (words and non-words). The results revealed a common RAN-reading network that included regions associated with motor planning (cerebellum), semantic access (middle temporal gyrus), articulation (supplementary motor area, pre-motor), and grapheme-phoneme translation (supramarginal gyrus). We found differences between RAN and reading with respect to percent signal change (PSC) in phonological and orthographic regions, but not in articulatory regions. Significant correlations between the neural RAN and reading parameters were found primarily in motor/articulatory regions. Further, we found a unique relationship between in-scanner reading response time and RAN PSC in the left inferior frontal gyrus. Taken together, these findings support the notion that RAN and reading activate similar neural networks. However, the relationship between RAN and reading is primarily driven by commonalities in the motor-sequencing/articulatory processes.

  4. Reading, Perceptual Strategies and Contrastive Analysis

    Science.gov (United States)

    Cowan, J. Ronayne

    1976-01-01

    Analyzes syntactic processes in three learning situations, Japanese reading English, Persians reading English and English speakers reading Hindi; discussed in terms of reading process and second language learning models. (Author/RM)

  5. The Relationship of Print Reading in Tier I Instruction and Reading Achievement for Kindergarten Students at Risk of Reading Difficulties

    Science.gov (United States)

    Wanzek, Jeanne; Roberts, Greg; Al Otaiba, Stephanie; Kent, Shawn C.

    2014-01-01

    For many students at risk of reading difficulties, effective, early reading instruction can improve reading outcomes and set them on a positive reading trajectory. Thus, response-to-intervention models include a focus on a student's Tier I reading instruction as one element for preventing reading difficulties and identifying students with a…

  6. Davies, Florence (1995. Introducing Reading. Davies, Florence (1995. Introducing Reading.

    Directory of Open Access Journals (Sweden)

    Sonia Maria Gomes Ferreira

    2008-04-01

    Full Text Available Arising at a time of unprecedented growth of interest in fostering critical thinking, Introducing Reading offers a clear introduction and thorough account of contemporary developments in the field of reading. While overtly focusing on the special demands of social and human aspects of the reading practice, the issues raised have crucial resonance in the sphere of critical reading. Explicitly addressed to teachers of mother tongue and foreign language contexts, the book claims to elaborate on aspects of reading which have received meager attention to date: individual readers engaged in different real-world reading tasks, the social contexts where such readers engage and interact with texts, and the nature and variety of texts, here regarded as “participants” in the interaction between reader and writer. To this extent, the book successfully reaches the ambitious aim of “socializing and humanizing reading and the teaching of reading” (p. xi. Arising at a time of unprecedented growth of interest in fostering critical thinking, Introducing Reading offers a clear introduction and thorough account of contemporary developments in the field of reading. While overtly focusing on the special demands of social and human aspects of the reading practice, the issues raised have crucial resonance in the sphere of critical reading. Explicitly addressed to teachers of mother tongue and foreign language contexts, the book claims to elaborate on aspects of reading which have received meager attention to date: individual readers engaged in different real-world reading tasks, the social contexts where such readers engage and interact with texts, and the nature and variety of texts, here regarded as “participants” in the interaction between reader and writer. To this extent, the book successfully reaches the ambitious aim of “socializing and humanizing reading and the teaching of reading” (p. xi.

  7. Read again or read anew? Children’s reading practices in a public library

    OpenAIRE

    2016-01-01

    This study discusses repeated reading as a recurring reading practice in childhood. Why do children so often wish to read stories already known to them? What is the meaning and the importance of reading again? We present the results of a master’s research project conducted in a public library with children aged 4 to 10 years. Our methodological strategies consisted of participant observations, photographic records, semi-structured interviews and informal conversation with children who regular...

  8. Ideogram reading in alexia.

    Science.gov (United States)

    Yamadori, A

    1975-06-01

    A case of alexia with agraphia in a Japanese patient is presented. Reading difficulty was severe in words composed of phonograms (Kana), while reading of words composed of Ideograms (Kanji) was better preserved. Writing was severely impaired in both types of characters. Occlusion of the angular branch of the left middle cerebral artery was demonstrated by carotid arteriography and was considered responsible for the symptoms. Two additional cases of alexia with agraphia from the Japanese literature are reviewed. Their linguistic features were similar to the present case. A hypothesis of a functional disconnexion between visual and auditory-oral systems is proposed to explain why Kana processing was more severly affected than Kanji processing.

  9. Reading and My Life

    Institute of Scientific and Technical Information of China (English)

    徐娟

    2005-01-01

    @@ Reading has accompanied me for many years, from1 a pupil to an undergraduate. It enriches my life, and the most significant2 is that it teaches me how to live. Reading instructs me to treat life unperturbedly(泰然自若地). There is such a story.3 An angler(钓鱼者) went fishing in the early morning and came back at dusk without any fish, He spent the whole day but gained nothing. However, he was blissful(有福的). To the angler, it's none of his business for fish doesn't bite the baits4, but5 what he has angled is happiness. We live calmly and confidently, so we can get delightful experiences.

  10. Do Knowledge Arrangements Affect Student Reading Comprehension of Genetics?

    Science.gov (United States)

    Wu, Jen-Yi; Tung, Yu-Neng; Hwang, Bi-Chi; Lin, Chen-Yung; Che-Di, Lee; Chang, Yung-Ta

    2014-01-01

    Various sequences for teaching genetics have been proposed. Three seventh-grade biology textbooks in Taiwan share similar key knowledge assemblages but have different knowledge arrangements. To investigate the influence of knowledge arrangements on student understanding of genetics, we compared students' reading comprehension of the three texts…

  11. Do Knowledge Arrangements Affect Student Reading Comprehension of Genetics?

    Science.gov (United States)

    Wu, Jen-Yi; Tung, Yu-Neng; Hwang, Bi-Chi; Lin, Chen-Yung; Che-Di, Lee; Chang, Yung-Ta

    2014-01-01

    Various sequences for teaching genetics have been proposed. Three seventh-grade biology textbooks in Taiwan share similar key knowledge assemblages but have different knowledge arrangements. To investigate the influence of knowledge arrangements on student understanding of genetics, we compared students' reading comprehension of the three…

  12. Do Knowledge Arrangements Affect Student Reading Comprehension of Genetics?

    Science.gov (United States)

    Wu, Jen-Yi; Tung, Yu-Neng; Hwang, Bi-Chi; Lin, Chen-Yung; Che-Di, Lee; Chang, Yung-Ta

    2014-01-01

    Various sequences for teaching genetics have been proposed. Three seventh-grade biology textbooks in Taiwan share similar key knowledge assemblages but have different knowledge arrangements. To investigate the influence of knowledge arrangements on student understanding of genetics, we compared students' reading comprehension of the three texts…

  13. Music Reading Expertise Selectively Improves Categorical Judgment with Musical Notation

    Directory of Open Access Journals (Sweden)

    Yetta Kwailing Wong

    2011-05-01

    Full Text Available Different domains of perceptual expertise often lead to different hemispheric engagement (e.g. Kanwisher et al., 1997. Recent work suggests that the neural substrates engaged in musical reading are shifted from left hemisphere novice processing to bilateral processing in experts (Wong & Gauthier, 2010. To relate this shift to behavior, we tested whether music-reading training improves categorical and coordinate perceptual judgments, which are argued to rely on the left and right hemisphere respectively (Kosslyn et al., 1989. Music-reading experts and novices judged whether two sequentially presented music sequences were identical. The notes were either on a staff (categorical or without a staff (coordinate in either trained or untrained (90° rotated orientations. Experts performed better than novices for categorical judgments, and the advantage was larger for the trained than untrained orientation. The two groups performed similarly for coordinate judgments. Music-reading fluency predicted performance in categorical judgments in the trained orientation in experts, while it predicted performance in all conditions in novices. This suggests that music-reading training selectively improves categorical judgments in the trained orientation, while music-reading ability in novices reflects general perceptual ability with notes. Future studies will clarify how these findings are related to the hemispheric shift in music-reading expertise.

  14. Accurate genome relative abundance estimation based on shotgun metagenomic reads.

    Directory of Open Access Journals (Sweden)

    Li C Xia

    Full Text Available Accurate estimation of microbial community composition based on metagenomic sequencing data is fundamental for subsequent metagenomics analysis. Prevalent estimation methods are mainly based on directly summarizing alignment results or its variants; often result in biased and/or unstable estimates. We have developed a unified probabilistic framework (named GRAMMy by explicitly modeling read assignment ambiguities, genome size biases and read distributions along the genomes. Maximum likelihood method is employed to compute Genome Relative Abundance of microbial communities using the Mixture Model theory (GRAMMy. GRAMMy has been demonstrated to give estimates that are accurate and robust across both simulated and real read benchmark datasets. We applied GRAMMy to a collection of 34 metagenomic read sets from four metagenomics projects and identified 99 frequent species (minimally 0.5% abundant in at least 50% of the data-sets in the human gut samples. Our results show substantial improvements over previous studies, such as adjusting the over-estimated abundance for Bacteroides species for human gut samples, by providing a new reference-based strategy for metagenomic sample comparisons. GRAMMy can be used flexibly with many read assignment tools (mapping, alignment or composition-based even with low-sensitivity mapping results from huge short-read datasets. It will be increasingly useful as an accurate and robust tool for abundance estimation with the growing size of read sets and the expanding database of reference genomes.

  15. Tired of Reading

    Institute of Scientific and Technical Information of China (English)

    2006-01-01

    China has a long tradition of highlighting reading for children, but many of today's youngsters seem not to have developed a love of books Shoved into Cheng Qidi's room is a bulky bookcase, all three of its shelves full of books sorted by their subjects. Dozens of books occupy most of the desk that's beside the bookcase, while on the wall is a poster of U.S. basketball star Allen Iverson.

  16. Reading Heidegger after Derrida

    Directory of Open Access Journals (Sweden)

    Michael J. Strawser

    2013-11-01

    Full Text Available This essay attempts to broach the complex difference between Martin Heidegger and Jacques Derrida, It focuses on the fundamental assumptions involved in the reading of Heidegger's Being and Time and Derrida's early "noted" attention to this text. Is Heidegger's early work essentially tainted by "the metaphysics of presence," as Derrida wishes to suggest? After sketching Derrida's interpretation, the author attempts to show how readers of Being and Time need not succumb to Derrida's criticism.

  17. Reading to Learn or Learning to Read? Engaging College Students in Course Readings

    Science.gov (United States)

    Kerr, Mary Margaret; Frese, Kristen M.

    2017-01-01

    Despite instructors' belief that class readings are integral to the learning process, only 20-30% of undergraduate students complete required readings. Failure to complete course reading has been associated with declines in exam and research performance. This article first offers a brief review of the literature on why students do not complete…

  18. Program Evaluation of the Direct Instruction Reading Interventions: Reading Mastery and Corrective Reading

    Science.gov (United States)

    Jarvis, Nita M.

    2016-01-01

    The purpose of this program evaluation was to evaluate the Direct Instruction programs, Reading Mastery and Corrective Reading, from SRA McGraw-Hill Publishing Company, which were being used as a school-wide reading intervention. These programs were implemented at a small elementary school in the Piedmont area of North Carolina beginning in the…

  19. Word Reading Efficiency, Text Reading Fluency, and Reading Comprehension among Chinese Learners of English

    Science.gov (United States)

    Jiang, Xiangying; Sawaki, Yasuyo; Sabatini, John

    2012-01-01

    This study examined the relationship among word reading efficiency, text reading fluency, and reading comprehension for adult English as a Foreign Language (EFL) learners. Data from 185 adult Chinese EFL learners preparing to take the Test-of-English-as-a-Foreign-Language[TM] (TOEFL[R]) were analyzed in this study. The participants completed a…

  20. Word Reading Efficiency, Text Reading Fluency, and Reading Comprehension among Chinese Learners of English

    Science.gov (United States)

    Jiang, Xiangying; Sawaki, Yasuyo; Sabatini, John

    2012-01-01

    This study examined the relationship among word reading efficiency, text reading fluency, and reading comprehension for adult English as a Foreign Language (EFL) learners. Data from 185 adult Chinese EFL learners preparing to take the Test-of-English-as-a-Foreign-Language[TM] (TOEFL[R]) were analyzed in this study. The participants completed a…

  1. Repeated Reading for Developing Reading Fluency and Reading Comprehension: The Case of EFL Learners in Vietnam

    Science.gov (United States)

    Gorsuch, Greta; Taguchi, Etsuo

    2008-01-01

    Reading in a foreign or second language is often a laborious process, often caused by underdeveloped word recognition skills, among other things, of second and foreign language readers. Developing fluency in L2/FL reading has become an important pedagogical issue in L2 settings and one major component of reading fluency is fast and accurate word…

  2. The Effects of Extensive Reading on Reading Comprehension, Reading Rate, and Vocabulary Acquisition

    Science.gov (United States)

    Suk, Namhee

    2017-01-01

    Several empirical studies and syntheses of extensive reading have concluded that extensive reading has positive impacts on language learning in second- and foreign-language settings. However, many of the studies contained methodological or curricular limitations, raising questions about the asserted positive effects of extensive reading. The…

  3. Archaebacterial rhodopsin sequences: Implications for evolution

    Science.gov (United States)

    Lanyi, J. K.

    1991-01-01

    It was proposed over 10 years ago that the archaebacteria represent a separate kingdom which diverged very early from the eubacteria and eukaryotes. It follows that investigations of archaebacterial characteristics might reveal features of early evolution. So far, two genes, one for bacteriorhodopsin and another for halorhodopsin, both from Halobacterium halobium, have been sequenced. We cloned and sequenced the gene coding for the polypeptide of another one of these rhodopsins, a halorhodopsin in Natronobacterium pharaonis. Peptide sequencing of cyanogen bromide fragments, and immuno-reactions of the protein and synthetic peptides derived from the C-terminal gene sequence, confirmed that the open reading frame was the structural gene for the pharaonis halorhodopsin polypeptide. The flanking DNA sequences of this gene, as well as those of other bacterial rhodopsins, were compared to previously proposed archaebacterial consensus sequences. In pairwise comparisons of the open reading frame with DNA sequences for bacterio-opsin and halo-opsin from Halobacterium halobium, silent divergences were calculated. These indicate very considerable evolutionary distance between each pair of genes, even in the dame organism. In spite of this, three protein sequences show extensive similarities, indicating strong selective pressures.

  4. XII Elkonin Readings

    Directory of Open Access Journals (Sweden)

    Bugrimenko E.A.,

    2016-07-01

    Full Text Available The article introduces the reader the main content of report presented at the XII Elkonin Readings (4th March, 2016. Elkonin Readings takes place each 2 years in Psychological Institute, Russian Academy of education. This year they are focused on the problem of “Mediation and development”. Speakers from different institutions presented their approaches to solving these problems. The theoretical foundation of the new understanding of the relationship of functional genesis and ontogenesis, buildings mediation activities, proximal development areas have been disclosed in the articles of V.V. Rubtsov, D.B. Elkonin, P.G. Nezhnov. The new conditions for mediation were created on the basis of the different materials (games, reading, and spatial image of “self” according to an experimental practice of teaching and correction of self- development. Developing of creative ways of mediation were mentioned in the articles of L.I. Elkoninov, E.O. Smirnov, E.A. Abdulaeva, E.A. Bugrimenko, N.U. Mangutov, that meet actual problems for modern children.

  5. Sequence characteristics of T4-like bacteriophage IME08 benome termini revealed by high throughput sequencing

    Directory of Open Access Journals (Sweden)

    An Xiaoping

    2011-04-01

    Full Text Available Abstract Background T4 phage is a model species that has contributed broadly to our understanding of molecular biology. T4 DNA replication and packaging share various mechanisms with human double-stranded DNA viruses such as herpes virus. The literature indicates that T4-like phage genomes have permuted terminal sequences, and are generated by a DNA terminase in a sequence-independent manner; Methods genomic DNA of T4-like bacteriophage IME08 was subjected to high throughput sequencing, and the read sequences with extraordinarily high occurrences were analyzed; Results we demonstrate that both the 5' and 3' termini of the IME08 genome starts with base G or A. The presence of a consensus sequence TTGGA|G around the breakpoint of the high frequency read sequences suggests that the terminase cuts the branched pre-genome in a sequence-preferred manner. Our analysis also shows that terminal cleavage is asymmetric, with one end cut at a consensus sequence, and the other end generated randomly. The sequence-preferred cleavage may produce sticky-ends, but with each end being packaged with different efficiencies; Conclusions this study illustrates how high throughput sequencing can be used to probe replication and packaging mechanisms in bacteriophages and/or viruses.

  6. On the Practice Teaching of English Reading

    Science.gov (United States)

    Gao, Yonghong

    2009-01-01

    The main task of practice teaching of English Reading is to train students' independent reading ability and good reading habits. Extra-curricular reading of English literature and English newspapers and magazines plays an active role in improving English reading ability. The principle of selecting reading materials, the scope of selection and the…

  7. Why Reading Fluency Should Be Hot

    Science.gov (United States)

    Rasinski, Timothy V.

    2012-01-01

    This article explores problems that have surfaced in the teaching of reading fluency and how teachers and reading coaches can resolve those problems. Specific issues addressed include reading fluency being defined as reading fast and instruction that is focused on having students read fast, reading fluency viewed as solely and oral reading…

  8. Effective Ways of Extensive Reading Teaching

    Institute of Scientific and Technical Information of China (English)

    杨柳

    2012-01-01

    COmpared with intensive reading; extenSiVe reading Can arouse Students' interest in reading, and enrich their back- ground knowledge about western COuntries. Therefore, we Should improve both and pay Special attention tO the extensive reading So that students' reading ability can be enhanced: The following points Should be noticed when teaching extensive reading.

  9. Reading Disability and the Elementary School Counselor.

    Science.gov (United States)

    Martinez, David H.; Phelps, R. Neal

    1980-01-01

    Provides the elementary school counselor with a knowledge base in the reading and reading disability areas. The discussion on reading highlights four major areas with which the elementary school counselor should be familiar: definition of reading, proliferation of terms, reading skills assessment, and reading disability. (Author)

  10. Underlying skills of oral and silent reading

    NARCIS (Netherlands)

    van den Boer, M.; van Bergen, E.; de Jong, P.F.

    2014-01-01

    Many studies have examined reading and reading development. The majority of these studies, however, focused on oral reading rather than on the more dominant silent reading mode. Similarly, it is common practice to assess oral reading abilities rather than silent reading abilities in schools and in d

  11. Long Read Alignment with Parallel MapReduce Cloud Platform.

    Science.gov (United States)

    Al-Absi, Ahmed Abdulhakim; Kang, Dae-Ki

    2015-01-01

    Genomic sequence alignment is an important technique to decode genome sequences in bioinformatics. Next-Generation Sequencing technologies produce genomic data of longer reads. Cloud platforms are adopted to address the problems arising from storage and analysis of large genomic data. Existing genes sequencing tools for cloud platforms predominantly consider short read gene sequences and adopt the Hadoop MapReduce framework for computation. However, serial execution of map and reduce phases is a problem in such systems. Therefore, in this paper, we introduce Burrows-Wheeler Aligner's Smith-Waterman Alignment on Parallel MapReduce (BWASW-PMR) cloud platform for long sequence alignment. The proposed cloud platform adopts a widely accepted and accurate BWA-SW algorithm for long sequence alignment. A custom MapReduce platform is developed to overcome the drawbacks of the Hadoop framework. A parallel execution strategy of the MapReduce phases and optimization of Smith-Waterman algorithm are considered. Performance evaluation results exhibit an average speed-up of 6.7 considering BWASW-PMR compared with the state-of-the-art Bwasw-Cloud. An average reduction of 30% in the map phase makespan is reported across all experiments comparing BWASW-PMR with Bwasw-Cloud. Optimization of Smith-Waterman results in reducing the execution time by 91.8%. The experimental study proves the efficiency of BWASW-PMR for aligning long genomic sequences on cloud platforms.

  12. Long Read Alignment with Parallel MapReduce Cloud Platform

    Directory of Open Access Journals (Sweden)

    Ahmed Abdulhakim Al-Absi

    2015-01-01

    Full Text Available Genomic sequence alignment is an important technique to decode genome sequences in bioinformatics. Next-Generation Sequencing technologies produce genomic data of longer reads. Cloud platforms are adopted to address the problems arising from storage and analysis of large genomic data. Existing genes sequencing tools for cloud platforms predominantly consider short read gene sequences and adopt the Hadoop MapReduce framework for computation. However, serial execution of map and reduce phases is a problem in such systems. Therefore, in this paper, we introduce Burrows-Wheeler Aligner’s Smith-Waterman Alignment on Parallel MapReduce (BWASW-PMR cloud platform for long sequence alignment. The proposed cloud platform adopts a widely accepted and accurate BWA-SW algorithm for long sequence alignment. A custom MapReduce platform is developed to overcome the drawbacks of the Hadoop framework. A parallel execution strategy of the MapReduce phases and optimization of Smith-Waterman algorithm are considered. Performance evaluation results exhibit an average speed-up of 6.7 considering BWASW-PMR compared with the state-of-the-art Bwasw-Cloud. An average reduction of 30% in the map phase makespan is reported across all experiments comparing BWASW-PMR with Bwasw-Cloud. Optimization of Smith-Waterman results in reducing the execution time by 91.8%. The experimental study proves the efficiency of BWASW-PMR for aligning long genomic sequences on cloud platforms.

  13. Tidying up international nucleotide sequence databases: ecological, geographical and sequence quality annotation of its sequences of mycorrhizal fungi.

    Directory of Open Access Journals (Sweden)

    Leho Tedersoo

    Full Text Available Sequence analysis of the ribosomal RNA operon, particularly the internal transcribed spacer (ITS region, provides a powerful tool for identification of mycorrhizal fungi. The sequence data deposited in the International Nucleotide Sequence Databases (INSD are, however, unfiltered for quality and are often poorly annotated with metadata. To detect chimeric and low-quality sequences and assign the ectomycorrhizal fungi to phylogenetic lineages, fungal ITS sequences were downloaded from INSD, aligned within family-level groups, and examined through phylogenetic analyses and BLAST searches. By combining the fungal sequence database UNITE and the annotation and search tool PlutoF, we also added metadata from the literature to these accessions. Altogether 35,632 sequences belonged to mycorrhizal fungi or originated from ericoid and orchid mycorrhizal roots. Of these sequences, 677 were considered chimeric and 2,174 of low read quality. Information detailing country of collection, geographical coordinates, interacting taxon and isolation source were supplemented to cover 78.0%, 33.0%, 41.7% and 96.4% of the sequences, respectively. These annotated sequences are publicly available via UNITE (http://unite.ut.ee/ for downstream biogeographic, ecological and taxonomic analyses. In European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena/, the annotated sequences have a special link-out to UNITE. We intend to expand the data annotation to additional genes and all taxonomic groups and functional guilds of fungi.

  14. Why should I read? - A cross-cultural investigation into adolescents' reading socialisation and reading attitude

    Science.gov (United States)

    Broeder, Peter; Stokmans, Mia

    2013-06-01

    While reading behaviour of adolescents is a frequent object of research, most studies in this field are restricted to a single country. This study investigates reading as a leisure-time activity across social groups from three regions differing in reading tradition as well as in the facilities available for reading. The authors analyse the reading behaviour of a total of 2,173 adolescents in the Netherlands, in Beijing (China), and in Cape Town (South Africa). Taking Icek Ajzen's Theory of Planned Behaviour as a starting point, the authors adjusted it to model the three most important determinants of reading behaviour, namely (1) reading attitude; (2) subjective norms (implicit and explicit social pressure to read); and (3) perceived behavioural control, which includes reading proficiency and appropriateness of the available books (book supply). While they found the adjusted model to fit the Dutch and Beijing situation quite well, it appeared to be inappropriate for the Cape Town situation. Despite considerable cultural and situational differences between the Netherlands and Beijing, the results show a similar pattern for these two environments. The most important determinants turn out to be: the hedonic reading attitude, the implicit norm of family and friends, the attractiveness of the available choice of books, and the perceived reading proficiency.

  15. SMRT sequencing only de novo assembly of the sugar beet (Beta vulgaris) chloroplast genome.

    Science.gov (United States)

    Stadermann, Kai Bernd; Weisshaar, Bernd; Holtgräwe, Daniela

    2015-09-16

    Third generation sequencing methods, like SMRT (Single Molecule, Real-Time) sequencing developed by Pacific Biosciences, offer much longer read length in comparison to Next Generation Sequencing (NGS) methods. Hence, they are well suited for de novo- or re-sequencing projects. Sequences generated for these purposes will not only contain reads originating from the nuclear genome, but also a significant amount of reads originating from the organelles of the target organism. These reads are usually discarded but they can also be used for an assembly of organellar replicons. The long read length supports resolution of repetitive regions and repeats within the organelles genome which might be problematic when just using short read data. Additionally, SMRT sequencing is less influenced by GC rich areas and by long stretches of the same base. We describe a workflow for a de novo assembly of the sugar beet (Beta vulgaris ssp. vulgaris) chloroplast genome sequence only based on data originating from a SMRT sequencing dataset targeted on its nuclear genome. We show that the data obtained from such an experiment are sufficient to create a high quality assembly with a higher reliability than assemblies derived from e.g. Illumina reads only. The chloroplast genome is especially challenging for de novo assembling as it contains two large inverted repeat (IR) regions. We also describe some limitations that still apply even though long reads are used for the assembly. SMRT sequencing reads extracted from a dataset created for nuclear genome (re)sequencing can be used to obtain a high quality de novo assembly of the chloroplast of the sequenced organism. Even with a relatively small overall coverage for the nuclear genome it is possible to collect more than enough reads to generate a high quality assembly that outperforms short read based assemblies. However, even with long reads it is not always possible to clarify the order of elements of a chloroplast genome sequence reliantly

  16. Underlying skills of oral and silent reading.

    Science.gov (United States)

    van den Boer, Madelon; van Bergen, Elsje; de Jong, Peter F

    2014-12-01

    Many studies have examined reading and reading development. The majority of these studies, however, focused on oral reading rather than on the more dominant silent reading mode. Similarly, it is common practice to assess oral reading abilities rather than silent reading abilities in schools and in diagnosis of reading impairments. More important, insights gained through examinations of oral reading tend to be generalized to silent reading. In the current study, we examined whether such generalizations are justified. We directly compared oral and silent reading fluency by examining whether these reading modes relate to the same underlying skills. In total, 132 fourth graders read words, sentences, and text orally, and 123 classmates read the same material silently. As underlying skills, we considered phonological awareness, rapid naming, and visual attention span. All skills correlated significantly with both reading modes. Phonological awareness contributed equally to oral and silent reading. Rapid naming, however, correlated more strongly with oral reading than with silent reading. Visual attention span correlated equally strongly with both reading modes but showed a significant unique contribution only to silent reading. In short, we showed that oral and silent reading indeed are fairly similar reading modes, based on the relations with reading-related cognitive skills. However, we also found differences that warrant caution in generalizing findings across reading modes. Copyright © 2014 Elsevier Inc. All rights reserved.

  17. Genome sequence of Stachybotrys chartarum Strain 51-11

    Science.gov (United States)

    Stachybotrys chartarum strain 51-11 genome was sequenced by shotgun sequencing utilizing Illumina Hiseq 2000 and PacBio long read technology. Since Stachybotrys chartarum has been implicated in health impacts within water-damaged buildings, any information extracted from the geno...

  18. Genome sequence of Stachybotrys chartarum Strain 51-11

    Science.gov (United States)

    Stachybotrys chartarum strain 51-11 genome was sequenced by shotgun sequencing utilizing Illumina Hiseq 2000 and PacBio long read technology. Since Stachybotrys chartarum has been implicated in health impacts within water-damaged buildings, any information extracted from the geno...

  19. Complete genome sequence of a new maize-associated cytorhabdovirus

    Science.gov (United States)

    A new 11,877 nt cytorhabdovirus sequence with 6 open reading frames has been identified in a maize sample. It shares 50 and 51% genome-wide nucleotide sequence identity with northern cereal mosaic cytorhabdovirus (NCMV) and barley yellow striate mosaic cytorhabdovirus (BYSMV), respectively....

  20. Digital reading:An overview

    Institute of Scientific and Technical Information of China (English)

    Ziming; LIU

    2012-01-01

    Purpose: Digital reading is an important research topic in contemporary information science research. This paper aims to provide a snapshot of major studies on digital reading over the past few years.Design/methodology/approach: This paper begins by introducing the background in digital reading, then outlines major research findings.Findings: The paper demonstrates the growth of interest in information science and other disciplines in digital reading behavior. Five areas are highlighted: Digital reading behavior,print vs. digital, preference for reading medium, multi-tasking and learning, and technological advancement and traditional attachment.Research limitations: Only major studies in the North American and European countries are covered.Practical implications: Understanding reading behavior in the digital environment would help develop more effective reading devices and empower readers in the online environment.Originality/value: The paper represents a first attempt to compare, evaluate, and synthesize recent studies on digital reading. Implications for the changes in reading behavior are discussed, and directions for future research are suggested.