WorldWideScience

Sample records for total sequence length

  1. Blind sequence-length estimation of low-SNR cyclostationary sequences

    CSIR Research Space (South Africa)

    Vlok, JD

    2014-06-01

    Full Text Available Several existing direct-sequence spread spectrum (DSSS) detection and estimation algorithms assume prior knowledge of the symbol period or sequence length, although very few sequence-length estimation techniques are available in the literature...

  2. The relationship of protein conservation and sequence length

    Directory of Open Access Journals (Sweden)

    Panchenko Anna R

    2002-11-01

    Full Text Available Abstract Background In general, the length of a protein sequence is determined by its function and the wide variance in the lengths of an organism's proteins reflects the diversity of specific functional roles for these proteins. However, additional evolutionary forces that affect the length of a protein may be revealed by studying the length distributions of proteins evolving under weaker functional constraints. Results We performed sequence comparisons to distinguish highly conserved and poorly conserved proteins from the bacterium Escherichia coli, the archaeon Archaeoglobus fulgidus, and the eukaryotes Saccharomyces cerevisiae, Drosophila melanogaster, and Homo sapiens. For all organisms studied, the conserved and nonconserved proteins have strikingly different length distributions. The conserved proteins are, on average, longer than the poorly conserved ones, and the length distributions for the poorly conserved proteins have a relatively narrow peak, in contrast to the conserved proteins whose lengths spread over a wider range of values. For the two prokaryotes studied, the poorly conserved proteins approximate the minimal length distribution expected for a diverse range of structural folds. Conclusions There is a relationship between protein conservation and sequence length. For all the organisms studied, there seems to be a significant evolutionary trend favoring shorter proteins in the absence of other, more specific functional constraints.

  3. Noise Attenuation Estimation for Maximum Length Sequences in Deconvolution Process of Auditory Evoked Potentials

    Directory of Open Access Journals (Sweden)

    Xian Peng

    2017-01-01

    Full Text Available The use of maximum length sequence (m-sequence has been found beneficial for recovering both linear and nonlinear components at rapid stimulation. Since m-sequence is fully characterized by a primitive polynomial of different orders, the selection of polynomial order can be problematic in practice. Usually, the m-sequence is repetitively delivered in a looped fashion. Ensemble averaging is carried out as the first step and followed by the cross-correlation analysis to deconvolve linear/nonlinear responses. According to the classical noise reduction property based on additive noise model, theoretical equations have been derived in measuring noise attenuation ratios (NARs after the averaging and correlation processes in the present study. A computer simulation experiment was conducted to test the derived equations, and a nonlinear deconvolution experiment was also conducted using order 7 and 9 m-sequences to address this issue with real data. Both theoretical and experimental results show that the NAR is essentially independent of the m-sequence order and is decided by the total length of valid data, as well as stimulation rate. The present study offers a guideline for m-sequence selections, which can be used to estimate required recording time and signal-to-noise ratio in designing m-sequence experiments.

  4. Analysis of expressed sequence tags generated from full-length enriched cDNA libraries of melon

    Directory of Open Access Journals (Sweden)

    Bendahmane Abdelhafid

    2011-05-01

    Full Text Available Abstract Background Melon (Cucumis melo, an economically important vegetable crop, belongs to the Cucurbitaceae family which includes several other important crops such as watermelon, cucumber, and pumpkin. It has served as a model system for sex determination and vascular biology studies. However, genomic resources currently available for melon are limited. Result We constructed eleven full-length enriched and four standard cDNA libraries from fruits, flowers, leaves, roots, cotyledons, and calluses of four different melon genotypes, and generated 71,577 and 22,179 ESTs from full-length enriched and standard cDNA libraries, respectively. These ESTs, together with ~35,000 ESTs available in public domains, were assembled into 24,444 unigenes, which were extensively annotated by comparing their sequences to different protein and functional domain databases, assigning them Gene Ontology (GO terms, and mapping them onto metabolic pathways. Comparative analysis of melon unigenes and other plant genomes revealed that 75% to 85% of melon unigenes had homologs in other dicot plants, while approximately 70% had homologs in monocot plants. The analysis also identified 6,972 gene families that were conserved across dicot and monocot plants, and 181, 1,192, and 220 gene families specific to fleshy fruit-bearing plants, the Cucurbitaceae family, and melon, respectively. Digital expression analysis identified a total of 175 tissue-specific genes, which provides a valuable gene sequence resource for future genomics and functional studies. Furthermore, we identified 4,068 simple sequence repeats (SSRs and 3,073 single nucleotide polymorphisms (SNPs in the melon EST collection. Finally, we obtained a total of 1,382 melon full-length transcripts through the analysis of full-length enriched cDNA clones that were sequenced from both ends. Analysis of these full-length transcripts indicated that sizes of melon 5' and 3' UTRs were similar to those of tomato, but

  5. Technology development for gene discovery and full-length sequencing

    Energy Technology Data Exchange (ETDEWEB)

    Marcelo Bento Soares

    2004-07-19

    In previous years, with support from the U.S. Department of Energy, we developed methods for construction of normalized and subtracted cDNA libraries, and constructed hundreds of high-quality libraries for production of Expressed Sequence Tags (ESTs). Our clones were made widely available to the scientific community through the IMAGE Consortium, and millions of ESTs were produced from our libraries either by collaborators or by our own sequencing laboratory at the University of Iowa. During this grant period, we focused on (1) the development of a method for preferential cloning of tissue-specific and/or rare transcripts, (2) its utilization to expedite EST-based gene discovery for the NIH Mouse Brain Molecular Anatomy Project, (3) further development and optimization of a method for construction of full-length-enriched cDNA libraries, and (4) modification of a plasmid vector to maximize efficiency of full-length cDNA sequencing by the transposon-mediated approach. It is noteworthy that the technology developed for preferential cloning of rare mRNAs enabled identification of over 2,000 mouse transcripts differentially expressed in the hippocampus. In addition, the method that we optimized for construction of full-length-enriched cDNA libraries was successfully utilized for the production of approximately fifty libraries from the developing mouse nervous system, from which over 2,500 full-ORF-containing cDNAs have been identified and accurately sequenced in their entirety either by our group or by the NIH-Mammalian Gene Collection Program Sequencing Team.

  6. Detecting Scareware by Mining Variable Length Instruction Sequences

    OpenAIRE

    Shahzad, Raja Khurram; Lavesson, Niklas

    2011-01-01

    Scareware is a recent type of malicious software that may pose financial and privacy-related threats to novice users. Traditional countermeasures, such as anti-virus software, require regular updates and often lack the capability of detecting novel (unseen) instances. This paper presents a scareware detection method that is based on the application of machine learning algorithms to learn patterns in extracted variable length opcode sequences derived from instruction sequences of binary files....

  7. Genotypic characterization of Salmonella by multilocus sequence typing, pulsed-field gel electrophoresis and amplified fragment length polymorphism

    DEFF Research Database (Denmark)

    Torpdahl, Mia; Skov, Marianne N.; Sandvang, Dorthe

    2005-01-01

    subspecies enterica isolates. A total of 25 serotypes were investigated that had been isolated from humans or veterinary sources in Denmark between 1995 and 2001. All isolates were genotyped by multilocus sequence typing (MLST), pulsed-field gel electrophoresis (PFGE) and amplified fragment length...

  8. The algorithm of random length sequences synthesis for frame synchronization of digital television systems

    Directory of Open Access Journals (Sweden)

    Аndriy V. Sadchenko

    2015-12-01

    Full Text Available Digital television systems need to ensure that all digital signals processing operations are performed simultaneously and consistently. Frame synchronization dictated by the need to match phases of transmitter and receiver so that it would be possible to identify the start of a frame. As a frame synchronization signals are often used long length binary sequence with good aperiodic autocorrelation function. Aim: This work is dedicated to the development of the algorithm of random length sequences synthesis. Materials and Methods: The paper provides a comparative analysis of the known sequences, which can be used at present as synchronization ones, revealed their advantages and disadvantages. This work proposes the algorithm for the synthesis of binary synchronization sequences of random length with good autocorrelation properties based on noise generator with a uniform distribution law of probabilities. A "white noise" semiconductor generator is proposed to use as the initial material for the synthesis of binary sequences with desired properties. Results: The statistical analysis of the initial implementations of the "white noise" and synthesized sequences for frame synchronization of digital television is conducted. The comparative analysis of the synthesized sequences with known ones was carried out. The results show the benefits of obtained sequences in compare with known ones. The performed simulations confirm the obtained results. Conclusions: Thus, the search algorithm of binary synchronization sequences with desired autocorrelation properties received. According to this algorithm, the sequence can be longer in length and without length limitations. The received sync sequence can be used for frame synchronization in modern digital communication systems that will increase their efficiency and noise immunity.

  9. On the normalization of the minimum free energy of RNAs by sequence length.

    Science.gov (United States)

    Trotta, Edoardo

    2014-01-01

    The minimum free energy (MFE) of ribonucleic acids (RNAs) increases at an apparent linear rate with sequence length. Simple indices, obtained by dividing the MFE by the number of nucleotides, have been used for a direct comparison of the folding stability of RNAs of various sizes. Although this normalization procedure has been used in several studies, the relationship between normalized MFE and length has not yet been investigated in detail. Here, we demonstrate that the variation of MFE with sequence length is not linear and is significantly biased by the mathematical formula used for the normalization procedure. For this reason, the normalized MFEs strongly decrease as hyperbolic functions of length and produce unreliable results when applied for the comparison of sequences with different sizes. We also propose a simple modification of the normalization formula that corrects the bias enabling the use of the normalized MFE for RNAs longer than 40 nt. Using the new corrected normalized index, we analyzed the folding free energies of different human RNA families showing that most of them present an average MFE density more negative than expected for a typical genomic sequence. Furthermore, we found that a well-defined and restricted range of MFE density characterizes each RNA family, suggesting the use of our corrected normalized index to improve RNA prediction algorithms. Finally, in coding and functional human RNAs the MFE density appears scarcely correlated with sequence length, consistent with a negligible role of thermodynamic stability demands in determining RNA size.

  10. On the normalization of the minimum free energy of RNAs by sequence length.

    Directory of Open Access Journals (Sweden)

    Edoardo Trotta

    Full Text Available The minimum free energy (MFE of ribonucleic acids (RNAs increases at an apparent linear rate with sequence length. Simple indices, obtained by dividing the MFE by the number of nucleotides, have been used for a direct comparison of the folding stability of RNAs of various sizes. Although this normalization procedure has been used in several studies, the relationship between normalized MFE and length has not yet been investigated in detail. Here, we demonstrate that the variation of MFE with sequence length is not linear and is significantly biased by the mathematical formula used for the normalization procedure. For this reason, the normalized MFEs strongly decrease as hyperbolic functions of length and produce unreliable results when applied for the comparison of sequences with different sizes. We also propose a simple modification of the normalization formula that corrects the bias enabling the use of the normalized MFE for RNAs longer than 40 nt. Using the new corrected normalized index, we analyzed the folding free energies of different human RNA families showing that most of them present an average MFE density more negative than expected for a typical genomic sequence. Furthermore, we found that a well-defined and restricted range of MFE density characterizes each RNA family, suggesting the use of our corrected normalized index to improve RNA prediction algorithms. Finally, in coding and functional human RNAs the MFE density appears scarcely correlated with sequence length, consistent with a negligible role of thermodynamic stability demands in determining RNA size.

  11. Importance of Viral Sequence Length and Number of Variable and Informative Sites in Analysis of HIV Clustering.

    Science.gov (United States)

    Novitsky, Vlad; Moyo, Sikhulile; Lei, Quanhong; DeGruttola, Victor; Essex, M

    2015-05-01

    To improve the methodology of HIV cluster analysis, we addressed how analysis of HIV clustering is associated with parameters that can affect the outcome of viral clustering. The extent of HIV clustering and tree certainty was compared between 401 HIV-1C near full-length genome sequences and subgenomic regions retrieved from the LANL HIV Database. Sliding window analysis was based on 99 windows of 1,000 bp and 45 windows of 2,000 bp. Potential associations between the extent of HIV clustering and sequence length and the number of variable and informative sites were evaluated. The near full-length genome HIV sequences showed the highest extent of HIV clustering and the highest tree certainty. At the bootstrap threshold of 0.80 in maximum likelihood (ML) analysis, 58.9% of near full-length HIV-1C sequences but only 15.5% of partial pol sequences (ViroSeq) were found in clusters. Among HIV-1 structural genes, pol showed the highest extent of clustering (38.9% at a bootstrap threshold of 0.80), although it was significantly lower than in the near full-length genome sequences. The extent of HIV clustering was significantly higher for sliding windows of 2,000 bp than 1,000 bp. We found a strong association between the sequence length and proportion of HIV sequences in clusters, and a moderate association between the number of variable and informative sites and the proportion of HIV sequences in clusters. In HIV cluster analysis, the extent of detectable HIV clustering is directly associated with the length of viral sequences used, as well as the number of variable and informative sites. Near full-length genome sequences could provide the most informative HIV cluster analysis. Selected subgenomic regions with a high extent of HIV clustering and high tree certainty could also be considered as a second choice.

  12. Full-length sequencing and identification of novel polymorphisms in ...

    Indian Academy of Sciences (India)

    The aim of this work was to sequence the entirecoding region of ACACA gene in Valle del Belice sheep breed to identify polymorphic sites. A total of 51 coding exons of ACACA gene were sequenced in 32 individuals of Valle del Belice sheep breed. Sequencing analysis and alignment of obtained sequences showed the ...

  13. Length and sequence dependence in the association of Huntingtin protein with lipid membranes

    Science.gov (United States)

    Jawahery, Sudi; Nagarajan, Anu; Matysiak, Silvina

    2013-03-01

    There is a fundamental gap in our understanding of how aggregates of mutant Huntingtin protein (htt) with overextended polyglutamine (polyQ) sequences gain the toxic properties that cause Huntington's disease (HD). Experimental studies have shown that the most important step associated with toxicity is the binding of mutant htt aggregates to lipid membranes. Studies have also shown that flanking amino acid sequences around the polyQ sequence directly affect interactions with the lipid bilayer, and that polyQ sequences of greater than 35 glutamine repeats in htt are a characteristic of HD. The key steps that determine how flanking sequences and polyQ length affect the structure of lipid bilayers remain unknown. In this study, we use atomistic molecular dynamics simulations to study the interactions between lipid membranes of varying compositions and polyQ peptides of varying lengths and flanking sequences. We find that overextended polyQ interactions do cause deformation in model membranes, and that the flanking sequences do play a role in intensifying this deformation by altering the shape of the affected regions.

  14. On the total number of genes and their length distribution in complete microbial genomes

    DEFF Research Database (Denmark)

    Skovgaard, M; Jensen, L J; Brunak, S

    2001-01-01

    In sequenced microbial genomes, some of the annotated genes are actually not protein-coding genes, but rather open reading frames that occur by chance. Therefore, the number of annotated genes is higher than the actual number of genes for most of these microbes. Comparison of the length distribut......In sequenced microbial genomes, some of the annotated genes are actually not protein-coding genes, but rather open reading frames that occur by chance. Therefore, the number of annotated genes is higher than the actual number of genes for most of these microbes. Comparison of the length...... distribution of the annotated genes with the length distribution of those matching a known protein reveals that too many short genes are annotated in many genomes. Here we estimate the true number of protein-coding genes for sequenced genomes. Although it is often claimed that Escherichia coli has about 4300...... genes, we show that it probably has only approximately 3800 genes, and that a similar discrepancy exists for almost all published genomes....

  15. Generation and analysis of large-scale expressed sequence tags (ESTs from a full-length enriched cDNA library of porcine backfat tissue

    Directory of Open Access Journals (Sweden)

    Lee Hae-Young

    2006-02-01

    Full Text Available Abstract Background Genome research in farm animals will expand our basic knowledge of the genetic control of complex traits, and the results will be applied in the livestock industry to improve meat quality and productivity, as well as to reduce the incidence of disease. A combination of quantitative trait locus mapping and microarray analysis is a useful approach to reduce the overall effort needed to identify genes associated with quantitative traits of interest. Results We constructed a full-length enriched cDNA library from porcine backfat tissue. The estimated average size of the cDNA inserts was 1.7 kb, and the cDNA fullness ratio was 70%. In total, we deposited 16,110 high-quality sequences in the dbEST division of GenBank (accession numbers: DT319652-DT335761. For all the expressed sequence tags (ESTs, approximately 10.9 Mb of porcine sequence were generated with an average length of 674 bp per EST (range: 200–952 bp. Clustering and assembly of these ESTs resulted in a total of 5,008 unique sequences with 1,776 contigs (35.46% and 3,232 singleton (65.54% ESTs. From a total of 5,008 unique sequences, 3,154 (62.98% were similar to other sequences, and 1,854 (37.02% were identified as having no hit or low identity (Sus scrofa. Gene ontology (GO annotation of unique sequences showed that approximately 31.7, 32.3, and 30.8% were assigned molecular function, biological process, and cellular component GO terms, respectively. A total of 1,854 putative novel transcripts resulted after comparison and filtering with the TIGR SsGI; these included a large percentage of singletons (80.64% and a small proportion of contigs (13.36%. Conclusion The sequence data generated in this study will provide valuable information for studying expression profiles using EST-based microarrays and assist in the condensation of current pig TCs into clusters representing longer stretches of cDNA sequences. The isolation of genes expressed in backfat tissue is the

  16. Full-length cDNA sequences from Rhesus monkey placenta tissue: analysis and utility for comparative mapping

    Directory of Open Access Journals (Sweden)

    Lee Sang-Rae

    2010-07-01

    Full Text Available Abstract Background Rhesus monkeys (Macaca mulatta are widely-used as experimental animals in biomedical research and are closely related to other laboratory macaques, such as cynomolgus monkeys (Macaca fascicularis, and to humans, sharing a last common ancestor from about 25 million years ago. Although rhesus monkeys have been studied extensively under field and laboratory conditions, research has been limited by the lack of genetic resources. The present study generated placenta full-length cDNA libraries, characterized the resulting expressed sequence tags, and described their utility for comparative mapping with human RefSeq mRNA transcripts. Results From rhesus monkey placenta full-length cDNA libraries, 2000 full-length cDNA sequences were determined and 1835 rhesus placenta cDNA sequences longer than 100 bp were collected. These sequences were annotated based on homology to human genes. Homology search against human RefSeq mRNAs revealed that our collection included the sequences of 1462 putative rhesus monkey genes. Moreover, we identified 207 genes containing exon alterations in the coding region and the untranslated region of rhesus monkey transcripts, despite the highly conserved structure of the coding regions. Approximately 10% (187 of all full-length cDNA sequences did not represent any public human RefSeq mRNAs. Intriguingly, two rhesus monkey specific exons derived from the transposable elements of AluYRa2 (SINE family and MER11B (LTR family were also identified. Conclusion The 1835 rhesus monkey placenta full-length cDNA sequences described here could expand genomic resources and information of rhesus monkeys. This increased genomic information will greatly contribute to the development of evolutionary biology and biomedical research.

  17. Sequencing and analysis of full-length cDNAs, 5'-ESTs and 3'-ESTs from a cartilaginous fish, the elephant shark (Callorhinchus milii).

    KAUST Repository

    Brenner, Sydney

    2012-10-08

    Cartilaginous fishes are the most ancient group of living jawed vertebrates (gnathostomes) and are, therefore, an important reference group for understanding the evolution of vertebrates. The elephant shark (Callorhinchus milii), a holocephalan cartilaginous fish, has been identified as a model cartilaginous fish genome because of its compact genome (∼910 Mb) and a genome project has been initiated to obtain its whole genome sequence. In this study, we have generated and sequenced full-length enriched cDNA libraries of the elephant shark using the \\'oligo-capping\\' method and Sanger sequencing. A total of 6,778 full-length protein-coding cDNA and 10,701 full-length noncoding cDNA were sequenced from six tissues (gills, intestine, kidney, liver, spleen, and testis) of the elephant shark. Analysis of their polyadenylation signals showed that polyadenylation usage in elephant shark is similar to that in mammals. Furthermore, both coding and noncoding transcripts of the elephant shark use the same proportion of canonical polyadenylation sites. Besides BLASTX searches, protein-coding transcripts were annotated by Gene Ontology, InterPro domain, and KEGG pathway analyses. By comparing elephant shark genes to bony vertebrate genes, we identified several ancient genes present in elephant shark but differentially lost in tetrapods or teleosts. Only ∼6% of elephant shark noncoding cDNA showed similarity to known noncoding RNAs (ncRNAs). The rest are either highly divergent ncRNAs or novel ncRNAs. In addition to full-length transcripts, 30,375 5\\'-ESTs and 41,317 3\\'-ESTs were sequenced and annotated. The clones and transcripts generated in this study are valuable resources for annotating transcription start sites, exon-intron boundaries, and UTRs of genes in the elephant shark genome, and for the functional characterization of protein sequences. These resources will also be useful for annotating genes in other cartilaginous fishes whose genomes have been targeted for

  18. Sequencing and analysis of full-length cDNAs, 5'-ESTs and 3'-ESTs from a cartilaginous fish, the elephant shark (Callorhinchus milii).

    KAUST Repository

    Brenner, Sydney; Kodzius, Rimantas; Tan, Yue Ying; Tay, Alice; Tay, Boon-Hui; Venkatesh, Byrappa

    2012-01-01

    Cartilaginous fishes are the most ancient group of living jawed vertebrates (gnathostomes) and are, therefore, an important reference group for understanding the evolution of vertebrates. The elephant shark (Callorhinchus milii), a holocephalan cartilaginous fish, has been identified as a model cartilaginous fish genome because of its compact genome (∼910 Mb) and a genome project has been initiated to obtain its whole genome sequence. In this study, we have generated and sequenced full-length enriched cDNA libraries of the elephant shark using the 'oligo-capping' method and Sanger sequencing. A total of 6,778 full-length protein-coding cDNA and 10,701 full-length noncoding cDNA were sequenced from six tissues (gills, intestine, kidney, liver, spleen, and testis) of the elephant shark. Analysis of their polyadenylation signals showed that polyadenylation usage in elephant shark is similar to that in mammals. Furthermore, both coding and noncoding transcripts of the elephant shark use the same proportion of canonical polyadenylation sites. Besides BLASTX searches, protein-coding transcripts were annotated by Gene Ontology, InterPro domain, and KEGG pathway analyses. By comparing elephant shark genes to bony vertebrate genes, we identified several ancient genes present in elephant shark but differentially lost in tetrapods or teleosts. Only ∼6% of elephant shark noncoding cDNA showed similarity to known noncoding RNAs (ncRNAs). The rest are either highly divergent ncRNAs or novel ncRNAs. In addition to full-length transcripts, 30,375 5'-ESTs and 41,317 3'-ESTs were sequenced and annotated. The clones and transcripts generated in this study are valuable resources for annotating transcription start sites, exon-intron boundaries, and UTRs of genes in the elephant shark genome, and for the functional characterization of protein sequences. These resources will also be useful for annotating genes in other cartilaginous fishes whose genomes have been targeted for whole

  19. Read length and repeat resolution: Exploring prokaryote genomes using next-generation sequencing technologies

    KAUST Repository

    Cahill, Matt J.

    2010-07-12

    Background: There are a growing number of next-generation sequencing technologies. At present, the most cost-effective options also produce the shortest reads. However, even for prokaryotes, there is uncertainty concerning the utility of these technologies for the de novo assembly of complete genomes. This reflects an expectation that short reads will be unable to resolve small, but presumably abundant, repeats. Methodology/Principal Findings: Using a simple model of repeat assembly, we develop and test a technique that, for any read length, can estimate the occurrence of unresolvable repeats in a genome, and thus predict the number of gaps that would need to be closed to produce a complete sequence. We apply this technique to 818 prokaryote genome sequences. This provides a quantitative assessment of the relative performance of various lengths. Notably, unpaired reads of only 150nt can reconstruct approximately 50% of the analysed genomes with fewer than 96 repeat-induced gaps. Nonetheless, there is considerable variation amongst prokaryotes. Some genomes can be assembled to near contiguity using very short reads while others require much longer reads. Conclusions: Given the diversity of prokaryote genomes, a sequencing strategy should be tailored to the organism under study. Our results will provide researchers with a practical resource to guide the selection of the appropriate read length. 2010 Cahill et al.

  20. Read length and repeat resolution: exploring prokaryote genomes using next-generation sequencing technologies.

    Directory of Open Access Journals (Sweden)

    Matt J Cahill

    Full Text Available BACKGROUND: There are a growing number of next-generation sequencing technologies. At present, the most cost-effective options also produce the shortest reads. However, even for prokaryotes, there is uncertainty concerning the utility of these technologies for the de novo assembly of complete genomes. This reflects an expectation that short reads will be unable to resolve small, but presumably abundant, repeats. METHODOLOGY/PRINCIPAL FINDINGS: Using a simple model of repeat assembly, we develop and test a technique that, for any read length, can estimate the occurrence of unresolvable repeats in a genome, and thus predict the number of gaps that would need to be closed to produce a complete sequence. We apply this technique to 818 prokaryote genome sequences. This provides a quantitative assessment of the relative performance of various lengths. Notably, unpaired reads of only 150nt can reconstruct approximately 50% of the analysed genomes with fewer than 96 repeat-induced gaps. Nonetheless, there is considerable variation amongst prokaryotes. Some genomes can be assembled to near contiguity using very short reads while others require much longer reads. CONCLUSIONS: Given the diversity of prokaryote genomes, a sequencing strategy should be tailored to the organism under study. Our results will provide researchers with a practical resource to guide the selection of the appropriate read length.

  1. Read length and repeat resolution: Exploring prokaryote genomes using next-generation sequencing technologies

    KAUST Repository

    Cahill, Matt J.; Kö ser, Claudio U.; Ross, Nicholas E.; Archer, John A.C.

    2010-01-01

    Background: There are a growing number of next-generation sequencing technologies. At present, the most cost-effective options also produce the shortest reads. However, even for prokaryotes, there is uncertainty concerning the utility of these technologies for the de novo assembly of complete genomes. This reflects an expectation that short reads will be unable to resolve small, but presumably abundant, repeats. Methodology/Principal Findings: Using a simple model of repeat assembly, we develop and test a technique that, for any read length, can estimate the occurrence of unresolvable repeats in a genome, and thus predict the number of gaps that would need to be closed to produce a complete sequence. We apply this technique to 818 prokaryote genome sequences. This provides a quantitative assessment of the relative performance of various lengths. Notably, unpaired reads of only 150nt can reconstruct approximately 50% of the analysed genomes with fewer than 96 repeat-induced gaps. Nonetheless, there is considerable variation amongst prokaryotes. Some genomes can be assembled to near contiguity using very short reads while others require much longer reads. Conclusions: Given the diversity of prokaryote genomes, a sequencing strategy should be tailored to the organism under study. Our results will provide researchers with a practical resource to guide the selection of the appropriate read length. 2010 Cahill et al.

  2. Effects of the Length of Stay on the Cost of Total Knee and Total Hip Arthroplasty from 2002 to 2013.

    Science.gov (United States)

    Molloy, Ilda B; Martin, Brook I; Moschetti, Wayne E; Jevsevar, David S

    2017-03-01

    Utilization of total knee and hip arthroplasty has greatly increased in the past decade in the United States; these are among the most expensive procedures in patients with Medicare. Advances in surgical techniques, anesthesia, and care pathways decrease hospital length of stay. We examined how trends in hospital cost were altered by decreases in length of stay. Procedure, demographic, and economic data were collected on 6.4 million admissions for total knee arthroplasty and 2.8 million admissions for total hip arthroplasty from 2002 to 2013 using the National (Nationwide) Inpatient Sample, a component of the Healthcare Cost and Utilization Project. Trends in mean hospital costs and their association with length of stay were estimated using inflation-adjusted, survey-weighted generalized linear regression models, controlling for patient demographic characteristics and comorbidity. From 2002 to 2013, the length of stay decreased from a mean time of 4.06 to 2.97 days for total knee arthroplasty and from 4.06 to 2.75 days for total hip arthroplasty. During the same time period, the mean hospital cost for total knee arthroplasty increased from $14,988 (95% confidence interval [CI], $14,927 to $15,049) in 2002 to $22,837 (95% CI, $22,765 to $22,910) in 2013 (an overall increase of $7,849 or 52.4%). The mean hospital cost for total hip arthroplasty increased from $15,792 (95% CI, $15,706 to $15,878) in 2002 to $23,650 (95% CI, $23,544 to $23,755) in 2013 (an increase of $7,858 or 49.8%). If length of stay were set at the 2002 mean, the growth in cost for total knee arthroplasty would have been 70.8% instead of 52.4% as observed, and the growth in cost for total hip arthroplasty would have been 67.4% instead of 49.8% as observed. Hospital costs for joint replacement increased from 2002 to 2013, but were attenuated by reducing inpatient length of stay. With demographic characteristics showing an upward trend in the utilization of joint arthroplasty, including a shift

  3. Sequencing, mapping, and analysis of 27,455 maize full-length cDNAs.

    Directory of Open Access Journals (Sweden)

    Carol Soderlund

    2009-11-01

    Full Text Available Full-length cDNA (FLcDNA sequencing establishes the precise primary structure of individual gene transcripts. From two libraries representing 27 B73 tissues and abiotic stress treatments, 27,455 high-quality FLcDNAs were sequenced. The average transcript length was 1.44 kb including 218 bases and 321 bases of 5' and 3' UTR, respectively, with 8.6% of the FLcDNAs encoding predicted proteins of fewer than 100 amino acids. Approximately 94% of the FLcDNAs were stringently mapped to the maize genome. Although nearly two-thirds of this genome is composed of transposable elements (TEs, only 5.6% of the FLcDNAs contained TE sequences in coding or UTR regions. Approximately 7.2% of the FLcDNAs are putative transcription factors, suggesting that rare transcripts are well-enriched in our FLcDNA set. Protein similarity searching identified 1,737 maize transcripts not present in rice, sorghum, Arabidopsis, or poplar annotated genes. A strict FLcDNA assembly generated 24,467 non-redundant sequences, of which 88% have non-maize protein matches. The FLcDNAs were also assembled with 41,759 FLcDNAs in GenBank from other projects, where semi-strict parameters were used to identify 13,368 potentially unique non-redundant sequences from this project. The libraries, ESTs, and FLcDNA sequences produced from this project are publicly available. The annotated EST and FLcDNA assemblies are available through the maize FLcDNA web resource (www.maizecdna.org.

  4. On Sequence Lengths of Some Special External Exclusive OR Type LFSR Structures – Study and Analysis

    Directory of Open Access Journals (Sweden)

    A Ahmad

    2014-12-01

    Full Text Available The study of the length of pseudo-random binary sequences generated by Linear- Feedback Shift Registers (LFSRs plays an important role in the design approaches of built-in selftest, cryptosystems, and other applications. However, certain LFSR structures might not be appropriate in some situations. Given that determining the length of generated pseudo-random binary sequence is a complex task, therefore, before using an LFSR structure, it is essential to investigate the length and the properties of the sequence. This paper investigates some conditions and LFSR’s structures, which restrict the pseudo-random binary sequences’ generation to a certain fixed length. The outcomes of this paper are presented in the form of theorems, simulations, and analyses. We believe that these outcomes are of great importance to the designers of built-in self-test equipment, cryptosystems, and other applications such as radar, CDMA, error correction, and Monte Carlo simulation.

  5. Sequencing of BAC pools by different next generation sequencing platforms and strategies

    Directory of Open Access Journals (Sweden)

    Scholz Uwe

    2011-10-01

    Full Text Available Abstract Background Next generation sequencing of BACs is a viable option for deciphering the sequence of even large and highly repetitive genomes. In order to optimize this strategy, we examined the influence of read length on the quality of Roche/454 sequence assemblies, to what extent Illumina/Solexa mate pairs (MPs improve the assemblies by scaffolding and whether barcoding of BACs is dispensable. Results Sequencing four BACs with both FLX and Titanium technologies revealed similar sequencing accuracy, but showed that the longer Titanium reads produce considerably less misassemblies and gaps. The 454 assemblies of 96 barcoded BACs were improved by scaffolding 79% of the total contig length with MPs from a non-barcoded library. Assembly of the unmasked 454 sequences without separation by barcodes revealed chimeric contig formation to be a major problem, encompassing 47% of the total contig length. Masking the sequences reduced this fraction to 24%. Conclusion Optimal BAC pool sequencing should be based on the longest available reads, with barcoding essential for a comprehensive assessment of both repetitive and non-repetitive sequence information. When interest is restricted to non-repetitive regions and repeats are masked prior to assembly, barcoding is non-essential. In any case, the assemblies can be improved considerably by scaffolding with non-barcoded BAC pool MPs.

  6. Cost-effective sequencing of full-length cDNA clones powered by a de novo-reference hybrid assembly.

    Science.gov (United States)

    Kuroshu, Reginaldo M; Watanabe, Junichi; Sugano, Sumio; Morishita, Shinichi; Suzuki, Yutaka; Kasahara, Masahiro

    2010-05-07

    Sequencing full-length cDNA clones is important to determine gene structures including alternative splice forms, and provides valuable resources for experimental analyses to reveal the biological functions of coded proteins. However, previous approaches for sequencing cDNA clones were expensive or time-consuming, and therefore, a fast and efficient sequencing approach was demanded. We developed a program, MuSICA 2, that assembles millions of short (36-nucleotide) reads collected from a single flow cell lane of Illumina Genome Analyzer to shotgun-sequence approximately 800 human full-length cDNA clones. MuSICA 2 performs a hybrid assembly in which an external de novo assembler is run first and the result is then improved by reference alignment of shotgun reads. We compared the MuSICA 2 assembly with 200 pooled full-length cDNA clones finished independently by the conventional primer-walking using Sanger sequencers. The exon-intron structure of the coding sequence was correct for more than 95% of the clones with coding sequence annotation when we excluded cDNA clones insufficiently represented in the shotgun library due to PCR failure (42 out of 200 clones excluded), and the nucleotide-level accuracy of coding sequences of those correct clones was over 99.99%. We also applied MuSICA 2 to full-length cDNA clones from Toxoplasma gondii, to confirm that its ability was competent even for non-human species. The entire sequencing and shotgun assembly takes less than 1 week and the consumables cost only approximately US$3 per clone, demonstrating a significant advantage over previous approaches.

  7. The function analysis of full-length cDNA sequence from IRM-2 mouse cDNA library

    International Nuclear Information System (INIS)

    Wang Qin; Liu Xiaoqiu; Xu Chang; Du Liqing; Sun Zhijuan; Wang Yan; Liu Qiang; Song Li; Li Jin; Fan Feiyue

    2013-01-01

    Objective: To identify the function of full-length cDNA sequence from IRM-2 mouse cDNA library. Methods: Full-length cDNA products were amplified by PCR from IRM-2 mouse cDNA library according to twenty-one pieces of expressed sequence tag. The expression of full-length cDNAs were detected after mouse embryonic fibroblasts were exposed to 6.5 Gy γ-ray radiation. And the effect on the growth of radiosensitivity cells AT5B1VA transfected with full-length cDNAs was investigated. Results: The expression of No.4, 5 and 2 full-length cDNAs from IRM-2 mouse were higher than that of parental ICR and 615 mouse after mouse embryonic fibroblasts irradiated with γ-ray radiation. And the survival rate of AT5B1VA cells transfected with No.4, 5 and 2 full-length cDNAs was high. Conclusion: No.4, 5 and 2 full-length cDNAs of IRM-2 mouse are of high radioresistance. (authors)

  8. Characterization of full-length sequenced cDNA inserts (FLIcs) from Atlantic salmon (Salmo salar)

    Science.gov (United States)

    Andreassen, Rune; Lunner, Sigbjørn; Høyheim, Bjørn

    2009-01-01

    Background Sequencing of the Atlantic salmon genome is now being planned by an international research consortium. Full-length sequenced inserts from cDNAs (FLIcs) are an important tool for correct annotation and clustering of the genomic sequence in any species. The large amount of highly similar duplicate sequences caused by the relatively recent genome duplication in the salmonid ancestor represents a particular challenge for the genome project. FLIcs will therefore be an extremely useful resource for the Atlantic salmon sequencing project. In addition to be helpful in order to distinguish between duplicate genome regions and in determining correct gene structures, FLIcs are an important resource for functional genomic studies and for investigation of regulatory elements controlling gene expression. In contrast to the large number of ESTs available, including the ESTs from 23 developmental and tissue specific cDNA libraries contributed by the Salmon Genome Project (SGP), the number of sequences where the full-length of the cDNA insert has been determined has been small. Results High quality full-length insert sequences from 560 pre-smolt white muscle tissue specific cDNAs were generated, accession numbers [GenBank: BT043497 - BT044056]. Five hundred and ten (91%) of the transcripts were annotated using Gene Ontology (GO) terms and 440 of the FLIcs are likely to contain a complete coding sequence (cCDS). The sequence information was used to identify putative paralogs, characterize salmon Kozak motifs, polyadenylation signal variation and to identify motifs likely to be involved in the regulation of particular genes. Finally, conserved 7-mers in the 3'UTRs were identified, of which some were identical to miRNA target sequences. Conclusion This paper describes the first Atlantic salmon FLIcs from a tissue and developmental stage specific cDNA library. We have demonstrated that many FLIcs contained a complete coding sequence (cCDS). This suggests that the remaining c

  9. Characterization of full-length sequenced cDNA inserts (FLIcs from Atlantic salmon (Salmo salar

    Directory of Open Access Journals (Sweden)

    Lunner Sigbjørn

    2009-10-01

    Full Text Available Abstract Background Sequencing of the Atlantic salmon genome is now being planned by an international research consortium. Full-length sequenced inserts from cDNAs (FLIcs are an important tool for correct annotation and clustering of the genomic sequence in any species. The large amount of highly similar duplicate sequences caused by the relatively recent genome duplication in the salmonid ancestor represents a particular challenge for the genome project. FLIcs will therefore be an extremely useful resource for the Atlantic salmon sequencing project. In addition to be helpful in order to distinguish between duplicate genome regions and in determining correct gene structures, FLIcs are an important resource for functional genomic studies and for investigation of regulatory elements controlling gene expression. In contrast to the large number of ESTs available, including the ESTs from 23 developmental and tissue specific cDNA libraries contributed by the Salmon Genome Project (SGP, the number of sequences where the full-length of the cDNA insert has been determined has been small. Results High quality full-length insert sequences from 560 pre-smolt white muscle tissue specific cDNAs were generated, accession numbers [GenBank: BT043497 - BT044056]. Five hundred and ten (91% of the transcripts were annotated using Gene Ontology (GO terms and 440 of the FLIcs are likely to contain a complete coding sequence (cCDS. The sequence information was used to identify putative paralogs, characterize salmon Kozak motifs, polyadenylation signal variation and to identify motifs likely to be involved in the regulation of particular genes. Finally, conserved 7-mers in the 3'UTRs were identified, of which some were identical to miRNA target sequences. Conclusion This paper describes the first Atlantic salmon FLIcs from a tissue and developmental stage specific cDNA library. We have demonstrated that many FLIcs contained a complete coding sequence (cCDS. This

  10. Reducing Length of Stay in Total Joint Arthroplasty Care.

    Science.gov (United States)

    Walters, Megan; Chambers, Monique C; Sayeed, Zain; Anoushiravani, Afshin A; El-Othmani, Mouhanad M; Saleh, Khaled J

    2016-10-01

    As health care reforms continue to improve quality of care, significant emphasis will be placed on evaluation of orthopedic patient outcomes. Total joint arthroplasty (TJA) has a proven track record of enhancing patient quality of life and are easily replicable. The outcomes of these procedures serve as a measure of health care initiative success. Specifically, length of stay, will be targeted as a marker of quality of surgical care delivered to TJA patients. Within this review, we will discuss preoperative and postoperative methods by which orthopedic surgeons may enhance TJA outcomes and effectively reduce length of stay. Copyright © 2016 Elsevier Inc. All rights reserved.

  11. Construction and evaluation of normalized cDNA libraries enriched with full-length sequences for rapid discovery of new genes from Sisal (Agave sisalana Perr.) different developmental stages.

    Science.gov (United States)

    Zhou, Wen-Zhao; Zhang, Yan-Mei; Lu, Jun-Ying; Li, Jun-Feng

    2012-10-12

    To provide a resource of sisal-specific expressed sequence data and facilitate this powerful approach in new gene research, the preparation of normalized cDNA libraries enriched with full-length sequences is necessary. Four libraries were produced with RNA pooled from Agave sisalana multiple tissues to increase efficiency of normalization and maximize the number of independent genes by SMART™ method and the duplex-specific nuclease (DSN). This procedure kept the proportion of full-length cDNAs in the subtracted/normalized libraries and dramatically enhanced the discovery of new genes. Sequencing of 3875 cDNA clones of libraries revealed 3320 unigenes with an average insert length about 1.2 kb, indicating that the non-redundancy of libraries was about 85.7%. These unigene functions were predicted by comparing their sequences to functional domain databases and extensively annotated with Gene Ontology (GO) terms. Comparative analysis of sisal unigenes and other plant genomes revealed that four putative MADS-box genes and knotted-like homeobox (knox) gene were obtained from a total of 1162 full-length transcripts. Furthermore, real-time PCR showed that the characteristics of their transcripts mainly depended on the tight expression regulation of a number of genes during the leaf and flower development. Analysis of individual library sequence data indicated that the pooled-tissue approach was highly effective in discovering new genes and preparing libraries for efficient deep sequencing.

  12. Comparisons between Arabidopsis thaliana and Drosophila melanogaster in relation to Coding and Noncoding Sequence Length and Gene Expression

    Directory of Open Access Journals (Sweden)

    Rachel Caldwell

    2015-01-01

    Full Text Available There is a continuing interest in the analysis of gene architecture and gene expression to determine the relationship that may exist. Advances in high-quality sequencing technologies and large-scale resource datasets have increased the understanding of relationships and cross-referencing of expression data to the large genome data. Although a negative correlation between expression level and gene (especially transcript length has been generally accepted, there have been some conflicting results arising from the literature concerning the impacts of different regions of genes, and the underlying reason is not well understood. The research aims to apply quantile regression techniques for statistical analysis of coding and noncoding sequence length and gene expression data in the plant, Arabidopsis thaliana, and fruit fly, Drosophila melanogaster, to determine if a relationship exists and if there is any variation or similarities between these species. The quantile regression analysis found that the coding sequence length and gene expression correlations varied, and similarities emerged for the noncoding sequence length (5′ and 3′ UTRs between animal and plant species. In conclusion, the information described in this study provides the basis for further exploration into gene regulation with regard to coding and noncoding sequence length.

  13. Optimal choice of word length when comparing two Markov sequences using a χ 2-statistic.

    Science.gov (United States)

    Bai, Xin; Tang, Kujin; Ren, Jie; Waterman, Michael; Sun, Fengzhu

    2017-10-03

    Alignment-free sequence comparison using counts of word patterns (grams, k-tuples) has become an active research topic due to the large amount of sequence data from the new sequencing technologies. Genome sequences are frequently modelled by Markov chains and the likelihood ratio test or the corresponding approximate χ 2 -statistic has been suggested to compare two sequences. However, it is not known how to best choose the word length k in such studies. We develop an optimal strategy to choose k by maximizing the statistical power of detecting differences between two sequences. Let the orders of the Markov chains for the two sequences be r 1 and r 2 , respectively. We show through both simulations and theoretical studies that the optimal k= max(r 1 ,r 2 )+1 for both long sequences and next generation sequencing (NGS) read data. The orders of the Markov chains may be unknown and several methods have been developed to estimate the orders of Markov chains based on both long sequences and NGS reads. We study the power loss of the statistics when the estimated orders are used. It is shown that the power loss is minimal for some of the estimators of the orders of Markov chains. Our studies provide guidelines on choosing the optimal word length for the comparison of Markov sequences.

  14. On the total number of genes and their length distribution in complete microbial genomes

    DEFF Research Database (Denmark)

    Skovgaard, Marie; Jensen, L.J.; Brunak, Søren

    2001-01-01

    In sequenced microbial genomes, some of the annotated genes are actually not protein-coding genes, but rather open reading frames that occur by chance. Therefore, the number of annotated genes is higher than the actual number of genes for most of these microbes. Comparison of the length...... distribution of the annotated genes with the length distribution of those matching a known protein reveals that too many short genes are annotated in many genomes. Here we estimate the true number of protein-coding genes for sequenced genomes. Although it is often claimed that Escherichia coli has about 4300...... genes, we show that it probably has only similar to 3800 genes, and that a similar discrepancy exists for almost all published genomes....

  15. Total Path Length and Number of Terminal Nodes for Decision Trees

    KAUST Repository

    Hussain, Shahid

    2014-09-13

    This paper presents a new tool for study of relationships between total path length (average depth) and number of terminal nodes for decision trees. These relationships are important from the point of view of optimization of decision trees. In this particular case of total path length and number of terminal nodes, the relationships between these two cost functions are closely related with space-time trade-off. In addition to algorithm to compute the relationships, the paper also presents results of experiments with datasets from UCI ML Repository1. These experiments show how two cost functions behave for a given decision table and the resulting plots show the Pareto frontier or Pareto set of optimal points. Furthermore, in some cases this Pareto frontier is a singleton showing the total optimality of decision trees for the given decision table.

  16. Genome survey sequencing and genetic background characterization of Gracilariopsis lemaneiformis (Rhodophyta) based on next-generation sequencing.

    Science.gov (United States)

    Zhou, Wei; Hu, Yiyi; Sui, Zhenghong; Fu, Feng; Wang, Jinguo; Chang, Lianpeng; Guo, Weihua; Li, Binbin

    2013-01-01

    Gracilariopsis lemaneiformis has a high economic value and is one of the most important aquaculture species in China. Despite it is economic importance, it has remained largely unstudied at the genomic level. In this study, we conducted a genome survey of Gp. lemaneiformis using next-generation sequencing (NGS) technologies. In total, 18.70 Gb of high-quality sequence data with an estimated genome size of 97 Mb were obtained by HiSeq 2000 sequencing for Gp. lemaneiformis. These reads were assembled into 160,390 contigs with a N50 length of 3.64 kb, which were further assembled into 125,685 scaffolds with a total length of 81.17 Mb. Genome analysis predicted 3490 genes and a GC% content of 48%. The identified genes have an average transcript length of 1,429 bp, an average coding sequence size of 1,369 bp, 1.36 exons per gene, exon length of 1,008 bp, and intron length of 191 bp. From the initial assembled scaffold, transposable elements constituted 54.64% (44.35 Mb) of the genome, and 7737 simple sequence repeats (SSRs) were identified. Among these SSRs, the trinucleotide repeat type was the most abundant (up to 73.20% of total SSRs), followed by the di- (17.41%), tetra- (5.49%), hexa- (2.90%), and penta- (1.00%) nucleotide repeat type. These characteristics suggest that Gp. lemaneiformis is a model organism for genetic study. This is the first report of genome-wide characterization within this taxon.

  17. Genome Survey Sequencing and Genetic Background Characterization of Gracilariopsis lemaneiformis (Rhodophyta) Based on Next-Generation Sequencing

    Science.gov (United States)

    Sui, Zhenghong; Fu, Feng; Wang, Jinguo; Chang, Lianpeng; Guo, Weihua; Li, Binbin

    2013-01-01

    Gracilariopsis lemaneiformis has a high economic value and is one of the most important aquaculture species in China. Despite it is economic importance, it has remained largely unstudied at the genomic level. In this study, we conducted a genome survey of Gp. lemaneiformis using next-generation sequencing (NGS) technologies. In total, 18.70 Gb of high-quality sequence data with an estimated genome size of 97 Mb were obtained by HiSeq 2000 sequencing for Gp. lemaneiformis. These reads were assembled into 160,390 contigs with a N50 length of 3.64 kb, which were further assembled into 125,685 scaffolds with a total length of 81.17 Mb. Genome analysis predicted 3490 genes and a GC% content of 48%. The identified genes have an average transcript length of 1,429 bp, an average coding sequence size of 1,369 bp, 1.36 exons per gene, exon length of 1,008 bp, and intron length of 191 bp. From the initial assembled scaffold, transposable elements constituted 54.64% (44.35 Mb) of the genome, and 7737 simple sequence repeats (SSRs) were identified. Among these SSRs, the trinucleotide repeat type was the most abundant (up to 73.20% of total SSRs), followed by the di- (17.41%), tetra- (5.49%), hexa- (2.90%), and penta- (1.00%) nucleotide repeat type. These characteristics suggest that Gp. lemaneiformis is a model organism for genetic study. This is the first report of genome-wide characterization within this taxon. PMID:23875008

  18. WebPrInSeS: automated full-length clone sequence identification and verification using high-throughput sequencing data.

    Science.gov (United States)

    Massouras, Andreas; Decouttere, Frederik; Hens, Korneel; Deplancke, Bart

    2010-07-01

    High-throughput sequencing (HTS) is revolutionizing our ability to obtain cheap, fast and reliable sequence information. Many experimental approaches are expected to benefit from the incorporation of such sequencing features in their pipeline. Consequently, software tools that facilitate such an incorporation should be of great interest. In this context, we developed WebPrInSeS, a web server tool allowing automated full-length clone sequence identification and verification using HTS data. WebPrInSeS encompasses two separate software applications. The first is WebPrInSeS-C which performs automated sequence verification of user-defined open-reading frame (ORF) clone libraries. The second is WebPrInSeS-E, which identifies positive hits in cDNA or ORF-based library screening experiments such as yeast one- or two-hybrid assays. Both tools perform de novo assembly using HTS data from any of the three major sequencing platforms. Thus, WebPrInSeS provides a highly integrated, cost-effective and efficient way to sequence-verify or identify clones of interest. WebPrInSeS is available at http://webprinses.epfl.ch/ and is open to all users.

  19. Effect of temperature and cycle length on microbial competition in PHB-producing sequencing batch reactor

    NARCIS (Netherlands)

    Jiang, Y.; Marang, L.; Kleerebezem, R.; Muyzer, G.; van Loosdrecht, M.C.M.

    2011-01-01

    The impact of temperature and cycle length on microbial competition between polyhydroxybutyrate (PHB)-producing populations enriched in feast-famine sequencing batch reactors (SBRs) was investigated at temperatures of 20 °C and 30 °C, and in a cycle length range of 1-18 h. In this study, the

  20. Thousands of primer-free, high-quality, full-length SSU rRNA sequences from all domains of life

    DEFF Research Database (Denmark)

    Karst, Soeren M; Dueholm, Morten S; McIlroy, Simon J

    2016-01-01

    Ribosomal RNA (rRNA) genes are the consensus marker for determination of microbial diversity on the planet, invaluable in studies of evolution and, for the past decade, high-throughput sequencing of variable regions of ribosomal RNA genes has become the backbone of most microbial ecology studies...... (SSU) rRNA genes and synthetic long read sequencing by molecular tagging, to generate primer-free, full-length SSU rRNA gene sequences from all domains of life, with a median raw error rate of 0.17%. We generated thousands of full-length SSU rRNA sequences from five well-studied ecosystems (soil, human...... gut, fresh water, anaerobic digestion, and activated sludge) and obtained sequences covering all domains of life and the majority of all described phyla. Interestingly, 30% of all bacterial operational taxonomic units were novel, compared to the SILVA database (less than 97% similarity...

  1. Inconsistencies of genome annotations in apicomplexan parasites revealed by 5'-end-one-pass and full-length sequences of oligo-capped cDNAs

    Directory of Open Access Journals (Sweden)

    Sugano Sumio

    2009-07-01

    Full Text Available Abstract Background Apicomplexan parasites are causative agents of various diseases including malaria and have been targets of extensive genomic sequencing. We generated 5'-EST collections for six apicomplexa parasites using our full-length oligo-capping cDNA library method. To improve upon the current genome annotations, as well as to validate the importance for physical cDNA clone resources, we generated a large-scale collection of full-length cDNAs for several apicomplexa parasites. Results In this study, we used a total of 61,056 5'-end-single-pass cDNA sequences from Plasmodium falciparum, P. vivax, P. yoelii, P. berghei, Cryptosporidium parvum, and Toxoplasma gondii. We compared these partially sequenced cDNA sequences with the currently annotated gene models and observed significant inconsistencies between the two datasets. In particular, we found that on average 14% of the exons in the current gene models were not supported by any cDNA evidence, and that 16% of the current gene models may contain at least one mis-annotation and should be re-evaluated. We also identified a large number of transcripts that had been previously unidentified. For 732 cDNAs in T. gondii, the entire sequences were determined in order to evaluate the annotated gene models at the complete full-length transcript level. We found that 41% of the T. gondii gene models contained at least one inconsistency. We also identified and confirmed by RT-PCR 140 previously unidentified transcripts found in the intergenic regions of the current gene annotations. We show that the majority of these discrepancies are due to questionable predictions of one or two extra exons in the upstream or downstream regions of the genes. Conclusion Our data indicates that the current gene models are likely to still be incomplete and have much room for improvement. Our unique full-length cDNA information is especially useful for further refinement of the annotations for the genomes of

  2. Total Path Length and Number of Terminal Nodes for Decision Trees

    KAUST Repository

    Hussain, Shahid

    2014-01-01

    This paper presents a new tool for study of relationships between total path length (average depth) and number of terminal nodes for decision trees. These relationships are important from the point of view of optimization of decision trees

  3. First full-length genome sequence of the polerovirus luffa aphid-borne yellows virus (LABYV) reveals the presence of at least two consensus sequences in an isolate from Thailand.

    Science.gov (United States)

    Knierim, Dennis; Maiss, Edgar; Kenyon, Lawrence; Winter, Stephan; Menzel, Wulf

    2015-10-01

    Luffa aphid-borne yellows virus (LABYV) was proposed as the name for a previously undescribed polerovirus based on partial genome sequences obtained from samples of cucurbit plants collected in Thailand between 2008 and 2013. In this study, we determined the first full-length genome sequence of LABYV. Based on phylogenetic analysis and genome properties, it is clear that this virus represents a distinct species in the genus Polerovirus. Analysis of sequences from sample TH24, which was collected in 2010 from a luffa plant in Thailand, reveals the presence of two different full-length genome consensus sequences.

  4. Generation and Analysis of Full-length cDNA Sequences from Elephant Shark (Callorhinchus milii)

    KAUST Repository

    Kodzius, Rimantas

    2009-03-17

    Cartilaginous fishes are the oldest living group of jawed vertebrates and therefore is an important group for understanding the evolution of vertebrate genomes including the human genome. Our laboratory has proposed elephant shark (C. milii) as a model cartilaginous fish genome because of its relatively small genome size (910 Mb). The whole genome of C. milii is being sequenced (first cartilaginous fish genome to be sequenced completely). To characterize the transcriptome of C. milii and to assist in annotating exon-intron boundaries, transcriptional start sites and alternatively spliced transcripts, we are generating full-length cDNA sequences from C. milii.

  5. Evidence for a Complex Mosaic Genome Pattern in a Full-length Hepatitis C Virus Sequence

    Directory of Open Access Journals (Sweden)

    R.S. Ross

    2008-01-01

    Full Text Available The genome of the hepatitis C virus (HCV exhibits a high genetic variability. This remarkable heterogeneity is mainly attributed to the gradual accumulation of mutational changes, whereas the contribution of recombination events to the evolution of HCV remains controversial so far. While performing phylogenetic analyses including a large number of sequences deposited in the GenBank, we encountered a full-length HCV sequence (AY651061 that showed evidence for inter-subtype recombination and was, therefore, subjected to a detailed analysis of its molecular structure. The obtained results indicated that AY651061 does not represent a “simple” HCV 1c isolate, but a complex 1a/1c mosaic genome, showing five putative breakpoints in the core to NS3 regions. To our knowledge, this is the first report on a mosaic HCV full- length sequence with multiple breakpoints. The molecular structure of AY651061 is reminiscent of complex homologous recombinant variants occurring among other members of the flaviviridae family, e.g. GB virus C, dengue virus, and Japanese encephalitis virus. Our finding of a mosaic HCV sequence may have important implications for many fields of current HCV research which merit careful consideration.

  6. Full-length mRNA sequencing uncovers a widespread coupling between transcription initiation and mRNA processing.

    Science.gov (United States)

    Anvar, Seyed Yahya; Allard, Guy; Tseng, Elizabeth; Sheynkman, Gloria M; de Klerk, Eleonora; Vermaat, Martijn; Yin, Raymund H; Johansson, Hans E; Ariyurek, Yavuz; den Dunnen, Johan T; Turner, Stephen W; 't Hoen, Peter A C

    2018-03-29

    The multifaceted control of gene expression requires tight coordination of regulatory mechanisms at transcriptional and post-transcriptional level. Here, we studied the interdependence of transcription initiation, splicing and polyadenylation events on single mRNA molecules by full-length mRNA sequencing. In MCF-7 breast cancer cells, we find 2700 genes with interdependent alternative transcription initiation, splicing and polyadenylation events, both in proximal and distant parts of mRNA molecules, including examples of coupling between transcription start sites and polyadenylation sites. The analysis of three human primary tissues (brain, heart and liver) reveals similar patterns of interdependency between transcription initiation and mRNA processing events. We predict thousands of novel open reading frames from full-length mRNA sequences and obtained evidence for their translation by shotgun proteomics. The mapping database rescues 358 previously unassigned peptides and improves the assignment of others. By recognizing sample-specific amino-acid changes and novel splicing patterns, full-length mRNA sequencing improves proteogenomics analysis of MCF-7 cells. Our findings demonstrate that our understanding of transcriptome complexity is far from complete and provides a basis to reveal largely unresolved mechanisms that coordinate transcription initiation and mRNA processing.

  7. An integrated PCR colony hybridization approach to screen cDNA libraries for full-length coding sequences.

    Science.gov (United States)

    Pollier, Jacob; González-Guzmán, Miguel; Ardiles-Diaz, Wilson; Geelen, Danny; Goossens, Alain

    2011-01-01

    cDNA-Amplified Fragment Length Polymorphism (cDNA-AFLP) is a commonly used technique for genome-wide expression analysis that does not require prior sequence knowledge. Typically, quantitative expression data and sequence information are obtained for a large number of differentially expressed gene tags. However, most of the gene tags do not correspond to full-length (FL) coding sequences, which is a prerequisite for subsequent functional analysis. A medium-throughput screening strategy, based on integration of polymerase chain reaction (PCR) and colony hybridization, was developed that allows in parallel screening of a cDNA library for FL clones corresponding to incomplete cDNAs. The method was applied to screen for the FL open reading frames of a selection of 163 cDNA-AFLP tags from three different medicinal plants, leading to the identification of 109 (67%) FL clones. Furthermore, the protocol allows for the use of multiple probes in a single hybridization event, thus significantly increasing the throughput when screening for rare transcripts. The presented strategy offers an efficient method for the conversion of incomplete expressed sequence tags (ESTs), such as cDNA-AFLP tags, to FL-coding sequences.

  8. Relationships between total length and otolith size of bluefish Pomatomus saltatrix (Linnaeus, 1766 in the Marmara Sea of Turkey

    Directory of Open Access Journals (Sweden)

    Habib Bal

    2018-01-01

    Full Text Available In the present study, bluefish, Pomatomus saltatrix, was monthly collected from commercial fishing boats operating in the Marmara Sea between January and December 2014. The relationship between total length and otolith size of 346 bluefish samples were examined. Total lengths of females, males and unidentified samples were ranged from between 13.2-37.0, 12.3-34.8 and 13.0-31.6 cm, respectively. Otolith lengths were between 3.82-12.60 mm and otolith width were between 1.59-4.34 mm for all samples. It was found that there is a strong correlations between otolith length-total length (r2 =0.88 and otolith width-total length (r2 0.81.

  9. DNA interactions with a Methylene Blue redox indicator depend on the DNA length and are sequence specific.

    Science.gov (United States)

    Farjami, Elaheh; Clima, Lilia; Gothelf, Kurt V; Ferapontova, Elena E

    2010-06-01

    A DNA molecular beacon approach was used for the analysis of interactions between DNA and Methylene Blue (MB) as a redox indicator of a hybridization event. DNA hairpin structures of different length and guanine (G) content were immobilized onto gold electrodes in their folded states through the alkanethiol linker at the 5'-end. Binding of MB to the folded hairpin DNA was electrochemically studied and compared with binding to the duplex structure formed by hybridization of the hairpin DNA to a complementary DNA strand. Variation of the electrochemical signal from the DNA-MB complex was shown to depend primarily on the DNA length and sequence used: the G-C base pairs were the preferential sites of MB binding in the duplex. For short 20 nts long DNA sequences, the increased electrochemical response from MB bound to the duplex structure was consistent with the increased amount of bound and electrochemically readable MB molecules (i.e. MB molecules that are available for the electron transfer (ET) reaction with the electrode). With longer DNA sequences, the balance between the amounts of the electrochemically readable MB molecules bound to the hairpin DNA and to the hybrid was opposite: a part of the MB molecules bound to the long-sequence DNA duplex seem to be electrochemically mute due to long ET distance. The increasing electrochemical response from MB bound to the short-length DNA hybrid contrasts with the decreasing signal from MB bound to the long-length DNA hybrid and allows an "off"-"on" genosensor development.

  10. Construction of an SNP-based high-density linkage map for flax (Linum usitatissimum L.) using specific length amplified fragment sequencing (SLAF-seq) technology.

    Science.gov (United States)

    Yi, Liuxi; Gao, Fengyun; Siqin, Bateer; Zhou, Yu; Li, Qiang; Zhao, Xiaoqing; Jia, Xiaoyun; Zhang, Hui

    2017-01-01

    Flax is an important crop for oil and fiber, however, no high-density genetic maps have been reported for this species. Specific length amplified fragment sequencing (SLAF-seq) is a high-resolution strategy for large scale de novo discovery and genotyping of single nucleotide polymorphisms. In this study, SLAF-seq was employed to develop SNP markers in an F2 population to construct a high-density genetic map for flax. In total, 196.29 million paired-end reads were obtained. The average sequencing depth was 25.08 in male parent, 32.17 in the female parent, and 9.64 in each F2 progeny. In total, 389,288 polymorphic SLAFs were detected, from which 260,380 polymorphic SNPs were developed. After filtering, 4,638 SNPs were found suitable for genetic map construction. The final genetic map included 4,145 SNP markers on 15 linkage groups and was 2,632.94 cM in length, with an average distance of 0.64 cM between adjacent markers. To our knowledge, this map is the densest SNP-based genetic map for flax. The SNP markers and genetic map reported in here will serve as a foundation for the fine mapping of quantitative trait loci (QTLs), map-based gene cloning and marker assisted selection (MAS) for flax.

  11. Total RNA Sequencing Analysis of DCIS Progressing to Invasive Breast Cancer

    Science.gov (United States)

    2017-09-01

    AWARD NUMBER: W81XWH-14-1-0080 TITLE: Total RNA Sequencing Analysis of DCIS Progressing to Invasive Breast Cancer . PRINCIPAL INVESTIGATOR...TITLE AND SUBTITLE Total RNA Sequencing Analysis of DCIS Progressing to Invasive Breast Cancer . 5a. CONTRACT NUMBER 5b. GRANT NUMBER GRANT11489...institutional, NIH-funded study of genetic and epigenetic alterations of pre-invasive DCIS that did or did not progress to invasive breast cancer , with an

  12. No change in total length of white matter fibers in Alzheimer's disease

    DEFF Research Database (Denmark)

    Jorgensen, A.M.; Marner, L.; Pakkenberg, B.

    2008-01-01

    White matter changes have been reported as part of Alzheimer dementia. To investigate this, the total subcortical myelinated nerve fiber length was estimated in postmortem brains from eight females (age 79-88 years) with severe Alzheimer's disease (AD) and compared with brains from 10 female...

  13. An analysis of expressed sequence tags of developing castor endosperm using a full-length cDNA library

    Directory of Open Access Journals (Sweden)

    Wallis James G

    2007-07-01

    Full Text Available Abstract Background Castor seeds are a major source for ricinoleate, an important industrial raw material. Genomics studies of castor plant will provide critical information for understanding seed metabolism, for effectively engineering ricinoleate production in transgenic oilseeds, or for genetically improving castor plants by eliminating toxic and allergic proteins in seeds. Results Full-length cDNAs are useful resources in annotating genes and in providing functional analysis of genes and their products. We constructed a full-length cDNA library from developing castor endosperm, and obtained 4,720 ESTs from 5'-ends of the cDNA clones representing 1,908 unique sequences. The most abundant transcripts are genes encoding storage proteins, ricin, agglutinin and oleosins. Several other sequences are also very numerous, including two acidic triacylglycerol lipases, and the oleate hydroxylase (FAH12 gene that is responsible for ricinoleate biosynthesis. The role(s of the lipases in developing castor seeds are not clear, and co-expressing of a lipase and the FAH12 did not result in significant changes in hydroxy fatty acid accumulation in transgenic Arabidopsis seeds. Only one oleate desaturase (FAD2 gene was identified in our cDNA sequences. Sequence and functional analyses of the castor FAD2 were carried out since it had not been characterized previously. Overexpression of castor FAD2 in a FAH12-expressing Arabidopsis line resulted in decreased accumulation of hydroxy fatty acids in transgenic seeds. Conclusion Our results suggest that transcriptional regulation of FAD2 and FAH12 genes maybe one of the mechanisms that contribute to a high level of ricinoleate accumulation in castor endosperm. The full-length cDNA library will be used to search for additional genes that affect ricinoleate accumulation in seed oils. Our EST sequences will also be useful to annotate the castor genome, which whole sequence is being generated by shotgun sequencing at

  14. Construction of a Full-Length Enriched cDNA Library and Preliminary Analysis of Expressed Sequence Tags from Bengal Tiger Panthera tigris tigris

    Science.gov (United States)

    Liu, Changqing; Liu, Dan; Guo, Yu; Lu, Taofeng; Li, Xiangchen; Zhang, Minghai; Ma, Jianzhang; Ma, Yuehui; Guan, Weijun

    2013-01-01

    In this study, a full-length enriched cDNA library was successfully constructed from Bengal tiger, Panthera tigris tigris, the most well-known wild Animal. Total RNA was extracted from cultured Bengal tiger fibroblasts in vitro. The titers of primary and amplified libraries were 1.28 × 106 pfu/mL and 1.56 × 109 pfu/mL respectively. The percentage of recombinants from unamplified library was 90.2% and average length of exogenous inserts was 0.98 kb. A total of 212 individual ESTs with sizes ranging from 356 to 1108 bps were then analyzed. The BLASTX score revealed that 48.1% of the sequences were classified as a strong match, 45.3% as nominal and 6.6% as a weak match. Among the ESTs with known putative function, 26.4% ESTs were found to be related to all kinds of metabolisms, 19.3% ESTs to information storage and processing, 11.3% ESTs to posttranslational modification, protein turnover, chaperones, 11.3% ESTs to transport, 9.9% ESTs to signal transducer/cell communication, 9.0% ESTs to structure protein, 3.8% ESTs to cell cycle, and only 6.6% ESTs classified as novel genes. By EST sequencing, a full-length gene coding ferritin was identified and characterized. The recombinant plasmid pET32a-TAT-Ferritin was constructed, coded for the TAT-Ferritin fusion protein with two 6× His-tags in N and C-terminal. After BCA assay, the concentration of soluble Trx-TAT-Ferritin recombinant protein was 2.32 ± 0.12 mg/mL. These results demonstrated that the reliability and representativeness of the cDNA library attained to the requirements of a standard cDNA library. This library provided a useful platform for the functional genome and transcriptome research of Bengal tigers. PMID:23708105

  15. Construction of a Full-Length Enriched cDNA Library and Preliminary Analysis of Expressed Sequence Tags from Bengal Tiger Panthera tigris tigris

    Directory of Open Access Journals (Sweden)

    Changqing Liu

    2013-05-01

    Full Text Available In this study, a full-length enriched cDNA library was successfully constructed from Bengal tiger, Panthera tigris tigris, the most well-known wild Animal. Total RNA was extracted from cultured Bengal tiger fibroblasts in vitro. The titers of primary and amplified libraries were 1.28 × 106 pfu/mL and 1.56 × 109 pfu/mL respectively. The percentage of recombinants from unamplified library was 90.2% and average length of exogenous inserts was 0.98 kb. A total of 212 individual ESTs with sizes ranging from 356 to 1108 bps were then analyzed. The BLASTX score revealed that 48.1% of the sequences were classified as a strong match, 45.3% as nominal and 6.6% as a weak match. Among the ESTs with known putative function, 26.4% ESTs were found to be related to all kinds of metabolisms, 19.3% ESTs to information storage and processing, 11.3% ESTs to posttranslational modification, protein turnover, chaperones, 11.3% ESTs to transport, 9.9% ESTs to signal transducer/cell communication, 9.0% ESTs to structure protein, 3.8% ESTs to cell cycle, and only 6.6% ESTs classified as novel genes. By EST sequencing, a full-length gene coding ferritin was identified and characterized. The recombinant plasmid pET32a-TAT-Ferritin was constructed, coded for the TAT-Ferritin fusion protein with two 6× His-tags in N and C-terminal. After BCA assay, the concentration of soluble Trx-TAT-Ferritin recombinant protein was 2.32 ± 0.12 mg/mL. These results demonstrated that the reliability and representativeness of the cDNA library attained to the requirements of a standard cDNA library. This library provided a useful platform for the functional genome and transcriptome research of Bengal tigers.

  16. Alignment-free Transcriptomic and Metatranscriptomic Comparison Using Sequencing Signatures with Variable Length Markov Chains.

    Science.gov (United States)

    Liao, Weinan; Ren, Jie; Wang, Kun; Wang, Shun; Zeng, Feng; Wang, Ying; Sun, Fengzhu

    2016-11-23

    The comparison between microbial sequencing data is critical to understand the dynamics of microbial communities. The alignment-based tools analyzing metagenomic datasets require reference sequences and read alignments. The available alignment-free dissimilarity approaches model the background sequences with Fixed Order Markov Chain (FOMC) yielding promising results for the comparison of microbial communities. However, in FOMC, the number of parameters grows exponentially with the increase of the order of Markov Chain (MC). Under a fixed high order of MC, the parameters might not be accurately estimated owing to the limitation of sequencing depth. In our study, we investigate an alternative to FOMC to model background sequences with the data-driven Variable Length Markov Chain (VLMC) in metatranscriptomic data. The VLMC originally designed for long sequences was extended to apply to high-throughput sequencing reads and the strategies to estimate the corresponding parameters were developed. The flexible number of parameters in VLMC avoids estimating the vast number of parameters of high-order MC under limited sequencing depth. Different from the manual selection in FOMC, VLMC determines the MC order adaptively. Several beta diversity measures based on VLMC were applied to compare the bacterial RNA-Seq and metatranscriptomic datasets. Experiments show that VLMC outperforms FOMC to model the background sequences in transcriptomic and metatranscriptomic samples. A software pipeline is available at https://d2vlmc.codeplex.com.

  17. RT-PCR and sequence analysis of the full-length fusion protein of Canine Distemper Virus from domestic dogs.

    Science.gov (United States)

    Romanutti, Carina; Gallo Calderón, Marina; Keller, Leticia; Mattion, Nora; La Torre, José

    2016-02-01

    During 2007-2014, 84 out of 236 (35.6%) samples from domestic dogs submitted to our laboratory for diagnostic purposes were positive for Canine Distemper Virus (CDV), as analyzed by RT-PCR amplification of a fragment of the nucleoprotein gene. Fifty-nine of them (70.2%) were from dogs that had been vaccinated against CDV. The full-length gene encoding the Fusion (F) protein of fifteen isolates was sequenced and compared with that of those of other CDVs, including wild-type and vaccine strains. Phylogenetic analysis using the F gene full-length sequences grouped all the Argentinean CDV strains in the SA2 clade. Sequence identity with the Onderstepoort vaccine strain was 89.0-90.6%, and the highest divergence was found in the 135 amino acids corresponding to the F protein signal-peptide, Fsp (64.4-66.7% identity). In contrast, this region was highly conserved among the local strains (94.1-100% identity). One extra putative N-glycosylation site was identified in the F gene of CDV Argentinean strains with respect to the vaccine strain. The present report is the first to analyze full-length F protein sequences of CDV strains circulating in Argentina, and contributes to the knowledge of molecular epidemiology of CDV, which may help in understanding future disease outbreaks. Copyright © 2015 Elsevier B.V. All rights reserved.

  18. Giardia telomeric sequence d(TAGGG)4 forms two intramolecular G-quadruplexes in K+ solution: effect of loop length and sequence on the folding topology.

    Science.gov (United States)

    Hu, Lanying; Lim, Kah Wai; Bouaziz, Serge; Phan, Anh Tuân

    2009-11-25

    Recently, it has been shown that in K(+) solution the human telomeric sequence d[TAGGG(TTAGGG)(3)] forms a (3 + 1) intramolecular G-quadruplex, while the Bombyx mori telomeric sequence d[TAGG(TTAGG)(3)], which differs from the human counterpart only by one G deletion in each repeat, forms a chair-type intramolecular G-quadruplex, indicating an effect of G-tract length on the folding topology of G-quadruplexes. To explore the effect of loop length and sequence on the folding topology of G-quadruplexes, here we examine the structure of the four-repeat Giardia telomeric sequence d[TAGGG(TAGGG)(3)], which differs from the human counterpart only by one T deletion within the non-G linker in each repeat. We show by NMR that this sequence forms two different intramolecular G-quadruplexes in K(+) solution. The first one is a novel basket-type antiparallel-stranded G-quadruplex containing two G-tetrads, a G x (A-G) triad, and two A x T base pairs; the three loops are consecutively edgewise-diagonal-edgewise. The second one is a propeller-type parallel-stranded G-quadruplex involving three G-tetrads; the three loops are all double-chain-reversal. Recurrence of several structural elements in the observed structures suggests a "cut and paste" principle for the design and prediction of G-quadruplex topologies, for which different elements could be extracted from one G-quadruplex and inserted into another.

  19. Genetic characterization of human herpesvirus type 1: Full-length genome sequence of strain obtained from an encephalitis case from India

    Directory of Open Access Journals (Sweden)

    Vijay P Bondre

    2016-01-01

    Interpretation & conclusions: Our results showed that the full-length genome sequence generated from an Indian HSV-1 isolate shared close genetic relationship with the American KOS and Chinese CR38 strains which belonged to the Asian genetic lineage. Recombination analysis of Indian isolate demonstrated multiple recombination crossover points throughout the genome. This full-length genome sequence amplified from the Indian isolate would be helpful to study HSV evolution, genetic basis of differential pathogenesis, host-virus interactions and viral factors contributing towards differential clinical outcome in human infections.

  20. A leaf sequencing algorithm to enlarge treatment field length in IMRT

    International Nuclear Information System (INIS)

    Xia Ping; Hwang, Andrew B.; Verhey, Lynn J.

    2002-01-01

    With MLC-based IMRT, the maximum usable field size is often smaller than the maximum field size for conventional treatments. This is due to the constraints of the overtravel distances of MLC leaves and/or jaws. Using a new leaf sequencing algorithm, the usable IMRT field length (perpendicular to the MLC motion) can be mostly made equal to the full length of the MLC field without violating the upper jaw overtravel limit. For any given intensity pattern, a criterion was proposed to assess whether an intensity pattern can be delivered without violation of the jaw position constraints. If the criterion is met, the new algorithm will consider the jaw position constraints during the segmentation for the step and shoot delivery method. The strategy employed by the algorithm is to connect the intensity elements outside the jaw overtravel limits with those inside the jaw overtravel limits. Several methods were used to establish these connections during segmentation by modifying a previously published algorithm (areal algorithm), including changing the intensity level, alternating the leaf-sequencing direction, or limiting the segment field size. The algorithm was tested with 1000 random intensity patterns with dimensions of 21x27 cm2, 800 intensity patterns with higher intensity outside the jaw overtravel limit, and three different types of clinical treatment plans that were undeliverable using a segmentation method from a commercial treatment planning system. The new algorithm achieved a success rate of 100% with these test patterns. For the 1000 random patterns, the new algorithm yields a similar average number of segments of 36.9±2.9 in comparison to 36.6±1.3 when using the areal algorithm. For the 800 patterns with higher intensities outside the jaw overtravel limits, the new algorithm results in an increase of 25% in the average number of segments compared to the areal algorithm. However, the areal algorithm fails to create deliverable segments for 90% of these

  1. Do climate variables and human density affect Achatina fulica (Bowditch) (Gastropoda: Pulmonata) shell length, total weight and condition factor?

    Science.gov (United States)

    Albuquerque, F S; Peso-Aguiar, M C; Assunção-Albuquerque, M J T; Gálvez, L

    2009-08-01

    The length-weight relationship and condition factor have been broadly investigated in snails to obtain the index of physical condition of populations and evaluate habitat quality. Herein, our goal was to describe the best predictors that explain Achatina fulica biometrical parameters and well being in a recently introduced population. From November 2001 to November 2002, monthly snail samples were collected in Lauro de Freitas City, Bahia, Brazil. Shell length and total weight were measured in the laboratory and the potential curve and condition factor were calculated. Five environmental variables were considered: temperature range, mean temperature, humidity, precipitation and human density. Multiple regressions were used to generate models including multiple predictors, via model selection approach, and then ranked with AIC criteria. Partial regressions were used to obtain the separated coefficients of determination of climate and human density models. A total of 1.460 individuals were collected, presenting a shell length range between 4.8 to 102.5 mm (mean: 42.18 mm). The relationship between total length and total weight revealed that Achatina fulica presented a negative allometric growth. Simple regression indicated that humidity has a significant influence on A. fulica total length and weight. Temperature range was the main variable that influenced the condition factor. Multiple regressions showed that climatic and human variables explain a small proportion of the variance in shell length and total weight, but may explain up to 55.7% of the condition factor variance. Consequently, we believe that the well being and biometric parameters of A. fulica can be influenced by climatic and human density factors.

  2. Do climate variables and human density affect Achatina fulica (Bowditch (Gastropoda: Pulmonata shell length, total weight and condition factor?

    Directory of Open Access Journals (Sweden)

    FS. Albuquerque

    Full Text Available The length-weight relationship and condition factor have been broadly investigated in snails to obtain the index of physical condition of populations and evaluate habitat quality. Herein, our goal was to describe the best predictors that explain Achatina fulica biometrical parameters and well being in a recently introduced population. From November 2001 to November 2002, monthly snail samples were collected in Lauro de Freitas City, Bahia, Brazil. Shell length and total weight were measured in the laboratory and the potential curve and condition factor were calculated. Five environmental variables were considered: temperature range, mean temperature, humidity, precipitation and human density. Multiple regressions were used to generate models including multiple predictors, via model selection approach, and then ranked with AIC criteria. Partial regressions were used to obtain the separated coefficients of determination of climate and human density models. A total of 1.460 individuals were collected, presenting a shell length range between 4.8 to 102.5 mm (mean: 42.18 mm. The relationship between total length and total weight revealed that Achatina fulica presented a negative allometric growth. Simple regression indicated that humidity has a significant influence on A. fulica total length and weight. Temperature range was the main variable that influenced the condition factor. Multiple regressions showed that climatic and human variables explain a small proportion of the variance in shell length and total weight, but may explain up to 55.7% of the condition factor variance. Consequently, we believe that the well being and biometric parameters of A. fulica can be influenced by climatic and human density factors.

  3. High-throughput sequencing of nematode communities from total soil DNA extractions

    DEFF Research Database (Denmark)

    Sapkota, Rumakanta; Nicolaisen, Mogens

    2015-01-01

    nematodes without the need for enrichment was developed. Using this strategy on DNA templates from a set of 22 agricultural soils, we obtained 64.4% sequences of nematode origin in total, whereas the remaining sequences were almost entirely from other metazoans. The nematode sequences were derived from...... in previous sequence-based studies are not nematode specific but also amplify other groups of organisms such as fungi and plantae, and thus require a nematode enrichment step that may introduce biases. Results: In this study an amplification strategy which selectively amplifies a fragment of the SSU from...... a broad taxonomic range and most sequences were from nematode taxa that have previously been found to be abundant in soil such as Tylenchida, Rhabditida, Dorylaimida, Triplonchida and Araeolaimida. Conclusions: Our amplification and sequencing strategy for assessing nematode diversity was able to collect...

  4. Multilocus Sequence Typing of Total-Genome-Sequenced Bacteria

    DEFF Research Database (Denmark)

    Larsen, Mette Voldby; Cosentino, Salvatore; Rasmussen, Simon

    2012-01-01

    Accurate strain identification is essential for anyone working with bacteria. For many species, multilocus sequence typing (MLST) is considered the "gold standard" of typing, but it is traditionally performed in an expensive and time-consuming manner. As the costs of whole-genome sequencing (WGS...

  5. Importance of length and sequence order on magnesium binding to surface-bound oligonucleotides studied by second harmonic generation and atomic force microscopy.

    Science.gov (United States)

    Holland, Joseph G; Geiger, Franz M

    2012-06-07

    The binding of magnesium ions to surface-bound single-stranded oligonucleotides was studied under aqueous conditions using second harmonic generation (SHG) and atomic force microscopy (AFM). The effect of strand length on the number of Mg(II) ions bound and their free binding energy was examined for 5-, 10-, 15-, and 20-mers of adenine and guanine at pH 7, 298 K, and 10 mM NaCl. The binding free energies for adenine and guanine sequences were calculated to be -32.1(4) and -35.6(2) kJ/mol, respectively, and invariant with strand length. Furthermore, the ion density for adenine oligonucleotides did not change as strand length increased, with an average value of 2(1) ions/strand. In sharp contrast, guanine oligonucleotides displayed a linear relationship between strand length and ion density, suggesting that cooperativity is important. This data gives predictive capabilities for mixed strands of various lengths, which we exploit for 20-mers of adenines and guanines. In addition, the role sequence order plays in strands of hetero-oligonucleotides was examined for 5'-A(10)G(10)-3', 5'-(AG)(10)-3', and 5'-G(10)A(10)-3' (here the -3' end is chemically modified to bind to the surface). Although the free energy of binding is the same for these three strands (averaged to be -33.3(4) kJ/mol), the total ion density increases when several guanine residues are close to the 3' end (and thus close to the solid support substrate). To further understand these results, we analyzed the height profiles of the functionalized surfaces with tapping-mode atomic force microscopy (AFM). When comparing the average surface height profiles of the oligonucleotide surfaces pre- and post- Mg(II) binding, a positive correlation was found between ion density and the subsequent height decrease following Mg(II) binding, which we attribute to reductions in Coulomb repulsion and strand collapse once a critical number of Mg(II) ions are bound to the strand.

  6. Relationship between the total length of the stents and patients' quality of life after percutaneous coronary intervention.

    Science.gov (United States)

    Liu, Wei; Yang, Xuming; Dong, Pingshuan; Li, Zhijuan

    2015-01-01

    The aim of this study was to examine the relationship between the total length of the stents and the postoperative life quality of patients with multi-vessel coronary artery disease who undergo percutaneous coronary intervention (PCI). Using the short-form health survey (SF-36) items, we analyzed the data on the postoperative life quality of 166 patients with multi-vessel coronary artery disease who underwent percutaneous transluminal coronary intervention in the Department of Cardiology of the First Affiliated Hospital of Henan University of Science and Technology from September 2011 to September 2013. Follow-up was performed 6 months later. All of the dimensionalities, except general health and mental health, showed significantly higher scores after PCI. No significant relationships were observed between the total length of the stents and the postoperative life quality of patients with multi-vessel coronary artery disease who underwent PCI. PCI can effectively improve the postoperative life quality of patients; however, there was no significant relationship between the total length of the stents and postoperative life quality of patients.

  7. Study of canine parvovirus evolution: comparative analysis of full-length VP2 gene sequences from Argentina and international field strains.

    Science.gov (United States)

    Gallo Calderón, Marina; Wilda, Maximiliano; Boado, Lorena; Keller, Leticia; Malirat, Viviana; Iglesias, Marcela; Mattion, Nora; La Torre, Jose

    2012-02-01

    The continuous emergence of new strains of canine parvovirus (CPV), poorly protected by current vaccination, is a concern among breeders, veterinarians, and dog owners around the world. Therefore, the understanding of the genetic variation in emerging CPV strains is crucial for the design of disease control strategies, including vaccines. In this paper, we obtained the sequences of the full-length gene encoding for the main capsid protein (VP2) of 11 canine parvovirus type 2 (CPV-2) Argentine representative field strains, selected from a total of 75 positive samples studied in our laboratory in the last 9 years. A comparative sequence analysis was performed on 9 CPV-2c, one CPV-2a, and one CPV-2b Argentine strains with respect to international strains reported in the GenBank database. In agreement with previous reports, a high degree of identity was found among CPV-2c Argentine strains (99.6-100% and 99.7-100% at nucleotide and amino acid levels, respectively). However, the appearance of a new substitution in the 440 position (T440A) in four CPV-2c Argentine strains obtained after the year 2009 gives support to the variability observed for this position located within the VP2, three-fold spike. This is the first report on the genetic characterization of the full-length VP2 gene of emerging CPV strains in South America and shows that all the Argentine CPV-2c isolates cluster together with European and North American CPV-2c strains.

  8. Gene discovery from Jatropha curcas by sequencing of ESTs from normalized and full-length enriched cDNA library from developing seeds

    Directory of Open Access Journals (Sweden)

    Sugantham Priyanka Annabel

    2010-10-01

    Full Text Available Abstract Background Jatropha curcas L. is promoted as an important non-edible biodiesel crop worldwide. Jatropha oil, which is a triacylglycerol, can be directly blended with petro-diesel or transesterified with methanol and used as biodiesel. Genetic improvement in jatropha is needed to increase the seed yield, oil content, drought and pest resistance, and to modify oil composition so that it becomes a technically and economically preferred source for biodiesel production. However, genetic improvement efforts in jatropha could not take advantage of genetic engineering methods due to lack of cloned genes from this species. To overcome this hurdle, the current gene discovery project was initiated with an objective of isolating as many functional genes as possible from J. curcas by large scale sequencing of expressed sequence tags (ESTs. Results A normalized and full-length enriched cDNA library was constructed from developing seeds of J. curcas. The cDNA library contained about 1 × 106 clones and average insert size of the clones was 2.1 kb. Totally 12,084 ESTs were sequenced to average high quality read length of 576 bp. Contig analysis revealed 2258 contigs and 4751 singletons. Contig size ranged from 2-23 and there were 7333 ESTs in the contigs. This resulted in 7009 unigenes which were annotated by BLASTX. It showed 3982 unigenes with significant similarity to known genes and 2836 unigenes with significant similarity to genes of unknown, hypothetical and putative proteins. The remaining 191 unigenes which did not show similarity with any genes in the public database may encode for unique genes. Functional classification revealed unigenes related to broad range of cellular, molecular and biological functions. Among the 7009 unigenes, 6233 unigenes were identified to be potential full-length genes. Conclusions The high quality normalized cDNA library was constructed from developing seeds of J. curcas for the first time and 7009 unigenes coding

  9. Mining candidate genes associated with powdery mildew resistance in cucumber via super-BSA by specific length amplified fragment (SLAF) sequencing.

    Science.gov (United States)

    Zhang, Peng; Zhu, Yuqiang; Wang, Lili; Chen, Liping; Zhou, Shengjun

    2015-12-14

    Powdery mildew (PM) is the most common fungal disease of cucumber and other cucurbit crops, while breeding the PM-resistant materials is the effective way to defense this disease, and the recent development of modern genetics and genomics make us aware of that studying the resistance genes is the essential way to breed the PM high-resistance plant. With the ever increasing throughput of next-generation sequencing (NGS), the development of specific length amplified fragment sequencing (SLAF-seq) as a high-resolution strategy for large-scale de novo SNP discovery is gradually applied for functional gene mining. Here we combined the bulked segregant analysis (BSA) with SLAF-seq to identify candidate genes associated with PM resistance in cucumber. A segregating population comprising 251 F2 individuals was developed using H136 (female parent) as susceptible parent and BK2 (male parent) as resistance donor. After PMR test, total genomic DNA was prepared from each plant. Systemic genomic analysis of the GC content, repeat sequence, etc. was carried out by prediction software SLAF_Predict to establish condition to ensure the uniformity and density of the molecular markers. After samples were gel purified, SLAFs were generated at Biomarker Technologies Corporation in Beijing. Based on SLAF tags and the PMR test result, the hot region were annotated. A total of 73,100 high-quality SLAF tags with an average depth of 99.11× were sequenced. Among these, 5,355 polymorphic tags were identified with a polymorphism rate of 7.34 %, including 7.09 % SNPs and other polymorphism types. Finally, 140 associated SLAFs were identified, and two main Hot Regions were detected on chromosome 1 and 6, which contained five genes invovled in defense response, toxin metabolism, cell stress response, and injury response in cucumber. Associated markers identified by super-BSA in this study, could not only speed up the study of the PMR genes, but also provide a feasible solution for breeding the

  10. Effect of temperature and cycle length on microbial competition in PHB-producing sequencing batch reactor.

    Science.gov (United States)

    Jiang, Yang; Marang, Leonie; Kleerebezem, Robbert; Muyzer, Gerard; van Loosdrecht, Mark C M

    2011-05-01

    The impact of temperature and cycle length on microbial competition between polyhydroxybutyrate (PHB)-producing populations enriched in feast-famine sequencing batch reactors (SBRs) was investigated at temperatures of 20 °C and 30 °C, and in a cycle length range of 1-18 h. In this study, the microbial community structure of the PHB-producing enrichments was found to be strongly dependent on temperature, but not on cycle length. Zoogloea and Plasticicumulans acidivorans dominated the SBRs operated at 20 °C and 30 °C, respectively. Both enrichments accumulated PHB more than 75% of cell dry weight. Short-term temperature change experiments revealed that P. acidivorans was more temperature sensitive as compared with Zoogloea. This is particularly true for the PHB degradation, resulting in incomplete PHB degradation in P. acidivorans at 20 °C. Incomplete PHB degradation limited biomass growth and allowed Zoogloea to outcompete P. acidivorans. The PHB content at the end of the feast phase correlated well with the cycle length at a constant solid retention time (SRT). These results suggest that to establish enrichment with the capacity to store a high fraction of PHB, the number of cycles per SRT should be minimized independent of the temperature.

  11. Sequence-length variation of mtDNA HVS-I C-stretch in Chinese ethnic groups.

    Science.gov (United States)

    Chen, Feng; Dang, Yong-hui; Yan, Chun-xia; Liu, Yan-ling; Deng, Ya-jun; Fulton, David J R; Chen, Teng

    2009-10-01

    The purpose of this study was to investigate mitochondrial DNA (mtDNA) hypervariable segment-I (HVS-I) C-stretch variations and explore the significance of these variations in forensic and population genetics studies. The C-stretch sequence variation was studied in 919 unrelated individuals from 8 Chinese ethnic groups using both direct and clone sequencing approaches. Thirty eight C-stretch haplotypes were identified, and some novel and population specific haplotypes were also detected. The C-stretch genetic diversity (GD) values were relatively high, and probability (P) values were low. Additionally, C-stretch length heteroplasmy was observed in approximately 9% of individuals studied. There was a significant correlation (r=-0.961, Ppopulations. The results from the Fst and dA genetic distance matrix, neighbor-joining tree, and principal component map also suggest that C-stretch could be used as a reliable genetic marker in population genetics.

  12. High-throughput sequencing of three Lemnoideae (duckweeds chloroplast genomes from total DNA.

    Directory of Open Access Journals (Sweden)

    Wenqin Wang

    Full Text Available BACKGROUND: Chloroplast genomes provide a wealth of information for evolutionary and population genetic studies. Chloroplasts play a particularly important role in the adaption for aquatic plants because they float on water and their major surface is exposed continuously to sunlight. The subfamily of Lemnoideae represents such a collection of aquatic species that because of photosynthesis represents one of the fastest growing plant species on earth. METHODS: We sequenced the chloroplast genomes from three different genera of Lemnoideae, Spirodela polyrhiza, Wolffiella lingulata and Wolffia australiana by high-throughput DNA sequencing of genomic DNA using the SOLiD platform. Unfractionated total DNA contains high copies of plastid DNA so that sequences from the nucleus and mitochondria can easily be filtered computationally. Remaining sequence reads were assembled into contiguous sequences (contigs using SOLiD software tools. Contigs were mapped to a reference genome of Lemna minor and gaps, selected by PCR, were sequenced on the ABI3730xl platform. CONCLUSIONS: This combinatorial approach yielded whole genomic contiguous sequences in a cost-effective manner. Over 1,000-time coverage of chloroplast from total DNA were reached by the SOLiD platform in a single spot on a quadrant slide without purification. Comparative analysis indicated that the chloroplast genome was conserved in gene number and organization with respect to the reference genome of L. minor. However, higher nucleotide substitution, abundant deletions and insertions occurred in non-coding regions of these genomes, indicating a greater genomic dynamics than expected from the comparison of other related species in the Pooideae. Noticeably, there was no transition bias over transversion in Lemnoideae. The data should have immediate applications in evolutionary biology and plant taxonomy with increased resolution and statistical power.

  13. Do illness rating systems predict discharge location, length of stay, and cost after total hip arthroplasty?

    Directory of Open Access Journals (Sweden)

    Sarah E. Rudasill, BA

    2018-06-01

    Conclusions: These findings suggest that although ASA classifications predict discharge location and SOI scores predict length of stay and total costs, other factors beyond illness rating systems remain stronger predictors of discharge for THA patients.

  14. Human uroporphyrinogen III synthase: Molecular cloning, nucleotide sequence, and expression of a full-length cDNA

    International Nuclear Information System (INIS)

    Tsai, Shihfeng; Bishop, D.F.; Desnick, R.J.

    1988-01-01

    Uroporphyrinogen III synthase, the fourth enzyme in the heme biosynthetic pathway, is responsible for conversion of the linear tetrapyrrole, hydroxymethylbilane, to the cyclic tetrapyrrole, uroporphyrinogen III. The deficient activity of URO-synthase is the enzymatic defect in the autosomal recessive disorder congenital erythropoietic porphyria. To facilitate the isolation of a full-length cDNA for human URO-synthase, the human erythrocyte enzyme was purified to homogeneity and 81 nonoverlapping amino acids were determined by microsequencing the N terminus and four tryptic peptides. Two synthetic oligonucleotide mixtures were used to screen 1.2 x 10 6 recombinants from a human adult liver cDNA library. Eight clones were positive with both oligonucleotide mixtures. Of these, dideoxy sequencing of the 1.3 kilobase insert from clone pUROS-2 revealed 5' and 3' untranslated sequences of 196 and 284 base pairs, respectively, and an open reading frame of 798 base pairs encoding a protein of 265 amino acids with a predicted molecular mass of 28,607 Da. The isolation and expression of this full-length cDNA for human URO-synthase should facilitate studies of the structure, organization, and chromosomal localization of this heme biosynthetic gene as well as the characterization of the molecular lesions causing congenital erythropoietic porphyria

  15. Cloning and sequencing of full-length cDNAs of RNA1 and RNA2 of a Tomato black ring virus isolate from Poland.

    Science.gov (United States)

    Jończyk, M; Le Gall, O; Pałucha, A; Borodynko, N; Pospieszny, H

    2004-04-01

    Full-length cDNA clones corresponding to the RNA1 and RNA2 of the Polish isolate MJ of Tomato black ring virus (TBRV, genus Nepovirus) were obtained using a direct recombination strategy in yeast, and their complete nucleotide sequences were established. RNA1 is 7358 nucleotides and RNA2 is 4633 nucleotides in length, excluding the poly(A) tails. Both RNAs contain a single open reading frame encoding polyproteins of 254 kDa and 149 kDa for RNA1 and RNA2 respectively. Putative cleavage sites were identified, and the relationships between TBRV and related nepoviruses were studied by sequence comparison.

  16. Measurement of Telomere Length in Colorectal Cancers for Improved Molecular Diagnosis

    Directory of Open Access Journals (Sweden)

    Eric Le Balc’h

    2017-08-01

    Full Text Available All tumors have in common to reactivate a telomere maintenance mechanism to allow for unlimited proliferation. On the other hand, genetic instability found in some tumors can result from the loss of telomeres. Here, we measured telomere length in colorectal cancers (CRCs using TRF (Telomere Restriction Fragment analysis. Telomeric DNA content was also quantified as the ratio of total telomeric (TTAGGG sequences over that of the invariable Alu sequences. In most of the 125 CRCs analyzed, there was a significant diminution in telomere length compared with that in control healthy tissue. Only 34 tumors exhibited no telomere erosion and, in some cases, a slight telomere lengthening. Telomere length did not correlate with age, gender, tumor stage, tumor localization or stage of tumor differentiation. In addition, while telomere length did not correlate with the presence of a mutation in BRAF (V-raf murine sarcoma viral oncogene homolog B, PIK3CA (phosphatidylinositol 3-kinase catalytic subunit, or MSI status, it was significantly associated with the occurrence of a mutation in KRAS. Interestingly, we found that the shorter the telomeres in healthy tissue of a patient, the larger an increase in telomere length in the tumor. Our study points to the existence of two types of CRCs based on telomere length and reveals that telomere length in healthy tissue might influence telomere maintenance mechanisms in the tumor.

  17. Length and repeat-sequence variation in 58 STRs and 94 SNPs in two Spanish populations.

    Science.gov (United States)

    Casals, Ferran; Anglada, Roger; Bonet, Núria; Rasal, Raquel; van der Gaag, Kristiaan J; Hoogenboom, Jerry; Solé-Morata, Neus; Comas, David; Calafell, Francesc

    2017-09-01

    We have genotyped the 58 STRs (27 autosomal, 24 Y-STRs and 7 X-STRs) and 94 autosomal SNPs in Illumina ForenSeq™ Primer Mix A in 88 Spanish Roma (Gypsy) samples and 143 Catalans. Since this platform is based in massive parallel sequencing, we have used simple R scripts to uncover the sequence variation in the repeat region. Thus, we have found, across 58 STRs, 541 length-based alleles, which, after considering repeat-sequence variation, became 804 different alleles. All loci in both populations were in Hardy-Weinberg equilibrium. F ST between both populations was 0.0178 for autosomal SNPs, 0.0146 for autosomal STRs, 0.0101 for X-STRs and 0.1866 for Y-STRs. Combined a priori statistics showed quite large; for instance, pooling all the autosomal loci, the a priori probabilities of discriminating a suspect become 1-(2.3×10 -70 ) and 1-(5.9×10 -73 ), for Roma and Catalans respectively, and the chances of excluding a false father in a trio are 1-(2.6×10 -20 ) and 1-(2.0×10 -21 ). Copyright © 2017 Elsevier B.V. All rights reserved.

  18. Multiple sequence alignment accuracy and phylogenetic inference.

    Science.gov (United States)

    Ogden, T Heath; Rosenberg, Michael S

    2006-04-01

    Phylogenies are often thought to be more dependent upon the specifics of the sequence alignment rather than on the method of reconstruction. Simulation of sequences containing insertion and deletion events was performed in order to determine the role that alignment accuracy plays during phylogenetic inference. Data sets were simulated for pectinate, balanced, and random tree shapes under different conditions (ultrametric equal branch length, ultrametric random branch length, nonultrametric random branch length). Comparisons between hypothesized alignments and true alignments enabled determination of two measures of alignment accuracy, that of the total data set and that of individual branches. In general, our results indicate that as alignment error increases, topological accuracy decreases. This trend was much more pronounced for data sets derived from more pectinate topologies. In contrast, for balanced, ultrametric, equal branch length tree shapes, alignment inaccuracy had little average effect on tree reconstruction. These conclusions are based on average trends of many analyses under different conditions, and any one specific analysis, independent of the alignment accuracy, may recover very accurate or inaccurate topologies. Maximum likelihood and Bayesian, in general, outperformed neighbor joining and maximum parsimony in terms of tree reconstruction accuracy. Results also indicated that as the length of the branch and of the neighboring branches increase, alignment accuracy decreases, and the length of the neighboring branches is the major factor in topological accuracy. Thus, multiple-sequence alignment can be an important factor in downstream effects on topological reconstruction.

  19. The length-weight and length-length relationships of bluefish, Pomatomus saltatrix (Linnaeus, 1766 from Samsun, middle Black Sea region

    Directory of Open Access Journals (Sweden)

    Melek Özpiçak

    2017-10-01

    Full Text Available In this study, length-weight relationship (LWR and length-length relationship (LLR of bluefish, Pomatomus saltatrix were determined. A total of 125 specimens were sampled from Samsun, the middle Black Sea in 2014 fishing season. Bluefish specimens were monthly collected from commercial fishing boats from October to December 2014. All captured individuals (N=125 were measured to the nearest 0.1 cm for total, fork and standard lengths. The weight of each fish (W was recorded to the nearest 0.01 g. According to results of analyses, there were no statistically significant differences between sexes in term of length and weight (P˃0.05. The minimum and maximum total, fork and standard lengths of bluefish ranged between 13.5-23.6 cm, 12.50-21.80 cm and 10.60-20.10 cm, respectively. The equation of length-weight relationship were calculated as W=0.008TL3.12 (r2>0.962. Positive allometric growth was observed for bluefish (b>3. Length-length relationship was also highly significant (P<0.001 with coefficient of determination (r2 ranging from 0.916 to 0.988.

  20. The length-weight and length-length relationships of bluefish, Pomatomus saltatrix (Linnaeus, 1766) from Samsun, middle Black Sea region

    OpenAIRE

    Özpiçak, Melek; Saygın, Semra; Polat, Nazmi

    2017-01-01

    In this study, length-weight relationship (LWR) and length-length relationship (LLR) of bluefish,Pomatomus saltatrix were determined. A total of 125 specimens were sampled from Samsun, themiddle Black Sea in 2014 fishing season. Bluefish specimens were monthly collected fromcommercial fishing boats from October to December 2014. All captured individuals (N=125) weremeasured to the nearest 0.1 cm for total, fork and standard lengths. The weight of each fish (W)was recorded to the nearest 0.01 ...

  1. Assessment of adaptive evolution between wheat and rice as deduced from full-length common wheat cDNA sequence data and expression patterns

    Directory of Open Access Journals (Sweden)

    Hayashizaki Yoshihide

    2009-06-01

    Full Text Available Abstract Background Wheat is an allopolyploid plant that harbors a huge, complex genome. Therefore, accumulation of expressed sequence tags (ESTs for wheat is becoming particularly important for functional genomics and molecular breeding. We prepared a comprehensive collection of ESTs from the various tissues that develop during the wheat life cycle and from tissues subjected to stress. We also examined their expression profiles in silico. As full-length cDNAs are indispensable to certify the collected ESTs and annotate the genes in the wheat genome, we performed a systematic survey and sequencing of the full-length cDNA clones. This sequence information is a valuable genetic resource for functional genomics and will enable carrying out comparative genomics in cereals. Results As part of the functional genomics and development of genomic wheat resources, we have generated a collection of full-length cDNAs from common wheat. By grouping the ESTs of recombinant clones randomly selected from the full-length cDNA library, we were able to sequence 6,162 independent clones with high accuracy. About 10% of the clones were wheat-unique genes, without any counterparts within the DNA database. Wheat clones that showed high homology to those of rice were selected in order to investigate their expression patterns in various tissues throughout the wheat life cycle and in response to abiotic-stress treatments. To assess the variability of genes that have evolved differently in wheat and rice, we calculated the substitution rate (Ka/Ks of the counterparts in wheat and rice. Genes that were preferentially expressed in certain tissues or treatments had higher Ka/Ks values than those in other tissues and treatments, which suggests that the genes with the higher variability expressed in these tissues is under adaptive selection. Conclusion We have generated a high-quality full-length cDNA resource for common wheat, which is essential for continuation of the

  2. Universal sequence map (USM of arbitrary discrete sequences

    Directory of Open Access Journals (Sweden)

    Almeida Jonas S

    2002-02-01

    Full Text Available Abstract Background For over a decade the idea of representing biological sequences in a continuous coordinate space has maintained its appeal but not been fully realized. The basic idea is that any sequence of symbols may define trajectories in the continuous space conserving all its statistical properties. Ideally, such a representation would allow scale independent sequence analysis – without the context of fixed memory length. A simple example would consist on being able to infer the homology between two sequences solely by comparing the coordinates of any two homologous units. Results We have successfully identified such an iterative function for bijective mappingψ of discrete sequences into objects of continuous state space that enable scale-independent sequence analysis. The technique, named Universal Sequence Mapping (USM, is applicable to sequences with an arbitrary length and arbitrary number of unique units and generates a representation where map distance estimates sequence similarity. The novel USM procedure is based on earlier work by these and other authors on the properties of Chaos Game Representation (CGR. The latter enables the representation of 4 unit type sequences (like DNA as an order free Markov Chain transition table. The properties of USM are illustrated with test data and can be verified for other data by using the accompanying web-based tool:http://bioinformatics.musc.edu/~jonas/usm/. Conclusions USM is shown to enable a statistical mechanics approach to sequence analysis. The scale independent representation frees sequence analysis from the need to assume a memory length in the investigation of syntactic rules.

  3. Reducing Length of Stay, Direct Cost, and Readmissions in Total Joint Arthroplasty Patients With an Outcomes Manager-Led Interprofessional Team.

    Science.gov (United States)

    Arana, Melissa; Harper, Licia; Qin, Huanying; Mabrey, Jay

    The purpose of this quality improvement project was to determine whether an outcomes manager-led interprofessional team could reduce length of stay and direct cost without increasing 30-day readmission rates in the total joint arthroplasty patient population. The goal was to promote interprofessional relationships combined with collaborative practice to promote coordinated care with improved outcomes. Results from this project showed that length of stay (total hip arthroplasty [THA] reduced by 0.4 days and total knee arthroplasty [TKA] reduced by 0.6 days) and direct cost (THA reduced by $1,020 per case and TKA reduced by $539 per case) were significantly decreased whereas 30-day readmission rates of both populations were not significantly increased.

  4. Reducing the length of hospital stay after total knee arthroplasty: influence of femoral and sciatic nerve block.

    Science.gov (United States)

    Carvalho Júnior, Lúcio Honório de; Temponi, Eduardo Frois; Paganini, Vinícius Oliveira; Costa, Lincoln Paiva; Soares, Luiz Fernando Machado; Gonçalves, Matheus Braga Jacques

    2015-01-01

    the aim of this study is to evaluate the change in length of hospital stay postoperatively for Total Knee Arthroplasty after using femoral and sciatic nerve block. the medical records of 287 patients were evaluated, taking into account the number of hours of admission, the percentage and the reason for re-hospitalization within 30 days, as well as associated complications. All patients were divided into two groups according or not to whether they were admitted to ICU or not. During the years 2009 and 2010, isolated spinal anesthesia was the method used in the procedure. From 2011 on, femoral and sciatic nerve blocking was introduced. between the years 2009 and 2012, the average length of stay ranged from 74 hours in 2009 to 75.2 hours in 2010. The average length of stay in 2011 was 56.52 hours and 53.72 hours in 2012, all in the group of patients who did not remain in the ICU postoperatively. In the same period, among those in the group that needed ICU admission, the average length of stay was 138.7 hours in 2009, 90.25 hours in 2010, 79.8 hours in 2011, and 52.91 hours in 2012. During 2009 and 2010, the rate of re-hospitalization was 0%, while in 2011 and 2012, were 3.44% and 1%, respectively. according to this study, the use of femoral and sciatic nerve blocking after total knee arthroplasty allowed significant reduction in hospital stay.

  5. Visual artificial grammar learning by rhesus macaques (Macaca mulatta): exploring the role of grammar complexity and sequence length.

    Science.gov (United States)

    Heimbauer, Lisa A; Conway, Christopher M; Christiansen, Morten H; Beran, Michael J; Owren, Michael J

    2018-03-01

    Humans and nonhuman primates can learn about the organization of stimuli in the environment using implicit sequential pattern learning capabilities. However, most previous artificial grammar learning studies with nonhuman primates have involved relatively simple grammars and short input sequences. The goal in the current experiments was to assess the learning capabilities of monkeys on an artificial grammar-learning task that was more complex than most others previously used with nonhumans. Three experiments were conducted using a joystick-based, symmetrical-response serial reaction time task in which two monkeys were exposed to grammar-generated sequences at sequence lengths of four in Experiment 1, six in Experiment 2, and eight in Experiment 3. Over time, the monkeys came to respond faster to the sequences generated from the artificial grammar compared to random versions. In a subsequent generalization phase, subjects generalized their knowledge to novel sequences, responding significantly faster to novel instances of sequences produced using the familiar grammar compared to those constructed using an unfamiliar grammar. These results reveal that rhesus monkeys can learn and generalize the statistical structure inherent in an artificial grammar that is as complex as some used with humans, for sequences up to eight items long. These findings are discussed in relation to whether or not rhesus macaques and other primate species possess implicit sequence learning abilities that are similar to those that humans draw upon to learn natural language grammar.

  6. Lactobacillus strain diversity based on partial hsp60 gene sequences and design of PCR-restriction fragment length polymorphism assays for species identification and differentiation.

    Science.gov (United States)

    Blaiotta, Giuseppe; Fusco, Vincenzina; Ercolini, Danilo; Aponte, Maria; Pepe, Olimpia; Villani, Francesco

    2008-01-01

    A phylogenetic tree showing diversities among 116 partial (499-bp) Lactobacillus hsp60 (groEL, encoding a 60-kDa heat shock protein) nucleotide sequences was obtained and compared to those previously described for 16S rRNA and tuf gene sequences. The topology of the tree produced in this study showed a Lactobacillus species distribution similar, but not identical, to those previously reported. However, according to the most recent systematic studies, a clear differentiation of 43 single-species clusters was detected/identified among the sequences analyzed. The slightly higher variability of the hsp60 nucleotide sequences than of the 16S rRNA sequences offers better opportunities to design or develop molecular assays allowing identification and differentiation of either distant or very closely related Lactobacillus species. Therefore, our results suggest that hsp60 can be considered an excellent molecular marker for inferring the taxonomy and phylogeny of members of the genus Lactobacillus and that the chosen primers can be used in a simple PCR procedure allowing the direct sequencing of the hsp60 fragments. Moreover, in this study we performed a computer-aided restriction endonuclease analysis of all 499-bp hsp60 partial sequences and we showed that the PCR-restriction fragment length polymorphism (RFLP) patterns obtainable by using both endonucleases AluI and TacI (in separate reactions) can allow identification and differentiation of all 43 Lactobacillus species considered, with the exception of the pair L. plantarum/L. pentosus. However, the latter species can be differentiated by further analysis with Sau3AI or MseI. The hsp60 PCR-RFLP approach was efficiently applied to identify and to differentiate a total of 110 wild Lactobacillus strains (including closely related species, such as L. casei and L. rhamnosus or L. plantarum and L. pentosus) isolated from cheese and dry-fermented sausages.

  7. Decreased length of stay and earlier oral feeding associated with standardized postoperative clinical care for total gastrectomies at a cancer center.

    Science.gov (United States)

    Selby, Luke V; Rifkin, Marissa B; Yoon, Sam S; Ariyan, Charlotte E; Strong, Vivian E

    2016-09-01

    Standardization of postoperative care has been shown to decrease postoperative length of stay. In June 2009, we standardized postoperative care for all gastrectomies at our institution. Four years' worth of total gastrectomies (2 years prior to standardization and 2 years after standardization) were reviewed to determine the effect of standardization on postoperative care, length of stay, complications, and readmissions. Between June 2007 and July 2011, 99 patients underwent curative intent open total gastrectomy: 51 patients prior to standardization, and 48 patients poststandardization. Patients were predominantly male (70%); median age was 63; and median body mass index was 26. Standardization of postoperative care was associated with a decrease in median time to beginning both clear liquids and a postgastrectomy diet, earlier removal of epidural catheters, earlier use of oral pain medication, less time receiving intravenous fluids, and decreased length of stay (all P Care Center, or readmission. Institution of standardized postoperative orders for total gastrectomy was associated with a significantly decreased length of stay and earlier oral feeding without increasing postoperative complications, early postoperative outpatient visits, or readmissions. Copyright © 2016 Elsevier Inc. All rights reserved.

  8. Total sequence decomposition distinguishes functional modules, "molegos" in apurinic/apyrimidinic endonucleases

    Directory of Open Access Journals (Sweden)

    Braun Werner

    2002-11-01

    Full Text Available Abstract Background Total sequence decomposition, using the web-based MASIA tool, identifies areas of conservation in aligned protein sequences. By structurally annotating these motifs, the sequence can be parsed into individual building blocks, molecular legos ("molegos", that can eventually be related to function. Here, the approach is applied to the apurinic/apyrimidinic endonuclease (APE DNA repair proteins, essential enzymes that have been highly conserved throughout evolution. The APEs, DNase-1 and inositol 5'-polyphosphate phosphatases (IPP form a superfamily that catalyze metal ion based phosphorolysis, but recognize different substrates. Results MASIA decomposition of APE yielded 12 sequence motifs, 10 of which are also structurally conserved within the family and are designated as molegos. The 12 motifs include all the residues known to be essential for DNA cleavage by APE. Five of these molegos are sequentially and structurally conserved in DNase-1 and the IPP family. Correcting the sequence alignment to match the residues at the ends of two of the molegos that are absolutely conserved in each of the three families greatly improved the local structural alignment of APEs, DNase-1 and synaptojanin. Comparing substrate/product binding of molegos common to DNase-1 showed that those distinctive for APEs are not directly involved in cleavage, but establish protein-DNA interactions 3' to the abasic site. These additional bonds enhance both specific binding to damaged DNA and the processivity of APE1. Conclusion A modular approach can improve structurally predictive alignments of homologous proteins with low sequence identity and reveal residues peripheral to the traditional "active site" that control the specificity of enzymatic activity.

  9. Reducing the length of hospital stay after total knee arthroplasty: influence of femoral and sciatic nerve block

    Directory of Open Access Journals (Sweden)

    Lúcio Honório de Carvalho Júnior

    2015-02-01

    Full Text Available Objective: the aim of this study is to evaluate the change in length of hospital stay postoperatively for Total Knee Arthroplasty after using femoral and sciatic nerve block. Materials and methods: the medical records of 287 patients were evaluated, taking into account the number of hours of admission, the percentage and the reason for re-hospitalization within 30 days, as well as associated complications. All patients were divided into two groups according or not to whether they were admitted to ICU or not. During the years 2009 and 2010, isolated spinal anesthesia was the method used in the procedure. From 2011 on, femoral and sciatic nerve blocking was introduced. Results: between the years 2009 and 2012, the average length of stay ranged from 74 hours in 2009 to 75.2 hours in 2010. The average length of stay in 2011 was 56.52 hours and 53.72 hours in 2012, all in the group of patients who did not remain in the ICU postoperatively. In the same period, among those in the group that needed ICU admission, the average length of stay was 138.7 hours in 2009, 90.25 hours in 2010, 79.8 hours in 2011, and 52.91 hours in 2012. During 2009 and 2010, the rate of re-hospitalization was 0%, while in 2011 and 2012, were 3.44% and 1%, respectively. Conclusion: according to this study, the use of femoral and sciatic nerve blocking after total knee arthroplasty allowed significant reduction in hospital stay.

  10. Association of nursing-documented ambulation with length of stay following total laparoscopic hysterectomy for benign gynecologic disease

    OpenAIRE

    Kim, Kidong; Yoo, Sooyoung; Yang, Eun Joo; No, Jae Hong; Hwang, Hee; Kim, Yong-Beom

    2013-01-01

    Objective The objective was to examine the association of postoperative physical activity with length of stay in patients who received total laparoscopic hysterectomy for benign gynecologic disease. Methods The case group was composed of 70 patients who entered a critical pathway for elective total laparoscopic hysterectomy from 2009 to 2012 and were discharged behind schedule. The control group was selected from patients who were discharged on schedule, and matched to cases using 1:3 ratio p...

  11. The complete chloroplast genome of Capsicum annuum var. glabriusculum using Illumina sequencing.

    Science.gov (United States)

    Raveendar, Sebastin; Na, Young-Wang; Lee, Jung-Ro; Shim, Donghwan; Ma, Kyung-Ho; Lee, Sok-Young; Chung, Jong-Wook

    2015-07-20

    Chloroplast (cp) genome sequences provide a valuable source for DNA barcoding. Molecular phylogenetic studies have concentrated on DNA sequencing of conserved gene loci. However, this approach is time consuming and more difficult to implement when gene organization differs among species. Here we report the complete re-sequencing of the cp genome of Capsicum pepper (Capsicum annuum var. glabriusculum) using the Illumina platform. The total length of the cp genome is 156,817 bp with a 37.7% overall GC content. A pair of inverted repeats (IRs) of 50,284 bp were separated by a small single copy (SSC; 18,948 bp) and a large single copy (LSC; 87,446 bp). The number of cp genes in C. annuum var. glabriusculum is the same as that in other Capsicum species. Variations in the lengths of LSC; SSC and IR regions were the main contributors to the size variation in the cp genome of this species. A total of 125 simple sequence repeat (SSR) and 48 insertions or deletions variants were found by sequence alignment of Capsicum cp genome. These findings provide a foundation for further investigation of cp genome evolution in Capsicum and other higher plants.

  12. Genome sequence of the olive tree, Olea europaea.

    Science.gov (United States)

    Cruz, Fernando; Julca, Irene; Gómez-Garrido, Jèssica; Loska, Damian; Marcet-Houben, Marina; Cano, Emilio; Galán, Beatriz; Frias, Leonor; Ribeca, Paolo; Derdak, Sophia; Gut, Marta; Sánchez-Fernández, Manuel; García, Jose Luis; Gut, Ivo G; Vargas, Pablo; Alioto, Tyler S; Gabaldón, Toni

    2016-06-27

    The Mediterranean olive tree (Olea europaea subsp. europaea) was one of the first trees to be domesticated and is currently of major agricultural importance in the Mediterranean region as the source of olive oil. The molecular bases underlying the phenotypic differences among domesticated cultivars, or between domesticated olive trees and their wild relatives, remain poorly understood. Both wild and cultivated olive trees have 46 chromosomes (2n). A total of 543 Gb of raw DNA sequence from whole genome shotgun sequencing, and a fosmid library containing 155,000 clones from a 1,000+ year-old olive tree (cv. Farga) were generated by Illumina sequencing using different combinations of mate-pair and pair-end libraries. Assembly gave a final genome with a scaffold N50 of 443 kb, and a total length of 1.31 Gb, which represents 95 % of the estimated genome length (1.38 Gb). In addition, the associated fungus Aureobasidium pullulans was partially sequenced. Genome annotation, assisted by RNA sequencing from leaf, root, and fruit tissues at various stages, resulted in 56,349 unique protein coding genes, suggesting recent genomic expansion. Genome completeness, as estimated using the CEGMA pipeline, reached 98.79 %. The assembled draft genome of O. europaea will provide a valuable resource for the study of the evolution and domestication processes of this important tree, and allow determination of the genetic bases of key phenotypic traits. Moreover, it will enhance breeding programs and the formation of new varieties.

  13. US Medical Student Performance on the NBME Subject Examination in Internal Medicine: Do Clerkship Sequence and Clerkship Length Matter?

    Science.gov (United States)

    Ouyang, Wenli; Cuddy, Monica M; Swanson, David B

    2015-09-01

    Prior to graduation, US medical students are required to complete clinical clerkship rotations, most commonly in the specialty areas of family medicine, internal medicine, obstetrics and gynecology (ob/gyn), pediatrics, psychiatry, and surgery. Within a school, the sequence in which students complete these clerkships varies. In addition, the length of these rotations varies, both within a school for different clerkships and between schools for the same clerkship. The present study investigated the effects of clerkship sequence and length on performance on the National Board of Medical Examiner's subject examination in internal medicine. The study sample included 16,091 students from 67 US Liaison Committee on Medical Education (LCME)-accredited medical schools who graduated in 2012 or 2013. Student-level measures included first-attempt internal medicine subject examination scores, first-attempt USMLE Step 1 scores, and five dichotomous variables capturing whether or not students completed rotations in family medicine, ob/gyn, pediatrics, psychiatry, and surgery prior to taking the internal medicine rotation. School-level measures included clerkship length and average Step 1 score. Multilevel models with students nested in schools were estimated with internal medicine subject examination scores as the dependent measure. Step 1 scores and the five dichotomous variables were treated as student-level predictors. Internal medicine clerkship length and average Step 1 score were used to predict school-to-school variation in average internal medicine subject examination scores. Completion of rotations in surgery, pediatrics and family medicine prior to taking the internal medicine examination significantly improved scores, with the largest benefit observed for surgery (coefficient = 1.58 points; p value internal medicine subject examination performance. At the school level, longer internal medicine clerkships were associated with higher scores on the internal medicine

  14. Increased mRNA expression of a laminin-binding protein in human colon carcinoma: Complete sequence of a full-length cDNA encoding the protein

    International Nuclear Information System (INIS)

    Yow, Hsiukang; Wong, Jau Min; Chen, Hai Shiene; Lee, C.; Steele, G.D. Jr.; Chen, Lanbo

    1988-01-01

    Reliable markers to distinguish human colon carcinoma from normal colonic epithelium are needed particularly for poorly differentiated tumors where no useful marker is currently available. To search for markers the authors constructed cDNA libraries from human colon carcinoma cell lines and screened for clones that hybridize to a greater degree with mRNAs of colon carcinomas than with their normal counterparts. Here they report one such cDNA clone that hybridizes with a 1.2-kilobase (kb) mRNA, the level of which is ∼9-fold greater in colon carcinoma than in adjacent normal colonic epithelium. Blot hybridization of total RNA from a variety of human colon carcinoma cell lines shows that the level of this 1.2-kb mRNA in poorly differentiated colon carcinomas is as high as or higher than that in well-differentiated carcinomas. Molecular cloning and complete sequencing of cDNA corresponding to the full-length open reading frame of this 1.2-kb mRNA unexpectedly show it to contain all the partial cDNA sequence encoding 135 amino acid residues previously reported for a human laminin receptor. The deduced amino acid sequence suggests that this putative laminin-binding protein from human colon carcinomas consists of 295 amino acid residues with interesting features. There is an unusual C-terminal 70-amino acid segment, which is trypsin-resistant and highly negatively charged

  15. Memory for tonal pitches: a music-length effect hypothesis.

    Science.gov (United States)

    Akiva-Kabiri, Lilach; Vecchi, Tomaso; Granot, Roni; Basso, Demis; Schön, Daniele

    2009-07-01

    One of the most studied effects of verbal working memory (WM) is the influence of the length of the words that compose the list to be remembered. This work aims to investigate the nature of musical WM by replicating the word length effect in the musical domain. Length and rate of presentation were manipulated in a recognition task of tone sequences. Results showed significant effects for both factors (length and presentation rate) as well as their interaction, suggesting the existence of different strategies (e.g., chunking and rehearsal) for the immediate memory of musical information, depending upon the length of the sequences.

  16. Study of optical non-linear properties of a constant total effective length multiple quantum wells system

    International Nuclear Information System (INIS)

    Solaimani, M.; Morteza, Izadifard; Arabshahi, H.; Reza, Sarkardehi Mohammad

    2013-01-01

    In this work, we have studied the effect of the number of the wells, in a multiple quantum wells structure with constant total effective length, on the optical properties of multiple quantum wells like the absorption coefficient and the refractive index by means of compact density matrix approach. GaAs/Al x Ga (1−x) As multiple quantum wells systems was selected as an example. Besides, the effect of varying number of wells on the subband energies, wave functions, number of bound states, and the Fermi energy have been also investigated. Our calculation revealed that the number of wells in a multiple quantum well is a criterion with which we can control the amount of nonlinearity. This study showed that for the third order refractive index change there is two regimes of variations and the critical well number was six. In our calculations, we have used the same wells and barrier thicknesses to construct the multiple quantum wells system. - Highlights: ► OptiOptical Non-Linear. ► Total Effective Length. ► Multiple Quantum Wells System - genetic algorithm ► Schrödinger equation solution. ► Nanostructure.

  17. Retrieval of a million high-quality, full-length microbial 16S and 18S rRNA gene sequences without primer bias

    DEFF Research Database (Denmark)

    Karst, Søren Michael; Dueholm, Morten Simonsen; McIlroy, Simon Jon

    2018-01-01

    Small subunit ribosomal RNA (SSU rRNA) genes, 16S in bacteria and 18S in eukaryotes, have been the standard phylogenetic markers used to characterize microbial diversity and evolution for decades. However, the reference databases of full-length SSU rRNA gene sequences are skewed to well-studied e...

  18. Evaluating Methods for Isolating Total RNA and Predicting the Success of Sequencing Phylogenetically Diverse Plant Transcriptomes

    Science.gov (United States)

    Bruskiewich, Richard; Burris, Jason N.; Carrigan, Charlotte T.; Chase, Mark W.; Clarke, Neil D.; Covshoff, Sarah; dePamphilis, Claude W.; Edger, Patrick P.; Goh, Falicia; Graham, Sean; Greiner, Stephan; Hibberd, Julian M.; Jordon-Thaden, Ingrid; Kutchan, Toni M.; Leebens-Mack, James; Melkonian, Michael; Miles, Nicholas; Myburg, Henrietta; Patterson, Jordan; Pires, J. Chris; Ralph, Paula; Rolf, Megan; Sage, Rowan F.; Soltis, Douglas; Soltis, Pamela; Stevenson, Dennis; Stewart, C. Neal; Surek, Barbara; Thomsen, Christina J. M.; Villarreal, Juan Carlos; Wu, Xiaolei; Zhang, Yong; Deyholos, Michael K.; Wong, Gane Ka-Shu

    2012-01-01

    Next-generation sequencing plays a central role in the characterization and quantification of transcriptomes. Although numerous metrics are purported to quantify the quality of RNA, there have been no large-scale empirical evaluations of the major determinants of sequencing success. We used a combination of existing and newly developed methods to isolate total RNA from 1115 samples from 695 plant species in 324 families, which represents >900 million years of phylogenetic diversity from green algae through flowering plants, including many plants of economic importance. We then sequenced 629 of these samples on Illumina GAIIx and HiSeq platforms and performed a large comparative analysis to identify predictors of RNA quality and the diversity of putative genes (scaffolds) expressed within samples. Tissue types (e.g., leaf vs. flower) varied in RNA quality, sequencing depth and the number of scaffolds. Tissue age also influenced RNA quality but not the number of scaffolds ≥1000 bp. Overall, 36% of the variation in the number of scaffolds was explained by metrics of RNA integrity (RIN score), RNA purity (OD 260/230), sequencing platform (GAIIx vs HiSeq) and the amount of total RNA used for sequencing. However, our results show that the most commonly used measures of RNA quality (e.g., RIN) are weak predictors of the number of scaffolds because Illumina sequencing is robust to variation in RNA quality. These results provide novel insight into the methods that are most important in isolating high quality RNA for sequencing and assembling plant transcriptomes. The methods and recommendations provided here could increase the efficiency and decrease the cost of RNA sequencing for individual labs and genome centers. PMID:23185583

  19. Telomerase activity and telomere length in the colorectal polyp-carcinoma sequence Actividad de la telomerasa y longitud del telómero en la secuencia pólipo-carcinoma colorrectal

    OpenAIRE

    C. Valls Bautista; C. Piñol Felis; J. M. Reñe Espinet; J. Buenestado García; J. Viñas Salas

    2009-01-01

    Objective: the role of telomerase activity and telomere length in the adenoma-carcinoma sequence of colon carcinogenesis has not been well established. The objective of this study was to determine telomerase activity and telomere length patterns in patients with adenomatous polyps either associated or not with colorectal cancer, as well as the role of telomeric instability in the adenoma-carcinoma sequence. Patients and methods: we included in the study 14 patients who underwent surgery for c...

  20. Identification of the full-length β-actin sequence and expression profiles in the tree shrew (Tupaia belangeri).

    Science.gov (United States)

    Zheng, Yu; Yun, Chenxia; Wang, Qihui; Smith, Wanli W; Leng, Jing

    2015-02-01

    The tree shrew (Tupaia belangeri) diverges from the primate order (Primates) and is classified as a separate taxonomic group of mammals - Scandentia. It has been suggested that the tree shrew can be used as an animal model for studying human diseases; however, the genomic sequence of the tree shrew is largely unidentified. In the present study, we reported the full-length cDNA sequence of the housekeeping gene, β-actin, in the tree shrew. The amino acid sequence of β-actin in the tree shrew was compared to that of humans and other species; a simple phylogenetic relationship was discovered. Quantitative polymerase chain reaction (qPCR) and western blot analysis further demonstrated that the expression profiles of β-actin, as a general conservative housekeeping gene, in the tree shrew were similar to those in humans, although the expression levels varied among different types of tissue in the tree shrew. Our data provide evidence that the tree shrew has a close phylogenetic association with humans. These findings further enhance the potential that the tree shrew, as a species, may be used as an animal model for studying human disorders.

  1. Limb shortening osteotomy in a patient with achondroplasia and leg length difference after total hip arthroplasty

    Directory of Open Access Journals (Sweden)

    Christian L. Galata

    2013-07-01

    Full Text Available Introduction: Achondroplasia is the most common reason for disproportionate short stature. Normally, orthopedic limb lengthening procedures must be discussed in the course of this genetic disorder and have been successful in numerous achondroplastic patients in the past. In some cases, the disease may lead to leg length differences with need for surgical correction. Case Report: We report a case of achondroplastic dysplastic coxarthrosis with symptomatic leg length difference after bilateral total hip arthroplasty in a 52-year-old female patient, in which a distal femoral shortening osteotomy was successfully performed. Conclusion: Femoral shortening osteotomy is very uncommon in patients with achondroplasia. We conclude, however, that in rare cases it can be indicated and provide the advantage of shorter operation time, less perioperative complications and faster recovery compared to leg lengthening procedures. Keywords: Achondroplasia, dysplastic coxarthrosis, limb shortening, distal femur osteotomy.

  2. TrnH-psbA sequence analyses of asparagus cochinchinensis from different geographical origin in China

    Directory of Open Access Journals (Sweden)

    Ma Yingzi

    2017-01-01

    Full Text Available This template explains and demonstrates how to prepare your camera-ready paper for The trnH-psbA sequences of 13 Asparagus cochinchinensis populations from 6 provinces of China were studied. The results showed that length of trnH-psbA change and the mutation of GC content were small. The length of trnH-psbA sequences were from 619 bp to 632 bp, and the GC content was about 36%. The total variation rates of 13 populations were from 2.21% to 3.47%, when the missing sites were considered as variation sites. A. cochinchinensis from different sources had 10 information sites in trnH-psbA sequence, accounting for 1.58% of the total sequence. The information sites were located in the sites 8, 9, 120, 457, 458, 486, 487, 491, 492, and 593, respectively. Clustering analysis showed that the Qianxi and Hengshan populations clustered together; Dushan, Yuqing, and Guangzhou populations were grouped; Nanning and Xinning populations formed another cluster. trnH-psbA sequences could identify different A. cochinchinensis populations. Clustering of different A. cochinchinensis populations related primarily to latitude and had little relationship with longitude.

  3. Full-length genome sequences of porcine epidemic diarrhoea virus strain CV777; Use of NGS to analyse genomic and sub-genomic RNAs

    DEFF Research Database (Denmark)

    Rasmussen, Thomas Bruun; Boniotti, Maria Beatrice; Papetti, Alice

    2018-01-01

    Porcine epidemic diarrhoea virus, strain CV777, was initially characterized in 1978 as the causative agent of a disease first identified in the UK in 1971. This coronavirus has been widely distributed among laboratories and has been passaged both within pigs and in cell culture. To determine...... the variability between different stocks of the PEDV strain CV777, sequencing of the full-length genome (ca. 28kb) has been performed in 6 different laboratories, using different protocols. Not surprisingly, each of the different full genome sequences were distinct from each other and from the reference sequence...... the analysis of sub-genomic mRNAs from infected cells. It is clearly important to know the features of the specific sample of CV777 being used for experimental studies....

  4. Differential evolution-simulated annealing for multiple sequence alignment

    Science.gov (United States)

    Addawe, R. C.; Addawe, J. M.; Sueño, M. R. K.; Magadia, J. C.

    2017-10-01

    Multiple sequence alignments (MSA) are used in the analysis of molecular evolution and sequence structure relationships. In this paper, a hybrid algorithm, Differential Evolution - Simulated Annealing (DESA) is applied in optimizing multiple sequence alignments (MSAs) based on structural information, non-gaps percentage and totally conserved columns. DESA is a robust algorithm characterized by self-organization, mutation, crossover, and SA-like selection scheme of the strategy parameters. Here, the MSA problem is treated as a multi-objective optimization problem of the hybrid evolutionary algorithm, DESA. Thus, we name the algorithm as DESA-MSA. Simulated sequences and alignments were generated to evaluate the accuracy and efficiency of DESA-MSA using different indel sizes, sequence lengths, deletion rates and insertion rates. The proposed hybrid algorithm obtained acceptable solutions particularly for the MSA problem evaluated based on the three objectives.

  5. Full-Length Venom Protein cDNA Sequences from Venom-Derived mRNA: Exploring Compositional Variation and Adaptive Multigene Evolution.

    Science.gov (United States)

    Modahl, Cassandra M; Mackessy, Stephen P

    2016-06-01

    Envenomation of humans by snakes is a complex and continuously evolving medical emergency, and treatment is made that much more difficult by the diverse biochemical composition of many venoms. Venomous snakes and their venoms also provide models for the study of molecular evolutionary processes leading to adaptation and genotype-phenotype relationships. To compare venom complexity and protein sequences, venom gland transcriptomes are assembled, which usually requires the sacrifice of snakes for tissue. However, toxin transcripts are also present in venoms, offering the possibility of obtaining cDNA sequences directly from venom. This study provides evidence that unknown full-length venom protein transcripts can be obtained from the venoms of multiple species from all major venomous snake families. These unknown venom protein cDNAs are obtained by the use of primers designed from conserved signal peptide sequences within each venom protein superfamily. This technique was used to assemble a partial venom gland transcriptome for the Middle American Rattlesnake (Crotalus simus tzabcan) by amplifying sequences for phospholipases A2, serine proteases, C-lectins, and metalloproteinases from within venom. Phospholipase A2 sequences were also recovered from the venoms of several rattlesnakes and an elapid snake (Pseudechis porphyriacus), and three-finger toxin sequences were recovered from multiple rear-fanged snake species, demonstrating that the three major clades of advanced snakes (Elapidae, Viperidae, Colubridae) have stable mRNA present in their venoms. These cDNA sequences from venom were then used to explore potential activities derived from protein sequence similarities and evolutionary histories within these large multigene superfamilies. Venom-derived sequences can also be used to aid in characterizing venoms that lack proteomic profiles and identify sequence characteristics indicating specific envenomation profiles. This approach, requiring only venom, provides

  6. Length-weight and length-length relationships of common carp (Cyprinus carpio L.) in the middle and southern Iraq provinces

    Science.gov (United States)

    Al-jebory, Taymaa A.; Das, Simon K.; Usup, Gires; Bakar, Y.; Al-saadi, Ali H.

    2018-04-01

    In this study, length-weight and length-length relationships of common carp (Cyprinus carpio L.) in the middle and southern Iraq provinces were determined. Fish specimens were procured from seven provinces from July to December, 2015. A negative and positive allometric growth pattern was obtained, where the total length (TL) ranged from 25.60 cm to 33.53 cm, and body weight (BW) ranged from 700 g to 1423 g. Meanwhile, the lowest of 1.03 and highest of 3.54 in "b" value was recorded in group F and group C, respectively. Therefore, Fulton condition factor (K) range from 2.57 to 4.94. While, relative condition factor (Kn) was in the ranged of 0.95 to 1.01. A linear relationship between total length (TL) and standard length (SL) among the provinces for fish groups was obtained. The variances in "b" value ranged from 0.10 to 0.93 with correlation coefficient (r2) of 0.02 to 0.97. This research could be used as a guide to study the ecology and biology of common Carp (Cyprinus carpio L.) in the middle and southern Iraq provinces.

  7. Employment of Near Full-Length Ribosome Gene TA-Cloning and Primer-Blast to Detect Multiple Species in a Natural Complex Microbial Community Using Species-Specific Primers Designed with Their Genome Sequences.

    Science.gov (United States)

    Zhang, Huimin; He, Hongkui; Yu, Xiujuan; Xu, Zhaohui; Zhang, Zhizhou

    2016-11-01

    It remains an unsolved problem to quantify a natural microbial community by rapidly and conveniently measuring multiple species with functional significance. Most widely used high throughput next-generation sequencing methods can only generate information mainly for genus-level taxonomic identification and quantification, and detection of multiple species in a complex microbial community is still heavily dependent on approaches based on near full-length ribosome RNA gene or genome sequence information. In this study, we used near full-length rRNA gene library sequencing plus Primer-Blast to design species-specific primers based on whole microbial genome sequences. The primers were intended to be specific at the species level within relevant microbial communities, i.e., a defined genomics background. The primers were tested with samples collected from the Daqu (also called fermentation starters) and pit mud of a traditional Chinese liquor production plant. Sixteen pairs of primers were found to be suitable for identification of individual species. Among them, seven pairs were chosen to measure the abundance of microbial species through quantitative PCR. The combination of near full-length ribosome RNA gene library sequencing and Primer-Blast may represent a broadly useful protocol to quantify multiple species in complex microbial population samples with species-specific primers.

  8. Characterization of European Yersinia enterocolitica 1A strains using restriction fragment length polymorphism and multilocus sequence analysis.

    Science.gov (United States)

    Murros, A; Säde, E; Johansson, P; Korkeala, H; Fredriksson-Ahomaa, M; Björkroth, J

    2016-10-01

    Yersinia enterocolitica is currently divided into two subspecies: subsp. enterocolitica including highly pathogenic strains of biotype 1B and subsp. palearctica including nonpathogenic strains of biotype 1A and moderately pathogenic strains of biotypes 2-5. In this work, we characterized 162 Y. enterocolitica strains of biotype 1A and 50 strains of biotypes 2-4 isolated from human, animal and food samples by restriction fragment length polymorphism using the HindIII restriction enzyme. Phylogenetic relatedness of 20 representative Y. enterocolitica strains including 15 biotype 1A strains was further studied by the multilocus sequence analysis of four housekeeping genes (glnA, gyrB, recA and HSP60). In all the analyses, biotype 1A strains formed a separate genomic group, which differed from Y. enterocolitica subsp. enterocolitica and from the strains of biotypes 2-4 of Y. enterocolitica subsp. palearctica. Based on these results, biotype 1A strains considered nonpathogenic should not be included in subspecies palearctica containing pathogenic strains of biotypes 2-5. Yersinia enterocolitica strains are currently divided into six biotypes and two subspecies. Strains of biotype 1A, which are phenotypically and genotypically very heterogeneous, are classified as subspecies palearctica. In this study, European Y. enterocolitica 1A strains isolated from both human and nonhuman sources were characterized using restriction fragment length polymorphism and multilocus sequence analysis. The European biotype 1A strains formed a separate group, which differed from strains belonging to subspecies enterocolitica and palearctica. This may indicate that the current division between the two subspecies is not sufficient considering the strain diversity within Y. enterocolitica. © 2016 The Society for Applied Microbiology.

  9. Full-length genome sequences of five hepatitis C virus isolates representing subtypes 3g, 3h, 3i and 3k, and a unique genotype 3 variant.

    Science.gov (United States)

    Lu, Ling; Li, Chunhua; Yuan, Jie; Lu, Teng; Okamoto, Hiroaki; Murphy, Donald G

    2013-03-01

    We characterized the full-length genomes of five distinct hepatitis C virus (HCV)-3 isolates. These represent the first complete genomes for subtypes 3g and 3h, the second such genomes for 3k and 3i, and of one novel variant presently not assigned to a subtype. Each genome was determined from 18-25 overlapping fragments. They had lengths of 9579-9660 nt and each contained a single ORF encoding 3020-3025 aa. They were isolated from five patients residing in Canada; four were of Asian origin and one was of Somali origin. Phylogenetic analysis using 64 partial NS5B sequences differentiated 10 assigned subtypes, 3a-3i and 3k, and two additional lineages within genotype 3. From the data of this study, HCV-3 full-length sequences are now available for six of the assigned subtypes and one unassigned. Our findings should add insights to HCV evolutionary studies and clinical applications.

  10. Draft Genome Sequence of Lactobacillus sp. Strain TCF032-E4, Isolated from Fermented Radish.

    Science.gov (United States)

    Mao, Yuejian; Chen, Meng; Horvath, Philippe

    2015-07-30

    Here, we report the draft genome sequence of Lactobacillus sp. strain TCF032-E4 (= CCTCC AB2015090 = DSM 100358), isolated from a Chinese fermented radish. The total length of the 57 contigs is about 2.9 Mb, with a G+C content of 43.5 mol% and 2,797 predicted coding sequences (CDSs). Copyright © 2015 Mao et al.

  11. Novel full-length major histocompatibility complex class I allele discovery and haplotype definition in pig-tailed macaques.

    Science.gov (United States)

    Semler, Matthew R; Wiseman, Roger W; Karl, Julie A; Graham, Michael E; Gieger, Samantha M; O'Connor, David H

    2017-11-13

    Pig-tailed macaques (Macaca nemestrina, Mane) are important models for human immunodeficiency virus (HIV) studies. Their infectability with minimally modified HIV makes them a uniquely valuable animal model to mimic human infection with HIV and progression to acquired immunodeficiency syndrome (AIDS). However, variation in the pig-tailed macaque major histocompatibility complex (MHC) and the impact of individual transcripts on the pathogenesis of HIV and other infectious diseases is understudied compared to that of rhesus and cynomolgus macaques. In this study, we used Pacific Biosciences single-molecule real-time circular consensus sequencing to describe full-length MHC class I (MHC-I) transcripts for 194 pig-tailed macaques from three breeding centers. We then used the full-length sequences to infer Mane-A and Mane-B haplotypes containing groups of MHC-I transcripts that co-segregate due to physical linkage. In total, we characterized full-length open reading frames (ORFs) for 313 Mane-A, Mane-B, and Mane-I sequences that defined 86 Mane-A and 106 Mane-B MHC-I haplotypes. Pacific Biosciences technology allows us to resolve these Mane-A and Mane-B haplotypes to the level of synonymous allelic variants. The newly defined haplotypes and transcript sequences containing full-length ORFs provide an important resource for infectious disease researchers as certain MHC haplotypes have been shown to provide exceptional control of simian immunodeficiency virus (SIV) replication and prevention of AIDS-like disease in nonhuman primates. The increased allelic resolution provided by Pacific Biosciences sequencing also benefits transplant research by allowing researchers to more specifically match haplotypes between donors and recipients to the level of nonsynonymous allelic variation, thus reducing the risk of graft-versus-host disease.

  12. Signal sequence and keyword trap in silico for selection of full-length human cDNAs encoding secretion or membrane proteins from oligo-capped cDNA libraries.

    Science.gov (United States)

    Otsuki, Tetsuji; Ota, Toshio; Nishikawa, Tetsuo; Hayashi, Koji; Suzuki, Yutaka; Yamamoto, Jun-ichi; Wakamatsu, Ai; Kimura, Kouichi; Sakamoto, Katsuhiko; Hatano, Naoto; Kawai, Yuri; Ishii, Shizuko; Saito, Kaoru; Kojima, Shin-ichi; Sugiyama, Tomoyasu; Ono, Tetsuyoshi; Okano, Kazunori; Yoshikawa, Yoko; Aotsuka, Satoshi; Sasaki, Naokazu; Hattori, Atsushi; Okumura, Koji; Nagai, Keiichi; Sugano, Sumio; Isogai, Takao

    2005-01-01

    We have developed an in silico method of selection of human full-length cDNAs encoding secretion or membrane proteins from oligo-capped cDNA libraries. Fullness rates were increased to about 80% by combination of the oligo-capping method and ATGpr, software for prediction of translation start point and the coding potential. Then, using 5'-end single-pass sequences, cDNAs having the signal sequence were selected by PSORT ('signal sequence trap'). We also applied 'secretion or membrane protein-related keyword trap' based on the result of BLAST search against the SWISS-PROT database for the cDNAs which could not be selected by PSORT. Using the above procedures, 789 cDNAs were primarily selected and subjected to full-length sequencing, and 334 of these cDNAs were finally selected as novel. Most of the cDNAs (295 cDNAs: 88.3%) were predicted to encode secretion or membrane proteins. In particular, 165(80.5%) of the 205 cDNAs selected by PSORT were predicted to have signal sequences, while 70 (54.2%) of the 129 cDNAs selected by 'keyword trap' preserved the secretion or membrane protein-related keywords. Many important cDNAs were obtained, including transporters, receptors, and ligands, involved in significant cellular functions. Thus, an efficient method of selecting secretion or membrane protein-encoding cDNAs was developed by combining the above four procedures.

  13. Dog Y chromosomal DNA sequence: identification, sequencing and SNP discovery

    Directory of Open Access Journals (Sweden)

    Kirkness Ewen

    2006-10-01

    Full Text Available Abstract Background Population genetic studies of dogs have so far mainly been based on analysis of mitochondrial DNA, describing only the history of female dogs. To get a picture of the male history, as well as a second independent marker, there is a need for studies of biallelic Y-chromosome polymorphisms. However, there are no biallelic polymorphisms reported, and only 3200 bp of non-repetitive dog Y-chromosome sequence deposited in GenBank, necessitating the identification of dog Y chromosome sequence and the search for polymorphisms therein. The genome has been only partially sequenced for one male dog, disallowing mapping of the sequence into specific chromosomes. However, by comparing the male genome sequence to the complete female dog genome sequence, candidate Y-chromosome sequence may be identified by exclusion. Results The male dog genome sequence was analysed by Blast search against the human genome to identify sequences with a best match to the human Y chromosome and to the female dog genome to identify those absent in the female genome. Candidate sequences were then tested for male specificity by PCR of five male and five female dogs. 32 sequences from the male genome, with a total length of 24 kbp, were identified as male specific, based on a match to the human Y chromosome, absence in the female dog genome and male specific PCR results. 14437 bp were then sequenced for 10 male dogs originating from Europe, Southwest Asia, Siberia, East Asia, Africa and America. Nine haplotypes were found, which were defined by 14 substitutions. The genetic distance between the haplotypes indicates that they originate from at least five wolf haplotypes. There was no obvious trend in the geographic distribution of the haplotypes. Conclusion We have identified 24159 bp of dog Y-chromosome sequence to be used for population genetic studies. We sequenced 14437 bp in a worldwide collection of dogs, identifying 14 SNPs for future SNP analyses, and

  14. Generation and analysis of a large-scale expressed sequence Tag database from a full-length enriched cDNA library of developing leaves of Gossypium hirsutum L.

    Directory of Open Access Journals (Sweden)

    Min Lin

    Full Text Available BACKGROUND: Cotton (Gossypium hirsutum L. is one of the world's most economically-important crops. However, its entire genome has not been sequenced, and limited resources are available in GenBank for understanding the molecular mechanisms underlying leaf development and senescence. METHODOLOGY/PRINCIPAL FINDINGS: In this study, 9,874 high-quality ESTs were generated from a normalized, full-length cDNA library derived from pooled RNA isolated from throughout leaf development during the plant blooming stage. After clustering and assembly of these ESTs, 5,191 unique sequences, representative 1,652 contigs and 3,539 singletons, were obtained. The average unique sequence length was 682 bp. Annotation of these unique sequences revealed that 84.4% showed significant homology to sequences in the NCBI non-redundant protein database, and 57.3% had significant hits to known proteins in the Swiss-Prot database. Comparative analysis indicated that our library added 2,400 ESTs and 991 unique sequences to those known for cotton. The unigenes were functionally characterized by gene ontology annotation. We identified 1,339 and 200 unigenes as potential leaf senescence-related genes and transcription factors, respectively. Moreover, nine genes related to leaf senescence and eleven MYB transcription factors were randomly selected for quantitative real-time PCR (qRT-PCR, which revealed that these genes were regulated differentially during senescence. The qRT-PCR for three GhYLSs revealed that these genes express express preferentially in senescent leaves. CONCLUSIONS/SIGNIFICANCE: These EST resources will provide valuable sequence information for gene expression profiling analyses and functional genomics studies to elucidate their roles, as well as for studying the mechanisms of leaf development and senescence in cotton and discovering candidate genes related to important agronomic traits of cotton. These data will also facilitate future whole-genome sequence

  15. Generation and Characterization of HIV-1 Transmitted and Founder Virus Consensus Sequence from Intravenous Drug Users in Xinjiang, China.

    Science.gov (United States)

    Li, Fan; Ma, Liying; Feng, Yi; Hu, Jing; Ni, Na; Ruan, Yuhua; Shao, Yiming

    2017-06-01

    HIV-1 transmission in intravenous drug users (IDUs) has been characterized by high genetic multiplicity and suggests a greater challenge for HIV-1 infection blocking. We investigated a total of 749 sequences of full-length gp160 gene obtained by single genome sequencing (SGS) from 22 HIV-1 early infected IDUs in Xinjiang province, northwest China, and generated a transmitted and founder virus (T/F virus) consensus sequence (IDU.CON). The T/F virus was classified as subtype CRF07_BC and predicted to be CCR5-tropic virus. The variable region (V1, V2, and V4 loop) of IDU.CON showed length variation compared with the heterosexual T/F virus consensus sequence (HSX.CON) and homosexual T/F virus consensus sequence (MSM.CON). A total of 26 N-linked glycosylation sites were discovered in the IDU.CON sequence, which is less than that of MSM.CON and HSX.CON. Characterization of T/F virus from IDUs highlights the genetic make-up and complexity of virus near the moment of transmission or in early infection preceding systemic dissemination and is important toward the development of an effective HIV-1 preventive methods, including vaccines.

  16. Direct chromosome-length haplotyping by single-cell sequencing

    NARCIS (Netherlands)

    Porubský, David; Sanders, Ashley D; van Wietmarschen, Niek; Falconer, Ester; Hills, Mark; Spierings, Diana C J; Bevova, Marianna R; Guryev, Victor; Lansdorp, Peter Michael

    Haplotypes are fundamental to fully characterize the diploid genome of an individual, yet methods to directly chart the unique genetic makeup of each parental chromosome are lacking. Here we introduce single-cell DNA template strand sequencing (Strand-seq) as a novel approach to phasing diploid

  17. Locomotor sequence learning in visually guided walking

    DEFF Research Database (Denmark)

    Choi, Julia T; Jensen, Peter; Nielsen, Jens Bo

    2016-01-01

    walking. In addition, we determined how age (i.e., healthy young adults vs. children) and biomechanical factors (i.e., walking speed) affected the rate and magnitude of locomotor sequence learning. The results showed that healthy young adults (age 24 ± 5 years, N = 20) could learn a specific sequence...... of step lengths over 300 training steps. Younger children (age 6-10 years, N = 8) have lower baseline performance, but their magnitude and rate of sequence learning was the same compared to older children (11-16 years, N = 10) and healthy adults. In addition, learning capacity may be more limited...... to modify step length from one trial to the next. Our sequence learning paradigm is derived from the serial reaction-time (SRT) task that has been used in upper limb studies. Both random and ordered sequences of step lengths were used to measure sequence-specific and sequence non-specific learning during...

  18. Analysis of B Cell Repertoire Dynamics Following Hepatitis B Vaccination in Humans, and Enrichment of Vaccine-specific Antibody Sequences.

    Science.gov (United States)

    Galson, Jacob D; Trück, Johannes; Fowler, Anna; Clutterbuck, Elizabeth A; Münz, Márton; Cerundolo, Vincenzo; Reinhard, Claudia; van der Most, Robbert; Pollard, Andrew J; Lunter, Gerton; Kelly, Dominic F

    2015-12-01

    Generating a diverse B cell immunoglobulin repertoire is essential for protection against infection. The repertoire in humans can now be comprehensively measured by high-throughput sequencing. Using hepatitis B vaccination as a model, we determined how the total immunoglobulin sequence repertoire changes following antigen exposure in humans, and compared this to sequences from vaccine-specific sorted cells. Clonal sequence expansions were seen 7 days after vaccination, which correlated with vaccine-specific plasma cell numbers. These expansions caused an increase in mutation, and a decrease in diversity and complementarity-determining region 3 sequence length in the repertoire. We also saw an increase in sequence convergence between participants 14 and 21 days after vaccination, coinciding with an increase of vaccine-specific memory cells. These features allowed development of a model for in silico enrichment of vaccine-specific sequences from the total repertoire. Identifying antigen-specific sequences from total repertoire data could aid our understanding B cell driven immunity, and be used for disease diagnostics and vaccine evaluation.

  19. Characterisation of Toxoplasma gondii isolates using polymerase chain reaction (PCR) and restriction fragment length polymorphism (RFLP) of the non-coding Toxoplasma gondii (TGR)-gene sequences

    DEFF Research Database (Denmark)

    Høgdall, Estrid; Vuust, Jens; Lind, Peter

    2000-01-01

    of using TGR gene variants as markers to distinguish among T. gondii isolates from different animals and different geographical sources. Based on the band patterns obtained by restriction fragment length polymorphism (RFLP) analysis of the polymerase chain reaction (PCR) amplified TGR sequences, the T...

  20. Hardware Accelerated Sequence Alignment with Traceback

    Directory of Open Access Journals (Sweden)

    Scott Lloyd

    2009-01-01

    in a timely manner. Known methods to accelerate alignment on reconfigurable hardware only address sequence comparison, limit the sequence length, or exhibit memory and I/O bottlenecks. A space-efficient, global sequence alignment algorithm and architecture is presented that accelerates the forward scan and traceback in hardware without memory and I/O limitations. With 256 processing elements in FPGA technology, a performance gain over 300 times that of a desktop computer is demonstrated on sequence lengths of 16000. For greater performance, the architecture is scalable to more processing elements.

  1. Advantages of genome sequencing by long-read sequencer using SMRT technology in medical area.

    Science.gov (United States)

    Nakano, Kazuma; Shiroma, Akino; Shimoji, Makiko; Tamotsu, Hinako; Ashimine, Noriko; Ohki, Shun; Shinzato, Misuzu; Minami, Maiko; Nakanishi, Tetsuhiro; Teruya, Kuniko; Satou, Kazuhito; Hirano, Takashi

    2017-07-01

    PacBio RS II is the first commercialized third-generation DNA sequencer able to sequence a single molecule DNA in real-time without amplification. PacBio RS II's sequencing technology is novel and unique, enabling the direct observation of DNA synthesis by DNA polymerase. PacBio RS II confers four major advantages compared to other sequencing technologies: long read lengths, high consensus accuracy, a low degree of bias, and simultaneous capability of epigenetic characterization. These advantages surmount the obstacle of sequencing genomic regions such as high/low G+C, tandem repeat, and interspersed repeat regions. Moreover, PacBio RS II is ideal for whole genome sequencing, targeted sequencing, complex population analysis, RNA sequencing, and epigenetics characterization. With PacBio RS II, we have sequenced and analyzed the genomes of many species, from viruses to humans. Herein, we summarize and review some of our key genome sequencing projects, including full-length viral sequencing, complete bacterial genome and almost-complete plant genome assemblies, and long amplicon sequencing of a disease-associated gene region. We believe that PacBio RS II is not only an effective tool for use in the basic biological sciences but also in the medical/clinical setting.

  2. High-Throughput Next-Generation Sequencing of Polioviruses

    Science.gov (United States)

    Montmayeur, Anna M.; Schmidt, Alexander; Zhao, Kun; Magaña, Laura; Iber, Jane; Castro, Christina J.; Chen, Qi; Henderson, Elizabeth; Ramos, Edward; Shaw, Jing; Tatusov, Roman L.; Dybdahl-Sissoko, Naomi; Endegue-Zanga, Marie Claire; Adeniji, Johnson A.; Oberste, M. Steven; Burns, Cara C.

    2016-01-01

    ABSTRACT The poliovirus (PV) is currently targeted for worldwide eradication and containment. Sanger-based sequencing of the viral protein 1 (VP1) capsid region is currently the standard method for PV surveillance. However, the whole-genome sequence is sometimes needed for higher resolution global surveillance. In this study, we optimized whole-genome sequencing protocols for poliovirus isolates and FTA cards using next-generation sequencing (NGS), aiming for high sequence coverage, efficiency, and throughput. We found that DNase treatment of poliovirus RNA followed by random reverse transcription (RT), amplification, and the use of the Nextera XT DNA library preparation kit produced significantly better results than other preparations. The average viral reads per total reads, a measurement of efficiency, was as high as 84.2% ± 15.6%. PV genomes covering >99 to 100% of the reference length were obtained and validated with Sanger sequencing. A total of 52 PV genomes were generated, multiplexing as many as 64 samples in a single Illumina MiSeq run. This high-throughput, sequence-independent NGS approach facilitated the detection of a diverse range of PVs, especially for those in vaccine-derived polioviruses (VDPV), circulating VDPV, or immunodeficiency-related VDPV. In contrast to results from previous studies on other viruses, our results showed that filtration and nuclease treatment did not discernibly increase the sequencing efficiency of PV isolates. However, DNase treatment after nucleic acid extraction to remove host DNA significantly improved the sequencing results. This NGS method has been successfully implemented to generate PV genomes for molecular epidemiology of the most recent PV isolates. Additionally, the ability to obtain full PV genomes from FTA cards will aid in facilitating global poliovirus surveillance. PMID:27927929

  3. A constrained optimization algorithm for total energy minimization in electronic structure calculations

    International Nuclear Information System (INIS)

    Yang Chao; Meza, Juan C.; Wang Linwang

    2006-01-01

    A new direct constrained optimization algorithm for minimizing the Kohn-Sham (KS) total energy functional is presented in this paper. The key ingredients of this algorithm involve projecting the total energy functional into a sequence of subspaces of small dimensions and seeking the minimizer of total energy functional within each subspace. The minimizer of a subspace energy functional not only provides a search direction along which the KS total energy functional decreases but also gives an optimal 'step-length' to move along this search direction. Numerical examples are provided to demonstrate that this new direct constrained optimization algorithm can be more efficient than the self-consistent field (SCF) iteration

  4. The contribution of 700,000 ORF sequence tags to the definition of the human transcriptome

    Science.gov (United States)

    Camargo, Anamaria A.; Samaia, Helena P. B.; Dias-Neto, Emmanuel; Simão, Daniel F.; Migotto, Italo A.; Briones, Marcelo R. S.; Costa, Fernando F.; Aparecida Nagai, Maria; Verjovski-Almeida, Sergio; Zago, Marco A.; Andrade, Luis Eduardo C.; Carrer, Helaine; El-Dorry, Hamza F. A.; Espreafico, Enilza M.; Habr-Gama, Angelita; Giannella-Neto, Daniel; Goldman, Gustavo H.; Gruber, Arthur; Hackel, Christine; Kimura, Edna T.; Maciel, Rui M. B.; Marie, Suely K. N.; Martins, Elizabeth A. L.; Nóbrega, Marina P.; Paçó-Larson, Maria Luisa; Pardini, Maria Inês M. C.; Pereira, Gonçalo G.; Pesquero, João Bosco; Rodrigues, Vanderlei; Rogatto, Silvia R.; da Silva, Ismael D. C. G.; Sogayar, Mari C.; Sonati, Maria de Fátima; Tajara, Eloiza H.; Valentini, Sandro R.; Alberto, Fernando L.; Amaral, Maria Elisabete J.; Aneas, Ivy; Arnaldi, Liliane A. T.; de Assis, Angela M.; Bengtson, Mário Henrique; Bergamo, Nadia Aparecida; Bombonato, Vanessa; de Camargo, Maria E. R.; Canevari, Renata A.; Carraro, Dirce M.; Cerutti, Janete M.; Corrêa, Maria Lucia C.; Corrêa, Rosana F. R.; Costa, Maria Cristina R.; Curcio, Cyntia; Hokama, Paula O. M.; Ferreira, Ari J. S.; Furuzawa, Gilberto K.; Gushiken, Tsieko; Ho, Paulo L.; Kimura, Elza; Krieger, José E.; Leite, Luciana C. C.; Majumder, Paromita; Marins, Mozart; Marques, Everaldo R.; Melo, Analy S. A.; Melo, Monica; Mestriner, Carlos Alberto; Miracca, Elisabete C.; Miranda, Daniela C.; Nascimento, Ana Lucia T. O.; Nóbrega, Francisco G.; Ojopi, Élida P. B.; Pandolfi, José Rodrigo C.; Pessoa, Luciana G.; Prevedel, Aline C.; Rahal, Paula; Rainho, Claudia A.; Reis, Eduardo M. R.; Ribeiro, Marcelo L.; da Rós, Nancy; de Sá, Renata G.; Sales, Magaly M.; Sant'anna, Simone Cristina; dos Santos, Mariana L.; da Silva, Aline M.; da Silva, Neusa P.; Silva, Wilson A.; da Silveira, Rosana A.; Sousa, Josane F.; Stecconi, Daniella; Tsukumo, Fernando; Valente, Valéria; Soares, Fernando; Moreira, Eloisa S.; Nunes, Diana N.; Correa, Ricardo G.; Zalcberg, Heloisa; Carvalho, Alex F.; Reis, Luis F. L.; Brentani, Ricardo R.; Simpson, Andrew J. G.; de Souza, Sandro J.

    2001-01-01

    Open reading frame expressed sequences tags (ORESTES) differ from conventional ESTs by providing sequence data from the central protein coding portion of transcripts. We generated a total of 696,745 ORESTES sequences from 24 human tissues and used a subset of the data that correspond to a set of 15,095 full-length mRNAs as a means of assessing the efficiency of the strategy and its potential contribution to the definition of the human transcriptome. We estimate that ORESTES sampled over 80% of all highly and moderately expressed, and between 40% and 50% of rarely expressed, human genes. In our most thoroughly sequenced tissue, the breast, the 130,000 ORESTES generated are derived from transcripts from an estimated 70% of all genes expressed in that tissue, with an equally efficient representation of both highly and poorly expressed genes. In this respect, we find that the capacity of the ORESTES strategy both for gene discovery and shotgun transcript sequence generation significantly exceeds that of conventional ESTs. The distribution of ORESTES is such that many human transcripts are now represented by a scaffold of partial sequences distributed along the length of each gene product. The experimental joining of the scaffold components, by reverse transcription–PCR, represents a direct route to transcript finishing that may represent a useful alternative to full-length cDNA cloning. PMID:11593022

  5. The contribution of 700,000 ORF sequence tags to the definition of the human transcriptome.

    Science.gov (United States)

    Camargo, A A; Samaia, H P; Dias-Neto, E; Simão, D F; Migotto, I A; Briones, M R; Costa, F F; Nagai, M A; Verjovski-Almeida, S; Zago, M A; Andrade, L E; Carrer, H; El-Dorry, H F; Espreafico, E M; Habr-Gama, A; Giannella-Neto, D; Goldman, G H; Gruber, A; Hackel, C; Kimura, E T; Maciel, R M; Marie, S K; Martins, E A; Nobrega, M P; Paco-Larson, M L; Pardini, M I; Pereira, G G; Pesquero, J B; Rodrigues, V; Rogatto, S R; da Silva, I D; Sogayar, M C; Sonati, M F; Tajara, E H; Valentini, S R; Alberto, F L; Amaral, M E; Aneas, I; Arnaldi, L A; de Assis, A M; Bengtson, M H; Bergamo, N A; Bombonato, V; de Camargo, M E; Canevari, R A; Carraro, D M; Cerutti, J M; Correa, M L; Correa, R F; Costa, M C; Curcio, C; Hokama, P O; Ferreira, A J; Furuzawa, G K; Gushiken, T; Ho, P L; Kimura, E; Krieger, J E; Leite, L C; Majumder, P; Marins, M; Marques, E R; Melo, A S; Melo, M B; Mestriner, C A; Miracca, E C; Miranda, D C; Nascimento, A L; Nobrega, F G; Ojopi, E P; Pandolfi, J R; Pessoa, L G; Prevedel, A C; Rahal, P; Rainho, C A; Reis, E M; Ribeiro, M L; da Ros, N; de Sa, R G; Sales, M M; Sant'anna, S C; dos Santos, M L; da Silva, A M; da Silva, N P; Silva, W A; da Silveira, R A; Sousa, J F; Stecconi, D; Tsukumo, F; Valente, V; Soares, F; Moreira, E S; Nunes, D N; Correa, R G; Zalcberg, H; Carvalho, A F; Reis, L F; Brentani, R R; Simpson, A J; de Souza, S J; Melo, M

    2001-10-09

    Open reading frame expressed sequences tags (ORESTES) differ from conventional ESTs by providing sequence data from the central protein coding portion of transcripts. We generated a total of 696,745 ORESTES sequences from 24 human tissues and used a subset of the data that correspond to a set of 15,095 full-length mRNAs as a means of assessing the efficiency of the strategy and its potential contribution to the definition of the human transcriptome. We estimate that ORESTES sampled over 80% of all highly and moderately expressed, and between 40% and 50% of rarely expressed, human genes. In our most thoroughly sequenced tissue, the breast, the 130,000 ORESTES generated are derived from transcripts from an estimated 70% of all genes expressed in that tissue, with an equally efficient representation of both highly and poorly expressed genes. In this respect, we find that the capacity of the ORESTES strategy both for gene discovery and shotgun transcript sequence generation significantly exceeds that of conventional ESTs. The distribution of ORESTES is such that many human transcripts are now represented by a scaffold of partial sequences distributed along the length of each gene product. The experimental joining of the scaffold components, by reverse transcription-PCR, represents a direct route to transcript finishing that may represent a useful alternative to full-length cDNA cloning.

  6. Size matters: Associations between the androgen receptor CAG repeat length and the intrafollicular hormone milieu

    DEFF Research Database (Denmark)

    Borgbo, T; Macek, M; Chrudimska, J

    2015-01-01

    Granulosa cell (GC) expressed androgen receptors (AR) and intrafollicular androgens are central to fertility. The transactivating domain of the AR contains a polymorphic CAG repeat sequence, which is linked to the transcriptional activity of AR and may influence the GC function. This study aims...... to evaluate the effects of the AR CAG repeat length on the intrafollicular hormone profiles, and the gene expression profiles of GC from human small antral follicles. In total, 190 small antral follicles (3-11 mm in diameter) were collected from 58 women undergoing ovarian cryopreservation for fertility...... expression compared to medium CAG repeat lengths (P = 0.03). In conclusion, long CAG repeat lengths in the AR were associated to significant attenuated levels of androgens and an increased conversion of testosterone into oestradiol, in human small antral follicles....

  7. BAC end sequencing of Pacific white shrimp Litopenaeus vannamei: a glimpse into the genome of Penaeid shrimp

    Science.gov (United States)

    Zhao, Cui; Zhang, Xiaojun; Liu, Chengzhang; Huan, Pin; Li, Fuhua; Xiang, Jianhai; Huang, Chao

    2012-05-01

    Little is known about the genome of Pacific white shrimp ( Litopenaeus vannamei). To address this, we conducted BAC (bacterial artificial chromosome) end sequencing of L. vannamei. We selected and sequenced 7 812 BAC clones from the BAC library LvHE from the two ends of the inserts by Sanger sequencing. After trimming and quality filtering, 11 279 BAC end sequences (BESs) including 4 609 pairedends BESs were obtained. The total length of the BESs was 4 340 753 bp, representing 0.18% of the L. vannamei haploid genome. The lengths of the BESs ranged from 100 bp to 660 bp with an average length of 385 bp. Analysis of the BESs indicated that the L. vannamei genome is AT-rich and that the primary repeats patterns were simple sequence repeats (SSRs) and low complexity sequences. Dinucleotide and hexanucleotide repeats were the most common SSR types in the BESs. The most abundant transposable element was gypsy, which may contribute to the generation of the large genome size of L. vannamei. We successfully annotated 4 519 BESs by BLAST searching, including genes involved in immunity and sex determination. Our results provide an important resource for functional gene studies, map construction and integration, and complete genome assembly for this species.

  8. Endophytic bacterial diversity in grapevine (Vitis vinifera L.) leaves described by 16S rRNA gene sequence analysis and length heterogeneity-PCR.

    Science.gov (United States)

    Bulgari, Daniela; Casati, Paola; Brusetti, Lorenzo; Quaglino, Fabio; Brasca, Milena; Daffonchio, Daniele; Bianco, Piero Attilio

    2009-08-01

    Diversity of bacterial endophytes associated with grapevine leaf tissues was analyzed by cultivation and cultivation-independent methods. In order to identify bacterial endophytes directly from metagenome, a protocol for bacteria enrichment and DNA extraction was optimized. Sequence analysis of 16S rRNA gene libraries underscored five diverse Operational Taxonomic Units (OTUs), showing best sequence matches with gamma-Proteobacteria, family Enterobacteriaceae, with a dominance of the genus Pantoea. Bacteria isolation through cultivation revealed the presence of six OTUs, showing best sequence matches with Actinobacteria, genus Curtobacterium, and with Firmicutes genera Bacillus and Enterococcus. Length Heterogeneity-PCR (LH-PCR) electrophoretic peaks from single bacterial clones were used to setup a database representing the bacterial endophytes identified in association with grapevine tissues. Analysis of healthy and phytoplasma-infected grapevine plants showed that LH-PCR could be a useful complementary tool for examining the diversity of bacterial endophytes especially for diversity survey on a large number of samples.

  9. Salmo salar and Esox lucius full-length cDNA sequences reveal changes in evolutionary pressures on a post-tetraploidization genome

    Directory of Open Access Journals (Sweden)

    Holt Robert A

    2010-04-01

    Full Text Available Abstract Background Salmonids are one of the most intensely studied fish, in part due to their economic and environmental importance, and in part due to a recent whole genome duplication in the common ancestor of salmonids. This duplication greatly impacts species diversification, functional specialization, and adaptation. Extensive new genomic resources have recently become available for Atlantic salmon (Salmo salar, but documentation of allelic versus duplicate reference genes remains a major uncertainty in the complete characterization of its genome and its evolution. Results From existing expressed sequence tag (EST resources and three new full-length cDNA libraries, 9,057 reference quality full-length gene insert clones were identified for Atlantic salmon. A further 1,365 reference full-length clones were annotated from 29,221 northern pike (Esox lucius ESTs. Pairwise dN/dS comparisons within each of 408 sets of duplicated salmon genes using northern pike as a diploid out-group show asymmetric relaxation of selection on salmon duplicates. Conclusions 9,057 full-length reference genes were characterized in S. salar and can be used to identify alleles and gene family members. Comparisons of duplicated genes show that while purifying selection is the predominant force acting on both duplicates, consistent with retention of functionality in both copies, some relaxation of pressure on gene duplicates can be identified. In addition, there is evidence that evolution has acted asymmetrically on paralogs, allowing one of the pair to diverge at a faster rate.

  10. Optimal solution for travelling salesman problem using heuristic shortest path algorithm with imprecise arc length

    Science.gov (United States)

    Bakar, Sumarni Abu; Ibrahim, Milbah

    2017-08-01

    The shortest path problem is a popular problem in graph theory. It is about finding a path with minimum length between a specified pair of vertices. In any network the weight of each edge is usually represented in a form of crisp real number and subsequently the weight is used in the calculation of shortest path problem using deterministic algorithms. However, due to failure, uncertainty is always encountered in practice whereby the weight of edge of the network is uncertain and imprecise. In this paper, a modified algorithm which utilized heuristic shortest path method and fuzzy approach is proposed for solving a network with imprecise arc length. Here, interval number and triangular fuzzy number in representing arc length of the network are considered. The modified algorithm is then applied to a specific example of the Travelling Salesman Problem (TSP). Total shortest distance obtained from this algorithm is then compared with the total distance obtained from traditional nearest neighbour heuristic algorithm. The result shows that the modified algorithm can provide not only on the sequence of visited cities which shown to be similar with traditional approach but it also provides a good measurement of total shortest distance which is lesser as compared to the total shortest distance calculated using traditional approach. Hence, this research could contribute to the enrichment of methods used in solving TSP.

  11. Analysis of B Cell Repertoire Dynamics Following Hepatitis B Vaccination in Humans, and Enrichment of Vaccine-specific Antibody Sequences

    Directory of Open Access Journals (Sweden)

    Jacob D. Galson

    2015-12-01

    Full Text Available Generating a diverse B cell immunoglobulin repertoire is essential for protection against infection. The repertoire in humans can now be comprehensively measured by high-throughput sequencing. Using hepatitis B vaccination as a model, we determined how the total immunoglobulin sequence repertoire changes following antigen exposure in humans, and compared this to sequences from vaccine-specific sorted cells. Clonal sequence expansions were seen 7 days after vaccination, which correlated with vaccine-specific plasma cell numbers. These expansions caused an increase in mutation, and a decrease in diversity and complementarity-determining region 3 sequence length in the repertoire. We also saw an increase in sequence convergence between participants 14 and 21 days after vaccination, coinciding with an increase of vaccine-specific memory cells. These features allowed development of a model for in silico enrichment of vaccine-specific sequences from the total repertoire. Identifying antigen-specific sequences from total repertoire data could aid our understanding B cell driven immunity, and be used for disease diagnostics and vaccine evaluation.

  12. Spreading Sequence Design for Multiple Cell Synchronous DS-CDMA Systems under Total Weighted Squared Correlation Criterion

    Directory of Open Access Journals (Sweden)

    Cotae Paul

    2004-01-01

    Full Text Available An algorithm for designing spreading sequences for an overloaded multicellular synchronous DS-CDMA system on uplink is introduced. The criterion used to measure the optimality of the design is the total weighted square correlation (TWSC assuming the channel state information known perfectly at both transmitter and receiver. By using this algorithm it is possible to obtain orthogonal generalized WBE sequences sets for any processing gain. The bandwidth of initial generalized WBE signals of each cell is preserved in the extended signal space associated to multicellular system. Mathematical formalism is illustrated by selected numerical examples.

  13. Biological phosphorus and nitrogen removal in sequencing batch reactors: effects of cycle length, dissolved oxygen concentration and influent particulate matter.

    Science.gov (United States)

    Ginige, Maneesha P; Kayaalp, Ahmet S; Cheng, Ka Yu; Wylie, Jason; Kaksonen, Anna H

    2013-01-01

    Removal of phosphorus (P) and nitrogen (N) from municipal wastewaters is required to mitigate eutrophication of receiving water bodies. While most treatment plants achieve good N removal using influent carbon (C), the use of influent C to facilitate enhanced biological phosphorus removal (EBPR) is poorly explored. A number of operational parameters can facilitate optimum use of influent C and this study investigated the effects of cycle length, dissolved oxygen (DO) concentration during aerobic period and influent solids on biological P and N removal in sequencing batch reactors (SRBs) using municipal wastewaters. Increasing cycle length from 3 to 6 h increased P removal efficiency, which was attributed to larger portion of N being removed via nitrite pathway and more biodegradable organic C becoming available for EBPR. Further increasing cycle length from 6 to 8 h decreased P removal efficiencies as the demand for biodegradable organic C for denitrification increased as a result of complete nitrification. Decreasing DO concentration in the aerobic period from 2 to 0.8 mg L(-1) increased P removal efficiency but decreased nitrification rates possibly due to oxygen limitation. Further, sedimented wastewater was proved to be a better influent stream than non-sedimented wastewater possibility due to the detrimental effect of particulate matter on biological nutrient removal.

  14. In silico characterization of microsatellites in Eucalyptus spp.: abundance, length variation and transposon associations

    Directory of Open Access Journals (Sweden)

    Edenilson Rabello

    2005-01-01

    Full Text Available This study assessed the abundance of microsatellites, or simple sequence repeats (SSR, in 19 Eucalyptus EST libraries from FORESTs, containing cDNA sequences from five species: E. grandis, E. globulus, E. saligna, E. urophylla and E. camaldulensis. Overall, a total of 11,534 SSRs and 8,447 SSR-containing sequences (25.5% of total ESTs were identified, with an average of 1 SSR/2.5 kb when considering all motifs and 1 SSR/3.1 kb when mononucleotides were not included. Dimeric repeats were the most abundant (41.03%, followed by trimerics (36.11% and monomerics (19.59%. The most frequent motifs were A/T (87.24% for monomerics, AG/CT (94.44% for dimerics, CCG/CGG (37.87% for trimerics, AAGG/CCTT (18.75% for tetramerics, AGAGG/CCTCT (14.04% for pentamerics and ACGGCG/CGCCGT (6.30% for hexamerics. According to sequence length, Class II or potentially variable markers were the most commonly found, followed by Class III. Two sequences presented high similarity to previously published Eucalyptus sequences from the NCBI database, EMBRA_72 and EMBRA_122. Local blastn search for transposons did not reveal the presence of any transposable elements with a cut-off value of 10-50. The large number of microsatellites identified will contribute to the refinement of marker-assisted mapping and to the discovery of novel markers for virtually all genes of economic interest.

  15. Total direct cost, length of hospital stay, institutional discharges and their determinants from rehabilitation settings in stroke patients.

    Science.gov (United States)

    Saxena, S K; Ng, T P; Yong, D; Fong, N P; Gerald, K

    2006-11-01

    Length of hospital stay (LOHS) is the largest determinant of direct cost for stroke care. Institutional discharges (acute care and nursing homes) from rehabilitation settings add to the direct cost. It is important to identify potentially preventable medical and non-medical reasons determining LOHS and institutional discharges to reduce the direct cost of stroke care. The aim of the study was to ascertain the total direct cost, LOHS, frequency of institutional discharges and their determinants from rehabilitation settings. Observational study was conducted on 200 stroke patients in two rehabilitation settings. The patients were examined for various socio-demographic, neurological and clinical variables upon admission to the rehabilitation hospitals. Information on total direct cost and medical complications during hospitalization were also recorded. The outcome variables measured were total direct cost, LOHS and discharges to institutions (acute care and nursing home facility) and their determinants. The mean and median LOHS in our study were 34 days (SD = 18) and 32 days respectively. LOHS and the cost of hospital stay were significantly correlated. The significant variables associated with LOHS on multiple linear regression analysis were: (i) severe functional impairment/functional dependence Barthel Index institutional discharges (22 to acute care and 17 to nursing homes). On multivariate analysis the significant predictors of discharges to institutions from rehabilitation hospitals were medical complications (OR = 4.37; 95% CI 1.01-12.53) and severe functional impairment/functional dependence. (OR = 5.90, 95% CI 2.32-14.98). Length of hospital stay and discharges to institutions from rehabilitation settings are significantly determined by medical complications. Importance of adhering to clinical pathway/protocol for stroke care is further discussed.

  16. Full-Length Sequence of Mouse Acupuncture-Induced 1-L (Aig1l Gene Including Its Transcriptional Start Site

    Directory of Open Access Journals (Sweden)

    Mika Ohta

    2011-01-01

    Full Text Available We have been investigating the molecular efficacy of electroacupuncture (EA, which is one type of acupuncture therapy. In our previous molecular biological study of acupuncture, we found an EA-induced gene, named acupuncture-induced 1-L (Aig1l, in mouse skeletal muscle. The aims of this study consisted of identification of the full-length cDNA sequence of Aig1l including the transcriptional start site, determination of the tissue distribution of Aig1l and analysis of the effect of EA on Aig1l gene expression. We determined the complete cDNA sequence including the transcriptional start site via cDNA cloning with the cap site hunting method. We then analyzed the tissue distribution of Aig1l by means of northern blot analysis and real-time quantitative polymerase chain reaction. We used the semiquantitative reverse transcriptase-polymerase chain reaction to examine the effect of EA on Aig1l gene expression. Our results showed that the complete cDNA sequence of Aig1l was 6073 bp long, and the putative protein consisted of 962 amino acids. All seven tissues that we analyzed expressed the Aig1l gene. In skeletal muscle, EA induced expression of the Aig1l gene, with high expression observed after 3 hours of EA. Our findings thus suggest that the Aig1l gene may play a key role in the molecular mechanisms of EA efficacy.

  17. Disentangling the effects of alternation rate and maximum run length on judgments of randomness

    Directory of Open Access Journals (Sweden)

    Sabine G. Scholl

    2011-08-01

    Full Text Available Binary sequences are characterized by various features. Two of these characteristics---alternation rate and run length---have repeatedly been shown to influence judgments of randomness. The two characteristics, however, have usually been investigated separately, without controlling for the other feature. Because the two features are correlated but not identical, it seems critical to analyze their unique impact, as well as their interaction, so as to understand more clearly what influences judgments of randomness. To this end, two experiments on the perception of binary sequences orthogonally manipulated alternation rate and maximum run length (i.e., length of the longest run within the sequence. Results show that alternation rate consistently exerts a unique effect on judgments of randomness, but that the effect of alternation rate is contingent on the length of the longest run within the sequence. The effect of maximum run length was found to be small and less consistent. Together, these findings extend prior randomness research by integrating literature from the realms of perception, categorization, and prediction, as well as by showing the unique and joint effects of alternation rate and maximum run length on judgments of randomness.

  18. Total curvature and total torsion of knotted random polygons in confinement

    Science.gov (United States)

    Diao, Yuanan; Ernst, Claus; Rawdon, Eric J.; Ziegler, Uta

    2018-04-01

    Knots in nature are typically confined spatially. The confinement affects the possible configurations, which in turn affects the spectrum of possible knot types as well as the geometry of the configurations within each knot type. The goal of this paper is to determine how confinement, length, and knotting affect the total curvature and total torsion of random polygons. Previously published papers have investigated these effects in the unconstrained case. In particular, we analyze how the total curvature and total torsion are affected by (1) varying the length of polygons within a fixed confinement radius and (2) varying the confinement radius of polygons with a fixed length. We also compare the total curvature and total torsion of groups of knots with similar complexity (measured as crossing number). While some of our results fall in line with what has been observed in the studies of the unconfined random polygons, a few surprising results emerge from our study, showing some properties that are unique due to the effect of knotting in confinement.

  19. [Study on quality evaluation of sequence and SSR information in transcriptome of Astragalus membranacus].

    Science.gov (United States)

    Chang, Yue; Yang, Song; Liu, Zhen-Peng; Ren, Wei-Chao; Liu, Jie; Ma, Wei

    2016-04-01

    In this study, 454/Roche GS FLX sequencing technology was used to obtain the data of the Astragalus membranaceus. Four hundred and fifty-four Sequencing System Software was applied to carry out the transcription of the group from scratch. Using MISA tools, 9 893 unigenes were selected for the sequence of the genome of A. membranaceus, and the information of SSR locus was analyzed. According to the result, the average length of reads was 413 bp, about 86% of the reads was involved in the splicing, the length of the N50 was 1 205 bp, the number of unigenes was measured by the whole transcript. 1 729 SSR loci in the A. membranaceus transcriptome were searched, the occurrence frequency of SSR was 9.24%, the frequency of SSR in the whole transcriptome was 13.42%, the average length of SSR was 7.97 kb. One hundred and twenty-seven kinds of core repeat sequences were found, the dominant type was TG/AC type of dinucleotide, it appeared to account for 4.25% of the total SSR locus. The results of the sequence of the transcription of the A. membranaceus transcriptome revealed the overall expression, and a large number of unigenessequence was obtained, and the SSR locus in the genome of the A. membranaceus is high, and the type is diverse, and the polymorphism of the gene is high. Copyright© by the Chinese Pharmaceutical Association.

  20. Sequence analysis of the genome of carnation (Dianthus caryophyllus L.).

    Science.gov (United States)

    Yagi, Masafumi; Kosugi, Shunichi; Hirakawa, Hideki; Ohmiya, Akemi; Tanase, Koji; Harada, Taro; Kishimoto, Kyutaro; Nakayama, Masayoshi; Ichimura, Kazuo; Onozaki, Takashi; Yamaguchi, Hiroyasu; Sasaki, Nobuhiro; Miyahara, Taira; Nishizaki, Yuzo; Ozeki, Yoshihiro; Nakamura, Noriko; Suzuki, Takamasa; Tanaka, Yoshikazu; Sato, Shusei; Shirasawa, Kenta; Isobe, Sachiko; Miyamura, Yoshinori; Watanabe, Akiko; Nakayama, Shinobu; Kishida, Yoshie; Kohara, Mitsuyo; Tabata, Satoshi

    2014-06-01

    The whole-genome sequence of carnation (Dianthus caryophyllus L.) cv. 'Francesco' was determined using a combination of different new-generation multiplex sequencing platforms. The total length of the non-redundant sequences was 568,887,315 bp, consisting of 45,088 scaffolds, which covered 91% of the 622 Mb carnation genome estimated by k-mer analysis. The N50 values of contigs and scaffolds were 16,644 bp and 60,737 bp, respectively, and the longest scaffold was 1,287,144 bp. The average GC content of the contig sequences was 36%. A total of 1050, 13, 92 and 143 genes for tRNAs, rRNAs, snoRNA and miRNA, respectively, were identified in the assembled genomic sequences. For protein-encoding genes, 43 266 complete and partial gene structures excluding those in transposable elements were deduced. Gene coverage was ∼ 98%, as deduced from the coverage of the core eukaryotic genes. Intensive characterization of the assigned carnation genes and comparison with those of other plant species revealed characteristic features of the carnation genome. The results of this study will serve as a valuable resource for fundamental and applied research of carnation, especially for breeding new carnation varieties. Further information on the genomic sequences is available at http://carnation.kazusa.or.jp. © The Author 2013. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  1. Optimization of metal artefact reduction (MAR) sequences for MRI of total hip prostheses

    Energy Technology Data Exchange (ETDEWEB)

    Toms, A.P., E-mail: andoni.toms@nnuh.nhs.u [Department of Radiology, Norfolk and Norwich University Hospital Trust, Norwich, Norfolk NR4 7UY (United Kingdom); Smith-Bateman, C.; Malcolm, P.N.; Cahir, J. [Department of Radiology, Norfolk and Norwich University Hospital Trust, Norwich, Norfolk NR4 7UY (United Kingdom); Graves, M. [University Department of Radiology, Addenbrooke' s Hospital, Cambridge (United Kingdom)

    2010-06-15

    Aim: To describe the relative contribution of matrix size and bandwidth to artefact reduction in order to define optimal sequence parameters for metal artefact reduction (MAR) sequences for MRI of total hip prostheses. Methods and materials: A phantom was created using a Charnley total hip replacement. Mid-coronal T1-weighted (echo time 12 ms, repetition time 400 ms) images through the prosthesis were acquired with increasing bandwidths (150, 300, 454, 592, and 781 Hz/pixel) and increasing matrixes of 128, 256, 384, 512, 640, and 768 pixels square. Signal loss from the prosthesis and susceptibility artefact was segmented using an automated tool. Results: Over 90% of the achievable reduction in artefacts was obtained with matrixes of 256 x 256 or greater and a receiver bandwidth of approximately 400 Hz/pixel or greater. Thereafter increasing the receiver bandwidth or matrix had little impact on reducing susceptibility artefacts. Increasing the bandwidth produced a relative fall in the signal-to-noise ratio (SNR) of between 49 and 56% for a given matrix, but, in practice, the image quality was still satisfactory even with the highest bandwidth and largest matrix sizes. The acquisition time increased linearly with increasing matrix parameters. Conclusion: Over 90% of the achievable metal artefact reduction can be realized with mid-range matrices and receiver bandwidths on a clinical 1.5 T system. The loss of SNR from increasing receiver bandwidth, is preferable to long acquisition times, and therefore, should be the main tool for reducing metal artefact.

  2. The effects of rest interval length manipulation of the first upper-body resistance exercise in sequence on acute performance of subsequent exercises in men and women.

    Science.gov (United States)

    Ratamess, Nicholas A; Chiarello, Christina M; Sacco, Anthony J; Hoffman, Jay R; Faigenbaum, Avery D; Ross, Ryan E; Kang, Jie

    2012-11-01

    The purpose of the present study was to investigate the effects of manipulating rest interval (RI) length of the first upper-body exercise in sequence on subsequent resistance exercise performance. Twenty-two men and women with at least 1 year of resistance training experience performed resistance exercise protocols on 3 occasions in random order. Each protocol consisted of performing 4 barbell upper-body exercises in the same sequence (bench press, incline bench press, shoulder press, and bent-over row) for 3 sets of up to 10 repetitions with 75% of 1 repetition maximum. Bench press RIs were 1, 2, or 3 minutes, whereas other exercises were performed with a standard 2-minute rest interval. The number of repetitions completed, average power, and velocity for each set of each exercise were recorded. Gender differences were observed during the bench press and incline press as women performed significantly (p ≤ 0.05) more repetitions than men during all RIs. The magnitude of decline in velocity and power over 3 sets of the bench press and incline press was significantly higher in men than women. Manipulation of RI length during the bench press did not affect performance of the remaining exercises in men. However, significantly more repetitions were performed by women during the first set of the incline press using 3-minute rest interval than 1-minute rest interval. In men and women, performance of the incline press and shoulder press was compromised compared with baseline performances. Manipulation of RI length of the first exercise affected performance of only the first set of 1 subsequent exercise in women. All RIs led to comparable levels of fatigue in men, indicating that reductions in load are necessary for subsequent exercises performed in sequence that stress similar agonist muscle groups when 10 repetitions are desired.

  3. Comparison of next generation sequencing technologies for transcriptome characterization

    Directory of Open Access Journals (Sweden)

    Soltis Douglas E

    2009-08-01

    Full Text Available Abstract Background We have developed a simulation approach to help determine the optimal mixture of sequencing methods for most complete and cost effective transcriptome sequencing. We compared simulation results for traditional capillary sequencing with "Next Generation" (NG ultra high-throughput technologies. The simulation model was parameterized using mappings of 130,000 cDNA sequence reads to the Arabidopsis genome (NCBI Accession SRA008180.19. We also generated 454-GS20 sequences and de novo assemblies for the basal eudicot California poppy (Eschscholzia californica and the magnoliid avocado (Persea americana using a variety of methods for cDNA synthesis. Results The Arabidopsis reads tagged more than 15,000 genes, including new splice variants and extended UTR regions. Of the total 134,791 reads (13.8 MB, 119,518 (88.7% mapped exactly to known exons, while 1,117 (0.8% mapped to introns, 11,524 (8.6% spanned annotated intron/exon boundaries, and 3,066 (2.3% extended beyond the end of annotated UTRs. Sequence-based inference of relative gene expression levels correlated significantly with microarray data. As expected, NG sequencing of normalized libraries tagged more genes than non-normalized libraries, although non-normalized libraries yielded more full-length cDNA sequences. The Arabidopsis data were used to simulate additional rounds of NG and traditional EST sequencing, and various combinations of each. Our simulations suggest a combination of FLX and Solexa sequencing for optimal transcriptome coverage at modest cost. We have also developed ESTcalc http://fgp.huck.psu.edu/NG_Sims/ngsim.pl, an online webtool, which allows users to explore the results of this study by specifying individualized costs and sequencing characteristics. Conclusion NG sequencing technologies are a highly flexible set of platforms that can be scaled to suit different project goals. In terms of sequence coverage alone, the NG sequencing is a dramatic advance

  4. Poly A tail length analysis of in vitro transcribed mRNA by LC-MS.

    Science.gov (United States)

    Beverly, Michael; Hagen, Caitlin; Slack, Olga

    2018-02-01

    The 3'-polyadenosine (poly A) tail of in vitro transcribed (IVT) mRNA was studied using liquid chromatography coupled to mass spectrometry (LC-MS). Poly A tails were cleaved from the mRNA using ribonuclease T1 followed by isolation with dT magnetic beads. Extracted tails were then analyzed by LC-MS which provided tail length information at single-nucleotide resolution. A 2100-nt mRNA with plasmid-encoded poly A tail lengths of either 27, 64, 100, or 117 nucleotides was used for these studies as enzymatically added poly A tails showed significant length heterogeneity. The number of As observed in the tails closely matched Sanger sequencing results of the DNA template, and even minor plasmid populations with sequence variations were detected. When the plasmid sequence contained a discreet number of poly As in the tail, analysis revealed a distribution that included tails longer than the encoded tail lengths. These observations were consistent with transcriptional slippage of T7 RNAP taking place within a poly A sequence. The type of RNAP did not alter the observed tail distribution, and comparison of T3, T7, and SP6 showed all three RNAPs produced equivalent tail length distributions. The addition of a sequence at the 3' end of the poly A tail did, however, produce narrower tail length distributions which supports a previously described model of slippage where the 3' end can be locked in place by having a G or C after the poly nucleotide region. Graphical abstract Determination of mRNA poly A tail length using magnetic beads and LC-MS.

  5. Assessing the genetic diversity of Cu resistance in mine tailings through high-throughput recovery of full-length copA genes

    Science.gov (United States)

    Li, Xiaofang; Zhu, Yong-Guan; Shaban, Babak; Bruxner, Timothy J. C.; Bond, Philip L.; Huang, Longbin

    2015-01-01

    Characterizing the genetic diversity of microbial copper (Cu) resistance at the community level remains challenging, mainly due to the polymorphism of the core functional gene copA. In this study, a local BLASTN method using a copA database built in this study was developed to recover full-length putative copA sequences from an assembled tailings metagenome; these sequences were then screened for potentially functioning CopA using conserved metal-binding motifs, inferred by evolutionary trace analysis of CopA sequences from known Cu resistant microorganisms. In total, 99 putative copA sequences were recovered from the tailings metagenome, out of which 70 were found with high potential to be functioning in Cu resistance. Phylogenetic analysis of selected copA sequences detected in the tailings metagenome showed that topology of the copA phylogeny is largely congruent with that of the 16S-based phylogeny of the tailings microbial community obtained in our previous study, indicating that the development of copA diversity in the tailings might be mainly through vertical descent with few lateral gene transfer events. The method established here can be used to explore copA (and potentially other metal resistance genes) diversity in any metagenome and has the potential to exhaust the full-length gene sequences for downstream analyses. PMID:26286020

  6. Subtype-independent near full-length HIV-1 genome sequencing and assembly to be used in large molecular epidemiological studies and clinical management.

    Science.gov (United States)

    Grossmann, Sebastian; Nowak, Piotr; Neogi, Ujjwal

    2015-01-01

    HIV-1 near full-length genome (HIV-NFLG) sequencing from plasma is an attractive multidimensional tool to apply in large-scale population-based molecular epidemiological studies. It also enables genotypic resistance testing (GRT) for all drug target sites allowing effective intervention strategies for control and prevention in high-risk population groups. Thus, the main objective of this study was to develop a simplified subtype-independent, cost- and labour-efficient HIV-NFLG protocol that can be used in clinical management as well as in molecular epidemiological studies. Plasma samples (n=30) were obtained from HIV-1B (n=10), HIV-1C (n=10), CRF01_AE (n=5) and CRF01_AG (n=5) infected individuals with minimum viral load >1120 copies/ml. The amplification was performed with two large amplicons of 5.5 kb and 3.7 kb, sequenced with 17 primers to obtain HIV-NFLG. GRT was validated against ViroSeq™ HIV-1 Genotyping System. After excluding four plasma samples with low-quality RNA, a total of 26 samples were attempted. Among them, NFLG was obtained from 24 (92%) samples with the lowest viral load being 3000 copies/ml. High (>99%) concordance was observed between HIV-NFLG and ViroSeq™ when determining the drug resistance mutations (DRMs). The N384I connection mutation was additionally detected by NFLG in two samples. Our high efficiency subtype-independent HIV-NFLG is a simple and promising approach to be used in large-scale molecular epidemiological studies. It will facilitate the understanding of the HIV-1 pandemic population dynamics and outline effective intervention strategies. Furthermore, it can potentially be applicable in clinical management of drug resistance by evaluating DRMs against all available antiretrovirals in a single assay.

  7. Hybrid Sequencing of Full-Length cDNA Transcripts of Stems and Leaves in Dendrobium officinale

    Directory of Open Access Journals (Sweden)

    Liu He

    2017-10-01

    Full Text Available Dendrobium officinale is an extremely valuable orchid used in traditional Chinese medicine, so sought after that it has a higher market value than gold. Although the expression profiles of some genes involved in the polysaccharide synthesis have previously been investigated, little research has been carried out on their alternatively spliced isoforms in D. officinale. In addition, information regarding the translocation of sugars from leaves to stems in D. officinale also remains limited. We analyzed the polysaccharide content of D. officinale leaves and stems, and completed in-depth transcriptome sequencing of these two diverse tissue types using second-generation sequencing (SGS and single-molecule real-time (SMRT sequencing technology. The results of this study yielded a digital inventory of gene and mRNA isoform expressions. A comparative analysis of both transcriptomes uncovered a total of 1414 differentially expressed genes, including 844 that were up-regulated and 570 that were down-regulated in stems. Of these genes, one sugars will eventually be exported transporter (SWEET and one sucrose transporter (SUT are expressed to a greater extent in D. officinale stems than in leaves. Two glycosyltransferase (GT and four cellulose synthase (Ces genes undergo a distinct degree of alternative splicing. In the stems, the content of polysaccharides is twice as much as that in the leaves. The differentially expressed GT and transcription factor (TF genes will be the focus of further study. The genes DoSWEET4 and DoSUT1 are significantly expressed in the stem, and are likely to be involved in sugar loading in the phloem.

  8. Use of length heterogeneity polymerase chain reaction (LH-PCR as non-invasive approach for dietary analysis of Svalbard reindeer, Rangifer tarandus platyrhynchus.

    Directory of Open Access Journals (Sweden)

    Sungbae Joo

    Full Text Available To efficiently investigate the forage preference of Svalbard reindeer (Rangifer tarandus platyrhynchus, we applied length-heterogeneity polymerase chain reaction (LH-PCR based on length differences of internal transcribed spacer (ITS regions of ribosomal RNA (rRNA to fecal samples from R. tarandus platyrhynchus. A length-heterogeneity (LH database was constructed using both collected potential food sources of Svalbard reindeer and fecal samples, followed by PCR, cloning and sequencing. In total, eighteen fecal samples were collected between 2011 and 2012 from 2 geographic regions and 15 samples were successfully amplified by PCR. The LH-PCR analysis detected abundant peaks, 18.6 peaks on an average per sample, ranging from 100 to 500 bp in size and showing distinct patterns associated with both regions and years of sample collection. Principal component analysis (PCA resulted in clustering of 15 fecal samples into 3 groups by the year of collection and region with a statistically significant difference at 99.9% level. The first 2 principal components (PCs explained 71.1% of the total variation among the samples. Through comparison with LH database and identification by cloning and sequencing, lichens (Stereocaulon sp. and Ochrolechia sp. and plant species (Salix polaris and Saxifraga oppositifolia were detected as the food sources that contributed most to the Svalbard reindeer diet. Our results suggest that the use of LH-PCR analysis would be a non-invasive and efficient monitoring tool for characterizing the foraging strategy of Svalbard reindeer. Additionally, combining sequence information would increase its resolving power in identification of foraged diet components.

  9. DNA barcode sequencing from old type specimens as a tool in taxonomy: a case study in the diverse genus Eois (Lepidoptera: Geometridae.

    Directory of Open Access Journals (Sweden)

    Patrick Strutzenberger

    Full Text Available In this study we report on the sequencing of the COI barcode region from 96 historical specimens (92 type specimens +4 non-types of Eois. Eois is a diverse clade of tropical geometrid moths and is the target of a number of ongoing studies on life-histories, phylogeny, co-evolution with host plants or parasitoids, and diversity patterns across temporal and spatial dimensions. The unequivocal application of valid names is crucial for all aspects of biodiversity research as well as monitoring and conservation efforts. The availability of barcodes from historical type specimens has the potential to facilitate the much-needed acceleration of species description. We performed non-destructive DNA extraction on the abdomens of Eois specimens between 79 and 157 years of age. We used six primer combinations (recovering between 109 and 130 bp each to target the full-length barcode sequence of each specimen. We were able to obtain sequences for 91 of 96 specimens (success rate 94.8%. Sequence length ranged from 121 bp to full barcode sequences (658 bp, the average sequence length was ~500 bp. We detected a moderately strong and statistically significant negative correlation between specimen age and total sequence length, which is in agreement with expectations. The abdomen proved to be an exceedingly valuable source of DNA in old specimens of Lepidoptera. Barcode sequences obtained in this study are currently being used in an effort towards a step-wise taxonomic revision of Eois. We encourage that DNA barcodes obtained from types specimens should be included in all species descriptions and revisions whenever feasible.

  10. Sequence determinants of human microsatellite variability

    Directory of Open Access Journals (Sweden)

    Jakobsson Mattias

    2009-12-01

    Full Text Available Abstract Background Microsatellite loci are frequently used in genomic studies of DNA sequence repeats and in population studies of genetic variability. To investigate the effect of sequence properties of microsatellites on their level of variability we have analyzed genotypes at 627 microsatellite loci in 1,048 worldwide individuals from the HGDP-CEPH cell line panel together with the DNA sequences of these microsatellites in the human RefSeq database. Results Calibrating PCR fragment lengths in individual genotypes by using the RefSeq sequence enabled us to infer repeat number in the HGDP-CEPH dataset and to calculate the mean number of repeats (as opposed to the mean PCR fragment length, under the assumption that differences in PCR fragment length reflect differences in the numbers of repeats in the embedded repeat sequences. We find the mean and maximum numbers of repeats across individuals to be positively correlated with heterozygosity. The size and composition of the repeat unit of a microsatellite are also important factors in predicting heterozygosity, with tetra-nucleotide repeat units high in G/C content leading to higher heterozygosity. Finally, we find that microsatellites containing more separate sets of repeated motifs generally have higher heterozygosity. Conclusions These results suggest that sequence properties of microsatellites have a significant impact in determining the features of human microsatellite variability.

  11. Generation of pseudo-random sequences for spread spectrum systems

    Science.gov (United States)

    Moser, R.; Stover, J.

    1985-05-01

    The characteristics of pseudo random radio signal sequences (PRS) are explored. The randomness of the PSR is a matter of artificially altering the sequence of binary digits broadcast. Autocorrelations of the two sequences shifted in time, if high, determine if the signals are the same and thus allow for position identification. Cross-correlation can also be calculated between sequences. Correlations closest to zero are obtained with large volume of prime numbers in the sequences. Techniques for selecting optimal and maximal lengths for the sequences are reviewed. If the correlations are near zero in the sequences, then signal channels can accommodate multiple users. Finally, Gold codes are discussed as a technique for maximizing the code lengths.

  12. The Complete Chloroplast Genome Sequences of the Medicinal Plant Forsythia suspensa (Oleaceae

    Directory of Open Access Journals (Sweden)

    Wenbin Wang

    2017-10-01

    Full Text Available Forsythia suspensa is an important medicinal plant and traditionally applied for the treatment of inflammation, pyrexia, gonorrhea, diabetes, and so on. However, there is limited sequence and genomic information available for F. suspensa. Here, we produced the complete chloroplast genomes of F. suspensa using Illumina sequencing technology. F. suspensa is the first sequenced member within the genus Forsythia (Oleaceae. The gene order and organization of the chloroplast genome of F. suspensa are similar to other Oleaceae chloroplast genomes. The F. suspensa chloroplast genome is 156,404 bp in length, exhibits a conserved quadripartite structure with a large single-copy (LSC; 87,159 bp region, and a small single-copy (SSC; 17,811 bp region interspersed between inverted repeat (IRa/b; 25,717 bp regions. A total of 114 unique genes were annotated, including 80 protein-coding genes, 30 tRNA, and four rRNA. The low GC content (37.8% and codon usage bias for A- or T-ending codons may largely affect gene codon usage. Sequence analysis identified a total of 26 forward repeats, 23 palindrome repeats with lengths >30 bp (identity > 90%, and 54 simple sequence repeats (SSRs with an average rate of 0.35 SSRs/kb. We predicted 52 RNA editing sites in the chloroplast of F. suspensa, all for C-to-U transitions. IR expansion or contraction and the divergent regions were analyzed among several species including the reported F. suspensa in this study. Phylogenetic analysis based on whole-plastome revealed that F. suspensa, as a member of the Oleaceae family, diverged relatively early from Lamiales. This study will contribute to strengthening medicinal resource conservation, molecular phylogenetic, and genetic engineering research investigations of this species.

  13. Genome-wide SNP discovery in tetraploid alfalfa using 454 sequencing and high resolution melting analysis

    Directory of Open Access Journals (Sweden)

    Zhao Patrick X

    2011-07-01

    Full Text Available Abstract Background Single nucleotide polymorphisms (SNPs are the most common type of sequence variation among plants and are often functionally important. We describe the use of 454 technology and high resolution melting analysis (HRM for high throughput SNP discovery in tetraploid alfalfa (Medicago sativa L., a species with high economic value but limited genomic resources. Results The alfalfa genotypes selected from M. sativa subsp. sativa var. 'Chilean' and M. sativa subsp. falcata var. 'Wisfal', which differ in water stress sensitivity, were used to prepare cDNA from tissue of clonally-propagated plants grown under either well-watered or water-stressed conditions, and then pooled for 454 sequencing. Based on 125.2 Mb of raw sequence, a total of 54,216 unique sequences were obtained including 24,144 tentative consensus (TCs sequences and 30,072 singletons, ranging from 100 bp to 6,662 bp in length, with an average length of 541 bp. We identified 40,661 candidate SNPs distributed throughout the genome. A sample of candidate SNPs were evaluated and validated using high resolution melting (HRM analysis. A total of 3,491 TCs harboring 20,270 candidate SNPs were located on the M. truncatula (MT 3.5.1 chromosomes. Gene Ontology assignments indicate that sequences obtained cover a broad range of GO categories. Conclusions We describe an efficient method to identify thousands of SNPs distributed throughout the alfalfa genome covering a broad range of GO categories. Validated SNPs represent valuable molecular marker resources that can be used to enhance marker density in linkage maps, identify potential factors involved in heterosis and genetic variation, and as tools for association mapping and genomic selection in alfalfa.

  14. Pairwise local structural alignment of RNA sequences with sequence similarity less than 40%

    DEFF Research Database (Denmark)

    Havgaard, Jakob Hull; Lyngsø, Rune B.; Stormo, Gary D.

    2005-01-01

    detect two genes with low sequence similarity, where the genes are part of a larger genomic region. Results: Here we present such an approach for pairwise local alignment which is based on FILDALIGN and the Sankoff algorithm for simultaneous structural alignment of multiple sequences. We include...... the ability to conduct mutual scans of two sequences of arbitrary length while searching for common local structural motifs of some maximum length. This drastically reduces the complexity of the algorithm. The scoring scheme includes structural parameters corresponding to those available for free energy....... The structure prediction performance for a family is typically around 0.7 using Matthews correlation coefficient. In case (2), the algorithm is successful at locating RNA families with an average sensitivity of 0.8 and a positive predictive value of 0.9 using a BLAST-like hit selection scheme. Availability...

  15. Magnetic nanoparticle imaging by random and maximum length sequences of inhomogeneous activation fields.

    Science.gov (United States)

    Baumgarten, Daniel; Eichardt, Roland; Crevecoeur, Guillaume; Supriyanto, Eko; Haueisen, Jens

    2013-01-01

    Biomedical applications of magnetic nanoparticles require a precise knowledge of their biodistribution. From multi-channel magnetorelaxometry measurements, this distribution can be determined by means of inverse methods. It was recently shown that the combination of sequential inhomogeneous excitation fields in these measurements is favorable regarding the reconstruction accuracy when compared to homogeneous activation . In this paper, approaches for the determination of activation sequences for these measurements are investigated. Therefor, consecutive activation of single coils, random activation patterns and families of m-sequences are examined in computer simulations involving a sample measurement setup and compared with respect to the relative condition number of the system matrix. We obtain that the values of this condition number decrease with larger number of measurement samples for all approaches. Random sequences and m-sequences reveal similar results with a significant reduction of the required number of samples. We conclude that the application of pseudo-random sequences for sequential activation in the magnetorelaxometry imaging of magnetic nanoparticles considerably reduces the number of required sequences while preserving the relevant measurement information.

  16. Totally optimal decision rules

    KAUST Repository

    Amin, Talha

    2017-11-22

    Optimality of decision rules (patterns) can be measured in many ways. One of these is referred to as length. Length signifies the number of terms in a decision rule and is optimally minimized. Another, coverage represents the width of a rule’s applicability and generality. As such, it is desirable to maximize coverage. A totally optimal decision rule is a decision rule that has the minimum possible length and the maximum possible coverage. This paper presents a method for determining the presence of totally optimal decision rules for “complete” decision tables (representations of total functions in which different variables can have domains of differing values). Depending on the cardinalities of the domains, we can either guarantee for each tuple of values of the function that totally optimal rules exist for each row of the table (as in the case of total Boolean functions where the cardinalities are equal to 2) or, for each row, we can find a tuple of values of the function for which totally optimal rules do not exist for this row.

  17. Totally optimal decision rules

    KAUST Repository

    Amin, Talha M.; Moshkov, Mikhail

    2017-01-01

    Optimality of decision rules (patterns) can be measured in many ways. One of these is referred to as length. Length signifies the number of terms in a decision rule and is optimally minimized. Another, coverage represents the width of a rule’s applicability and generality. As such, it is desirable to maximize coverage. A totally optimal decision rule is a decision rule that has the minimum possible length and the maximum possible coverage. This paper presents a method for determining the presence of totally optimal decision rules for “complete” decision tables (representations of total functions in which different variables can have domains of differing values). Depending on the cardinalities of the domains, we can either guarantee for each tuple of values of the function that totally optimal rules exist for each row of the table (as in the case of total Boolean functions where the cardinalities are equal to 2) or, for each row, we can find a tuple of values of the function for which totally optimal rules do not exist for this row.

  18. Full-length cDNA sequence cloning and analysis of Ghrelin in Cervus nippon%梅花鹿Ghrelin全长cDNA克隆及其序列分析

    Institute of Scientific and Technical Information of China (English)

    张曼; 金鑫; 田巧珍; 刘骄; 王云鹤; 杨银凤

    2017-01-01

    为获得梅花鹿Ghrelin eDNA全序列,以梅花鹿皱胃黏膜上皮组织提取的总RNA为模板,通过RT-PCR和RACE法克隆了梅花鹿皱胃中Ghrelin基因eDNA的全序列.结果表明梅花鹿Ghrelin eDNA序列全长为539 bp,其中5’非翻译区(5'UTR)为46 bp,3'UTR为128 bp,开放阅读框(ORF)为351 bp,该ORF编码116个氨基酸残基.将梅花鹿Ghrelin基因的eDNA与人和其他动物的Ghrelin相比,发现:梅花鹿Ghrelin与驯鹿、山羊、绵羊和牛的同源性达90.4%~99.1%;与恒河猴、人、猪、犬的同源性达76.6%~66.9%;与鸡和野鸽的同源性分别为36.4%和35.4%.研究表明Ghrelin的结构具有明显的种属特异性,因此Ghrelin在反刍动物体内可能有着重要的生理功能.%In order to obtain the full-length cDNA of Ghrelin in Cervus nippon,RT-PCR and RACE methods were used by using total RNA of abomasus tissue in C.nippon as template.The results of sequence analysis revealed a 539 bp length cDNA containing 46 bp 5'-untranslated region (5'UTR),128 bp 3'-untranslated region (3'UTR) and 351 bp open reading frame (ORF) encoding 116 amino acids.The cDNA sequence alignments of C.nippon Ghrelin gene with human and other animals showed that the cDNA sequence homology of C.nippon Ghrelin was 90.4%-99.1% to reindeer,goat,sheep and cattle,66.9%-76.6% with rhesus monkey,human,pig and dog,only 36.4% with chicken and C.livia.These results indicated that the structure of Ghrelin displayed an obvious varietal specificity,suggesting that Ghrelin might play an important physiological function role in ruminants.

  19. Image correlation method for DNA sequence alignment.

    Science.gov (United States)

    Curilem Saldías, Millaray; Villarroel Sassarini, Felipe; Muñoz Poblete, Carlos; Vargas Vásquez, Asticio; Maureira Butler, Iván

    2012-01-01

    The complexity of searches and the volume of genomic data make sequence alignment one of bioinformatics most active research areas. New alignment approaches have incorporated digital signal processing techniques. Among these, correlation methods are highly sensitive. This paper proposes a novel sequence alignment method based on 2-dimensional images, where each nucleic acid base is represented as a fixed gray intensity pixel. Query and known database sequences are coded to their pixel representation and sequence alignment is handled as object recognition in a scene problem. Query and database become object and scene, respectively. An image correlation process is carried out in order to search for the best match between them. Given that this procedure can be implemented in an optical correlator, the correlation could eventually be accomplished at light speed. This paper shows an initial research stage where results were "digitally" obtained by simulating an optical correlation of DNA sequences represented as images. A total of 303 queries (variable lengths from 50 to 4500 base pairs) and 100 scenes represented by 100 x 100 images each (in total, one million base pair database) were considered for the image correlation analysis. The results showed that correlations reached very high sensitivity (99.01%), specificity (98.99%) and outperformed BLAST when mutation numbers increased. However, digital correlation processes were hundred times slower than BLAST. We are currently starting an initiative to evaluate the correlation speed process of a real experimental optical correlator. By doing this, we expect to fully exploit optical correlation light properties. As the optical correlator works jointly with the computer, digital algorithms should also be optimized. The results presented in this paper are encouraging and support the study of image correlation methods on sequence alignment.

  20. Young Children's Understandings of Length Measurement: Evaluating a Learning Trajectory

    Science.gov (United States)

    Szilagyi, Janka; Clements, Douglas H.; Sarama, Julie

    2013-01-01

    This study investigated the development of length measurement ideas in students from prekindergarten through 2nd grade. The main purpose was to evaluate and elaborate the developmental progression, or levels of thinking, of a hypothesized learning trajectory for length measurement to ensure that the sequence of levels of thinking is consistent…

  1. Allelic sequence variations in the hypervariable region of a T-cell receptor β chain: Correlation with restriction fragment length polymorphism in human families and populations

    International Nuclear Information System (INIS)

    Robinson, M.A.

    1989-01-01

    Direct sequence analysis of the human T-cell antigen receptor (TCR) V β1 variable gene identified a single base-pair allelic variation (C/G) located within the coding region. This change results in substitution of a histidine (CAC) for a glutamine (CAG) at position 48 of the TCR β chain, a position predicted to be in the TCR antigen binding site. The V β1 polymorphism was found by DNA sequence analysis of V β1 genes from seven unrelated individuals; V β1 genes were amplified by the polymerase chain reaction, the amplified fragments were cloned into M13 phage vectors, and sequences were determined. To determined the inheritance patterns of the V β1 substitution and to test correlation with V β1 restriction fragment length polymorphism detected with Pvu II and Taq I, allele-specific oligonucleotides were constructed and used to characterize amplified DNA samples. Seventy unrelated individuals and six families were tested for both restriction fragment length polymorphism and for the V β1 substitution. The correlation was also tested using amplified, size-selected, Pvu II- and Taq I-digested DNA samples from heterozygotes. Pvu II allele 1 (61/70) and Taq I allele 1 (66/70) were found to be correlated with the substitution giving rise to a histidine at position 48. Because there are exceptions to the correlation, the use of specific probes to characterize allelic forms of TCR variable genes will provide important tools for studies of basic TCR genetics and disease associations

  2. Simple sequence repeat marker development from bacterial artificial chromosome end sequences and expressed sequence tags of flax (Linum usitatissimum L.).

    Science.gov (United States)

    Cloutier, Sylvie; Miranda, Evelyn; Ward, Kerry; Radovanovic, Natasa; Reimer, Elsa; Walichnowski, Andrzej; Datla, Raju; Rowland, Gordon; Duguid, Scott; Ragupathy, Raja

    2012-08-01

    Flax is an important oilseed crop in North America and is mostly grown as a fibre crop in Europe. As a self-pollinated diploid with a small estimated genome size of ~370 Mb, flax is well suited for fast progress in genomics. In the last few years, important genetic resources have been developed for this crop. Here, we describe the assessment and comparative analyses of 1,506 putative simple sequence repeats (SSRs) of which, 1,164 were derived from BAC-end sequences (BESs) and 342 from expressed sequence tags (ESTs). The SSRs were assessed on a panel of 16 flax accessions with 673 (58 %) and 145 (42 %) primer pairs being polymorphic in the BESs and ESTs, respectively. With 818 novel polymorphic SSR primer pairs reported in this study, the repertoire of available SSRs in flax has more than doubled from the combined total of 508 of all previous reports. Among nucleotide motifs, trinucleotides were the most abundant irrespective of the class, but dinucleotides were the most polymorphic. SSR length was also positively correlated with polymorphism. Two dinucleotide (AT/TA and AG/GA) and two trinucleotide (AAT/ATA/TAA and GAA/AGA/AAG) motifs and their iterations, different from those reported in many other crops, accounted for more than half of all the SSRs and were also more polymorphic (63.4 %) than the rest of the markers (42.7 %). This improved resource promises to be useful in genetic, quantitative trait loci (QTL) and association mapping as well as for anchoring the physical/genetic map with the whole genome shotgun reference sequence of flax.

  3. High-throughput sequencing of 16S rRNA gene amplicons: effects of extraction procedure, primer length and annealing temperature.

    Science.gov (United States)

    Sergeant, Martin J; Constantinidou, Chrystala; Cogan, Tristan; Penn, Charles W; Pallen, Mark J

    2012-01-01

    The analysis of 16S-rDNA sequences to assess the bacterial community composition of a sample is a widely used technique that has increased with the advent of high throughput sequencing. Although considerable effort has been devoted to identifying the most informative region of the 16S gene and the optimal informatics procedures to process the data, little attention has been paid to the PCR step, in particular annealing temperature and primer length. To address this, amplicons derived from 16S-rDNA were generated from chicken caecal content DNA using different annealing temperatures, primers and different DNA extraction procedures. The amplicons were pyrosequenced to determine the optimal protocols for capture of maximum bacterial diversity from a chicken caecal sample. Even at very low annealing temperatures there was little effect on the community structure, although the abundance of some OTUs such as Bifidobacterium increased. Using shorter primers did not reveal any novel OTUs but did change the community profile obtained. Mechanical disruption of the sample by bead beating had a significant effect on the results obtained, as did repeated freezing and thawing. In conclusion, existing primers and standard annealing temperatures captured as much diversity as lower annealing temperatures and shorter primers.

  4. Factors affecting the duration of nestling period and fledging order in Tengmalm's owl (Aegolius funereus: effect of wing length and hatching sequence.

    Directory of Open Access Journals (Sweden)

    Marek Kouba

    Full Text Available In altricial birds, the nestling period is an important part of the breeding phase because the juveniles may spend quite a long time in the nest, with associated high energy costs for the parents. The length of the nestling period can be variable and its duration may be influenced by both biotic and abiotic factors; however, studies of this have mostly been undertaken on passerine birds. We studied individual duration of nestling period of 98 Tengmalm's owl chicks (Aegolius funereus at 27 nests during five breeding seasons using a camera and chip system and radio-telemetry. We found the nestlings stayed in the nest box for 27 - 38 days from hatching (mean ± SD, 32.4 ± 2.2 days. The individual duration of nestling period was negatively related to wing length, but no formally significant effect was found for body weight, sex, prey availability and/or weather conditions. The fledging sequence of individual nestlings was primarily related to hatching order; no relationship with wing length and/or other factors was found in this case. We suggest the length of wing is the most important measure of body condition and individual quality in Tengmalm's owl young determining the duration of the nestling period. Other differences from passerines (e.g., the lack of effect of weather or prey availability on nestling period are considered likely to be due to different life-history traits, in particular different food habits and nesting sites and greater risk of nest predation among passerines.

  5. Computationally Efficient Chaotic Spreading Sequence Selection for Asynchronous DS-CDMA

    Directory of Open Access Journals (Sweden)

    Litviņenko Anna

    2017-12-01

    Full Text Available The choice of the spreading sequence for asynchronous direct-sequence code-division multiple-access (DS-CDMA systems plays a crucial role for the mitigation of multiple-access interference. Considering the rich dynamics of chaotic sequences, their use for spreading allows overcoming the limitations of the classical spreading sequences. However, to ensure low cross-correlation between the sequences, careful selection must be performed. This paper presents a novel exhaustive search algorithm, which allows finding sets of chaotic spreading sequences of required length with a particularly low mutual cross-correlation. The efficiency of the search is verified by simulations, which show a significant advantage compared to non-selected chaotic sequences. Moreover, the impact of sequence length on the efficiency of the selection is studied.

  6. Complete Sequence and Analysis of Coconut Palm (Cocos nucifera) Mitochondrial Genome.

    Science.gov (United States)

    Aljohi, Hasan Awad; Liu, Wanfei; Lin, Qiang; Zhao, Yuhui; Zeng, Jingyao; Alamer, Ali; Alanazi, Ibrahim O; Alawad, Abdullah O; Al-Sadi, Abdullah M; Hu, Songnian; Yu, Jun

    2016-01-01

    Coconut (Cocos nucifera L.), a member of the palm family (Arecaceae), is one of the most economically important crops in tropics, serving as an important source of food, drink, fuel, medicine, and construction material. Here we report an assembly of the coconut (C. nucifera, Oman local Tall cultivar) mitochondrial (mt) genome based on next-generation sequencing data. This genome, 678,653bp in length and 45.5% in GC content, encodes 72 proteins, 9 pseudogenes, 23 tRNAs, and 3 ribosomal RNAs. Within the assembly, we find that the chloroplast (cp) derived regions account for 5.07% of the total assembly length, including 13 proteins, 2 pseudogenes, and 11 tRNAs. The mt genome has a relatively large fraction of repeat content (17.26%), including both forward (tandem) and inverted (palindromic) repeats. Sequence variation analysis shows that the Ti/Tv ratio of the mt genome is lower as compared to that of the nuclear genome and neutral expectation. By combining public RNA-Seq data for coconut, we identify 734 RNA editing sites supported by at least two datasets. In summary, our data provides the second complete mt genome sequence in the family Arecaceae, essential for further investigations on mitochondrial biology of seed plants.

  7. Draft Genome Sequence of a “Candidatus Liberibacter europaeus” Strain Assembled from Broom Psyllids (Arytainilla spartiophila) from New Zealand

    Science.gov (United States)

    Thompson, Sarah M.; Kalamorz, Falk; David, Charles; Addison, Shea M.; Smith, Grant R.

    2018-01-01

    ABSTRACT Here, we report the draft genome sequence of “Candidatus Liberibacter europaeus” ASNZ1, assembled from broom psyllids (Arytainilla spartiophila) from New Zealand. The assembly comprises 15 contigs, with a total length of 1.33 Mb and a G+C content of 33.5%. PMID:29773636

  8. Genomic insight into the common carp (Cyprinus carpio genome by sequencing analysis of BAC-end sequences

    Directory of Open Access Journals (Sweden)

    Wang Jintu

    2011-04-01

    Full Text Available Abstract Background Common carp is one of the most important aquaculture teleost fish in the world. Common carp and other closely related Cyprinidae species provide over 30% aquaculture production in the world. However, common carp genomic resources are still relatively underdeveloped. BAC end sequences (BES are important resources for genome research on BAC-anchored genetic marker development, linkage map and physical map integration, and whole genome sequence assembling and scaffolding. Result To develop such valuable resources in common carp (Cyprinus carpio, a total of 40,224 BAC clones were sequenced on both ends, generating 65,720 clean BES with an average read length of 647 bp after sequence processing, representing 42,522,168 bp or 2.5% of common carp genome. The first survey of common carp genome was conducted with various bioinformatics tools. The common carp genome contains over 17.3% of repetitive elements with GC content of 36.8% and 518 transposon ORFs. To identify and develop BAC-anchored microsatellite markers, a total of 13,581 microsatellites were detected from 10,355 BES. The coding region of 7,127 genes were recognized from 9,443 BES on 7,453 BACs, with 1,990 BACs have genes on both ends. To evaluate the similarity to the genome of closely related zebrafish, BES of common carp were aligned against zebrafish genome. A total of 39,335 BES of common carp have conserved homologs on zebrafish genome which demonstrated the high similarity between zebrafish and common carp genomes, indicating the feasibility of comparative mapping between zebrafish and common carp once we have physical map of common carp. Conclusion BAC end sequences are great resources for the first genome wide survey of common carp. The repetitive DNA was estimated to be approximate 28% of common carp genome, indicating the higher complexity of the genome. Comparative analysis had mapped around 40,000 BES to zebrafish genome and established over 3

  9. Genomic insight into the common carp (Cyprinus carpio) genome by sequencing analysis of BAC-end sequences

    Science.gov (United States)

    2011-01-01

    Background Common carp is one of the most important aquaculture teleost fish in the world. Common carp and other closely related Cyprinidae species provide over 30% aquaculture production in the world. However, common carp genomic resources are still relatively underdeveloped. BAC end sequences (BES) are important resources for genome research on BAC-anchored genetic marker development, linkage map and physical map integration, and whole genome sequence assembling and scaffolding. Result To develop such valuable resources in common carp (Cyprinus carpio), a total of 40,224 BAC clones were sequenced on both ends, generating 65,720 clean BES with an average read length of 647 bp after sequence processing, representing 42,522,168 bp or 2.5% of common carp genome. The first survey of common carp genome was conducted with various bioinformatics tools. The common carp genome contains over 17.3% of repetitive elements with GC content of 36.8% and 518 transposon ORFs. To identify and develop BAC-anchored microsatellite markers, a total of 13,581 microsatellites were detected from 10,355 BES. The coding region of 7,127 genes were recognized from 9,443 BES on 7,453 BACs, with 1,990 BACs have genes on both ends. To evaluate the similarity to the genome of closely related zebrafish, BES of common carp were aligned against zebrafish genome. A total of 39,335 BES of common carp have conserved homologs on zebrafish genome which demonstrated the high similarity between zebrafish and common carp genomes, indicating the feasibility of comparative mapping between zebrafish and common carp once we have physical map of common carp. Conclusion BAC end sequences are great resources for the first genome wide survey of common carp. The repetitive DNA was estimated to be approximate 28% of common carp genome, indicating the higher complexity of the genome. Comparative analysis had mapped around 40,000 BES to zebrafish genome and established over 3,100 microsyntenies, covering over 50% of

  10. Direct chloroplast sequencing: comparison of sequencing platforms and analysis tools for whole chloroplast barcoding.

    Directory of Open Access Journals (Sweden)

    Marta Brozynska

    Full Text Available Direct sequencing of total plant DNA using next generation sequencing technologies generates a whole chloroplast genome sequence that has the potential to provide a barcode for use in plant and food identification. Advances in DNA sequencing platforms may make this an attractive approach for routine plant identification. The HiSeq (Illumina and Ion Torrent (Life Technology sequencing platforms were used to sequence total DNA from rice to identify polymorphisms in the whole chloroplast genome sequence of a wild rice plant relative to cultivated rice (cv. Nipponbare. Consensus chloroplast sequences were produced by mapping sequence reads to the reference rice chloroplast genome or by de novo assembly and mapping of the resulting contigs to the reference sequence. A total of 122 polymorphisms (SNPs and indels between the wild and cultivated rice chloroplasts were predicted by these different sequencing and analysis methods. Of these, a total of 102 polymorphisms including 90 SNPs were predicted by both platforms. Indels were more variable with different sequencing methods, with almost all discrepancies found in homopolymers. The Ion Torrent platform gave no apparent false SNP but was less reliable for indels. The methods should be suitable for routine barcoding using appropriate combinations of sequencing platform and data analysis.

  11. The complete chloroplast genome sequence of Euonymus japonicus (Celastraceae).

    Science.gov (United States)

    Choi, Kyoung Su; Park, SeonJoo

    2016-09-01

    The complete chloroplast (cp) genome sequence of the Euonymus japonicus, the first sequenced of the genus Euonymus, was reported in this study. The total length was 157 637 bp, containing a pair of 26 678 bp inverted repeat region (IR), which were separated by small single copy (SSC) region and large single copy (LSC) region of 18 340 bp and 85 941 bp, respectively. This genome contains 107 unique genes, including 74 coding genes, four rRNA genes, and 29 tRNA genes. Seventeen genes contain intron of E. japonicus, of which three genes (clpP, ycf3, and rps12) include two introns. The maximum likelihood (ML) phylogenetic analysis revealed that E. japonicus was closely related to Manihot and Populus.

  12. The total body length and body weight examination among gabus Sentani fish population, Oxyeleotris heterodon, Weber 1907 (Pisces: Eleotridae) of Sentani lake, Papua, Indonesia

    Science.gov (United States)

    Sriyani, E. D.; Abinawanto, Bowolaksono, A.

    2017-07-01

    The gabus Sentani fish lived in the Sentani Lake, Papua, since million years ago. Nowadays, the population of those species is getting extinct because of the overexploitation, whereas the culture effort of this species has not been developed, yet. The purpose of the study was to examine the total body length and body weight collected from some villages surrounding Sentani Lake such as Ifar village, Sosiri village, and Putali village. The body weight average of gabus fish from Ifar village, Sosiri village, and Putali village were 373.53 g, 426.86 g, and 118.34 g respectively. While the total body length average of gabus Sentani fish from Ifar village, Sosiri village, and Putali village were 279.30 mm, 223.30 mm and 222.06 mm, respectively. The growth model was W = 0.021067L3.086 with R2 value was 35.8 %, and r value was 0.598. Gabus Sentani fish, Oxyeleotris heterodon (Weber 1907) exhibited positive allometric (b > 3).

  13. Phylogeny of the Serrasalmidae (Characiformes based on mitochondrial DNA sequences

    Directory of Open Access Journals (Sweden)

    Guillermo Ortí

    2008-01-01

    Full Text Available Previous studies based on DNA sequences of mitochondrial (mt rRNA genes showed three main groups within the subfamily Serrasalminae: (1 a "pacu" clade of herbivores (Colossoma, Mylossoma, Piaractus; (2 the "Myleus" clade (Myleus, Mylesinus, Tometes, Ossubtus; and (3 the "piranha" clade (Serrasalmus, Pygocentrus, Pygopristis, Pristobrycon, Catoprion, Metynnis. The genus Acnodon was placed as the sister taxon of clade (2+3. However, poor resolution within each clade was obtained due to low levels of variation among rRNA gene sequences. Complete sequences of the hypervariable mtDNA control region for a total of 45 taxa, and additional sequences of 12S and 16S rRNA from a total of 74 taxa representing all genera in the family are now presented to address intragroup relationships. Control region sequences of several serrasalmid species exhibit tandem repeats of short motifs (12 to 33 bp in the 3' end of this region, accounting for substantial length variation. Bayesian inference and maximum parsimony analyses of these sequences identify the same groupings as before and provide further evidence to support the following observations: (a Serrasalmus gouldingi and species of Pristobrycon (non-striolatus form a monophyletic group that is the sister group to other species of Serrasalmus and Pygocentrus; (b Catoprion, Pygopristis, and Pristobrycon striolatus form a well supported clade, sister to the group described above; (c some taxa assigned to the genus Myloplus (M. asterias, M tiete, M ternetzi, and M rubripinnis form a well supported group whereas other Myloplus species remain with uncertain affinities (d Mylesinus, Tometes and Myleus setiger form a monophyletic group.

  14. Genome Survey Sequencing of Luffa Cylindrica L. and Microsatellite High Resolution Melting (SSR-HRM) Analysis for Genetic Relationship of Luffa Genotypes.

    Science.gov (United States)

    An, Jianyu; Yin, Mengqi; Zhang, Qin; Gong, Dongting; Jia, Xiaowen; Guan, Yajing; Hu, Jin

    2017-09-11

    Luffa cylindrica (L.) Roem. is an economically important vegetable crop in China. However, the genomic information on this species is currently unknown. In this study, for the first time, a genome survey of L. cylindrica was carried out using next-generation sequencing (NGS) technology. In total, 43.40 Gb sequence data of L. cylindrica , about 54.94× coverage of the estimated genome size of 789.97 Mb, were obtained from HiSeq 2500 sequencing, in which the guanine plus cytosine (GC) content was calculated to be 37.90%. The heterozygosity of genome sequences was only 0.24%. In total, 1,913,731 contigs (>200 bp) with 525 bp N 50 length and 1,410,117 scaffolds (>200 bp) with 885.01 Mb total length were obtained. From the initial assembled L. cylindrica genome, 431,234 microsatellites (SSRs) (≥5 repeats) were identified. The motif types of SSR repeats included 62.88% di-nucleotide, 31.03% tri-nucleotide, 4.59% tetra-nucleotide, 0.96% penta-nucleotide and 0.54% hexa-nucleotide. Eighty genomic SSR markers were developed, and 51/80 primers could be used in both "Zheda 23" and "Zheda 83". Nineteen SSRs were used to investigate the genetic diversity among 32 accessions through SSR-HRM analysis. The unweighted pair group method analysis (UPGMA) dendrogram tree was built by calculating the SSR-HRM raw data. SSR-HRM could be effectively used for genotype relationship analysis of Luffa species.

  15. How Many Protein Sequences Fold to a Given Structure? A Coevolutionary Analysis.

    Science.gov (United States)

    Tian, Pengfei; Best, Robert B

    2017-10-17

    Quantifying the relationship between protein sequence and structure is key to understanding the protein universe. A fundamental measure of this relationship is the total number of amino acid sequences that can fold to a target protein structure, known as the "sequence capacity," which has been suggested as a proxy for how designable a given protein fold is. Although sequence capacity has been extensively studied using lattice models and theory, numerical estimates for real protein structures are currently lacking. In this work, we have quantitatively estimated the sequence capacity of 10 proteins with a variety of different structures using a statistical model based on residue-residue co-evolution to capture the variation of sequences from the same protein family. Remarkably, we find that even for the smallest protein folds, such as the WW domain, the number of foldable sequences is extremely large, exceeding the Avogadro constant. In agreement with earlier theoretical work, the calculated sequence capacity is positively correlated with the size of the protein, or better, the density of contacts. This allows the absolute sequence capacity of a given protein to be approximately predicted from its structure. On the other hand, the relative sequence capacity, i.e., normalized by the total number of possible sequences, is an extremely tiny number and is strongly anti-correlated with the protein length. Thus, although there may be more foldable sequences for larger proteins, it will be much harder to find them. Lastly, we have correlated the evolutionary age of proteins in the CATH database with their sequence capacity as predicted by our model. The results suggest a trade-off between the opposing requirements of high designability and the likelihood of a novel fold emerging by chance. Published by Elsevier Inc.

  16. Fiscal 2000 report on result of the full-length cDNA structure analysis; 2000 nendo kanzen cho cDNA kozo kaiseki seika hokokusho

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    2001-03-01

    This paper explains the results of research on full-length cDNA structure analysis for the period from April, 2000 to March, 2001. The outline of human genome sequence was published in June, 2000. In Japan, human gene analysis was such that, as the basic technology of the bio industry, a millennium project was decided in the budget of fiscal 2000. The full-length cDNA structure analysis is the core of the project. The libraries of cDNA were prepared using full-length and more than 4-5kbp-long cDNAs by oligo-capping method. It began from determining partial sequence data at end cDNA, and then, with new clones selected therefrom, full-length human cDNA sequence data were determined. The partial sequence data determined by fiscal 2000 were 1,035,000 clones while the full-length sequence data were 12,144 clones. The sequence data obtained were analyzed by homology search and translated into amino acid coding sequences, with predictions conducted on protein functions. A clustering method was examined that selects new clones from partial sequences. Database was constructed on gene expression profiles and disease-related gene sequence data. (NEDO)

  17. Photoluminescence Enhancement of Poly(3-methylthiophene Nanowires upon Length Variable DNA Hybridization

    Directory of Open Access Journals (Sweden)

    Jingyuan Huang

    2018-01-01

    Full Text Available The use of low-dimensional inorganic or organic nanomaterials has advantages for DNA and protein recognition due to their sensitivity, accuracy, and physical size matching. In this research, poly(3-methylthiophene (P3MT nanowires (NWs are electrochemically prepared with dopant followed by functionalization with probe DNA (pDNA sequence through electrostatic interaction. Various lengths of pDNA sequences (10-, 20- and 30-mer are conjugated to the P3MT NWs respectively followed with hybridization with their complementary target DNA (tDNA sequences. The nanoscale photoluminescence (PL properties of the P3MT NWs are studied throughout the whole process at solid state. In addition, the correlation between the PL enhancement and the double helix DNA with various lengths is demonstrated.

  18. An unbiased stereological method for efficiently quantifying the innervation of the heart and other organs based on total length estimations

    DEFF Research Database (Denmark)

    Mühlfeld, Christian; Papadakis, Tamara; Krasteva, Gabriela

    2010-01-01

    Quantitative information about the innervation is essential to analyze the structure-function relationships of organs. So far, there has been no unbiased stereological tool for this purpose. This study presents a new unbiased and efficient method to quantify the total length of axons in a given...... reference volume, illustrated on the left ventricle of the mouse heart. The method is based on the following steps: 1) estimation of the reference volume; 2) randomization of location and orientation using appropriate sampling techniques; 3) counting of nerve fiber profiles hit by a defined test area within...

  19. Method and apparatus for biological sequence comparison

    Science.gov (United States)

    Marr, T.G.; Chang, W.I.

    1997-12-23

    A method and apparatus are disclosed for comparing biological sequences from a known source of sequences, with a subject (query) sequence. The apparatus takes as input a set of target similarity levels (such as evolutionary distances in units of PAM), and finds all fragments of known sequences that are similar to the subject sequence at each target similarity level, and are long enough to be statistically significant. The invention device filters out fragments from the known sequences that are too short, or have a lower average similarity to the subject sequence than is required by each target similarity level. The subject sequence is then compared only to the remaining known sequences to find the best matches. The filtering member divides the subject sequence into overlapping blocks, each block being sufficiently large to contain a minimum-length alignment from a known sequence. For each block, the filter member compares the block with every possible short fragment in the known sequences and determines a best match for each comparison. The determined set of short fragment best matches for the block provide an upper threshold on alignment values. Regions of a certain length from the known sequences that have a mean alignment value upper threshold greater than a target unit score are concatenated to form a union. The current block is compared to the union and provides an indication of best local alignment with the subject sequence. 5 figs.

  20. Hardness of deriving invertible sequences from finite state machines

    DEFF Research Database (Denmark)

    Hierons, Robert M.; Mousavi, Mohammad Reza; Thomsen, Michael Kirkedal

    2017-01-01

    invertible sequences; these allow one to construct additional UIOs once a UIO has been found. We consider three optimisation problems associated with invertible sequences: deciding whether there is a (proper) invertible sequence of length at least K; deciding whether there is a set of invertible sequences...

  1. Lower bounds on the total cross section and the slope parameter for some measurable sequences of s→infinity

    International Nuclear Information System (INIS)

    Uchiyama, T.

    1978-01-01

    The lower bounds on the total cross section and the slope parameter are obtained on the basis of the analyticity and polynomial upper boundedness of the scattering amplitude and the unitarity of the S matrix: sigma/sub tot/> or =Cs -6 (logs) -2 , B> or =Cs -5 (logs) -4 for some measurable sequences of s→infinity. These bounds hold for any t in 0 2 /sub π/It is unnecessary in order to obtain our bounds that the scattering amplitude has the crossing even property. If we assume this property, we can suppress the logarithmic factors of our bounds. Also we obtain our lower bounds for any sequence of s→infinity, if we take the average scattering amplitude

  2. Towards predicting the encoding capability of MR fingerprinting sequences.

    Science.gov (United States)

    Sommer, K; Amthor, T; Doneva, M; Koken, P; Meineke, J; Börnert, P

    2017-09-01

    Sequence optimization and appropriate sequence selection is still an unmet need in magnetic resonance fingerprinting (MRF). The main challenge in MRF sequence design is the lack of an appropriate measure of the sequence's encoding capability. To find such a measure, three different candidates for judging the encoding capability have been investigated: local and global dot-product-based measures judging dictionary entry similarity as well as a Monte Carlo method that evaluates the noise propagation properties of an MRF sequence. Consistency of these measures for different sequence lengths as well as the capability to predict actual sequence performance in both phantom and in vivo measurements was analyzed. While the dot-product-based measures yielded inconsistent results for different sequence lengths, the Monte Carlo method was in a good agreement with phantom experiments. In particular, the Monte Carlo method could accurately predict the performance of different flip angle patterns in actual measurements. The proposed Monte Carlo method provides an appropriate measure of MRF sequence encoding capability and may be used for sequence optimization. Copyright © 2017 Elsevier Inc. All rights reserved.

  3. XenDB: Full length cDNA prediction and cross species mapping in Xenopus laevis

    Directory of Open Access Journals (Sweden)

    Giegerich Robert

    2005-09-01

    Full Text Available Abstract Background Research using the model system Xenopus laevis has provided critical insights into the mechanisms of early vertebrate development and cell biology. Large scale sequencing efforts have provided an increasingly important resource for researchers. To provide full advantage of the available sequence, we have analyzed 350,468 Xenopus laevis Expressed Sequence Tags (ESTs both to identify full length protein encoding sequences and to develop a unique database system to support comparative approaches between X. laevis and other model systems. Description Using a suffix array based clustering approach, we have identified 25,971 clusters and 40,877 singleton sequences. Generation of a consensus sequence for each cluster resulted in 31,353 tentative contig and 4,801 singleton sequences. Using both BLASTX and FASTY comparison to five model organisms and the NR protein database, more than 15,000 sequences are predicted to encode full length proteins and these have been matched to publicly available IMAGE clones when available. Each sequence has been compared to the KOG database and ~67% of the sequences have been assigned a putative functional category. Based on sequence homology to mouse and human, putative GO annotations have been determined. Conclusion The results of the analysis have been stored in a publicly available database XenDB http://bibiserv.techfak.uni-bielefeld.de/xendb/. A unique capability of the database is the ability to batch upload cross species queries to identify potential Xenopus homologues and their associated full length clones. Examples are provided including mapping of microarray results and application of 'in silico' analysis. The ability to quickly translate the results of various species into 'Xenopus-centric' information should greatly enhance comparative embryological approaches. Supplementary material can be found at http://bibiserv.techfak.uni-bielefeld.de/xendb/.

  4. Characterization of Erwinia amylovora strains from different host plants using repetitive-sequences PCR analysis, and restriction fragment length polymorphism and short-sequence DNA repeats of plasmid pEA29.

    Science.gov (United States)

    Barionovi, D; Giorgi, S; Stoeger, A R; Ruppitsch, W; Scortichini, M

    2006-05-01

    The three main aims of the study were the assessment of the genetic relationship between a deviating Erwinia amylovora strain isolated from Amelanchier sp. (Maloideae) grown in Canada and other strains from Maloideae and Rosoideae, the investigation of the variability of the PstI fragment of the pEA29 plasmid using restriction fragment length polymorphism (RFLP) analysis and the determination of the number of short-sequence DNA repeats (SSR) by DNA sequence analysis in representative strains. Ninety-three strains obtained from 12 plant genera and different geographical locations were examined by repetitive-sequences PCR using Enterobacterial Repetitive Intergenic Consensus, BOX and Repetitive Extragenic Palindromic primer sets. Upon the unweighted pair group method with arithmetic mean analysis, a deviating strain from Amelanchier sp. was analysed using amplified ribosomal DNA restriction analysis (ARDRA) analysis and the sequencing of the 16S rDNA gene. This strain showed 99% similarity to other E. amylovora strains in the 16S gene and the same banding pattern with ARDRA. The RFLP analysis of pEA29 plasmid using MspI and Sau3A restriction enzymes showed a higher variability than that previously observed and no clear-cut grouping of the strains was possible. The number of SSR units reiterated two to 12 times. The strains obtained from pear orchards showing for the first time symptoms of fire blight had a low number of SSR units. The strains from Maloideae exhibit a wider genetic variability than previously thought. The RFLP analysis of a fragment of the pEA29 plasmid would not seem a reliable method for typing E. amylovora strains. A low number of SSR units was observed with first epidemics of fire blight. The current detection techniques are mainly based on the genetic similarities observed within the strains from the cultivated tree-fruit crops. For a more reliable detection of the fire blight pathogen also in wild and ornamentals Rosaceous plants the genetic

  5. The Use of Trust Regions in Kohn-Sham Total Energy Minimization

    International Nuclear Information System (INIS)

    Yang, Chao; Meza, Juan C.; Wang, Lin-wang

    2006-01-01

    The Self Consistent Field (SCF) iteration, widely used for computing the ground state energy and the corresponding single particle wave functions associated with a many-electron atomistic system, is viewed in this paper as an optimization procedure that minimizes the Kohn-Sham total energy indirectly by minimizing a sequence of quadratic surrogate functions. We point out the similarity and difference between the total energy and the surrogate, and show how the SCF iteration can fail when the minimizer of the surrogate produces an increase in the KS total energy. A trust region technique is introduced as a way to restrict the update of the wave functions within a small neighborhood of an approximate solution at which the gradient of the total energy agrees with that of the surrogate. The use of trust region in SCF is not new. However, it has been observed that directly applying a trust region based SCF(TRSCF) to the Kohn-Sham total energy often leads to slow convergence. We propose to use TRSCF within a direct constrained minimization(DCM) algorithm we developed in dcm. The key ingredients of the DCM algorithm involve projecting the total energy function into a sequence of subspaces of small dimensions and seeking the minimizer of the total energy function within each subspace. The minimizer of a subspace energy function, which is computed by TRSCF, not only provides a search direction along which the KS total energy function decreases but also gives an optimal 'step-length' that yields a sufficient decrease in total energy. A numerical example is provided to demonstrate that the combination of TRSCF and DCM is more efficient than SCF

  6. RANDNA: a random DNA sequence generator.

    Science.gov (United States)

    Piva, Francesco; Principato, Giovanni

    2006-01-01

    Monte Carlo simulations are useful to verify the significance of data. Genomic regularities, such as the nucleotide correlations or the not uniform distribution of the motifs throughout genomic or mature mRNA sequences, exist and their significance can be checked by means of the Monte Carlo test. The test needs good quality random sequences in order to work, moreover they should have the same nucleotide distribution as the sequences in which the regularities have been found. Random DNA sequences are also useful to estimate the background score of an alignment, that is a threshold below which the resulting score is merely due to chance. We have developed RANDNA, a free software which allows to produce random DNA or RNA sequences setting both their length and the percentage of nucleotide composition. Sequences having the same nucleotide distribution of exonic, intronic or intergenic sequences can be generated. Its graphic interface makes it possible to easily set the parameters that characterize the sequences being produced and saved in a text format file. The pseudo-random number generator function of Borland Delphi 6 is used, since it guarantees a good randomness, a long cycle length and a high speed. We have checked the quality of sequences generated by the software, by means of well-known tests, both by themselves and versus genuine random sequences. We show the good quality of the generated sequences. The software, complete with examples and documentation, is freely available to users from: http://www.introni.it/en/software.

  7. SeqEntropy: genome-wide assessment of repeats for short read sequencing.

    Directory of Open Access Journals (Sweden)

    Hsueh-Ting Chu

    Full Text Available BACKGROUND: Recent studies on genome assembly from short-read sequencing data reported the limitation of this technology to reconstruct the entire genome even at very high depth coverage. We investigated the limitation from the perspective of information theory to evaluate the effect of repeats on short-read genome assembly using idealized (error-free reads at different lengths. METHODOLOGY/PRINCIPAL FINDINGS: We define a metric H(k to be the entropy of sequencing reads at a read length k and use the relative loss of entropy ΔH(k to measure the impact of repeats for the reconstruction of whole-genome from sequences of length k. In our experiments, we found that entropy loss correlates well with de-novo assembly coverage of a genome, and a score of ΔH(k>1% indicates a severe loss in genome reconstruction fidelity. The minimal read lengths to achieve ΔH(k<1% are different for various organisms and are independent of the genome size. For example, in order to meet the threshold of ΔH(k<1%, a read length of 60 bp is needed for the sequencing of human genome (3.2 10(9 bp and 320 bp for the sequencing of fruit fly (1.8×10(8 bp. We also calculated the ΔH(k scores for 2725 prokaryotic chromosomes and plasmids at several read lengths. Our results indicate that the levels of repeats in different genomes are diverse and the entropy of sequencing reads provides a measurement for the repeat structures. CONCLUSIONS/SIGNIFICANCE: The proposed entropy-based measurement, which can be calculated in seconds to minutes in most cases, provides a rapid quantitative evaluation on the limitation of idealized short-read genome sequencing. Moreover, the calculation can be parallelized to scale up to large euakryotic genomes. This approach may be useful to tune the sequencing parameters to achieve better genome assemblies when a closely related genome is already available.

  8. Perspective on sequence evolution of microsatellite locus (CCGn in Rv0050 gene from Mycobacterium tuberculosis

    Directory of Open Access Journals (Sweden)

    Jin Ruiliang

    2011-08-01

    Full Text Available Abstract Background The mycobacterial genome is inclined to polymerase slippage and a high mutation rate in microsatellite regions due to high GC content and absence of a mismatch repair system. However, the exact molecular mechanisms underlying microsatellite variation have not been fully elucidated. Here, we investigated mutation events in the hyper-variable trinucleotide microsatellite locus MML0050 located in the Rv0050 gene of W-Beijing and non-W-Beijing Mycobacterium tuberculosis strains in order to gain insight into the genomic structure and activity of repeated regions. Results Size analysis indicated the presence of five alleles that differed in length by three base pairs. Moreover, nucleotide gains occurred more frequently than loses in this trinucleotide microsatellite. Mutation frequency was not completely related with the total length, though the relative frequency in the longest allele was remarkably higher than that in the shortest. Sequence analysis was able to detect seven alleles and revealed that point mutations enhanced the level of locus variation. Introduction of an interruptive motif correlated with the total allele length and genetic lineage, rather than the length of the longest stretch of perfect repeats. Finally, the level of locus variation was drastically different between the two genetic lineages. Conclusion The Rv0050 locus encodes the bifunctional penicillin-binding protein ponA1 and is essential to mycobacterial survival. Our investigations of this particularly dynamic genomic region provide insights into the overall mode of microsatellite evolution. Specifically, replication slippage was implicated in the mutational process of this microsatellite and a sequence-based genetic analysis was necessary to determine that point mutation events acted to maintain microsatellite size integrity while providing genomic diversity.

  9. Intraflagellar transport particle size scales inversely with flagellar length: revisiting the balance-point length control model.

    Science.gov (United States)

    Engel, Benjamin D; Ludington, William B; Marshall, Wallace F

    2009-10-05

    The assembly and maintenance of eukaryotic flagella are regulated by intraflagellar transport (IFT), the bidirectional traffic of IFT particles (recently renamed IFT trains) within the flagellum. We previously proposed the balance-point length control model, which predicted that the frequency of train transport should decrease as a function of flagellar length, thus modulating the length-dependent flagellar assembly rate. However, this model was challenged by the differential interference contrast microscopy observation that IFT frequency is length independent. Using total internal reflection fluorescence microscopy to quantify protein traffic during the regeneration of Chlamydomonas reinhardtii flagella, we determined that anterograde IFT trains in short flagella are composed of more kinesin-associated protein and IFT27 proteins than trains in long flagella. This length-dependent remodeling of train size is consistent with the kinetics of flagellar regeneration and supports a revised balance-point model of flagellar length control in which the size of anterograde IFT trains tunes the rate of flagellar assembly.

  10. Detection of M-Sequences from Spike Sequence in Neuronal Networks

    Directory of Open Access Journals (Sweden)

    Yoshi Nishitani

    2012-01-01

    Full Text Available In circuit theory, it is well known that a linear feedback shift register (LFSR circuit generates pseudorandom bit sequences (PRBS, including an M-sequence with the maximum period of length. In this study, we tried to detect M-sequences known as a pseudorandom sequence generated by the LFSR circuit from time series patterns of stimulated action potentials. Stimulated action potentials were recorded from dissociated cultures of hippocampal neurons grown on a multielectrode array. We could find several M-sequences from a 3-stage LFSR circuit (M3. These results show the possibility of assembling LFSR circuits or its equivalent ones in a neuronal network. However, since the M3 pattern was composed of only four spike intervals, the possibility of an accidental detection was not zero. Then, we detected M-sequences from random spike sequences which were not generated from an LFSR circuit and compare the result with the number of M-sequences from the originally observed raster data. As a result, a significant difference was confirmed: a greater number of “0–1” reversed the 3-stage M-sequences occurred than would have accidentally be detected. This result suggests that some LFSR equivalent circuits are assembled in neuronal networks.

  11. A Note on Sequence Prediction over Large Alphabets

    Directory of Open Access Journals (Sweden)

    Travis Gagie

    2012-02-01

    Full Text Available Building on results from data compression, we prove nearly tight bounds on how well sequences of length n can be predicted in terms of the size σ of the alphabet and the length k of the context considered when making predictions. We compare the performance achievable by an adaptive predictor with no advance knowledge of the sequence, to the performance achievable by the optimal static predictor using a table listing the frequency of each (k + 1-tuple in the sequence. We show that, if the elements of the sequence are chosen uniformly at random, then an adaptive predictor can compete in the expected case if k ≤ logσ n – 3 – ε, for a constant ε > 0, but not if k ≥ logσ n.

  12. Hibiscus latent Fort Pierce virus in Brazil and synthesis of its biologically active full-length cDNA clone.

    Science.gov (United States)

    Gao, Ruimin; Niu, Shengniao; Dai, Weifang; Kitajima, Elliot; Wong, Sek-Man

    2016-10-01

    A Brazilian isolate of Hibiscus latent Fort Pierce virus (HLFPV-BR) was firstly found in a hibiscus plant in Limeira, SP, Brazil. RACE PCR was carried out to obtain the full-length sequences of HLFPV-BR which is 6453 nucleotides and has more than 99.15 % of complete genomic RNA nucleotide sequence identity with that of HLFPV Japanese isolate. The genomic structure of HLFPV-BR is similar to other tobamoviruses. It includes a 5' untranslated region (UTR), followed by open reading frames encoding for a 128-kDa protein and a 188-kDa readthrough protein, a 38-kDa movement protein, 18-kDa coat protein, and a 3' UTR. Interestingly, the unique feature of poly(A) tract is also found within its 3'-UTR. Furthermore, from the total RNA extracted from the local lesions of HLFPV-BR-infected Chenopodium quinoa leaves, a biologically active, full-length cDNA clone encompassing the genome of HLFPV-BR was amplified and placed adjacent to a T7 RNA polymerase promoter. The capped in vitro transcripts from the cloned cDNA were infectious when mechanically inoculated into C. quinoa and Nicotiana benthamiana plants. This is the first report of the presence of an isolate of HLFPV in Brazil and the successful synthesis of a biologically active HLFPV-BR full-length cDNA clone.

  13. In vivo myograph measurement of muscle contraction at optimal length

    Directory of Open Access Journals (Sweden)

    Ahmed Aminul

    2007-01-01

    Full Text Available Abstract Background Current devices for measuring muscle contraction in vivo have limited accuracy in establishing and re-establishing the optimum muscle length. They are variable in the reproducibility to determine the muscle contraction at this length, and often do not maintain precise conditions during the examination. Consequently, for clinical testing only semi-quantitative methods have been used. Methods We present a newly developed myograph, an accurate measuring device for muscle contraction, consisting of three elements. Firstly, an element for adjusting the axle of the device and the physiological axis of muscle contraction; secondly, an element to accurately position and reposition the extremity of the muscle; and thirdly, an element for the progressive pre-stretching and isometric locking of the target muscle. Thus it is possible to examine individual in vivo muscles in every pre-stretched, specified position, to maintain constant muscle-length conditions, and to accurately re-establish the conditions of the measurement process at later sessions. Results In a sequence of experiments the force of contraction of the muscle at differing stretching lengths were recorded and the forces determined. The optimum muscle length for maximal force of contraction was established. In a following sequence of experiments with smaller graduations around this optimal stretching length an increasingly accurate optimum muscle length for maximal force of contraction was determined. This optimum length was also accurately re-established at later sessions. Conclusion We have introduced a new technical solution for valid, reproducible in vivo force measurements on every possible point of the stretching curve. Thus it should be possible to study the muscle contraction in vivo to the same level of accuracy as is achieved in tests with in vitro organ preparations.

  14. Complete nucleotide sequences of avian metapneumovirus subtype B genome.

    Science.gov (United States)

    Sugiyama, Miki; Ito, Hiroshi; Hata, Yusuke; Ono, Eriko; Ito, Toshihiro

    2010-12-01

    Complete nucleotide sequences were determined for subtype B avian metapneumovirus (aMPV), the attenuated vaccine strain VCO3/50 and its parental pathogenic strain VCO3/60616. The genomes of both strains comprised 13,508 nucleotides (nt), with a 42-nt leader at the 3'-end and a 46-nt trailer at the 5'-end. The genome contains eight genes in the order 3'-N-P-M-F-M2-SH-G-L-5', which is the same order shown in the other metapneumoviruses. The genes are flanked on either side by conserved transcriptional start and stop signals and have intergenic sequences varying in length from 1 to 88 nt. Comparison of nt and predicted amino acid (aa) sequences of VCO3/60616 with those of other metapneumoviruses revealed higher homology with aMPV subtype A virus than with other metapneumoviruses. A total of 18 nt and 10 deduced aa differences were seen between the strains, and one or a combination of several differences could be associated with attenuation of VCO3/50.

  15. Relationships of body lengths with mouth opening and prey length of nemipterid fishes (Regan, 1913 in the Gulf of Thailand

    Directory of Open Access Journals (Sweden)

    Mithun Paul

    2017-12-01

    Full Text Available This study aims to investigate the relationship among total length (TL of fish with mouth opening namely horizontal opening (MH, vertical opening (VH, mouth area (MA and fork length (FL of seven sympatric nemipterid fish species and to know the relationship between total length and consumed prey length of five sympatric species sampled from the Gulf of Thailand in 2015. A total 883 fish were investigated collected from both cruise surveys and fishing port survey. TL was linearly and log-linearly related with both MV and MH for three and four species, respectively. MA’s were always the log linear relation of TL and shapes were nearly oval for all species. FL in all TL-FL relationships were proportional to the TL’s in all species (r2 = 0.94, P  .5 and in invertebrate prey items for N. tambuloides (P > .5. So, this study clearly confirms that nemipterid fishes of different sizes feed on all different specific prey items according to its own body size and feed according to size class for prey items available nearby.

  16. Tandemly repeated sequence in 5'end of mtDNA control region of ...

    African Journals Online (AJOL)

    Extensive length variability was observed in 5' end sequence of the mitochondrial DNA control region of the Japanese Spanish mackerel (Scomberomorus niphonius). This length variability was due to the presence of varying numbers of a 56-bp tandemly repeated sequence and a 46-bp insertion/deletion (indel).

  17. [Complete genome sequencing and sequence analysis of BCG Tice].

    Science.gov (United States)

    Wang, Zhiming; Pan, Yuanlong; Wu, Jun; Zhu, Baoli

    2012-10-04

    The objective of this study is to obtain the complete genome sequence of Bacillus Calmette-Guerin Tice (BCG Tice), in order to provide more information about the molecular biology of BCG Tice and design more reasonable vaccines to prevent tuberculosis. We assembled the data from high-throughput sequencing with SOAPdenovo software, with many contigs and scaffolds obtained. There are many sequence gaps and physical gaps remained as a result of regional low coverage and low quality. We designed primers at the end of contigs and performed PCR amplification in order to link these contigs and scaffolds. With various enzymes to perform PCR amplification, adjustment of PCR reaction conditions, and combined with clone construction to sequence, all the gaps were finished. We obtained the complete genome sequence of BCG Tice and submitted it to GenBank of National Center for Biotechnology Information (NCBI). The genome of BCG Tice is 4334064 base pairs in length, with GC content 65.65%. The problems and strategies during the finishing step of BCG Tice sequencing are illuminated here, with the hope of affording some experience to those who are involved in the finishing step of genome sequencing. The microarray data were verified by our results.

  18. Universal and idiosyncratic characteristic lengths in bacterial genomes

    Science.gov (United States)

    Junier, Ivan; Frémont, Paul; Rivoire, Olivier

    2018-05-01

    In condensed matter physics, simplified descriptions are obtained by coarse-graining the features of a system at a certain characteristic length, defined as the typical length beyond which some properties are no longer correlated. From a physics standpoint, in vitro DNA has thus a characteristic length of 300 base pairs (bp), the Kuhn length of the molecule beyond which correlations in its orientations are typically lost. From a biology standpoint, in vivo DNA has a characteristic length of 1000 bp, the typical length of genes. Since bacteria live in very different physico-chemical conditions and since their genomes lack translational invariance, whether larger, universal characteristic lengths exist is a non-trivial question. Here, we examine this problem by leveraging the large number of fully sequenced genomes available in public databases. By analyzing GC content correlations and the evolutionary conservation of gene contexts (synteny) in hundreds of bacterial chromosomes, we conclude that a fundamental characteristic length around 10–20 kb can be defined. This characteristic length reflects elementary structures involved in the coordination of gene expression, which are present all along the genome of nearly all bacteria. Technically, reaching this conclusion required us to implement methods that are insensitive to the presence of large idiosyncratic genomic features, which may co-exist along these fundamental universal structures.

  19. Mapping of a Novel Race Specific Resistance Gene to Phytophthora Root Rot of Pepper (Capsicum annuum) Using Bulked Segregant Analysis Combined with Specific Length Amplified Fragment Sequencing Strategy.

    Science.gov (United States)

    Xu, Xiaomei; Chao, Juan; Cheng, Xueli; Wang, Rui; Sun, Baojuan; Wang, Hengming; Luo, Shaobo; Xu, Xiaowan; Wu, Tingquan; Li, Ying

    2016-01-01

    Phytophthora root rot caused by Phytophthora capsici (P. capsici) is a serious limitation to pepper production in Southern China, with high temperature and humidity. Mapping PRR resistance genes can provide linked DNA markers for breeding PRR resistant varieties by molecular marker-assisted selection (MAS). Two BC1 populations and an F2 population derived from a cross between P. capsici-resistant accession, Criollo de Morelos 334 (CM334) and P. capsici-susceptible accession, New Mexico Capsicum Accession 10399 (NMCA10399) were used to investigate the genetic characteristics of PRR resistance. PRR resistance to isolate Byl4 (race 3) was controlled by a single dominant gene, PhR10, that was mapped to an interval of 16.39Mb at the end of the long arm of chromosome 10. Integration of bulked segregant analysis (BSA) and Specific Length Amplified Fragment sequencing (SLAF-seq) provided an efficient genetic mapping strategy. Ten polymorphic Simple Sequence Repeat (SSR) markers were found within this region and used to screen the genotypes of 636 BC1 plants, delimiting PhR10 to a 2.57 Mb interval between markers P52-11-21 (1.5 cM away) and P52-11-41 (1.1 cM). A total of 163 genes were annotated within this region and 31 were predicted to be associated with disease resistance. PhR10 is a novel race specific gene for PRR, and this paper describes linked SSR markers suitable for marker-assisted selection of PRR resistant varieties, also laying a foundation for cloning the resistance gene.

  20. Mapping of a Novel Race Specific Resistance Gene to Phytophthora Root Rot of Pepper (Capsicum annuum Using Bulked Segregant Analysis Combined with Specific Length Amplified Fragment Sequencing Strategy.

    Directory of Open Access Journals (Sweden)

    Xiaomei Xu

    Full Text Available Phytophthora root rot caused by Phytophthora capsici (P. capsici is a serious limitation to pepper production in Southern China, with high temperature and humidity. Mapping PRR resistance genes can provide linked DNA markers for breeding PRR resistant varieties by molecular marker-assisted selection (MAS. Two BC1 populations and an F2 population derived from a cross between P. capsici-resistant accession, Criollo de Morelos 334 (CM334 and P. capsici-susceptible accession, New Mexico Capsicum Accession 10399 (NMCA10399 were used to investigate the genetic characteristics of PRR resistance. PRR resistance to isolate Byl4 (race 3 was controlled by a single dominant gene, PhR10, that was mapped to an interval of 16.39Mb at the end of the long arm of chromosome 10. Integration of bulked segregant analysis (BSA and Specific Length Amplified Fragment sequencing (SLAF-seq provided an efficient genetic mapping strategy. Ten polymorphic Simple Sequence Repeat (SSR markers were found within this region and used to screen the genotypes of 636 BC1 plants, delimiting PhR10 to a 2.57 Mb interval between markers P52-11-21 (1.5 cM away and P52-11-41 (1.1 cM. A total of 163 genes were annotated within this region and 31 were predicted to be associated with disease resistance. PhR10 is a novel race specific gene for PRR, and this paper describes linked SSR markers suitable for marker-assisted selection of PRR resistant varieties, also laying a foundation for cloning the resistance gene.

  1. Compressing DNA sequence databases with coil

    Directory of Open Access Journals (Sweden)

    Hendy Michael D

    2008-05-01

    Full Text Available Abstract Background Publicly available DNA sequence databases such as GenBank are large, and are growing at an exponential rate. The sheer volume of data being dealt with presents serious storage and data communications problems. Currently, sequence data is usually kept in large "flat files," which are then compressed using standard Lempel-Ziv (gzip compression – an approach which rarely achieves good compression ratios. While much research has been done on compressing individual DNA sequences, surprisingly little has focused on the compression of entire databases of such sequences. In this study we introduce the sequence database compression software coil. Results We have designed and implemented a portable software package, coil, for compressing and decompressing DNA sequence databases based on the idea of edit-tree coding. coil is geared towards achieving high compression ratios at the expense of execution time and memory usage during compression – the compression time represents a "one-off investment" whose cost is quickly amortised if the resulting compressed file is transmitted many times. Decompression requires little memory and is extremely fast. We demonstrate a 5% improvement in compression ratio over state-of-the-art general-purpose compression tools for a large GenBank database file containing Expressed Sequence Tag (EST data. Finally, coil can efficiently encode incremental additions to a sequence database. Conclusion coil presents a compelling alternative to conventional compression of flat files for the storage and distribution of DNA sequence databases having a narrow distribution of sequence lengths, such as EST data. Increasing compression levels for databases having a wide distribution of sequence lengths is a direction for future work.

  2. A new set of ESTs and cDNA clones from full-length and normalized libraries for gene discovery and functional characterization in citrus

    Directory of Open Access Journals (Sweden)

    Alamar Santiago

    2009-09-01

    Full Text Available Abstract Background Interpretation of ever-increasing raw sequence information generated by modern genome sequencing technologies faces multiple challenges, such as gene function analysis and genome annotation. Indeed, nearly 40% of genes in plants encode proteins of unknown function. Functional characterization of these genes is one of the main challenges in modern biology. In this regard, the availability of full-length cDNA clones may fill in the gap created between sequence information and biological knowledge. Full-length cDNA clones facilitate functional analysis of the corresponding genes enabling manipulation of their expression in heterologous systems and the generation of a variety of tagged versions of the native protein. In addition, the development of full-length cDNA sequences has the power to improve the quality of genome annotation. Results We developed an integrated method to generate a new normalized EST collection enriched in full-length and rare transcripts of different citrus species from multiple tissues and developmental stages. We constructed a total of 15 cDNA libraries, from which we isolated 10,898 high-quality ESTs representing 6142 different genes. Percentages of redundancy and proportion of full-length clones range from 8 to 33, and 67 to 85, respectively, indicating good efficiency of the approach employed. The new EST collection adds 2113 new citrus ESTs, representing 1831 unigenes, to the collection of citrus genes available in the public databases. To facilitate functional analysis, cDNAs were introduced in a Gateway-based cloning vector for high-throughput functional analysis of genes in planta. Herein, we describe the technical methods used in the library construction, sequence analysis of clones and the overexpression of CitrSEP, a citrus homolog to the Arabidopsis SEP3 gene, in Arabidopsis as an example of a practical application of the engineered Gateway vector for functional analysis. Conclusion The new

  3. A new set of ESTs and cDNA clones from full-length and normalized libraries for gene discovery and functional characterization in citrus

    Science.gov (United States)

    Marques, M Carmen; Alonso-Cantabrana, Hugo; Forment, Javier; Arribas, Raquel; Alamar, Santiago; Conejero, Vicente; Perez-Amador, Miguel A

    2009-01-01

    Background Interpretation of ever-increasing raw sequence information generated by modern genome sequencing technologies faces multiple challenges, such as gene function analysis and genome annotation. Indeed, nearly 40% of genes in plants encode proteins of unknown function. Functional characterization of these genes is one of the main challenges in modern biology. In this regard, the availability of full-length cDNA clones may fill in the gap created between sequence information and biological knowledge. Full-length cDNA clones facilitate functional analysis of the corresponding genes enabling manipulation of their expression in heterologous systems and the generation of a variety of tagged versions of the native protein. In addition, the development of full-length cDNA sequences has the power to improve the quality of genome annotation. Results We developed an integrated method to generate a new normalized EST collection enriched in full-length and rare transcripts of different citrus species from multiple tissues and developmental stages. We constructed a total of 15 cDNA libraries, from which we isolated 10,898 high-quality ESTs representing 6142 different genes. Percentages of redundancy and proportion of full-length clones range from 8 to 33, and 67 to 85, respectively, indicating good efficiency of the approach employed. The new EST collection adds 2113 new citrus ESTs, representing 1831 unigenes, to the collection of citrus genes available in the public databases. To facilitate functional analysis, cDNAs were introduced in a Gateway-based cloning vector for high-throughput functional analysis of genes in planta. Herein, we describe the technical methods used in the library construction, sequence analysis of clones and the overexpression of CitrSEP, a citrus homolog to the Arabidopsis SEP3 gene, in Arabidopsis as an example of a practical application of the engineered Gateway vector for functional analysis. Conclusion The new EST collection denotes an

  4. On avoided words, absent words, and their application to biological sequence analysis.

    Science.gov (United States)

    Almirantis, Yannis; Charalampopoulos, Panagiotis; Gao, Jia; Iliopoulos, Costas S; Mohamed, Manal; Pissis, Solon P; Polychronopoulos, Dimitris

    2017-01-01

    The deviation of the observed frequency of a word w from its expected frequency in a given sequence x is used to determine whether or not the word is avoided . This concept is particularly useful in DNA linguistic analysis. The value of the deviation of w , denoted by [Formula: see text], effectively characterises the extent of a word by its edge contrast in the context in which it occurs. A word w of length [Formula: see text] is a [Formula: see text]-avoided word in x if [Formula: see text], for a given threshold [Formula: see text]. Notice that such a word may be completely absent from x . Hence, computing all such words naïvely can be a very time-consuming procedure, in particular for large k . In this article, we propose an [Formula: see text]-time and [Formula: see text]-space algorithm to compute all [Formula: see text]-avoided words of length k in a given sequence of length n over a fixed-sized alphabet. We also present a time-optimal [Formula: see text]-time algorithm to compute all [Formula: see text]-avoided words (of any length) in a sequence of length n over an integer alphabet of size [Formula: see text]. In addition, we provide a tight asymptotic upper bound for the number of [Formula: see text]-avoided words over an integer alphabet and the expected length of the longest one. We make available an implementation of our algorithm. Experimental results, using both real and synthetic data, show the efficiency and applicability of our implementation in biological sequence analysis. The systematic search for avoided words is particularly useful for biological sequence analysis. We present a linear-time and linear-space algorithm for the computation of avoided words of length k in a given sequence x . We suggest a modification to this algorithm so that it computes all avoided words of x , irrespective of their length, within the same time complexity. We also present combinatorial results with regards to avoided words and absent words.

  5. OTU analysis using metagenomic shotgun sequencing data.

    Directory of Open Access Journals (Sweden)

    Xiaolin Hao

    Full Text Available Because of technological limitations, the primer and amplification biases in targeted sequencing of 16S rRNA genes have veiled the true microbial diversity underlying environmental samples. However, the protocol of metagenomic shotgun sequencing provides 16S rRNA gene fragment data with natural immunity against the biases raised during priming and thus the potential of uncovering the true structure of microbial community by giving more accurate predictions of operational taxonomic units (OTUs. Nonetheless, the lack of statistically rigorous comparison between 16S rRNA gene fragments and other data types makes it difficult to interpret previously reported results using 16S rRNA gene fragments. Therefore, in the present work, we established a standard analysis pipeline that would help confirm if the differences in the data are true or are just due to potential technical bias. This pipeline is built by using simulated data to find optimal mapping and OTU prediction methods. The comparison between simulated datasets revealed a relationship between 16S rRNA gene fragments and full-length 16S rRNA sequences that a 16S rRNA gene fragment having a length >150 bp provides the same accuracy as a full-length 16S rRNA sequence using our proposed pipeline, which could serve as a good starting point for experimental design and making the comparison between 16S rRNA gene fragment-based and targeted 16S rRNA sequencing-based surveys possible.

  6. RUNX2 tandem repeats and the evolution of facial length in placental mammals

    Directory of Open Access Journals (Sweden)

    Pointer Marie A

    2012-06-01

    Full Text Available Abstract Background When simple sequence repeats are integrated into functional genes, they can potentially act as evolutionary ‘tuning knobs’, supplying abundant genetic variation with minimal risk of pleiotropic deleterious effects. The genetic basis of variation in facial shape and length represents a possible example of this phenomenon. Runt-related transcription factor 2 (RUNX2, which is involved in osteoblast differentiation, contains a functionally-important tandem repeat of glutamine and alanine amino acids. The ratio of glutamines to alanines (the QA ratio in this protein seemingly influences the regulation of bone development. Notably, in domestic breeds of dog, and in carnivorans in general, the ratio of glutamines to alanines is strongly correlated with facial length. Results In this study we examine whether this correlation holds true across placental mammals, particularly those mammals for which facial length is highly variable and related to adaptive behavior and lifestyle (e.g., primates, afrotherians, xenarthrans. We obtained relative facial length measurements and RUNX2 sequences for 41 mammalian species representing 12 orders. Using both a phylogenetic generalized least squares model and a recently-developed Bayesian comparative method, we tested for a correlation between genetic and morphometric data while controlling for phylogeny, evolutionary rates, and divergence times. Non-carnivoran taxa generally had substantially lower glutamine-alanine ratios than carnivorans (primates and xenarthrans with means of 1.34 and 1.25, respectively, compared to a mean of 3.1 for carnivorans, and we found no correlation between RUNX2 sequence and face length across placental mammals. Conclusions Results of our diverse comparative phylogenetic analyses indicate that QA ratio does not consistently correlate with face length across the 41 mammalian taxa considered. Thus, although RUNX2 might function as a ‘tuning knob’ modifying face

  7. Unified Deep Learning Architecture for Modeling Biology Sequence.

    Science.gov (United States)

    Wu, Hongjie; Cao, Chengyuan; Xia, Xiaoyan; Lu, Qiang

    2017-10-09

    Prediction of the spatial structure or function of biological macromolecules based on their sequence remains an important challenge in bioinformatics. When modeling biological sequences using traditional sequencing models, characteristics, such as long-range interactions between basic units, the complicated and variable output of labeled structures, and the variable length of biological sequences, usually lead to different solutions on a case-by-case basis. This study proposed the use of bidirectional recurrent neural networks based on long short-term memory or a gated recurrent unit to capture long-range interactions by designing the optional reshape operator to adapt to the diversity of the output labels and implementing a training algorithm to support the training of sequence models capable of processing variable-length sequences. Additionally, the merge and pooling operators enhanced the ability to capture short-range interactions between basic units of biological sequences. The proposed deep-learning model and its training algorithm might be capable of solving currently known biological sequence-modeling problems through the use of a unified framework. We validated our model on one of the most difficult biological sequence-modeling problems currently known, with our results indicating the ability of the model to obtain predictions of protein residue interactions that exceeded the accuracy of current popular approaches by 10% based on multiple benchmarks.

  8. Direct comparisons of Illumina vs. Roche 454 sequencing technologies on the same microbial community DNA sample.

    Science.gov (United States)

    Luo, Chengwei; Tsementzi, Despina; Kyrpides, Nikos; Read, Timothy; Konstantinidis, Konstantinos T

    2012-01-01

    Next-generation sequencing (NGS) is commonly used in metagenomic studies of complex microbial communities but whether or not different NGS platforms recover the same diversity from a sample and their assembled sequences are of comparable quality remain unclear. We compared the two most frequently used platforms, the Roche 454 FLX Titanium and the Illumina Genome Analyzer (GA) II, on the same DNA sample obtained from a complex freshwater planktonic community. Despite the substantial differences in read length and sequencing protocols, the platforms provided a comparable view of the community sampled. For instance, derived assemblies overlapped in ~90% of their total sequences and in situ abundances of genes and genotypes (estimated based on sequence coverage) correlated highly between the two platforms (R(2)>0.9). Evaluation of base-call error, frameshift frequency, and contig length suggested that Illumina offered equivalent, if not better, assemblies than Roche 454. The results from metagenomic samples were further validated against DNA samples of eighteen isolate genomes, which showed a range of genome sizes and G+C% content. We also provide quantitative estimates of the errors in gene and contig sequences assembled from datasets characterized by different levels of complexity and G+C% content. For instance, we noted that homopolymer-associated, single-base errors affected ~1% of the protein sequences recovered in Illumina contigs of 10× coverage and 50% G+C; this frequency increased to ~3% when non-homopolymer errors were also considered. Collectively, our results should serve as a useful practical guide for choosing proper sampling strategies and data possessing protocols for future metagenomic studies.

  9. ABS: Sequence alignment by scanning

    KAUST Repository

    Bonny, Mohamed Talal; Salama, Khaled N.

    2011-01-01

    Sequence alignment is an essential tool in almost any computational biology research. It processes large database sequences and considered to be high consumers of computation time. Heuristic algorithms are used to get approximate but fast results. We introduce fast alignment algorithm, called Alignment By Scanning (ABS), to provide an approximate alignment of two DNA sequences. We compare our algorithm with the well-known alignment algorithms, the FASTA (which is heuristic) and the 'Needleman-Wunsch' (which is optimal). The proposed algorithm achieves up to 76% enhancement in alignment score when it is compared with the FASTA Algorithm. The evaluations are conducted using different lengths of DNA sequences. © 2011 IEEE.

  10. ABS: Sequence alignment by scanning

    KAUST Repository

    Bonny, Mohamed Talal

    2011-08-01

    Sequence alignment is an essential tool in almost any computational biology research. It processes large database sequences and considered to be high consumers of computation time. Heuristic algorithms are used to get approximate but fast results. We introduce fast alignment algorithm, called Alignment By Scanning (ABS), to provide an approximate alignment of two DNA sequences. We compare our algorithm with the well-known alignment algorithms, the FASTA (which is heuristic) and the \\'Needleman-Wunsch\\' (which is optimal). The proposed algorithm achieves up to 76% enhancement in alignment score when it is compared with the FASTA Algorithm. The evaluations are conducted using different lengths of DNA sequences. © 2011 IEEE.

  11. Fast global sequence alignment technique

    KAUST Repository

    Bonny, Mohamed Talal

    2011-11-01

    Bioinformatics database is growing exponentially in size. Processing these large amount of data may take hours of time even if super computers are used. One of the most important processing tool in Bioinformatics is sequence alignment. We introduce fast alignment algorithm, called \\'Alignment By Scanning\\' (ABS), to provide an approximate alignment of two DNA sequences. We compare our algorithm with the wellknown sequence alignment algorithms, the \\'GAP\\' (which is heuristic) and the \\'Needleman-Wunsch\\' (which is optimal). The proposed algorithm achieves up to 51% enhancement in alignment score when it is compared with the GAP Algorithm. The evaluations are conducted using different lengths of DNA sequences. © 2011 IEEE.

  12. The peculiar landscape of repetitive sequences in the olive (Olea europaea L.) genome.

    Science.gov (United States)

    Barghini, Elena; Natali, Lucia; Cossu, Rosa Maria; Giordani, Tommaso; Pindo, Massimo; Cattonaro, Federica; Scalabrin, Simone; Velasco, Riccardo; Morgante, Michele; Cavallini, Andrea

    2014-04-01

    Analyzing genome structure in different species allows to gain an insight into the evolution of plant genome size. Olive (Olea europaea L.) has a medium-sized haploid genome of 1.4 Gb, whose structure is largely uncharacterized, despite the growing importance of this tree as oil crop. Next-generation sequencing technologies and different computational procedures have been used to study the composition of the olive genome and its repetitive fraction. A total of 2.03 and 2.3 genome equivalents of Illumina and 454 reads from genomic DNA, respectively, were assembled following different procedures, which produced more than 200,000 differently redundant contigs, with mean length higher than 1,000 nt. Mapping Illumina reads onto the assembled sequences was used to estimate their redundancy. The genome data set was subdivided into highly and medium redundant and nonredundant contigs. By combining identification and mapping of repeated sequences, it was established that tandem repeats represent a very large portion of the olive genome (∼31% of the whole genome), consisting of six main families of different length, two of which were first discovered in these experiments. The other large redundant class in the olive genome is represented by transposable elements (especially long terminal repeat-retrotransposons). On the whole, the results of our analyses show the peculiar landscape of the olive genome, related to the massive amplification of tandem repeats, more than that reported for any other sequenced plant genome.

  13. Complete mitochondrial genome sequence of the hedgehog seahorse Hippocampus spinosissimus Weber, 1933 (Gasterosteiformes:Syngnathidae).

    Science.gov (United States)

    Liu, Shuaishuai; Zhang, Yanhong; Wang, Changming; Lin, Qiang

    2016-07-01

    The complete mitochondrial genome sequence of the hedgehog seahorse Hippocampus spinosissimus was first determined in this article. The total length of H. spinosissimus mitogenome is 16 527 bp and consists of 13 protein-coding genes, 2 rRNA genes, 22 tRNA genes and 1 control region. The gene order and composition of H. spinosissimus were similar to those of most other vertebrates. The overall base composition of H. spinosissimus is 32.1% A, 30.3% T, 14.9% G and 22.7% C, with a slight A + T-rich feature (62.4%). Phylogenetic analyses based on complete mitochondrial genome sequence showed that H. spinosissimus has a close genetic relationship to H. ingens and H. kuda.

  14. Genome Sequence Databases (Overview): Sequencing and Assembly

    Energy Technology Data Exchange (ETDEWEB)

    Lapidus, Alla L.

    2009-01-01

    From the date its role in heredity was discovered, DNA has been generating interest among scientists from different fields of knowledge: physicists have studied the three dimensional structure of the DNA molecule, biologists tried to decode the secrets of life hidden within these long molecules, and technologists invent and improve methods of DNA analysis. The analysis of the nucleotide sequence of DNA occupies a special place among the methods developed. Thanks to the variety of sequencing technologies available, the process of decoding the sequence of genomic DNA (or whole genome sequencing) has become robust and inexpensive. Meanwhile the assembly of whole genome sequences remains a challenging task. In addition to the need to assemble millions of DNA fragments of different length (from 35 bp (Solexa) to 800 bp (Sanger)), great interest in analysis of microbial communities (metagenomes) of different complexities raises new problems and pushes some new requirements for sequence assembly tools to the forefront. The genome assembly process can be divided into two steps: draft assembly and assembly improvement (finishing). Despite the fact that automatically performed assembly (or draft assembly) is capable of covering up to 98% of the genome, in most cases, it still contains incorrectly assembled reads. The error rate of the consensus sequence produced at this stage is about 1/2000 bp. A finished genome represents the genome assembly of much higher accuracy (with no gaps or incorrectly assembled areas) and quality ({approx}1 error/10,000 bp), validated through a number of computer and laboratory experiments.

  15. Cytomegalovirus sequence variability, amplicon length, and DNase-sensitive non-encapsidated genomes are obstacles to standardization and commutability of plasma viral load results.

    Science.gov (United States)

    Naegele, Klaudia; Lautenschlager, Irmeli; Gosert, Rainer; Loginov, Raisa; Bir, Katia; Helanterä, Ilkka; Schaub, Stefan; Khanna, Nina; Hirsch, Hans H

    2018-04-22

    Cytomegalovirus (CMV) management post-transplantation relies on quantification in blood, but inter-laboratory and inter-assay variability impairs commutability. An international multicenter study demonstrated that variability is mitigated by standardizing plasma volumes, automating DNA extraction and amplification, and calibration to the 1st-CMV-WHO-International-Standard as in the FDA-approved Roche-CAP/CTM-CMV. However, Roche-CAP/CTM-CMV showed under-quantification and false-negative results in a quality assurance program (UK-NEQAS-2014). To evaluate factors contributing to quantification variability of CMV viral load and to develop optimized CMV-UL54-QNAT. The UL54 target of the UK-NEQAS-2014 variant was sequenced and compared to 329 available CMV GenBank sequences. Four Basel-CMV-UL54-QNAT assays of 361 bp, 254 bp, 151 bp, and 95 bp amplicons were developed that only differed in reverse primer positions. The assays were validated using plasmid dilutions, UK-NEQAS-2014 sample, as well as 107 frozen and 69 prospectively collected plasma samples from transplant patients submitted for CMV QNAT, with and without DNase-digestion prior to nucleic acid extraction. Eight of 43 mutations were identified as relevant in the UK-NEQAS-2014 target. All Basel-CMV-UL54 QNATs quantified the UK-NEQAS-2014 but revealed 10-fold increasing CMV loads as amplicon size decreased. The inverse correlation of amplicon size and viral loads was confirmed using 1st-WHO-International-Standard and patient samples. DNase pre-treatment reduced plasma CMV loads by >90% indicating the presence of unprotected CMV genomic DNA. Sequence variability, amplicon length, and non-encapsidated genomes obstruct standardization and commutability of CMV loads needed to develop thresholds for clinical research and management. Besides regular sequence surveys, matrix and extraction standardization, we propose developing reference calibrators using 100 bp amplicons. Copyright © 2018 Elsevier B.V. All

  16. Non PCR-amplified Transcripts and AFLP fragments as reduced representations of the quail genome for 454 Titanium sequencing

    Directory of Open Access Journals (Sweden)

    Leterrier Christine

    2010-07-01

    Full Text Available Abstract Background SNP (Single Nucleotide Polymorphism discovery is now routinely performed using high-throughput sequencing of reduced representation libraries. Our objective was to adapt 454 GS FLX based sequencing methodologies in order to obtain the largest possible dataset from two reduced representations libraries, produced by AFLP (Amplified Fragment Length Polymorphism for genomic DNA, and EST (Expressed Sequence Tag for the transcribed fraction of the genome. Findings The expressed fraction was obtained by preparing cDNA libraries without PCR amplification from quail embryo and brain. To optimize the information content for SNP analyses, libraries were prepared from individuals selected in three quail lines and each individual in the AFLP library was tagged. Sequencing runs produced 399,189 sequence reads from cDNA and 373,484 from genomic fragments, covering close to 250 Mb of sequence in total. Conclusions Both methods used to obtain reduced representations for high-throughput sequencing were successful after several improvements. The protocols may be used for several sequencing applications, such as de novo sequencing, tagged PCR fragments or long fragment sequencing of cDNA.

  17. Order and correlations in genomic DNA sequences. The spectral approach

    International Nuclear Information System (INIS)

    Lobzin, Vasilii V; Chechetkin, Vladimir R

    2000-01-01

    The structural analysis of genomic DNA sequences is discussed in the framework of the spectral approach, which is sufficiently universal due to the reciprocal correspondence and mutual complementarity of Fourier transform length scales. The spectral characteristics of random sequences of the same nucleotide composition possess the property of self-averaging for relatively short sequences of length M≥100-300. Comparison with the characteristics of random sequences determines the statistical significance of the structural features observed. Apart from traditional applications to the search for hidden periodicities, spectral methods are also efficient in studying mutual correlations in DNA sequences. By combining spectra for structure factors and correlation functions, not only integral correlations can be estimated but also their origin identified. Using the structural spectral entropy approach, the regularity of a sequence can be quantitatively assessed. A brief introduction to the problem is also presented and other major methods of DNA sequence analysis described. (reviews of topical problems)

  18. Sequencing of the Hepatitis C Virus: A Systematic Review.

    Directory of Open Access Journals (Sweden)

    Brendan Jacka

    Full Text Available Since the identification of hepatitis C virus (HCV, viral sequencing has been important in understanding HCV classification, epidemiology, evolution, transmission clustering, treatment response and natural history. The length and diversity of the HCV genome has resulted in analysis of certain regions of the virus, however there has been little standardisation of protocols. This systematic review was undertaken to map the location and frequency of sequencing on the HCV genome in peer reviewed publications, with the aim to produce a database of sequencing primers and amplicons to inform future research. Medline and Scopus databases were searched for English language publications based on keyword/MeSH terms related to sequence analysis (9 terms or HCV (3 terms, plus "primer" as a general search term. Exclusion criteria included non-HCV research, review articles, duplicate records, and incomplete description of HCV sequencing methods. The PCR primer locations of accepted publications were noted, and purpose of sequencing was determined. A total of 450 studies were accepted from the 2099 identified, with 629 HCV sequencing amplicons identified and mapped on the HCV genome. The most commonly sequenced region was the HVR-1 region, often utilised for studies of natural history, clustering/transmission, evolution and treatment response. Studies related to genotyping/classification or epidemiology of HCV genotype generally targeted the 5'UTR, Core and NS5B regions, while treatment response/resistance was assessed mainly in the NS3-NS5B region with emphasis on the Interferon sensitivity determining region (ISDR region of NS5A. While the sequencing of HCV is generally constricted to certain regions of the HCV genome there is little consistency in the positioning of sequencing primers, with the exception of a few highly referenced manuscripts. This study demonstrates the heterogeneity of HCV sequencing, providing a comprehensive database of previously

  19. Polyadenylated Sequencing Primers Enable Complete Readability of PCR Amplicons Analyzed by Dideoxynucleotide Sequencing

    Directory of Open Access Journals (Sweden)

    Martin Beránek

    2012-01-01

    Full Text Available Dideoxynucleotide DNA sequencing is one of the principal procedures in molecular biology. Loss of an initial part of nucleotides behind the 3' end of the sequencing primer limits the readability of sequenced amplicons. We present a method which extends the readability by using sequencing primers modified by polyadenylated tails attached to their 5' ends. Performing a polymerase chain reaction, we amplified eight amplicons of six human genes (AMELX, APOE, HFE, MBL2, SERPINA1 and TGFB1 ranging from 106 bp to 680 bp. Polyadenylation of the sequencing primers minimized the loss of bases in all amplicons. Complete sequences of shorter products (AMELX 106 bp, SERPINA1 121 bp, HFE 208 bp, APOE 244 bp, MBL2 317 bp were obtained. In addition, in the case of TGFB1 products (366 bp, 432 bp, and 680 bp, respectively, the lengths of sequencing readings were significantly longer if adenylated primers were used. Thus, single strand dideoxynucleotide sequencing with adenylated primers enables complete or near complete readability of short PCR amplicons.

  20. Enriching Genomic Resources and Marker Development from Transcript Sequences of Jatropha curcas for Microgravity Studies

    Science.gov (United States)

    Tian, Wenlan; Paudel, Dev

    2017-01-01

    Jatropha (Jatropha curcas L.) is an economically important species with a great potential for biodiesel production. To enrich the jatropha genomic databases and resources for microgravity studies, we sequenced and annotated the transcriptome of jatropha and developed SSR and SNP markers from the transcriptome sequences. In total 1,714,433 raw reads with an average length of 441.2 nucleotides were generated. De novo assembling and clustering resulted in 115,611 uniquely assembled sequences (UASs) including 21,418 full-length cDNAs and 23,264 new jatropha transcript sequences. The whole set of UASs were fully annotated, out of which 59,903 (51.81%) were assigned with gene ontology (GO) term, 12,584 (10.88%) had orthologs in Eukaryotic Orthologous Groups (KOG), and 8,822 (7.63%) were mapped to 317 pathways in six different categories in Kyoto Encyclopedia of Genes and Genome (KEGG) database, and it contained 3,588 putative transcription factors. From the UASs, 9,798 SSRs were discovered with AG/CT as the most frequent (45.8%) SSR motif type. Further 38,693 SNPs were detected and 7,584 remained after filtering. This UAS set has enriched the current jatropha genomic databases and provided a large number of genetic markers, which can facilitate jatropha genetic improvement and many other genetic and biological studies. PMID:28154822

  1. An improved method for quantitatively measuring the sequences of total organic carbon and black carbon in marine sediment cores

    Science.gov (United States)

    Xu, Xiaoming; Zhu, Qing; Zhou, Qianzhi; Liu, Jinzhong; Yuan, Jianping; Wang, Jianghai

    2018-01-01

    Understanding global carbon cycle is critical to uncover the mechanisms of global warming and remediate its adverse effects on human activities. Organic carbon in marine sediments is an indispensable part of the global carbon reservoir in global carbon cycling. Evaluating such a reservoir calls for quantitative studies of marine carbon burial, which closely depend on quantifying total organic carbon and black carbon in marine sediment cores and subsequently on obtaining their high-resolution temporal sequences. However, the conventional methods for detecting the contents of total organic carbon or black carbon cannot resolve the following specific difficulties, i.e., (1) a very limited amount of each subsample versus the diverse analytical items, (2) a low and fluctuating recovery rate of total organic carbon or black carbon versus the reproducibility of carbon data, and (3) a large number of subsamples versus the rapid batch measurements. In this work, (i) adopting the customized disposable ceramic crucibles with the microporecontrolled ability, (ii) developing self-made or customized facilities for the procedures of acidification and chemothermal oxidization, and (iii) optimizing procedures and carbon-sulfur analyzer, we have built a novel Wang-Xu-Yuan method (the WXY method) for measuring the contents of total organic carbon or black carbon in marine sediment cores, which includes the procedures of pretreatment, weighing, acidification, chemothermal oxidation and quantification; and can fully meet the requirements of establishing their highresolution temporal sequences, whatever in the recovery, experimental efficiency, accuracy and reliability of the measurements, and homogeneity of samples. In particular, the usage of disposable ceramic crucibles leads to evidently simplify the experimental scenario, which further results in the very high recovery rates for total organic carbon and black carbon. This new technique may provide a significant support for

  2. Identification of the sequence variations of 15 autosomal STR loci in a Chinese population.

    Science.gov (United States)

    Chen, Wenjing; Cheng, Jianding; Ou, Xueling; Chen, Yong; Tong, Dayue; Sun, Hongyu

    2014-01-01

    DNA sequence variation including base(s) changes and insertion or deletion in the primer binding region may cause a null allele and, if this changes the length of the amplified fragment out of the allelic ladder, off-ladder (OL) alleles may be detected. In order to provide accurate and reliable DNA evidence for forensic DNA analysis, it is essential to clarify sequence variations in prevalently used STR loci. Suspected null alleles and OL alleles of PlowerPlex16® System from 21,934 unrelated Chinese individuals were verified by alternative systems and sequenced. A total of 17 cases with null alleles were identified, including 12 kinds of point mutations in 16 cases and a 19-base deletion in one case. The total frequency of null alleles was 7.751 × 10(-4). Eight hundred and forty-four OL alleles classified as being of 97 different kinds were observed at 15 STR loci of the PowerPlex®16 system except vWA. All the frequencies of OL alleles were under 0.01. Null alleles should be confirmed by alternative primers and OL alleles should be named appropriately. Particular attention should be paid to sequence variation, since incorrect designation could lead to false conclusions.

  3. Effect of day of the week of primary total hip arthroplasty on length of stay at a university-based teaching medical center.

    Science.gov (United States)

    Rathi, Pranav; Coleman, Sheldon; Durbin-Johnson, Blythe; Giordani, Mauro; Pereira, Gavin; Di Cesare, Paul E

    2014-12-01

    Length of hospital stay (LHS) after primary total hip arthroplasty (THA) constitutes a critical outcome measure, as prolonged LHS implies increased resource expenditure. Investigations have highlighted factors that affect LHS after THA. These factors include advanced age, medical comorbidities, obesity, intraoperative time, anesthesia technique, surgical site infection, and incision length. We retrospectively analyzed the effect of day of the week of primary THA on LHS. We reviewed the surgery and patient factors of 273 consecutive patients who underwent THA at our institution, a tertiary-care teaching hospital. There was a 15% increase in LHS for patients who underwent THA on Thursday versus Monday when controlling for other covariates that can affect LHS. Other statistically significant variables associated with increased LHS included American Society of Anesthesiologists grade, transfusion requirements, and postoperative complications. The day of the week of THA may be an independent variable affecting LHS. Institutions with reduced weekend resources may want to perform THA earlier in the week to try to reduce LHS.

  4. Collaborative Filtering Recommendation on Users' Interest Sequences.

    Directory of Open Access Journals (Sweden)

    Weijie Cheng

    Full Text Available As an important factor for improving recommendations, time information has been introduced to model users' dynamic preferences in many papers. However, the sequence of users' behaviour is rarely studied in recommender systems. Due to the users' unique behavior evolution patterns and personalized interest transitions among items, users' similarity in sequential dimension should be introduced to further distinguish users' preferences and interests. In this paper, we propose a new collaborative filtering recommendation method based on users' interest sequences (IS that rank users' ratings or other online behaviors according to the timestamps when they occurred. This method extracts the semantics hidden in the interest sequences by the length of users' longest common sub-IS (LCSIS and the count of users' total common sub-IS (ACSIS. Then, these semantics are utilized to obtain users' IS-based similarities and, further, to refine the similarities acquired from traditional collaborative filtering approaches. With these updated similarities, transition characteristics and dynamic evolution patterns of users' preferences are considered. Our new proposed method was compared with state-of-the-art time-aware collaborative filtering algorithms on datasets MovieLens, Flixster and Ciao. The experimental results validate that the proposed recommendation method is effective and outperforms several existing algorithms in the accuracy of rating prediction.

  5. Collaborative Filtering Recommendation on Users' Interest Sequences.

    Science.gov (United States)

    Cheng, Weijie; Yin, Guisheng; Dong, Yuxin; Dong, Hongbin; Zhang, Wansong

    2016-01-01

    As an important factor for improving recommendations, time information has been introduced to model users' dynamic preferences in many papers. However, the sequence of users' behaviour is rarely studied in recommender systems. Due to the users' unique behavior evolution patterns and personalized interest transitions among items, users' similarity in sequential dimension should be introduced to further distinguish users' preferences and interests. In this paper, we propose a new collaborative filtering recommendation method based on users' interest sequences (IS) that rank users' ratings or other online behaviors according to the timestamps when they occurred. This method extracts the semantics hidden in the interest sequences by the length of users' longest common sub-IS (LCSIS) and the count of users' total common sub-IS (ACSIS). Then, these semantics are utilized to obtain users' IS-based similarities and, further, to refine the similarities acquired from traditional collaborative filtering approaches. With these updated similarities, transition characteristics and dynamic evolution patterns of users' preferences are considered. Our new proposed method was compared with state-of-the-art time-aware collaborative filtering algorithms on datasets MovieLens, Flixster and Ciao. The experimental results validate that the proposed recommendation method is effective and outperforms several existing algorithms in the accuracy of rating prediction.

  6. Collaborative Filtering Recommendation on Users’ Interest Sequences

    Science.gov (United States)

    Cheng, Weijie; Yin, Guisheng; Dong, Yuxin; Dong, Hongbin; Zhang, Wansong

    2016-01-01

    As an important factor for improving recommendations, time information has been introduced to model users’ dynamic preferences in many papers. However, the sequence of users’ behaviour is rarely studied in recommender systems. Due to the users’ unique behavior evolution patterns and personalized interest transitions among items, users’ similarity in sequential dimension should be introduced to further distinguish users’ preferences and interests. In this paper, we propose a new collaborative filtering recommendation method based on users’ interest sequences (IS) that rank users’ ratings or other online behaviors according to the timestamps when they occurred. This method extracts the semantics hidden in the interest sequences by the length of users’ longest common sub-IS (LCSIS) and the count of users’ total common sub-IS (ACSIS). Then, these semantics are utilized to obtain users’ IS-based similarities and, further, to refine the similarities acquired from traditional collaborative filtering approaches. With these updated similarities, transition characteristics and dynamic evolution patterns of users’ preferences are considered. Our new proposed method was compared with state-of-the-art time-aware collaborative filtering algorithms on datasets MovieLens, Flixster and Ciao. The experimental results validate that the proposed recommendation method is effective and outperforms several existing algorithms in the accuracy of rating prediction. PMID:27195787

  7. Genotyping of major histocompatibility complex Class II DRB gene in Rohilkhandi goats by polymerase chain reaction-restriction fragment length polymorphism and DNA sequencing

    Directory of Open Access Journals (Sweden)

    Kush Shrivastava

    2015-10-01

    Full Text Available Aim: To study the major histocompatibility complex (MHC Class II DRB1 gene polymorphism in Rohilkhandi goat using polymerase chain reaction-restriction fragment length polymorphism (PCR-RFLP and nucleotide sequencing techniques. Materials and Methods: DNA was isolated from 127 Rohilkhandi goats maintained at sheep and goat farm, Indian Veterinary Research Institute, Izatnagar, Bareilly. A 284 bp fragment of exon 2 of DRB1 gene was amplified and digested using BsaI and TaqI restriction enzymes. Population genetic parameters were calculated using Popgene v 1.32 and SAS 9.0. The genotypes were then sequenced using Sanger dideoxy chain termination method and were compared with related breeds/species using MEGA 6.0 and Megalign (DNASTAR software. Results: TaqI locus showed three and BsaI locus showed two genotypes. Both the loci were found to be in Hardy–Weinberg equilibrium (HWE, however, population genetic parameters suggest that heterozygosity is still maintained in the population at both loci. Percent diversity and divergence matrix, as well as phylogenetic analysis revealed that the MHC Class II DRB1 gene of Rohilkhandi goats was found to be in close cluster with Garole and Scottish blackface sheep breeds as compared to other goat breeds included in the sequence comparison. Conclusion: The PCR-RFLP patterns showed population to be in HWE and absence of one genotype at one locus (BsaI, both the loci showed excess of one or the other homozygote genotype, however, effective number of alleles showed that allelic diversity is present in the population. Sequence comparison of DRB1 gene of Rohilkhandi goat with other sheep and goat breed assigned Rohilkhandi goat in divergence with Jamanupari and Angora goats.

  8. Adaptive subdivision and the length and energy of Bézier curves

    DEFF Research Database (Denmark)

    Gravesen, Jens

    1997-01-01

    It is an often used fact that the control polygon of a Bézier curve approximates the curve and that the approximation gets better when the curve is subdivided. In particular, if a Bézier curve is subdivided into some number of pieces, then the arc-length of the original curve is greater than...... the sum of the chord-lengths of the pieces, and less than the sum of the polygon-lengths of the pieces. Under repeated subdivisions, the difference between this lower and upper bound gets arbitrarily small.If $L_c$ denotes the total chord-length of the pieces and $L_p$ denotes the total polygon...... combination, and it forms the basis for a fast adaptive algorithm, which determines the arc-length of a Bézier curve.The energy of a curve is half the square of the curvature integrated with respect to arc-length. Like in the case of the arc-length, it is possible to use the chord-length and polygon...

  9. The evolutionary rates of HCV estimated with subtype 1a and 1b sequences over the ORF length and in different genomic regions.

    Directory of Open Access Journals (Sweden)

    Manqiong Yuan

    Full Text Available Considerable progress has been made in the HCV evolutionary analysis, since the software BEAST was released. However, prior information, especially the prior evolutionary rate, which plays a critical role in BEAST analysis, is always difficult to ascertain due to various uncertainties. Providing a proper prior HCV evolutionary rate is thus of great importance.176 full-length sequences of HCV subtype 1a and 144 of 1b were assembled by taking into consideration the balance of the sampling dates and the even dispersion in phylogenetic trees. According to the HCV genomic organization and biological functions, each dataset was partitioned into nine genomic regions and two routinely amplified regions. A uniform prior rate was applied to the BEAST analysis for each region and also the entire ORF. All the obtained posterior rates for 1a are of a magnitude of 10(-3 substitutions/site/year and in a bell-shaped distribution. Significantly lower rates were estimated for 1b and some of the rate distribution curves resulted in a one-sided truncation, particularly under the exponential model. This indicates that some of the rates for subtype 1b are less accurate, so they were adjusted by including more sequences to improve the temporal structure.Among the various HCV subtypes and genomic regions, the evolutionary patterns are dissimilar. Therefore, an applied estimation of the HCV epidemic history requires the proper selection of the rate priors, which should match the actual dataset so that they can fit for the subtype, the genomic region and even the length. By referencing the findings here, future evolutionary analysis of the HCV subtype 1a and 1b datasets may become more accurate and hence prove useful for tracing their patterns.

  10. High-resolution T2-weighted MR imaging of the inner ear using a long echo-train-length 3D fast spin-echo sequence

    International Nuclear Information System (INIS)

    Naganawa, S.; Yamakawa, K.; Fukatsu, H.; Ishigaki, T.; Nakashima, T.; Sugimoto, H.; Aoki, I.; Miyazaki, M.; Takai, H.

    1996-01-01

    The purpose of this study was to assess the value of a long echo-train-length 3D fast spin-echo (3D-FSE) sequence in visualizing the inner ear structures. Ten normal ears and 50 patient ears were imaged on a 1.5T MR unit using a head coil. Axial high-resolution T2-weighted images of the inner ear and the internal auditory canal (IAC) were obtained in 15 min. In normal ears the reliability of the visualization for the inner ear structures was evaluated on original images and the targeted maximum intensity projection (MIP) images of the labyrinth. In ten normal ears, 3D surface display (3D) images were also created and compared with MIP images. On the original images the cochlear aqueduct, the vessels in the vicinity of the IAC, and more than three branches of the cranial nerves were visualized in the IAC in all the ears. The visibility of the endolympathic duct was 80%. On the MIP images the visibility of the three semicircular canals, anterior and posterior ampulla, and of more than two turns of the cochlea was 100%. The MIP images and 3D images were almost comparable. The visibility of the endolymphatic duct was 80% in normal ears and 0% in the affected ears of the patients with Meniere's disease (p<0.001). In one patient ear a small intracanalicular tumor was depicted clearly. In conclusion, the long echo train length T2-weighted 3D-FSE sequence enables the detailed visualization of the tiny structures of the inner ear and the IAC within a clinically acceptable scan time. Furthermore, obtaining a high contrast between the soft/bony tissue and the cerebrospinal/endolymph/perilymph fluid would be of significant value in the diagnosis of the pathologic conditions around the labyrinth and the IAC. (orig.)

  11. EGNAS: an exhaustive DNA sequence design algorithm

    Directory of Open Access Journals (Sweden)

    Kick Alfred

    2012-06-01

    Full Text Available Abstract Background The molecular recognition based on the complementary base pairing of deoxyribonucleic acid (DNA is the fundamental principle in the fields of genetics, DNA nanotechnology and DNA computing. We present an exhaustive DNA sequence design algorithm that allows to generate sets containing a maximum number of sequences with defined properties. EGNAS (Exhaustive Generation of Nucleic Acid Sequences offers the possibility of controlling both interstrand and intrastrand properties. The guanine-cytosine content can be adjusted. Sequences can be forced to start and end with guanine or cytosine. This option reduces the risk of “fraying” of DNA strands. It is possible to limit cross hybridizations of a defined length, and to adjust the uniqueness of sequences. Self-complementarity and hairpin structures of certain length can be avoided. Sequences and subsequences can optionally be forbidden. Furthermore, sequences can be designed to have minimum interactions with predefined strands and neighboring sequences. Results The algorithm is realized in a C++ program. TAG sequences can be generated and combined with primers for single-base extension reactions, which were described for multiplexed genotyping of single nucleotide polymorphisms. Thereby, possible foldback through intrastrand interaction of TAG-primer pairs can be limited. The design of sequences for specific attachment of molecular constructs to DNA origami is presented. Conclusions We developed a new software tool called EGNAS for the design of unique nucleic acid sequences. The presented exhaustive algorithm allows to generate greater sets of sequences than with previous software and equal constraints. EGNAS is freely available for noncommercial use at http://www.chm.tu-dresden.de/pc6/EGNAS.

  12. Molecular characterizations of somatic hybrids developed between Pleurotus florida and Lentinus squarrosulus through inter-simple sequence repeat markers and sequencing of ribosomal RNA-ITS gene.

    Science.gov (United States)

    Mallick, Pijush; Chattaraj, Shruti; Sikdar, Samir Ranjan

    2017-10-01

    The 12 pfls somatic hybrids and 2 parents of Pleurotus florida and Lentinus s quarrosulus were characterized by ISSR and sequencing of rRNA-ITS genes. Five ISSR primers were used and amplified a total of 54 reproducible fragments with 98.14% polymorphism among all the pfls hybrid populations and parental strains. UPGMA-based cluster exhibited a dendrogram with three major groups between the parents and pfls hybrids. Parent P . florida and L . squarrosulus showed different degrees of genetic distance with all the hybrid lines and they showed closeness to hybrid pfls 1m and pfls 1h , respectively. ITS1(F) and ITS4(R) amplified the rRNA-ITS gene with 611-867 bp sequence length. The nucleotide polymorphisms were found in the ITS1, ITS2 and 5.8S rRNA region with different number of bases. Based on rRNA-ITS sequence, UPGMA cluster exhibited three distinct groups between L. squarrosulus and pfls 1p , pfls 1m and pfls 1s , and pfls 1e and P. florida .

  13. Sequencing by ligation variation with endonuclease V digestion and deoxyinosine-containing query oligonucleotides

    Directory of Open Access Journals (Sweden)

    Ho Antoine

    2011-12-01

    Full Text Available Abstract Background Sequencing-by-ligation (SBL is one of several next-generation sequencing methods that has been developed for massive sequencing of DNA immobilized on arrayed beads (or other clonal amplicons. SBL has the advantage of being easy to implement and accessible to all because it can be performed with off-the-shelf reagents. However, SBL has the limitation of very short read lengths. Results To overcome the read length limitation, research groups have developed complex library preparation processes, which can be time-consuming, difficult, and result in low complexity libraries. Herein we describe a variation on traditional SBL protocols that extends the number of sequential bases that can be sequenced by using Endonuclease V to nick a query primer, thus leaving a ligatable end extended into the unknown sequence for further SBL cycles. To demonstrate the protocol, we constructed a known DNA sequence and utilized our SBL variation, cyclic SBL (cSBL, to resequence this region. Using our method, we were able to read thirteen contiguous bases in the 3' - 5' direction. Conclusions Combining this read length with sequencing in the 5' - 3' direction would allow a read length of over twenty bases on a single tage. Implementing mate-paired tags and this SBL variation could enable > 95% coverage of the genome.

  14. The comparison between the length of vertical dimension of occlusion and the length of thumb on undergraduate Mongoloid students

    Directory of Open Access Journals (Sweden)

    Goh Li Teng

    2011-03-01

    Full Text Available The Thumb Rule of Leonardo da Vinci states that many proportions of the face show relationship with the length of thumb which is measured from the proximal tip of the proximal phalanx to the distal tip of the distal phalanx. Previous studies have shown that the length of the vertical dimension of occlusion (VDO is similar to the length of thumb of the Caucasoid race. The purpose of this study is to determine whether the length of VDO have correlations with the length of thumb among those of the Mongoloid race. This study took a survey method with the analytical cross-sectional approach. A total of 80 students of Faculty of Dentistry who have fulfilled all population criteria were randomly chosen to measure the length of VDO and the length of the thumb. Results analyzed with Student's t-test statistic revealed that there was a significant difference between males and females in the length of VDO and the length of the thumb, however, there was no significant difference between the length of VDO and the length of the thumb. There were very strong correlations (P<0.05 between the length of VDO and the length of the thumb. As a conclusion, the length of thumb can be suggested as an objective method to determine the length of VDO in this population.

  15. Draft genome sequence of Sclerospora graminicola, the pearl millet downy mildew pathogen

    Directory of Open Access Journals (Sweden)

    Navajeet Chakravartty

    2017-12-01

    Full Text Available Sclerospora graminicola pathogen is the most important biotic production constraints of pearl millet in India, Africa and other parts of the world. We report a de novo whole genome assembly and analysis of pathotype 1, one of the most virulent pathotypes of S. graminicola from India. The whole genome sequencing was performed by sequencing of 7.38 Gb with 73,889,924 paired end reads from the paired-end library, and 1.15 Gb with 3,851,788 reads from the mate pair library generated from Illumina HiSeq 2500 and Illumina MiSeq, respectively. A total 597,293 filtered sub reads with average read length of 6.39 Kb was generated on PACBIO RSII with P6-C4 chemistry. Assembled draft genome sequence of S. graminicola pathotype 1 was 299,901,251 bp in length, N50 of 17,909 bp with a minimum of 1 Kb scaffold size. The GC content was 47.2 % consisting of 26,786 scaffolds with longest scaffold size of 238,843 bp. The overall coverage was 40X. The draft genome sequence was used for gene prediction using AUGUSTUS which resulted in 65,404 genes using Saccharomyces cerevisiae as a model. A total of 52,285 predicted genes found homology using BLASTX against nr database and 38,120 genes were observed with a significant BLASTX match with E-value cutoff of 1e-5 and 40% identity percentage. Out of 38,120 genes annotated a set of 11,873 genes had UniProt entries, while 7,248 were GO terms and 9,686 with KEGG IDs. Of the 7,248 GO terms, 2,724 were associated with the biological processes. The genome information of downy mildew pathogen is available in the NCBI GenBank database. The Sclerospora graminicola whole genome shotgun (WGS project has the project accession MIQA00000000. This version of the project (02 has the accession number MIQA02000000, and consists of sequences MIQA02000001-MIQA02026786, with BioProject ID PRJNA325098 and BioSample ID SAMN05219233. This study may help understand the evolutionary pattern of pathogen and aid elucidation of effector evolution for

  16. Development and utilization of novel intron length polymorphic markers in foxtail millet (Setaria italica (L.) P. Beauv.).

    Science.gov (United States)

    Gupta, Sarika; Kumari, Kajal; Das, Jyotirmoy; Lata, Charu; Puranik, Swati; Prasad, Manoj

    2011-07-01

    Introns are noncoding sequences in a gene that are transcribed to precursor mRNA but spliced out during mRNA maturation and are abundant in eukaryotic genomes. The availability of codominant molecular markers and saturated genetic linkage maps have been limited in foxtail millet (Setaria italica (L.) P. Beauv.). Here, we describe the development of 98 novel intron length polymorphic (ILP) markers in foxtail millet using sequence information of the model plant rice. A total of 575 nonredundant expressed sequence tag (EST) sequences were obtained, of which 327 and 248 unique sequences were from dehydration- and salinity-stressed suppression subtractive hybridization libraries, respectively. The BLAST analysis of 98 EST sequences suggests a nearly defined function for about 64% of them, and they were grouped into 11 different functional categories. All 98 ILP primer pairs showed a high level of cross-species amplification in two millets and two nonmillets species ranging from 90% to 100%, with a mean of ∼97%. The mean observed heterozygosity and Nei's average gene diversity 0.016 and 0.171, respectively, established the efficiency of the ILP markers for distinguishing the foxtail millet accessions. Based on 26 ILP markers, a reasonable dendrogram of 45 foxtail millet accessions was constructed, demonstrating the utility of ILP markers in germplasm characterizations and genomic relationships in millets and nonmillets species.

  17. Identification of human chromosome 22 transcribed sequences with ORF expressed sequence tags

    Science.gov (United States)

    de Souza, Sandro J.; Camargo, Anamaria A.; Briones, Marcelo R. S.; Costa, Fernando F.; Nagai, Maria Aparecida; Verjovski-Almeida, Sergio; Zago, Marco A.; Andrade, Luis Eduardo C.; Carrer, Helaine; El-Dorry, Hamza F. A.; Espreafico, Enilza M.; Habr-Gama, Angelita; Giannella-Neto, Daniel; Goldman, Gustavo H.; Gruber, Arthur; Hackel, Christine; Kimura, Edna T.; Maciel, Rui M. B.; Marie, Suely K. N.; Martins, Elizabeth A. L.; Nóbrega, Marina P.; Paçó-Larson, Maria Luisa; Pardini, Maria Inês M. C.; Pereira, Gonçalo G.; Pesquero, João Bosco; Rodrigues, Vanderlei; Rogatto, Silvia R.; da Silva, Ismael D. C. G.; Sogayar, Mari C.; de Fátima Sonati, Maria; Tajara, Eloiza H.; Valentini, Sandro R.; Acencio, Marcio; Alberto, Fernando L.; Amaral, Maria Elisabete J.; Aneas, Ivy; Bengtson, Mário Henrique; Carraro, Dirce M.; Carvalho, Alex F.; Carvalho, Lúcia Helena; Cerutti, Janete M.; Corrêa, Maria Lucia C.; Costa, Maria Cristina R.; Curcio, Cyntia; Gushiken, Tsieko; Ho, Paulo L.; Kimura, Elza; Leite, Luciana C. C.; Maia, Gustavo; Majumder, Paromita; Marins, Mozart; Matsukuma, Adriana; Melo, Analy S. A.; Mestriner, Carlos Alberto; Miracca, Elisabete C.; Miranda, Daniela C.; Nascimento, Ana Lucia T. O.; Nóbrega, Francisco G.; Ojopi, Élida P. B.; Pandolfi, José Rodrigo C.; Pessoa, Luciana Gilbert; Rahal, Paula; Rainho, Claudia A.; da Ro's, Nancy; de Sá, Renata G.; Sales, Magaly M.; da Silva, Neusa P.; Silva, Tereza C.; da Silva, Wilson; Simão, Daniel F.; Sousa, Josane F.; Stecconi, Daniella; Tsukumo, Fernando; Valente, Valéria; Zalcberg, Heloisa; Brentani, Ricardo R.; Reis, Luis F. L.; Dias-Neto, Emmanuel; Simpson, Andrew J. G.

    2000-01-01

    Transcribed sequences in the human genome can be identified with confidence only by alignment with sequences derived from cDNAs synthesized from naturally occurring mRNAs. We constructed a set of 250,000 cDNAs that represent partial expressed gene sequences and that are biased toward the central coding regions of the resulting transcripts. They are termed ORF expressed sequence tags (ORESTES). The 250,000 ORESTES were assembled into 81,429 contigs. Of these, 1,181 (1.45%) were found to match sequences in chromosome 22 with at least one ORESTES contig for 162 (65.6%) of the 247 known genes, for 67 (44.6%) of the 150 related genes, and for 45 of the 148 (30.4%) EST-predicted genes on this chromosome. Using a set of stringent criteria to validate our sequences, we identified a further 219 previously unannotated transcribed sequences on chromosome 22. Of these, 171 were in fact also defined by EST or full length cDNA sequences available in GenBank but not utilized in the initial annotation of the first human chromosome sequence. Thus despite representing less than 15% of all expressed human sequences in the public databases at the time of the present analysis, ORESTES sequences defined 48 transcribed sequences on chromosome 22 not defined by other sequences. All of the transcribed sequences defined by ORESTES coincided with DNA regions predicted as encoding exons by genscan. (http://genes.mit.edu/GENSCAN.html). PMID:11070084

  18. Application of the integrated analysis of safety (ISA) to sequences of Total loss of feed water in a PWR Reactor

    International Nuclear Information System (INIS)

    Moreno Chamorro, P.; Gallego Diaz, C.

    2011-01-01

    The main objective of this work is to show the current status of the implementation of integrated analysis of safety (ISA) methodology and its SCAIS associated tool (system of simulation codes for ISA) to the sequence analysis of total loss of feedwater in a PWR reactor model Westinghouse of three loops with large, dry containment.

  19. The Complete Sequence of a Human Parainfluenzavirus 4 Genome

    Science.gov (United States)

    Yea, Carmen; Cheung, Rose; Collins, Carol; Adachi, Dena; Nishikawa, John; Tellier, Raymond

    2009-01-01

    Although the human parainfluenza virus 4 (HPIV4) has been known for a long time, its genome, alone among the human paramyxoviruses, has not been completely sequenced to date. In this study we obtained the first complete genomic sequence of HPIV4 from a clinical isolate named SKPIV4 obtained at the Hospital for Sick Children in Toronto (Ontario, Canada). The coding regions for the N, P/V, M, F and HN proteins show very high identities (95% to 97%) with previously available partial sequences for HPIV4B. The sequence for the L protein and the non-coding regions represent new information. A surprising feature of the genome is its length, more than 17 kb, making it the longest genome within the genus Rubulavirus, although the length is well within the known range of 15 kb to 19 kb for the subfamily Paramyxovirinae. The availability of a complete genomic sequence will facilitate investigations on a respiratory virus that is still not completely characterized. PMID:21994536

  20. The Complete Sequence of a Human Parainfluenzavirus 4 Genome

    Directory of Open Access Journals (Sweden)

    Carmen Yea

    2009-06-01

    Full Text Available Although the human parainfluenza virus 4 (HPIV4 has been known for a long time, its genome, alone among the human paramyxoviruses, has not been completely sequenced to date. In this study we obtained the first complete genomic sequence of HPIV4 from a clinical isolate named SKPIV4 obtained at the Hospital for Sick Children in Toronto (Ontario, Canada. The coding regions for the N, P/V, M, F and HN proteins show very high identities (95% to 97% with previously available partial sequences for HPIV4B. The sequence for the L protein and the non-coding regions represent new information. A surprising feature of the genome is its length, more than 17 kb, making it the longest genome within the genus Rubulavirus, although the length is well within the known range of 15 kb to 19 kb for the subfamily Paramyxovirinae. The availability of a complete genomic sequence will facilitate investigations on a respiratory virus that is still not completely characterized.

  1. Allometric relations of total volumes of prolactin cells and corticotropic cells to body length in the annual cyprinodont Cynolebias whitei: effects of environmental salinity, stress and ageing

    NARCIS (Netherlands)

    Ruijter, J. M.; Wendelaar Bonga, S. E.

    1987-01-01

    An analysis of the allometric relations of the total volumes occupied by prolactin (PRL) and corticotropic (ACTH) cells (PRL volume and ACTH volume, respectively) to body length and a study of the immunocytochemical staining intensity of PRL and ACTH cells were used to determine the differences in

  2. Word-Length Correlations and Memory in Large Texts: A Visibility Network Analysis

    Directory of Open Access Journals (Sweden)

    Lev Guzmán-Vargas

    2015-11-01

    Full Text Available We study the correlation properties of word lengths in large texts from 30 ebooks in the English language from the Gutenberg Project (www.gutenberg.org using the natural visibility graph method (NVG. NVG converts a time series into a graph and then analyzes its graph properties. First, the original sequence of words is transformed into a sequence of values containing the length of each word, and then, it is integrated. Next, we apply the NVG to the integrated word-length series and construct the network. We show that the degree distribution of that network follows a power law, P ( k ∼ k - γ , with two regimes, which are characterized by the exponents γ s ≈ 1 . 7 (at short degree scales and γ l ≈ 1 . 3 (at large degree scales. This suggests that word lengths are much more strongly correlated at large distances between words than at short distances between words. That finding is also supported by the detrended fluctuation analysis (DFA and recurrence time distribution. These results provide new information about the universal characteristics of the structure of written texts beyond that given by word frequencies.

  3. The effects of session length on demand functions generated using FR schedules.

    Science.gov (United States)

    Foster, T Mary; Kinloch, Jennifer; Poling, Alan

    2011-05-01

    In comparing open and closed economies, researchers often arrange shorter sessions under the former condition than under the latter. Several studies indicate that session length per se can affect performance and there are some data that indicate that this variable can influence demand functions. To provide further data, the present study exposed domestic hens to series of increasing fixed-ratio schedules with the length of the open-economy sessions varied over 10, 40, 60, and 120 min. Session time affected the total-session response rates and pause lengths. The shortest session gave the greatest response rates and shortest pauses and the longest gave the lowest response rates and longest pauses. The total-session demand functions also changed with session length: The shortest session gave steeper initial slopes (i.e., the functions were more elastic at small ratios) and smaller rates of change of elasticity than the longest session. Response rates, pauses, and demand functions were, however, similar for equivalent periods of responding taken from within sessions of different overall lengths (e.g., total-session data for 10-min sessions and the data for the first 10 min of 120-min sessions). These findings suggest that differences in session length can confound the results of studies comparing open and closed economies when those economies are arranged in sessions that differ substantially in length, hence data for equivalent-length periods of responding, rather than total-session data, should be of primary interest under these conditions.

  4. Otolith Length-Fish Length Relationships of Eleven US Arctic Fish Species and Their Application to Ice Seal Diet Studies

    Science.gov (United States)

    Walker, K. L.; Norcross, B.

    2016-02-01

    The Arctic ecosystem has moved into the spotlight of scientific research in recent years due to increased climate change and oil and gas exploration. Arctic fishes and Arctic marine mammals represent key parts of this ecosystem, with fish being a common part of ice seal diets in the Arctic. Determining sizes of fish consumed by ice seals is difficult because otoliths are often the only part left of the fish after digestion. Otolith length is known to be positively related to fish length. By developing species-specific otolith-body morphometric relationships for Arctic marine fishes, fish length can be determined for fish prey found in seal stomachs. Fish were collected during ice free months in the Beaufort and Chukchi seas 2009 - 2014, and the most prevalent species captured were chosen for analysis. Otoliths from eleven fish species from seven families were measured. All species had strong linear relationships between otolith length and fish total length. Nine species had coefficient of determination values over 0.75, indicating that most of the variability in the otolith to fish length relationship was explained by the linear regression. These relationships will be applied to otoliths found in stomachs of three species of ice seals (spotted Phoca largha, ringed Pusa hispida, and bearded Erignathus barbatus) and used to estimate fish total length at time of consumption. Fish lengths can in turn be used to calculate fish weight, enabling further investigation into ice seal energetic demands. This application will aid in understanding how ice seals interact with fish communities in the US Arctic and directly contribute to diet comparisons among and within ice seal species. A better understanding of predator-prey interactions in the US Arctic will aid in predicting how ice seal and fish species will adapt to a changing Arctic.

  5. homo sapiens are bilaterally symmetrical but not with toe length and ...

    African Journals Online (AJOL)

    Kevin Ongeti

    2017-11-12

    Nov 12, 2017 ... in toe length and toe-length ratios among the three major ethnic groups in Nigeria. A total ... The toe-length ratios also displayed symmetrical differences for ... For the female population, all ratios were not significantly different.

  6. Direct comparisons of Illumina vs. Roche 454 sequencing technologies on the same microbial community DNA sample.

    Directory of Open Access Journals (Sweden)

    Chengwei Luo

    Full Text Available Next-generation sequencing (NGS is commonly used in metagenomic studies of complex microbial communities but whether or not different NGS platforms recover the same diversity from a sample and their assembled sequences are of comparable quality remain unclear. We compared the two most frequently used platforms, the Roche 454 FLX Titanium and the Illumina Genome Analyzer (GA II, on the same DNA sample obtained from a complex freshwater planktonic community. Despite the substantial differences in read length and sequencing protocols, the platforms provided a comparable view of the community sampled. For instance, derived assemblies overlapped in ~90% of their total sequences and in situ abundances of genes and genotypes (estimated based on sequence coverage correlated highly between the two platforms (R(2>0.9. Evaluation of base-call error, frameshift frequency, and contig length suggested that Illumina offered equivalent, if not better, assemblies than Roche 454. The results from metagenomic samples were further validated against DNA samples of eighteen isolate genomes, which showed a range of genome sizes and G+C% content. We also provide quantitative estimates of the errors in gene and contig sequences assembled from datasets characterized by different levels of complexity and G+C% content. For instance, we noted that homopolymer-associated, single-base errors affected ~1% of the protein sequences recovered in Illumina contigs of 10× coverage and 50% G+C; this frequency increased to ~3% when non-homopolymer errors were also considered. Collectively, our results should serve as a useful practical guide for choosing proper sampling strategies and data possessing protocols for future metagenomic studies.

  7. Utilization of deletion bins to anchor and order sequences along the wheat 7B chromosome.

    Science.gov (United States)

    Belova, Tatiana; Grønvold, Lars; Kumar, Ajay; Kianian, Shahryar; He, Xinyao; Lillemo, Morten; Springer, Nathan M; Lien, Sigbjørn; Olsen, Odd-Arne; Sandve, Simen R

    2014-09-01

    A total of 3,671 sequence contigs and scaffolds were mapped to deletion bins on wheat chromosome 7B providing a foundation for developing high-resolution integrated physical map for this chromosome. Bread wheat (Triticum aestivum L.) has a large, complex and highly repetitive genome which is challenging to assemble into high quality pseudo-chromosomes. As part of the international effort to sequence the hexaploid bread wheat genome by the international wheat genome sequencing consortium (IWGSC) we are focused on assembling a reference sequence for chromosome 7B. The successful completion of the reference chromosome sequence is highly dependent on the integration of genetic and physical maps. To aid the integration of these two types of maps, we have constructed a high-density deletion bin map of chromosome 7B. Using the 270 K Nimblegen comparative genomic hybridization (CGH) array on a set of cv. Chinese spring deletion lines, a total of 3,671 sequence contigs and scaffolds (~7.8 % of chromosome 7B physical length) were mapped into nine deletion bins. Our method of genotyping deletions on chromosome 7B relied on a model-based clustering algorithm (Mclust) to accurately predict the presence or absence of a given genomic sequence in a deletion line. The bin mapping results were validated using three different approaches, viz. (a) PCR-based amplification of randomly selected bin mapped sequences (b) comparison with previously mapped ESTs and (c) comparison with a 7B genetic map developed in the present study. Validation of the bin mapping results suggested a high accuracy of the assignment of 7B sequence contigs and scaffolds to the 7B deletion bins.

  8. Copolymers of N-cyclohexylacrylamide and n-butyl acrylate: synthesis, characterization, monomer reactivity ratios and mean sequence length

    Directory of Open Access Journals (Sweden)

    2007-06-01

    Full Text Available Copolymerization of N-cyclohexylacrylamide (NCHA and n-butyl acrylate (BA was carried out in dimethylformamide at 55±1°C using azobisisobutyronitrile as a free radical initiator. The copolymers were characterized by 1H-NMR spectroscopy and the copolymer compositions were determined by 1H-NMR analysis. The reactivity ratios of the monomers were determined by both linear and non-linear methods. The reactivity ratios of monomers determined using linear methods like Fineman-Ross (r1 = 0.37 and r2 = 1.77 , Kelen-Tudos (r1 = 0.38 and r2 = 1.77, ext. Kelen-Tudos (r1 = 0.37 and r2 = 1.75 Yezrieler-Brokhina-Roskin (r1 = 0.37 and r2 = 1.77 and non-linear methods like Tidwell-Mortimer (r1 = 0.37 and r2 = 1.76, ProCop (r1 = 0.36 and r2 = 1.82. The Q and e values for NCHA are 0.67 and 0.68 respectively. Mean sequence lengths of copolymers are estimated from r1 and r2 values. It shows that the BA units increases in a linear fashion in the polymer chain as the concentration of BA increases in the monomer feed.

  9. Benchmarking of the Oxford Nanopore MinION sequencing for quantitative and qualitative assessment of cDNA populations.

    Science.gov (United States)

    Oikonomopoulos, Spyros; Wang, Yu Chang; Djambazian, Haig; Badescu, Dunarel; Ragoussis, Jiannis

    2016-08-24

    To assess the performance of the Oxford Nanopore Technologies MinION sequencing platform, cDNAs from the External RNA Controls Consortium (ERCC) RNA Spike-In mix were sequenced. This mix mimics mammalian mRNA species and consists of 92 polyadenylated transcripts with known concentration. cDNA libraries were generated using a template switching protocol to facilitate the direct comparison between different sequencing platforms. The MinION performance was assessed for its ability to sequence the cDNAs directly with good accuracy in terms of abundance and full length. The abundance of the ERCC cDNA molecules sequenced by MinION agreed with their expected concentration. No length or GC content bias was observed. The majority of cDNAs were sequenced as full length. Additionally, a complex cDNA population derived from a human HEK-293 cell line was sequenced on an Illumina HiSeq 2500, PacBio RS II and ONT MinION platforms. We observed that there was a good agreement in the measured cDNA abundance between PacBio RS II and ONT MinION (rpearson = 0.82, isoforms with length more than 700bp) and between Illumina HiSeq 2500 and ONT MinION (rpearson = 0.75). This indicates that the ONT MinION can sequence quantitatively both long and short full length cDNA molecules.

  10. Molecular markers. Amplified fragment length polymorphism

    Directory of Open Access Journals (Sweden)

    Pržulj Novo

    2005-01-01

    Full Text Available Amplified Fragment Length Polymorphism molecular markers (AFLPs has been developed combining procedures of RFLPs and RAPDs molekular markers, i.e. the first step is restriction digestion of the genomic DNA that is followed by selective amplification of the restricted fragments. The advantage of the AFLP technique is that it allows rapid generation of a large number of reproducible markers. The reproducibility of AFLPs markers is assured by the use of restriction site-specific adapters and adapter-specific primers for PCR reaction. Only fragments containing the restriction site sequence plus the additional nucleotides will be amplified and the more selected nucleotides added on the primer sequence the fewer the number of fragments amplified by PCR. The amplified products are normally separated on a sequencing gel and visualized after exposure to X-ray film or by using fluorescent labeled primers. AFLP shave proven to be extremely proficient in revealing diversity at below the species level. A disadvantage of AFLP technique is that AFLPs are essentially a dominant marker system and not able to identify heterozygotes.

  11. Radiation hybrid maps of the D-genome of Aegilops tauschii and their application in sequence assembly of large and complex plant genomes.

    Science.gov (United States)

    Kumar, Ajay; Seetan, Raed; Mergoum, Mohamed; Tiwari, Vijay K; Iqbal, Muhammad J; Wang, Yi; Al-Azzam, Omar; Šimková, Hana; Luo, Ming-Cheng; Dvorak, Jan; Gu, Yong Q; Denton, Anne; Kilian, Andrzej; Lazo, Gerard R; Kianian, Shahryar F

    2015-10-16

    The large and complex genome of bread wheat (Triticum aestivum L., ~17 Gb) requires high resolution genome maps with saturated marker scaffolds to anchor and orient BAC contigs/ sequence scaffolds for whole genome assembly. Radiation hybrid (RH) mapping has proven to be an excellent tool for the development of such maps for it offers much higher and more uniform marker resolution across the length of the chromosome compared to genetic mapping and does not require marker polymorphism per se, as it is based on presence (retention) vs. absence (deletion) marker assay. In this study, a 178 line RH panel was genotyped with SSRs and DArT markers to develop the first high resolution RH maps of the entire D-genome of Ae. tauschii accession AL8/78. To confirm map order accuracy, the AL8/78-RH maps were compared with:1) a DArT consensus genetic map constructed using more than 100 bi-parental populations, 2) a RH map of the D-genome of reference hexaploid wheat 'Chinese Spring', and 3) two SNP-based genetic maps, one with anchored D-genome BAC contigs and another with anchored D-genome sequence scaffolds. Using marker sequences, the RH maps were also anchored with a BAC contig based physical map and draft sequence of the D-genome of Ae. tauschii. A total of 609 markers were mapped to 503 unique positions on the seven D-genome chromosomes, with a total map length of 14,706.7 cR. The average distance between any two marker loci was 29.2 cR which corresponds to 2.1 cM or 9.8 Mb. The average mapping resolution across the D-genome was estimated to be 0.34 Mb (Mb/cR) or 0.07 cM (cM/cR). The RH maps showed almost perfect agreement with several published maps with regard to chromosome assignments of markers. The mean rank correlations between the position of markers on AL8/78 maps and the four published maps, ranged from 0.75 to 0.92, suggesting a good agreement in marker order. With 609 mapped markers, a total of 2481 deletions for the whole D-genome were detected with an average

  12. The complete chloroplast genome sequence of Dodonaea viscosa: comparative and phylogenetic analyses.

    Science.gov (United States)

    Saina, Josphat K; Gichira, Andrew W; Li, Zhi-Zhong; Hu, Guang-Wan; Wang, Qing-Feng; Liao, Kuo

    2018-02-01

    The plant chloroplast (cp) genome is a highly conserved structure which is beneficial for evolution and systematic research. Currently, numerous complete cp genome sequences have been reported due to high throughput sequencing technology. However, there is no complete chloroplast genome of genus Dodonaea that has been reported before. To better understand the molecular basis of Dodonaea viscosa chloroplast, we used Illumina sequencing technology to sequence its complete genome. The whole length of the cp genome is 159,375 base pairs (bp), with a pair of inverted repeats (IRs) of 27,099 bp separated by a large single copy (LSC) 87,204 bp, and small single copy (SSC) 17,972 bp. The annotation analysis revealed a total of 115 unique genes of which 81 were protein coding, 30 tRNA, and four ribosomal RNA genes. Comparative genome analysis with other closely related Sapindaceae members showed conserved gene order in the inverted and single copy regions. Phylogenetic analysis clustered D. viscosa with other species of Sapindaceae with strong bootstrap support. Finally, a total of 249 SSRs were detected. Moreover, a comparison of the synonymous (Ks) and nonsynonymous (Ka) substitution rates in D. viscosa showed very low values. The availability of cp genome reported here provides a valuable genetic resource for comprehensive further studies in genetic variation, taxonomy and phylogenetic evolution of Sapindaceae family. In addition, SSR markers detected will be used in further phylogeographic and population structure studies of the species in this genus.

  13. Mitochondrial genome sequence of Egyptian swift Rock Pigeon (Columba livia breed Egyptian swift).

    Science.gov (United States)

    Li, Chun-Hong; Shi, Wei; Shi, Wan-Yu

    2015-06-01

    The Egyptian swift Rock Pigeon is a breed of fancy pigeon developed over many years of selective breeding. In this work, we report the complete mitochondrial genome sequence of Egyptian swift Rock Pigeon. The total length of the mitogenome was 17,239 bp and its overall base composition was estimated to be 30.2% for A, 24.0% for T, 31.9% for C and 13.9% for G, indicating an A-T (54.2%)-rich feature in the mitogenome. It contained the typical structure of 13 protein-coding genes, 2 ribosomal RNA genes, 22 transfer RNA genes and a non-coding control region (D-loop region). The complete mitochondrial genome sequence of Egyptian swift Rock Pigeon would serve as an important data set of the germplasm resources for further study.

  14. Online Sequence Training of Recurrent Neural Networks with Connectionist Temporal Classification

    OpenAIRE

    Hwang, Kyuyeon; Sung, Wonyong

    2015-01-01

    Connectionist temporal classification (CTC) based supervised sequence training of recurrent neural networks (RNNs) has shown great success in many machine learning areas including end-to-end speech and handwritten character recognition. For the CTC training, however, it is required to unroll (or unfold) the RNN by the length of an input sequence. This unrolling requires a lot of memory and hinders a small footprint implementation of online learning or adaptation. Furthermore, the length of tr...

  15. Optimizing multiple sequence alignments using a genetic algorithm based on three objectives: structural information, non-gaps percentage and totally conserved columns.

    Science.gov (United States)

    Ortuño, Francisco M; Valenzuela, Olga; Rojas, Fernando; Pomares, Hector; Florido, Javier P; Urquiza, Jose M; Rojas, Ignacio

    2013-09-01

    Multiple sequence alignments (MSAs) are widely used approaches in bioinformatics to carry out other tasks such as structure predictions, biological function analyses or phylogenetic modeling. However, current tools usually provide partially optimal alignments, as each one is focused on specific biological features. Thus, the same set of sequences can produce different alignments, above all when sequences are less similar. Consequently, researchers and biologists do not agree about which is the most suitable way to evaluate MSAs. Recent evaluations tend to use more complex scores including further biological features. Among them, 3D structures are increasingly being used to evaluate alignments. Because structures are more conserved in proteins than sequences, scores with structural information are better suited to evaluate more distant relationships between sequences. The proposed multiobjective algorithm, based on the non-dominated sorting genetic algorithm, aims to jointly optimize three objectives: STRIKE score, non-gaps percentage and totally conserved columns. It was significantly assessed on the BAliBASE benchmark according to the Kruskal-Wallis test (P algorithm also outperforms other aligners, such as ClustalW, Multiple Sequence Alignment Genetic Algorithm (MSA-GA), PRRP, DIALIGN, Hidden Markov Model Training (HMMT), Pattern-Induced Multi-sequence Alignment (PIMA), MULTIALIGN, Sequence Alignment Genetic Algorithm (SAGA), PILEUP, Rubber Band Technique Genetic Algorithm (RBT-GA) and Vertical Decomposition Genetic Algorithm (VDGA), according to the Wilcoxon signed-rank test (P 0.05) with the advantage of being able to use less structures. Structural information is included within the objective function to evaluate more accurately the obtained alignments. The source code is available at http://www.ugr.es/~fortuno/MOSAStrE/MO-SAStrE.zip.

  16. New PN Even Balanced Sequences for Spread-Spectrum Systems

    Directory of Open Access Journals (Sweden)

    Inácio JAL

    2005-01-01

    Full Text Available A new class of pseudonoise even balanced (PN-EB binary spreading sequences is derived from existing classical odd-length families of maximum-length sequences, such as those proposed by Gold, by appending or inserting one extra-zero element (chip to the original sequences. The incentive to generate large families of PN-EB spreading sequences is motivated by analyzing the spreading effect of these sequences from a natural sampling point of view. From this analysis a new definition for PG is established, from which it becomes clear that very high processing gains (PGs can be achieved in band-limited direct-sequence spread-spectrum (DSSS applications by using spreading sequences with zero mean, given that certain conditions regarding spectral aliasing are met. To obtain large families of even balanced (i.e., equal number of ones and zeros sequences, two design criteria are proposed, namely the ranging criterion (RC and the generating ranging criterion (GRC. PN-EB sequences in the polynomial range are derived using these criteria, and it is shown that they exhibit secondary autocorrelation and cross-correlation peaks comparable to the sequences they are derived from. The methods proposed not only facilitate the generation of large numbers of new PN-EB spreading sequences required for CDMA applications, but simultaneously offer high processing gains and good despreading characteristics in multiuser SS scenarios with band-limited noise and interference spectra. Simulation results are presented to confirm the respective claims made.

  17. Comparative analysis of human cytomegalovirus a-sequence in multiple clinical isolates by using polymerase chain reaction and restriction fragment length polymorphism assays.

    Science.gov (United States)

    Zaia, J A; Gallez-Hawkins, G; Churchill, M A; Morton-Blackshere, A; Pande, H; Adler, S P; Schmidt, G M; Forman, S J

    1990-01-01

    The human cytomegalovirus (HCMV) a-sequence (a-seq) is located in the joining region between the long (L) and short (S) unique sequences of the virus (L-S junction), and this hypervariable junction has been used to differentiate HCMV strains. The purpose of this study was to investigate whether there are differences among strains of human cytomegalovirus which could be characterized by polymerase chain reaction (PCR) amplification of the a-seq of HCMV DNA and to compare a PCR method of strain differentiation with conventional restriction fragment length polymorphism (RFLP) methodology by using HCMV junction probes. Laboratory strains of HCMV and viral isolates from individuals with HCMV infection were characterized by using both RFLPs and PCR. The PCR assay amplified regions in the major immediate-early gene (IE-1), the 64/65-kDa matrix phosphoprotein (pp65), and the a-seq of the L-S junction region. HCMV laboratory strains Towne, AD169, and Davis were distinguishable, in terms of size of the amplified product, when analyzed by PCR with primers specific for the a-seq but were indistinguishable by using PCR targeted to IE-1 and pp65 sequences. When this technique was applied to a characterization of isolates from individuals with HCMV infection, selected isolates could be readily distinguished. In addition, when the a-seq PCR product was analyzed with restriction enzyme digestion for the presence of specific sequences, these DNA differences were confirmed. PCR analysis across the variable a-seq of HCMV demonstrated differences among strains which were confirmed by RFLP in 38 of 40 isolates analyzed. The most informative restriction enzyme sites in the a-seq for distinguishing HCMV isolates were those of MnlI and BssHII. This indicates that the a-seq of HCMV is heterogeneous among wild strains, and PCR of the a-seq of HCMV is a practical way to characterize differences in strains of HCMV. Images PMID:1980680

  18. Symbolic complexity for nucleotide sequences: a sign of the genome structure

    International Nuclear Information System (INIS)

    Salgado-García, R; Ugalde, E

    2016-01-01

    We introduce a method for estimating the complexity function (which counts the number of observable words of a given length) of a finite symbolic sequence, which we use to estimate the complexity function of coding DNA sequences for several species of the Hominidae family. In all cases, the obtained symbolic complexities show the same characteristic behavior: exponential growth for small word lengths, followed by linear growth for larger word lengths. The symbolic complexities of the species we consider exhibit a systematic trend in correspondence with the phylogenetic tree. Using our method, we estimate the complexity function of sequences obtained by some known evolution models, and in some cases we observe the characteristic exponential-linear growth of the Hominidae coding DNA complexity. Analysis of the symbolic complexity of sequences obtained from a specific evolution model points to the following conclusion: linear growth arises from the random duplication of large segments during the evolution of the genome, while the decrease in the overall complexity from one species to another is due to a difference in the speed of accumulation of point mutations. (paper)

  19. Genome sequencing of the sweetpotato whitefly Bemisia tabaci MED/Q.

    Science.gov (United States)

    Xie, Wen; Chen, Chunhai; Yang, Zezhong; Guo, Litao; Yang, Xin; Wang, Dan; Chen, Ming; Huang, Jinqun; Wen, Yanan; Zeng, Yang; Liu, Yating; Xia, Jixing; Tian, Lixia; Cui, Hongying; Wu, Qingjun; Wang, Shaoli; Xu, Baoyun; Li, Xianchun; Tan, Xinqiu; Ghanim, Murad; Qiu, Baoli; Pan, Huipeng; Chu, Dong; Delatte, Helene; Maruthi, M N; Ge, Feng; Zhou, Xueping; Wang, Xiaowei; Wan, Fanghao; Du, Yuzhou; Luo, Chen; Yan, Fengming; Preisser, Evan L; Jiao, Xiaoguo; Coates, Brad S; Zhao, Jinyang; Gao, Qiang; Xia, Jinquan; Yin, Ye; Liu, Yong; Brown, Judith K; Zhou, Xuguo Joe; Zhang, Youjun

    2017-05-01

    The sweetpotato whitefly Bemisia tabaci is a highly destructive agricultural and ornamental crop pest. It damages host plants through both phloem feeding and vectoring plant pathogens. Introductions of B. tabaci are difficult to quarantine and eradicate because of its high reproductive rates, broad host plant range, and insecticide resistance. A total of 791 Gb of raw DNA sequence from whole genome shotgun sequencing, and 13 BAC pooling libraries were generated by Illumina sequencing using different combinations of mate-pair and pair-end libraries. Assembly gave a final genome with a scaffold N50 of 437 kb, and a total length of 658 Mb. Annotation of repetitive elements and coding regions resulted in 265.0 Mb TEs (40.3%) and 20 786 protein-coding genes with putative gene family expansions, respectively. Phylogenetic analysis based on orthologs across 14 arthropod taxa suggested that MED/Q is clustered into a hemipteran clade containing A. pisum and is a sister lineage to a clade containing both R. prolixus and N. lugens. Genome completeness, as estimated using the CEGMA and Benchmarking Universal Single-Copy Orthologs pipelines, reached 96% and 79%. These MED/Q genomic resources lay a foundation for future 'pan-genomic' comparisons of invasive vs. noninvasive, invasive vs. invasive, and native vs. exotic Bemisia, which, in return, will open up new avenues of investigation into whitefly biology, evolution, and management. © The Author 2017. Published by Oxford University Press.

  20. A High Density Genetic Map Derived from RAD Sequencing and Its Application in QTL Analysis of Yield-Related Traits in Vigna unguiculata

    Directory of Open Access Journals (Sweden)

    Lei Pan

    2017-09-01

    Full Text Available Cowpea [Vigna unguiculata (L. Walp.] is an annual legume of economic importance and widely grown in the semi-arid tropics. However, high-density genetic maps of cowpea are still lacking. Here, we identified 34,868 SNPs (single nucleotide polymorphisms that were distributed in the cowpea genome based on the RAD sequencing (restriction-site associated DNA sequencing technique using a population of 170 individuals (two cowpea parents and 168 F2:3 progenies. Of these, 17,996 reliable SNPs were allotted to 11 consensus linkage groups (LGs. The length of the genetic map was 1,194.25 cM in total with a mean distance of 0.066 cM/SNP marker locus. Using this map and the F2:3 population, combined with the CIM (composite interval mapping method, eleven quantitative trait loci (QTL of yield-related trait were detected on seven LGs (LG4, 5, 6, 7, 9, 10, and 11 in cowpea. These QTL explained 0.05–17.32% of the total phenotypic variation. Among these, four QTL were for pod length, four QTL for thousand-grain weight (TGW, two QTL for grain number per pod, and one QTL for carpopodium length. Our results will provide a foundation for understanding genes related to grain yield in the cowpea and genus Vigna.

  1. A New Maximum Length Record of the Bluefish (Pomatomus saltatrix Linnaeus, 1766) for Turkey Seas

    OpenAIRE

    Cengiz, Özgür

    2014-01-01

    One specimen of the bluefish (Pomatomus saltatrix Linnaeus, 1766) with 76.5 cm in total length and 4800.0 g in total weight was photographed in the Çanakkale Fish Market. The given length is second maximum length record of the bluefish for Turkish waters

  2. Murine protein H is comprised of 20 repeating units, 61 amino acids in length

    DEFF Research Database (Denmark)

    Kristensen, Torsten; Tack, B F

    1986-01-01

    A cDNA library constructed from size-selected (greater than 28 S) poly(A)+ RNA isolated from the livers of C57B10. WR mice was screened by using a 249-base-pair (bp) cDNA fragment encoding 83 amino acid residues of human protein H as a probe. Of 120,000 transformants screened, 30 hybridized......, 448 bp of 3'-untranslated sequence, and a polyadenylylated tail of undetermined length. Murine pre-protein H was deduced to consist of an 18-amino acid signal peptide and 1216 residues of H-protein sequence. Murine H was composed of 20 repetitive units, each about 61 amino acid residues in length...

  3. TypeLoader: A fast and efficient automated workflow for the annotation and submission of novel full-length HLA alleles.

    Science.gov (United States)

    Surendranath, V; Albrecht, V; Hayhurst, J D; Schöne, B; Robinson, J; Marsh, S G E; Schmidt, A H; Lange, V

    2017-07-01

    Recent years have seen a rapid increase in the discovery of novel allelic variants of the human leukocyte antigen (HLA) genes. Commonly, only the exons encoding the peptide binding domains of novel HLA alleles are submitted. As a result, the IPD-IMGT/HLA Database lacks sequence information outside those regions for the majority of known alleles. This has implications for the application of the new sequencing technologies, which deliver sequence data often covering the complete gene. As these technologies simplify the characterization of the complete gene regions, it is desirable for novel alleles to be submitted as full-length sequences to the database. However, the manual annotation of full-length alleles and the generation of specific formats required by the sequence repositories is prone to error and time consuming. We have developed TypeLoader to address both these facets. With only the full-length sequence as a starting point, Typeloader performs automatic sequence annotation and subsequently handles all steps involved in preparing the specific formats for submission with very little manual intervention. TypeLoader is routinely used at the DKMS Life Science Lab and has aided in the successful submission of more than 900 novel HLA alleles as full-length sequences to the European Nucleotide Archive repository and the IPD-IMGT/HLA Database with a 95% reduction in the time spent on annotation and submission when compared with handling these processes manually. TypeLoader is implemented as a web application and can be easily installed and used on a standalone Linux desktop system or within a Linux client/server architecture. TypeLoader is downloadable from http://www.github.com/DKMS-LSL/typeloader. © 2017 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  4. Genome sequence of Aspergillus luchuensis NBRC 4314

    Science.gov (United States)

    Yamada, Osamu; Machida, Masayuki; Hosoyama, Akira; Goto, Masatoshi; Takahashi, Toru; Futagami, Taiki; Yamagata, Youhei; Takeuchi, Michio; Kobayashi, Tetsuo; Koike, Hideaki; Abe, Keietsu; Asai, Kiyoshi; Arita, Masanori; Fujita, Nobuyuki; Fukuda, Kazuro; Higa, Ken-ichi; Horikawa, Hiroshi; Ishikawa, Takeaki; Jinno, Koji; Kato, Yumiko; Kirimura, Kohtaro; Mizutani, Osamu; Nakasone, Kaoru; Sano, Motoaki; Shiraishi, Yohei; Tsukahara, Masatoshi; Gomi, Katsuya

    2016-01-01

    Awamori is a traditional distilled beverage made from steamed Thai-Indica rice in Okinawa, Japan. For brewing the liquor, two microbes, local kuro (black) koji mold Aspergillus luchuensis and awamori yeast Saccharomyces cerevisiae are involved. In contrast, that yeasts are used for ethanol fermentation throughout the world, a characteristic of Japanese fermentation industries is the use of Aspergillus molds as a source of enzymes for the maceration and saccharification of raw materials. Here we report the draft genome of a kuro (black) koji mold, A. luchuensis NBRC 4314 (RIB 2604). The total length of nonredundant sequences was nearly 34.7 Mb, comprising approximately 2,300 contigs with 16 telomere-like sequences. In total, 11,691 genes were predicted to encode proteins. Most of the housekeeping genes, such as transcription factors and N-and O-glycosylation system, were conserved with respect to Aspergillus niger and Aspergillus oryzae. An alternative oxidase and acid-stable α-amylase regarding citric acid production and fermentation at a low pH as well as a unique glutamic peptidase were also found in the genome. Furthermore, key biosynthetic gene clusters of ochratoxin A and fumonisin B were absent when compared with A. niger genome, showing the safety of A. luchuensis for food and beverage production. This genome information will facilitate not only comparative genomics with industrial kuro-koji molds, but also molecular breeding of the molds in improvements of awamori fermentation. PMID:27651094

  5. Measuring long impulse responses with pseudorandom sequences and sweep signals

    DEFF Research Database (Denmark)

    Torras Rosell, Antoni; Jacobsen, Finn

    2010-01-01

    In architectural acoustics, background noise, loudspeaker nonlinearities, and time variances are the most common disturbances that can compromise a measurement. The effects of such disturbances on measurement of long impulse responses with pseudorandom sequences (maximum-length sequences (MLS) an...

  6. SNP identification from RNA sequencing and linkage map construction of rubber tree for anchoring the draft genome.

    Science.gov (United States)

    Shearman, Jeremy R; Sangsrakru, Duangjai; Jomchai, Nukoon; Ruang-Areerate, Panthita; Sonthirod, Chutima; Naktang, Chaiwat; Theerawattanasuk, Kanikar; Tragoonrung, Somvong; Tangphatsornruang, Sithichoke

    2015-01-01

    Hevea brasiliensis, or rubber tree, is an important crop species that accounts for the majority of natural latex production. The rubber tree nuclear genome consists of 18 chromosomes and is roughly 2.15 Gb. The current rubber tree reference genome assembly consists of 1,150,326 scaffolds ranging from 200 to 531,465 bp and totalling 1.1 Gb. Only 143 scaffolds, totalling 7.6 Mb, have been placed into linkage groups. We have performed RNA-seq on 6 varieties of rubber tree to identify SNPs and InDels and used this information to perform target sequence enrichment and high throughput sequencing to genotype a set of SNPs in 149 rubber tree offspring from a cross between RRIM 600 and RRII 105 rubber tree varieties. We used this information to generate a linkage map allowing for the anchoring of 24,424 contigs from 3,009 scaffolds, totalling 115 Mb or 10.4% of the published sequence, into 18 linkage groups. Each linkage group contains between 319 and 1367 SNPs, or 60 to 194 non-redundant marker positions, and ranges from 156 to 336 cM in length. This linkage map includes 20,143 of the 69,300 predicted genes from rubber tree and will be useful for mapping studies and improving the reference genome assembly.

  7. Sequence length variation, indel costs, and congruence in sensitivity analysis

    DEFF Research Database (Denmark)

    Aagesen, Lone; Petersen, Gitte; Seberg, Ole

    2005-01-01

    The behavior of two topological and four character-based congruence measures was explored using different indel treatments in three empirical data sets, each with different alignment difficulties. The analyses were done using direct optimization within a sensitivity analysis framework in which...... the cost of indels was varied. Indels were treated either as a fifth character state, or strings of contiguous gaps were considered single events by using linear affine gap cost. Congruence consistently improved when indels were treated as single events, but no congruence measure appeared as the obviously...... preferable one. However, when combining enough data, all congruence measures clearly tended to select the same alignment cost set as the optimal one. Disagreement among congruence measures was mostly caused by a dominant fragment or a data partition that included all or most of the length variation...

  8. Length bias correction in gene ontology enrichment analysis using logistic regression.

    Science.gov (United States)

    Mi, Gu; Di, Yanming; Emerson, Sarah; Cumbie, Jason S; Chang, Jeff H

    2012-01-01

    When assessing differential gene expression from RNA sequencing data, commonly used statistical tests tend to have greater power to detect differential expression of genes encoding longer transcripts. This phenomenon, called "length bias", will influence subsequent analyses such as Gene Ontology enrichment analysis. In the presence of length bias, Gene Ontology categories that include longer genes are more likely to be identified as enriched. These categories, however, are not necessarily biologically more relevant. We show that one can effectively adjust for length bias in Gene Ontology analysis by including transcript length as a covariate in a logistic regression model. The logistic regression model makes the statistical issue underlying length bias more transparent: transcript length becomes a confounding factor when it correlates with both the Gene Ontology membership and the significance of the differential expression test. The inclusion of the transcript length as a covariate allows one to investigate the direct correlation between the Gene Ontology membership and the significance of testing differential expression, conditional on the transcript length. We present both real and simulated data examples to show that the logistic regression approach is simple, effective, and flexible.

  9. A Genome-Scale Investigation of How Sequence, Function, and Tree-Based Gene Properties Influence Phylogenetic Inference.

    Science.gov (United States)

    Shen, Xing-Xing; Salichos, Leonidas; Rokas, Antonis

    2016-09-02

    Molecular phylogenetic inference is inherently dependent on choices in both methodology and data. Many insightful studies have shown how choices in methodology, such as the model of sequence evolution or optimality criterion used, can strongly influence inference. In contrast, much less is known about the impact of choices in the properties of the data, typically genes, on phylogenetic inference. We investigated the relationships between 52 gene properties (24 sequence-based, 19 function-based, and 9 tree-based) with each other and with three measures of phylogenetic signal in two assembled data sets of 2,832 yeast and 2,002 mammalian genes. We found that most gene properties, such as evolutionary rate (measured through the percent average of pairwise identity across taxa) and total tree length, were highly correlated with each other. Similarly, several gene properties, such as gene alignment length, Guanine-Cytosine content, and the proportion of tree distance on internal branches divided by relative composition variability (treeness/RCV), were strongly correlated with phylogenetic signal. Analysis of partial correlations between gene properties and phylogenetic signal in which gene evolutionary rate and alignment length were simultaneously controlled, showed similar patterns of correlations, albeit weaker in strength. Examination of the relative importance of each gene property on phylogenetic signal identified gene alignment length, alongside with number of parsimony-informative sites and variable sites, as the most important predictors. Interestingly, the subsets of gene properties that optimally predicted phylogenetic signal differed considerably across our three phylogenetic measures and two data sets; however, gene alignment length and RCV were consistently included as predictors of all three phylogenetic measures in both yeasts and mammals. These results suggest that a handful of sequence-based gene properties are reliable predictors of phylogenetic signal

  10. Identification of optimum sequencing depth especially for de novo genome assembly of small genomes using next generation sequencing data.

    Science.gov (United States)

    Desai, Aarti; Marwah, Veer Singh; Yadav, Akshay; Jha, Vineet; Dhaygude, Kishor; Bangar, Ujwala; Kulkarni, Vivek; Jere, Abhay

    2013-01-01

    Next Generation Sequencing (NGS) is a disruptive technology that has found widespread acceptance in the life sciences research community. The high throughput and low cost of sequencing has encouraged researchers to undertake ambitious genomic projects, especially in de novo genome sequencing. Currently, NGS systems generate sequence data as short reads and de novo genome assembly using these short reads is computationally very intensive. Due to lower cost of sequencing and higher throughput, NGS systems now provide the ability to sequence genomes at high depth. However, currently no report is available highlighting the impact of high sequence depth on genome assembly using real data sets and multiple assembly algorithms. Recently, some studies have evaluated the impact of sequence coverage, error rate and average read length on genome assembly using multiple assembly algorithms, however, these evaluations were performed using simulated datasets. One limitation of using simulated datasets is that variables such as error rates, read length and coverage which are known to impact genome assembly are carefully controlled. Hence, this study was undertaken to identify the minimum depth of sequencing required for de novo assembly for different sized genomes using graph based assembly algorithms and real datasets. Illumina reads for E.coli (4.6 MB) S.kudriavzevii (11.18 MB) and C.elegans (100 MB) were assembled using SOAPdenovo, Velvet, ABySS, Meraculous and IDBA-UD. Our analysis shows that 50X is the optimum read depth for assembling these genomes using all assemblers except Meraculous which requires 100X read depth. Moreover, our analysis shows that de novo assembly from 50X read data requires only 6-40 GB RAM depending on the genome size and assembly algorithm used. We believe that this information can be extremely valuable for researchers in designing experiments and multiplexing which will enable optimum utilization of sequencing as well as analysis resources.

  11. An Efficient Approach to Mining Maximal Contiguous Frequent Patterns from Large DNA Sequence Databases

    Directory of Open Access Journals (Sweden)

    Md. Rezaul Karim

    2012-03-01

    Full Text Available Mining interesting patterns from DNA sequences is one of the most challenging tasks in bioinformatics and computational biology. Maximal contiguous frequent patterns are preferable for expressing the function and structure of DNA sequences and hence can capture the common data characteristics among related sequences. Biologists are interested in finding frequent orderly arrangements of motifs that are responsible for similar expression of a group of genes. In order to reduce mining time and complexity, however, most existing sequence mining algorithms either focus on finding short DNA sequences or require explicit specification of sequence lengths in advance. The challenge is to find longer sequences without specifying sequence lengths in advance. In this paper, we propose an efficient approach to mining maximal contiguous frequent patterns from large DNA sequence datasets. The experimental results show that our proposed approach is memory-efficient and mines maximal contiguous frequent patterns within a reasonable time.

  12. Complete mitochondrial genome sequence of the lined seahorse Hippocampus erectus Perry, 1810 (Gasterosteiformes: Syngnathidae).

    Science.gov (United States)

    Zhang, Yanhong; Zhang, Huixian; Lin, Qiang; Huang, Liangmin

    2015-01-01

    The complete mitochondrial genome sequence of the lined seahorse Hippocampus erectus was first determined in this article. The total length of H. erectus mitogenome is 16,529 bp, which consists of 13 protein-coding genes, 22 tRNA and 2 rRNA genes and 1 control region. The features of the H. erectus mitochondrial genome were similar to the typical vertebrates. The overall base composition of H. erectus is 31.8% A, 28.6% T, 24.3% C and 15.3% G, with a slight A + T rich feature (60.4%).

  13. Complete Chloroplast Genome Sequence of Tartary Buckwheat (Fagopyrum tataricum and Comparative Analysis with Common Buckwheat (F. esculentum.

    Directory of Open Access Journals (Sweden)

    Kwang-Soo Cho

    Full Text Available We report the chloroplast (cp genome sequence of tartary buckwheat (Fagopyrum tataricum obtained by next-generation sequencing technology and compared this with the previously reported common buckwheat (F. esculentum ssp. ancestrale cp genome. The cp genome of F. tataricum has a total sequence length of 159,272 bp, which is 327 bp shorter than the common buckwheat cp genome. The cp gene content, order, and orientation are similar to those of common buckwheat, but with some structural variation at tandem and palindromic repeat frequencies and junction areas. A total of seven InDels (around 100 bp were found within the intergenic sequences and the ycf1 gene. Copy number variation of the 21-bp tandem repeat varied in F. tataricum (four repeats and F. esculentum (one repeat, and the InDel of the ycf1 gene was 63 bp long. Nucleotide and amino acid have highly conserved coding sequence with about 98% homology and four genes--rpoC2, ycf3, accD, and clpP--have high synonymous (Ks value. PCR based InDel markers were applied to diverse genetic resources of F. tataricum and F. esculentum, and the amplicon size was identical to that expected in silico. Therefore, these InDel markers are informative biomarkers to practically distinguish raw or processed buckwheat products derived from F. tataricum and F. esculentum.

  14. Morphological observation and length-weight relationship of critically endangered riverine catfish Rita rita (Hamilton).

    Science.gov (United States)

    Amin, M R; Mollah, M F A; Taslima, K; Muhammadullah

    2014-01-15

    The experiment was conducted to investigate the morphological status of the critically endangered riverine catfish Rita rita using morphometric and meristic traits. About 158 species of Rita were collected from the old Brahmaputra river in Mymensingh district and were studied in the laboratory of the Fisheries Biology and Genetics Department, Bangladesh Agricultural University. Measurement of length and weight of Rita were recorded by using measuring scale and electric balance respectively. Significant curvilinear relationship existed between total length and other morphometric characters and between head length and other characters of the head. Relationships between total length and various body measurements of the fish were highly significant (p < 0.01) except the relationship between total length and pelvic fin length of male fish (p < 0.05). In case of meristic characters-dorsal fin rays, pelvic fin rays, pectoral fin rays, anal fin rays, caudal fin rays, number of vertebrae and branchiostegal rays were found to be more or less similar except slight differences. The values of condition factors (k) in the total length body-weight relationships for female and male were found to be 0.41 and 0.38, respectively. The mean values of relative condition factors (kn) were 1.0 and 1.005 for female and male, respectively.

  15. Genetic variation at hair length candidate genes in elephants and the extinct woolly mammoth

    Directory of Open Access Journals (Sweden)

    Tisdale Michele

    2009-09-01

    Full Text Available Abstract Background Like humans, the living elephants are unusual among mammals in being sparsely covered with hair. Relative to extant elephants, the extinct woolly mammoth, Mammuthus primigenius, had a dense hair cover and extremely long hair, which likely were adaptations to its subarctic habitat. The fibroblast growth factor 5 (FGF5 gene affects hair length in a diverse set of mammalian species. Mutations in FGF5 lead to recessive long hair phenotypes in mice, dogs, and cats; and the gene has been implicated in hair length variation in rabbits. Thus, FGF5 represents a leading candidate gene for the phenotypic differences in hair length notable between extant elephants and the woolly mammoth. We therefore sequenced the three exons (except for the 3' UTR and a portion of the promoter of FGF5 from the living elephantid species (Asian, African savanna and African forest elephants and, using protocols for ancient DNA, from a woolly mammoth. Results Between the extant elephants and the mammoth, two single base substitutions were observed in FGF5, neither of which alters the amino acid sequence. Modeling of the protein structure suggests that the elephantid proteins fold similarly to the human FGF5 protein. Bioinformatics analyses and DNA sequencing of another locus that has been implicated in hair cover in humans, type I hair keratin pseudogene (KRTHAP1, also yielded negative results. Interestingly, KRTHAP1 is a pseudogene in elephantids as in humans (although fully functional in non-human primates. Conclusion The data suggest that the coding sequence of the FGF5 gene is not the critical determinant of hair length differences among elephantids. The results are discussed in the context of hairlessness among mammals and in terms of the potential impact of large body size, subarctic conditions, and an aquatic ancestor on hair cover in the Proboscidea.

  16. PCR-based isolation and identification of full-length low-molecular-weight glutenin subunit genes in bread wheat (Triticum aestivum L.).

    Science.gov (United States)

    Zhang, Xiaofei; Liu, Dongcheng; Jiang, Wei; Guo, Xiaoli; Yang, Wenlong; Sun, Jiazhu; Ling, Hongqing; Zhang, Aimin

    2011-12-01

    Low-molecular-weight glutenin subunits (LMW-GSs) are encoded by a multi-gene family and are essential for determining the quality of wheat flour products, such as bread and noodles. However, the exact role or contribution of individual LMW-GS genes to wheat quality remains unclear. This is, at least in part, due to the difficulty in characterizing complete sequences of all LMW-GS gene family members in bread wheat. To identify full-length LMW-GS genes, a polymerase chain reaction (PCR)-based method was established, consisting of newly designed conserved primers and the previously developed LMW-GS gene molecular marker system. Using the PCR-based method, 17 LMW-GS genes were identified and characterized in Xiaoyan 54, of which 12 contained full-length sequences. Sequence alignments showed that 13 LMW-GS genes were identical to those found in Xiaoyan 54 using the genomic DNA library screening, and the other four full-length LMW-GS genes were first isolated from Xiaoyan 54. In Chinese Spring, 16 unique LMW-GS genes were isolated, and 13 of them contained full-length coding sequences. Additionally, 16 and 17 LMW-GS genes in Dongnong 101 and Lvhan 328 (chosen from the micro-core collections of Chinese germplasm), respectively, were also identified. Sequence alignments revealed that at least 15 LMW-GS genes were common in the four wheat varieties, and allelic variants of each gene shared high sequence identities (>95%) but exhibited length polymorphism in repetitive regions. This study provides a PCR-based method for efficiently identifying LMW-GS genes in bread wheat, which will improve the characterization of complex members of the LMW-GS gene family and facilitate the understanding of their contributions to wheat quality.

  17. Isolation and characterization of full-length cDNA clones coding for cholinesterase from fetal human tissues

    International Nuclear Information System (INIS)

    Prody, C.A.; Zevin-Sonkin, D.; Gnatt, A.; Goldberg, O.; Soreq, H.

    1987-01-01

    To study the primary structure and regulation of human cholinesterases, oligodeoxynucleotide probes were prepared according to a consensus peptide sequence present in the active site of both human serum pseudocholinesterase and Torpedo electric organ true acetylcholinesterase. Using these probes, the authors isolated several cDNA clones from λgt10 libraries of fetal brain and liver origins. These include 2.4-kilobase cDNA clones that code for a polypeptide containing a putative signal peptide and the N-terminal, active site, and C-terminal peptides of human BtChoEase, suggesting that they code either for BtChoEase itself or for a very similar but distinct fetal form of cholinesterase. In RNA blots of poly(A) + RNA from the cholinesterase-producing fetal brain and liver, these cDNAs hybridized with a single 2.5-kilobase band. Blot hybridization to human genomic DNA revealed that these fetal BtChoEase cDNA clones hybridize with DNA fragments of the total length of 17.5 kilobases, and signal intensities indicated that these sequences are not present in many copies. Both the cDNA-encoded protein and its nucleotide sequence display striking homology to parallel sequences published for Torpedo AcChoEase. These finding demonstrate extensive homologies between the fetal BtChoEase encoded by these clones and other cholinesterases of various forms and species

  18. A measurement of disorder in binary sequences

    Science.gov (United States)

    Gong, Longyan; Wang, Haihong; Cheng, Weiwen; Zhao, Shengmei

    2015-03-01

    We propose a complex quantity, AL, to characterize the degree of disorder of L-length binary symbolic sequences. As examples, we respectively apply it to typical random and deterministic sequences. One kind of random sequences is generated from a periodic binary sequence and the other is generated from the logistic map. The deterministic sequences are the Fibonacci and Thue-Morse sequences. In these analyzed sequences, we find that the modulus of AL, denoted by |AL | , is a (statistically) equivalent quantity to the Boltzmann entropy, the metric entropy, the conditional block entropy and/or other quantities, so it is a useful quantitative measure of disorder. It can be as a fruitful index to discern which sequence is more disordered. Moreover, there is one and only one value of |AL | for the overall disorder characteristics. It needs extremely low computational costs. It can be easily experimentally realized. From all these mentioned, we believe that the proposed measure of disorder is a valuable complement to existing ones in symbolic sequences.

  19. Use of designed sequences in protein structure recognition.

    Science.gov (United States)

    Kumar, Gayatri; Mudgal, Richa; Srinivasan, Narayanaswamy; Sandhya, Sankaran

    2018-05-09

    Knowledge of the protein structure is a pre-requisite for improved understanding of molecular function. The gap in the sequence-structure space has increased in the post-genomic era. Grouping related protein sequences into families can aid in narrowing the gap. In the Pfam database, structure description is provided for part or full-length proteins of 7726 families. For the remaining 52% of the families, information on 3-D structure is not yet available. We use the computationally designed sequences that are intermediately related to two protein domain families, which are already known to share the same fold. These strategically designed sequences enable detection of distant relationships and here, we have employed them for the purpose of structure recognition of protein families of yet unknown structure. We first measured the success rate of our approach using a dataset of protein families of known fold and achieved a success rate of 88%. Next, for 1392 families of yet unknown structure, we made structural assignments for part/full length of the proteins. Fold association for 423 domains of unknown function (DUFs) are provided as a step towards functional annotation. The results indicate that knowledge-based filling of gaps in protein sequence space is a lucrative approach for structure recognition. Such sequences assist in traversal through protein sequence space and effectively function as 'linkers', where natural linkers between distant proteins are unavailable. This article was reviewed by Oliviero Carugo, Christine Orengo and Srikrishna Subramanian.

  20. Sequential Optimization of Global Sequence Alignments Relative to Different Cost Functions

    KAUST Repository

    Odat, Enas M.

    2011-05-01

    The purpose of this dissertation is to present a methodology to model global sequence alignment problem as directed acyclic graph which helps to extract all possible optimal alignments. Moreover, a mechanism to sequentially optimize sequence alignment problem relative to different cost functions is suggested. Sequence alignment is mostly important in computational biology. It is used to find evolutionary relationships between biological sequences. There are many algo- rithms that have been developed to solve this problem. The most famous algorithms are Needleman-Wunsch and Smith-Waterman that are based on dynamic program- ming. In dynamic programming, problem is divided into a set of overlapping sub- problems and then the solution of each subproblem is found. Finally, the solutions to these subproblems are combined into a final solution. In this thesis it has been proved that for two sequences of length m and n over a fixed alphabet, the suggested optimization procedure requires O(mn) arithmetic operations per cost function on a single processor machine. The algorithm has been simulated using C#.Net programming language and a number of experiments have been done to verify the proved statements. The results of these experiments show that the number of optimal alignments is reduced after each step of optimization. Furthermore, it has been verified that as the sequence length increased linearly then the number of optimal alignments increased exponentially which also depends on the cost function that is used. Finally, the number of executed operations increases polynomially as the sequence length increase linearly.

  1. Virtually full-length subtype F and F/D recombinant HIV-1 from Africa and South America

    NARCIS (Netherlands)

    Laukkanen, T.; Carr, J. K.; Janssens, W.; Liitsola, K.; Gotte, D.; McCutchan, F. E.; Op de Coul, E.; Cornelissen, M.; Heyndrickx, L.; van der Groen, G.; Salminen, M. O.

    2000-01-01

    For reliable classification of HIV-1 strains appropriate reference sequences are needed. The HIV-1 genetic subtype F has a wide geographic spread, causing significant epidemics in South America, Africa, and some regions of Europe. Previously only two full-length sequences of each of the HIV-1

  2. Full-length genomic sequence of hepatitis B virus genotype C2 isolated from a native Brazilian patient

    Directory of Open Access Journals (Sweden)

    Mónica Viviana Alvarado-Mora

    2011-06-01

    Full Text Available The hepatitis B virus (HBV is among the leading causes of chronic hepatitis, cirrhosis and hepatocellular carcinoma. In Brazil, genotype A is the most frequent, followed by genotypes D and F. Genotypes B and C are found in Brazil exclusively among Asian patients and their descendants. The aim of this study was to sequence the entire HBV genome of a Caucasian patient infected with HBV/C2 and to infer the origin of the virus based on sequencing analysis. The sequence of this Brazilian isolate was grouped with four other sequences described in China. The sequence of this patient is the first complete genome of HBV/C2 reported in Brazil.

  3. Simultaneous identification of long similar substrings in large sets of sequences

    Directory of Open Access Journals (Sweden)

    Wittig Burghardt

    2007-05-01

    Full Text Available Abstract Background Sequence comparison faces new challenges today, with many complete genomes and large libraries of transcripts known. Gene annotation pipelines match these sequences in order to identify genes and their alternative splice forms. However, the software currently available cannot simultaneously compare sets of sequences as large as necessary especially if errors must be considered. Results We therefore present a new algorithm for the identification of almost perfectly matching substrings in very large sets of sequences. Its implementation, called ClustDB, is considerably faster and can handle 16 times more data than VMATCH, the most memory efficient exact program known today. ClustDB simultaneously generates large sets of exactly matching substrings of a given minimum length as seeds for a novel method of match extension with errors. It generates alignments of maximum length with a considered maximum number of errors within each overlapping window of a given size. Such alignments are not optimal in the usual sense but faster to calculate and often more appropriate than traditional alignments for genomic sequence comparisons, EST and full-length cDNA matching, and genomic sequence assembly. The method is used to check the overlaps and to reveal possible assembly errors for 1377 Medicago truncatula BAC-size sequences published at http://www.medicago.org/genome/assembly_table.php?chr=1. Conclusion The program ClustDB proves that window alignment is an efficient way to find long sequence sections of homogenous alignment quality, as expected in case of random errors, and to detect systematic errors resulting from sequence contaminations. Such inserts are systematically overlooked in long alignments controlled by only tuning penalties for mismatches and gaps. ClustDB is freely available for academic use.

  4. Directed PCR-free engineering of highly repetitive DNA sequences

    Directory of Open Access Journals (Sweden)

    Preissler Steffen

    2011-09-01

    Full Text Available Abstract Background Highly repetitive nucleotide sequences are commonly found in nature e.g. in telomeres, microsatellite DNA, polyadenine (poly(A tails of eukaryotic messenger RNA as well as in several inherited human disorders linked to trinucleotide repeat expansions in the genome. Therefore, studying repetitive sequences is of biological, biotechnological and medical relevance. However, cloning of such repetitive DNA sequences is challenging because specific PCR-based amplification is hampered by the lack of unique primer binding sites resulting in unspecific products. Results For the PCR-free generation of repetitive DNA sequences we used antiparallel oligonucleotides flanked by restriction sites of Type IIS endonucleases. The arrangement of recognition sites allowed for stepwise and seamless elongation of repetitive sequences. This facilitated the assembly of repetitive DNA segments and open reading frames encoding polypeptides with periodic amino acid sequences of any desired length. By this strategy we cloned a series of polyglutamine encoding sequences as well as highly repetitive polyadenine tracts. Such repetitive sequences can be used for diverse biotechnological applications. As an example, the polyglutamine sequences were expressed as His6-SUMO fusion proteins in Escherichia coli cells to study their aggregation behavior in vitro. The His6-SUMO moiety enabled affinity purification of the polyglutamine proteins, increased their solubility, and allowed controlled induction of the aggregation process. We successfully purified the fusions proteins and provide an example for their applicability in filter retardation assays. Conclusion Our seamless cloning strategy is PCR-free and allows the directed and efficient generation of highly repetitive DNA sequences of defined lengths by simple standard cloning procedures.

  5. Sequential Optimization of Global Sequence Alignments Relative to Different Cost Functions

    KAUST Repository

    Odat, Enas M.

    2011-01-01

    The algorithm has been simulated using C#.Net programming language and a number of experiments have been done to verify the proved statements. The results of these experiments show that the number of optimal alignments is reduced after each step of optimization. Furthermore, it has been verified that as the sequence length increased linearly then the number of optimal alignments increased exponentially which also depends on the cost function that is used. Finally, the number of executed operations increases polynomially as the sequence length increase linearly.

  6. Rapid development of microsatellite markers for Callosobruchus chinensis using Illumina paired-end sequencing.

    Directory of Open Access Journals (Sweden)

    Can-Xing Duan

    Full Text Available BACKGROUND: The adzuki bean weevil, Callosobruchus chinensis L., is one of the most destructive pests of stored legume seeds such as mungbean, cowpea, and adzuki bean, which usually cause considerable loss in the quantity and quality of stored seeds during transportation and storage. However, a lack of genetic information of this pest results in a series of genetic questions remain largely unknown, including population genetic structure, kinship, biotype abundance, and so on. Co-dominant microsatellite markers offer a great resolving power to determine these events. Here, we report rapid microsatellite isolation from C. chinensis via high-throughput sequencing. PRINCIPAL FINDINGS: In this study, 94,560,852 quality-filtered and trimmed reads were obtained for the assembly of genome using Illumina paired-end sequencing technology. In total, the genome with total length of 497,124,785 bp, comprising 403,113 high quality contigs was generated with de novo assembly. More than 6800 SSR loci were detected and a suit of 6303 primer pair sequences were designed and 500 of them were randomly selected for validation. Of these, 196 pair of primers, i.e. 39.2%, produced reproducible amplicons that were polymorphic among 8 C. chinensis genotypes collected from different geographical regions. Twenty out of 196 polymorphic SSR markers were used to analyze the genetic diversity of 18 C. chinensis populations. The results showed the twenty SSR loci were highly polymorphic among these populations. CONCLUSIONS: This study presents a first report of genome sequencing and de novo assembly for C. chinensis and demonstrates the feasibility of generating a large scale of sequence information and SSR loci isolation by Illumina paired-end sequencing. Our results provide a valuable resource for C. chinensis research. These novel markers are valuable for future genetic mapping, trait association, genetic structure and kinship among C. chinensis.

  7. Length-weight relationship and condition factor of clarias gariepinus ...

    African Journals Online (AJOL)

    Length-Weight relationship and condition factor of Clarias gariepinus and Tilapia Zillii were studiedin lake Alau and Monguno hatchery, both in Borno State of Nigeria, for a period of two weeks. A total of 98 C. gariepinus and 140. T. zillii were measured. The length-weight regression coefficient (b) for both fishes in lake Alau ...

  8. Complete mitochondrial genome sequence of the Barbour's seahorse Hippocampus barbouri Jordan & Richardson, 1908 (Gasterosteiformes: Syngnathidae).

    Science.gov (United States)

    Wang, Bo; Zhang, Yanhong; Zhang, Huixian; Lin, Qiang

    2015-01-01

    The complete mitochondrial genome sequence of the Barbour's seahorse Hippocampus barbouri was first determined in this paper. The total length of H. barbouri mitogenome is 16,526 bp, which consists of 13 protein-coding genes, 22 tRNA and 2 rRNA genes and 1 control region. The features of the H. barbouri mitochondrial genome were similar to the typical vertebrates. The overall base composition of H. barbouri is 32.68% A, 29.75% T, 22.91% C and 14.66% G, with an AT content of 62.43%.

  9. Neutron scattering lengths of 3He

    International Nuclear Information System (INIS)

    Alfimenkov, V.P.; Akopian, G.G.; Wierzbicki, J.; Govorov, A.M.; Pikelner, L.B.; Sharapov, E.I.

    1976-01-01

    The total neutron scattering cross-section of 3 He has been measured in the neutron energy range from 20 meV to 2 eV. Together with the known value of coherent scattering amplitude it leads to the two sts of n 3 He scattering lengths

  10. Correcting length-frequency distributions for imperfect detection

    Science.gov (United States)

    Breton, André R.; Hawkins, John A.; Winkelman, Dana L.

    2013-01-01

    Sampling gear selects for specific sizes of fish, which may bias length-frequency distributions that are commonly used to assess population size structure, recruitment patterns, growth, and survival. To properly correct for sampling biases caused by gear and other sources, length-frequency distributions need to be corrected for imperfect detection. We describe a method for adjusting length-frequency distributions when capture and recapture probabilities are a function of fish length, temporal variation, and capture history. The method is applied to a study involving the removal of Smallmouth Bass Micropterus dolomieu by boat electrofishing from a 38.6-km reach on the Yampa River, Colorado. Smallmouth Bass longer than 100 mm were marked and released alive from 2005 to 2010 on one or more electrofishing passes and removed on all other passes from the population. Using the Huggins mark–recapture model, we detected a significant effect of fish total length, previous capture history (behavior), year, pass, year×behavior, and year×pass on capture and recapture probabilities. We demonstrate how to partition the Huggins estimate of abundance into length frequencies to correct for these effects. Uncorrected length frequencies of fish removed from Little Yampa Canyon were negatively biased in every year by as much as 88% relative to mark–recapture estimates for the smallest length-class in our analysis (100–110 mm). Bias declined but remained high even for adult length-classes (≥200 mm). The pattern of bias across length-classes was variable across years. The percentage of unadjusted counts that were below the lower 95% confidence interval from our adjusted length-frequency estimates were 95, 89, 84, 78, 81, and 92% from 2005 to 2010, respectively. Length-frequency distributions are widely used in fisheries science and management. Our simple method for correcting length-frequency estimates for imperfect detection could be widely applied when mark–recapture data

  11. The Length-Weight, Length-Length Relationship and Condition Factor of Angora Loach, Oxynoemacheilus angorae (Steindachner, 1897 Inhabiting Kılıçözü Stream in Kızılırmak River Basin (Central Anatolia-Turkey

    Directory of Open Access Journals (Sweden)

    Okan Yazıcıoğlu

    2016-12-01

    Full Text Available In this study, length-weight relationship (LWR, length- length relationship (LLR and condition factor (K of Angora loach, Oxynoemacheilus angorae were determined. A total of 103 specimens were sampled from Kılıçözü Stream in 2014. The length and weight of specimens were ranged 3.5-9.8 cm and 0.38-6.58 g, respectively. Length-weight relationships for female, male and all samples were found as W= 0.01056.TL2.896 (r²= 0.923, W= 0.00963.TL2.940 (r²= 0.978 and W= 0.00987.TL2.929 (r²= 0.963, respectively. LWRs indicated an isometric growth in female, male and all samples. The values of Fulton’s condition factor (K ranged from 0.699 to 1.246 for females and from 0.654 to 1.072 for males. All length-length relationships were statistically significant.

  12. OPTSDNA: Performance evaluation of an efficient distributed bioinformatics system for DNA sequence analysis.

    Science.gov (United States)

    Khan, Mohammad Ibrahim; Sheel, Chotan

    2013-01-01

    Storage of sequence data is a big concern as the amount of data generated is exponential in nature at several locations. Therefore, there is a need to develop techniques to store data using compression algorithm. Here we describe optimal storage algorithm (OPTSDNA) for storing large amount of DNA sequences of varying length. This paper provides performance analysis of optimal storage algorithm (OPTSDNA) of a distributed bioinformatics computing system for analysis of DNA sequences. OPTSDNA algorithm is used for storing various sizes of DNA sequences into database. DNA sequences of different lengths were stored by using this algorithm. These input DNA sequences are varied in size from very small to very large. Storage size is calculated by this algorithm. Response time is also calculated in this work. The efficiency and performance of the algorithm is high (in size calculation with percentage) when compared with other known with sequential approach.

  13. Complete plastid genome sequence of goosegrass (Eleusine indica) and comparison with other Poaceae.

    Science.gov (United States)

    Zhang, Hui; Hall, Nathan; McElroy, J Scott; Lowe, Elijah K; Goertzen, Leslie R

    2017-02-05

    Eleusine indica, also known as goosegrass, is a serious weed in at least 42 countries. In this paper we report the complete plastid genome sequence of goosegrass obtained by de novo assembly of paired-end and mate-paired reads generated by Illumina sequencing of total genomic DNA. The goosegrass plastome is a circular molecule of 135,151bp in length, consisting of two single-copy regions separated by a pair of inverted repeats (IRs) of 20,919 bases. The large (LSC) and the small (SSC) single-copy regions span 80,667 bases and 12,646 bases, respectively. The plastome of goosegrass has 38.19% GC content and includes 108 unique genes, of which 76 are protein-coding, 28 are transfer RNA, and 4 are ribosomal RNA. The goosegrass plastome sequence was compared to eight other species of Poaceae. Although generally conserved with respect to Poaceae, this genomic resource will be useful for evolutionary studies within this weed species and the genus Eleusine. Copyright © 2016. Published by Elsevier B.V.

  14. UltraPse: A Universal and Extensible Software Platform for Representing Biological Sequences.

    Science.gov (United States)

    Du, Pu-Feng; Zhao, Wei; Miao, Yang-Yang; Wei, Le-Yi; Wang, Likun

    2017-11-14

    With the avalanche of biological sequences in public databases, one of the most challenging problems in computational biology is to predict their biological functions and cellular attributes. Most of the existing prediction algorithms can only handle fixed-length numerical vectors. Therefore, it is important to be able to represent biological sequences with various lengths using fixed-length numerical vectors. Although several algorithms, as well as software implementations, have been developed to address this problem, these existing programs can only provide a fixed number of representation modes. Every time a new sequence representation mode is developed, a new program will be needed. In this paper, we propose the UltraPse as a universal software platform for this problem. The function of the UltraPse is not only to generate various existing sequence representation modes, but also to simplify all future programming works in developing novel representation modes. The extensibility of UltraPse is particularly enhanced. It allows the users to define their own representation mode, their own physicochemical properties, or even their own types of biological sequences. Moreover, UltraPse is also the fastest software of its kind. The source code package, as well as the executables for both Linux and Windows platforms, can be downloaded from the GitHub repository.

  15. Fundamental length and relativistic length

    International Nuclear Information System (INIS)

    Strel'tsov, V.N.

    1988-01-01

    It si noted that the introduction of fundamental length contradicts the conventional representations concerning the contraction of the longitudinal size of fast-moving objects. The use of the concept of relativistic length and the following ''elongation formula'' permits one to solve this problem

  16. Toward a Better Compression for DNA Sequences Using Huffman Encoding.

    Science.gov (United States)

    Al-Okaily, Anas; Almarri, Badar; Al Yami, Sultan; Huang, Chun-Hsi

    2017-04-01

    Due to the significant amount of DNA data that are being generated by next-generation sequencing machines for genomes of lengths ranging from megabases to gigabases, there is an increasing need to compress such data to a less space and a faster transmission. Different implementations of Huffman encoding incorporating the characteristics of DNA sequences prove to better compress DNA data. These implementations center on the concepts of selecting frequent repeats so as to force a skewed Huffman tree, as well as the construction of multiple Huffman trees when encoding. The implementations demonstrate improvements on the compression ratios for five genomes with lengths ranging from 5 to 50 Mbp, compared with the standard Huffman tree algorithm. The research hence suggests an improvement on all such DNA sequence compression algorithms that use the conventional Huffman encoding. The research suggests an improvement on all DNA sequence compression algorithms that use the conventional Huffman encoding. Accompanying software is publicly available (AL-Okaily, 2016 ).

  17. Quantitative relationship between nanotube length and anodizing current during constant current anodization

    International Nuclear Information System (INIS)

    Zhang, Yulian; Cheng, Weijie; Du, Fei; Zhang, Shaoyu; Ma, Weihua; Li, Dongdong; Song, Ye; Zhu, Xufei

    2015-01-01

    Highlights: • Ti anodization was performed by constant current rather than constant voltage. • The nanotube length was controlled by ionic current rather than dissolution current. • Electronic current can be estimated by the nanotube length and the anodizing current. • Dissolution reaction hardly contributes electric current across the barrier layer. - Abstract: The growth kinetics of anodic TiO 2 nanotubes (ATNTs) still remains unclear. ATNTs are generally fabricated under potentiostatic conditions rather than galvanostatic ones. The quantitative relationship between nanotube length and anodizing current (J total ) is difficult to determine, because the variable J total includes ionic current (J ion ) (also called oxide growth current J grow =J ion ) and electronic current (J e ), which cannot be separated from each other. One successful approach to achieve this objective is to use constant current anodization rather than constant voltage anodization, that is, through quantitative comparison between the nanotube length and the known J total during constant current anodization, we can estimate the relative magnitudes of J grow and J e . The nanotubes with lengths of 1.24, 2.23, 3.51 and 4.70 μm, were formed under constant currents (J total ) of 15, 20, 25 and 30 mA, respectively. The relationship between nanotube length (y) and anodizing current (x =J total =J grow +J e ) can be expressed by a fitting equation: y=0.23(x-10.13), from which J grow (J grow = x -10.13) and J e (∼10.13 mA) could be inferred under the present conditions. Meanwhile, the same conclusion could also be deduced from the oxide volume data. These results indicate that the nanotube growth is attributed to the oxide growth current rather than the dissolution current.

  18. Neutron-triton scattering lengths for interactions reproducing low-energy trinucleon data

    International Nuclear Information System (INIS)

    Levashev, V.P.

    1981-01-01

    By solving the integral equations for four nucleons the neutron-triton scattering lengths and total cross section are calculated using different S-wave rank-one separable potentials. A number of linear correlations between the neutron-triton scattering lengths and triton binding energy are found. The scattering lengths consistent with low-energy trinucleon data. The results obtained are compared with available experimental data [ru

  19. Complete chloroplast DNA sequence from a Korean endemic genus, Megaleranthis saniculifolia, and its evolutionary implications.

    Science.gov (United States)

    Kim, Young-Kyu; Park, Chong-wook; Kim, Ki-Joong

    2009-03-31

    The chloroplast DNA sequences of Megaleranthis saniculifolia, an endemic and monotypic endangered plant species, were completed in this study (GenBank FJ597983). The genome is 159,924 bp in length. It harbors a pair of IR regions consisting of 26,608 bp each. The lengths of the LSC and SSC regions are 88,326 bp and 18,382 bp, respectively. The structural organizations, gene and intron contents, gene orders, AT contents, codon usages, and transcription units of the Megaleranthis chloroplast genome are similar to those of typical land plant cp DNAs. However, the detailed features of Megaleranthis chloroplast genomes are substantially different from that of Ranunculus, which belongs to the same family, the Ranunculaceae. First, the Megaleranthis cp DNA was 4,797 bp longer than that of Ranunculus due to an expanded IR region into the SSC region and duplicated sequence elements in several spacer regions of the Megaleranthis cp genome. Second, the chloroplast genomes of Megaleranthis and Ranunculus evidence 5.6% sequence divergence in the coding regions, 8.9% sequence divergence in the intron regions, and 18.7% sequence divergence in the intergenic spacer regions, respectively. In both the coding and noncoding regions, average nucleotide substitution rates differed markedly, depending on the genome position. Our data strongly implicate the positional effects of the evolutionary modes of chloroplast genes. The genes evidencing higher levels of base substitutions also have higher incidences of indel mutations and low Ka/Ks ratios. A total of 54 simple sequence repeat loci were identified from the Megaleranthis cp genome. The existence of rich cp SSR loci in the Megaleranthis cp genome provides a rare opportunity to study the population genetic structures of this endangered species. Our phylogenetic trees based on the two independent markers, the nuclear ITS and chloroplast matK sequences, strongly support the inclusion of the Megaleranthis to the Trollius. Therefore, our

  20. Sequencing and de novo transcriptome assembly of the Chinese giant salamander (Andrias davidianus

    Directory of Open Access Journals (Sweden)

    Yong Huang

    2017-06-01

    Full Text Available Next-generation technologies for determination of genomics and transcriptomics composition have a wide range of applications. Andrias davidianus, has become an endangered amphibian species of salamander endemic in China. However, there is a lack of the molecular information. In this study, we obtained the RNA-Seq data from a pool of A. davidianus tissue including spleen, liver, muscle, kidney, skin, testis, gut and heart using Illumina HiSeq 2500 platform. A total of 15,398,997,600 bp were obtained, corresponding to 102,659,984 raw reads. A total of 102,659,984 reads were filtered after removing low-quality reads and trimming the adapter sequences. The Trinity program was used to de novo assemble 132,912 unigenes with an average length of 690 bp and N50 of 1263 bp. Unigenes were annotated through number of databases. These transcriptomic data of A. davidianus should open the door to molecular evolution studies based on the entire transcriptome or targeted genes of interest to sequence. The raw data in this study can be available in NCBI SRA database with accession number of SRP099564.

  1. Accuracy of working length determination with root ZX apex locator ...

    African Journals Online (AJOL)

    The purpose of this study was to clinically compare working length (WL) determination with root ZX apex locator and radiography, and then compare them with direct visualization method ex vivo. A total of 75 maxillary central and lateral incisors were selected. Working length determination was carried out using radiographic ...

  2. Optimization of sequence alignment for simple sequence repeat regions

    Directory of Open Access Journals (Sweden)

    Ogbonnaya Francis C

    2011-07-01

    Full Text Available Abstract Background Microsatellites, or simple sequence repeats (SSRs, are tandemly repeated DNA sequences, including tandem copies of specific sequences no longer than six bases, that are distributed in the genome. SSR has been used as a molecular marker because it is easy to detect and is used in a range of applications, including genetic diversity, genome mapping, and marker assisted selection. It is also very mutable because of slipping in the DNA polymerase during DNA replication. This unique mutation increases the insertion/deletion (INDELs mutation frequency to a high ratio - more than other types of molecular markers such as single nucleotide polymorphism (SNPs. SNPs are more frequent than INDELs. Therefore, all designed algorithms for sequence alignment fit the vast majority of the genomic sequence without considering microsatellite regions, as unique sequences that require special consideration. The old algorithm is limited in its application because there are many overlaps between different repeat units which result in false evolutionary relationships. Findings To overcome the limitation of the aligning algorithm when dealing with SSR loci, a new algorithm was developed using PERL script with a Tk graphical interface. This program is based on aligning sequences after determining the repeated units first, and the last SSR nucleotides positions. This results in a shifting process according to the inserted repeated unit type. When studying the phylogenic relations before and after applying the new algorithm, many differences in the trees were obtained by increasing the SSR length and complexity. However, less distance between different linage had been observed after applying the new algorithm. Conclusions The new algorithm produces better estimates for aligning SSR loci because it reflects more reliable evolutionary relations between different linages. It reduces overlapping during SSR alignment, which results in a more realistic

  3. DFAs and PFAs with long shortest synchronizing word length

    NARCIS (Netherlands)

    De Bondt, M.; Don, H.; Zantema, H.; Charlier, E.; Leroy, J.; Rigo, M.

    2017-01-01

    It was conjectured by Černý in 1964, that a synchronizing DFA on n states always has a synchronizing word of length at most (n-1)2 and he gave a sequence of DFAs for which this bound is reached. Until now a full analysis of all DFAs reaching this bound was only given for n ≤ 4 and with bounds on the

  4. Complete mitochondrial genome sequence of the longsnout seahorse Hippocampus reidi (Ginsburg, 1933; Gasterosteiformes: Syngnathidae).

    Science.gov (United States)

    Wang, Xin; Zhang, Yanhong; Zhang, Huixian; Meng, Tan; Lin, Qiang

    2016-01-01

    The complete mitochondrial genome sequence of the longsnout seahorse Hippocampus reidi was fisrt determined in this article. The total length of H. reidi mitogenome is 16,529 bp and consists of 13 protein-coding genes, 2 rRNA genes, 22 tRNA genes and 1 control region. The gene order and composition of H. reidi were similar to those of most other vertebrates. The overall base composition of H. reidi is 32.47% A, 29.41% T, 14.75% G and 23.37% C, with a slight A + T rich feature (61.88%).

  5. Genome wide SNP discovery in flax through next generation sequencing of reduced representation libraries

    Directory of Open Access Journals (Sweden)

    Kumar Santosh

    2012-12-01

    Full Text Available Abstract Background Flax (Linum usitatissimum L. is a significant fibre and oilseed crop. Current flax molecular markers, including isozymes, RAPDs, AFLPs and SSRs are of limited use in the construction of high density linkage maps and for association mapping applications due to factors such as low reproducibility, intense labour requirements and/or limited numbers. We report here on the use of a reduced representation library strategy combined with next generation Illumina sequencing for rapid and large scale discovery of SNPs in eight flax genotypes. SNP discovery was performed through in silico analysis of the sequencing data against the whole genome shotgun sequence assembly of flax genotype CDC Bethune. Genotyping-by-sequencing of an F6-derived recombinant inbred line population provided validation of the SNPs. Results Reduced representation libraries of eight flax genotypes were sequenced on the Illumina sequencing platform resulting in sequence coverage ranging from 4.33 to 15.64X (genome equivalents. Depending on the relatedness of the genotypes and the number and length of the reads, between 78% and 93% of the reads mapped onto the CDC Bethune whole genome shotgun sequence assembly. A total of 55,465 SNPs were discovered with the largest number of SNPs belonging to the genotypes with the highest mapping coverage percentage. Approximately 84% of the SNPs discovered were identified in a single genotype, 13% were shared between any two genotypes and the remaining 3% in three or more. Nearly a quarter of the SNPs were found in genic regions. A total of 4,706 out of 4,863 SNPs discovered in Macbeth were validated using genotyping-by-sequencing of 96 F6 individuals from a recombinant inbred line population derived from a cross between CDC Bethune and Macbeth, corresponding to a validation rate of 96.8%. Conclusions Next generation sequencing of reduced representation libraries was successfully implemented for genome-wide SNP discovery from

  6. Relationship between leukocyte telomere length and personality traits in healthy subjects.

    Science.gov (United States)

    Sadahiro, R; Suzuki, A; Enokido, M; Matsumoto, Y; Shibuya, N; Kamata, M; Goto, K; Otani, K

    2015-02-01

    It has been shown that certain personality traits are related to mortality and disease morbidity, but the biological mechanism linking them remains unclear. Telomeres are tandem repeat DNA sequences located at the ends of chromosomes, and shorter telomere length is a predictor of mortality and late-life disease morbidity. Thus, it is possible that personality traits influence telomere length. In the present study, we examined the relationship of leukocyte telomere length with personality traits in healthy subjects. The subjects were 209 unrelated healthy Japanese who were recruited from medical students at 4th-5th grade. Assessment of personality traits was performed by the Revised NEO Personality Inventory (NEO-PI-R) and the Temperament and Character Inventory (TCI). Leukocyte relative telomere length was determined by a quantitative real-time PCR method for a ratio of telomere/single copy gene. In the stepwise multiple regression analysis, shorter telomere length was related to lower scores of neuroticism (Ppersonality traits, and this association may be implicated in the relationship between personality traits and mortality. Copyright © 2014 Elsevier Masson SAS. All rights reserved.

  7. Comparative Genomics in Switchgrass Using 61,585 High-Quality Expressed Sequence Tags

    Directory of Open Access Journals (Sweden)

    Christian M. Tobias

    2008-11-01

    Full Text Available The development of genomic resources for switchgrass ( L., a perennial NAD-malic enzyme type C grass, is required to enable molecular breeding and biotechnological approaches for improving its value as a forage and bioenergy crop. Expressed sequence tag (EST sequencing is one method that can quickly sample gene inventories and produce data suitable for marker development or analysis of tissue-specific patterns of expression. Toward this goal, three cDNA libraries from callus, crown, and seedling tissues of ‘Kanlow’ switchgrass were end-sequenced to generate a total of 61,585 high-quality ESTs from 36,565 separate clones. Seventy-three percent of the assembled consensus sequences could be aligned with the sorghum [ (L. Moench] genome at a -value of <1 × 10, indicating a high degree of similarity. Sixty-five percent of the ESTs matched with gene ontology molecular terms, and 3.3% of the sequences were matched with genes that play potential roles in cell-wall biogenesis. The representation in the three libraries of gene families known to be associated with C photosynthesis, cellulose and β-glucan synthesis, phenylpropanoid biosynthesis, and peroxidase activity indicated likely roles for individual family members. Pairwise comparisons of synonymous codon substitutions were used to assess genome sequence diversity and indicated an overall similarity between the two genome copies present in the tetraploid. Identification of EST–simple sequence repeat markers and amplification on two individual parents of a mapping population yielded an average of 2.18 amplicons per individual, and 35% of the markers produced fragment length polymorphisms.

  8. Genomic DNA sequence and cytosine methylation changes of adult rice leaves after seeds space flight

    Science.gov (United States)

    Shi, Jinming

    In this study, cytosine methylation on CCGG site and genomic DNA sequence changes of adult leaves of rice after seeds space flight were detected by methylation-sensitive amplification polymorphism (MSAP) and Amplified fragment length polymorphism (AFLP) technique respectively. Rice seeds were planted in the trial field after 4 days space flight on the shenzhou-6 Spaceship of China. Adult leaves of space-treated rice including 8 plants chosen randomly and 2 plants with phenotypic mutation were used for AFLP and MSAP analysis. Polymorphism of both DNA sequence and cytosine methylation were detected. For MSAP analysis, the average polymorphic frequency of the on-ground controls, space-treated plants and mutants are 1.3%, 3.1% and 11% respectively. For AFLP analysis, the average polymorphic frequencies are 1.4%, 2.9%and 8%respectively. Total 27 and 22 polymorphic fragments were cloned sequenced from MSAP and AFLP analysis respectively. Nine of the 27 fragments from MSAP analysis show homology to coding sequence. For the 22 polymorphic fragments from AFLP analysis, no one shows homology to mRNA sequence and eight fragments show homology to repeat region or retrotransposon sequence. These results suggest that although both genomic DNA sequence and cytosine methylation status can be effected by space flight, the genomic region homology to the fragments from genome DNA and cytosine methylation analysis were different.

  9. Composite Binary Sequences with a Large Ensemble and Zero Correlation Zone

    Directory of Open Access Journals (Sweden)

    S. S. Yudachev

    2015-01-01

    Full Text Available The article considers a proposed class of derived signals such as composite binary sequences for application in advanced spread spectrum radio systems of various purposes, using signals based on spectrum spreading by direct sequence method. Considered composite sequences, having a representative set of lengths and unique correlation properties, compares favorably with the widely used at present large ensembles formed on a single algorithmic basis. To evaluate the properties of the composite sequences generated on the basis of two components - the Barker code and Kerdock sequences, expressions of periodic and aperiodic correlation functions are given.An algorithm for generating practical ensembles of composite sequences is presented. On the basis of the algorithm and its software implementation in C #, the samples of the sequence ensembles of various lengths were obtained and their periodic and aperiodic correlation functions and statistical characteristics were studied in detail. As an illustration, some of the most typical correlation functions are presented. The most remarkable characteristics allowing a ssessing the feasibility of using this type of sequences in the design of specific types of radio systems are considered.On the basis of the proposed program and the performed calculations the conclusions can be drawn about the possibility of using the sequences of these classes, with the aim of reducing intra-system disturbance in the projected spread spectrum CDMA.

  10. Growth in Total Height and Its Components and Cardiometabolic Health in Childhood

    DEFF Research Database (Denmark)

    Haugaard, Line Klingen; Baker, Jennifer Lyn; Perng, Wei

    2016-01-01

    BACKGROUND: Short stature or short legs is associated with cardiometabolic disease. Few studies have addressed this issue in children, incorporated repeated measures, or studied modern cohorts. METHODS: We examined if change in total height, leg length and trunk length between two time points from...... was a cardiometabolic risk score based on sex-specific internal z-scores for systolic blood pressure, waist circumference, homeostatic model assessment of insulin resistance, triglycerides and high-density lipoprotein-cholesterol. RESULTS: Mean (SD) total height was 97.9 (4.5) cm in boys and 97.1 (4.7) cm in girls...... in early childhood and 129.1 (7.2) cm in boys and 128.3 (7.9) cm in girls in mid-childhood. Trunk length constituted about half of total height. In linear regression models adjusted for parental anthropometry and socio-demographics, faster growth in total height, leg length and particularly trunk length...

  11. Detection and Resolution of Cryptosporidium Species and Species Mixtures by Genus-Specific Nested PCR-Restriction Fragment Length Polymorphism Analysis, Direct Sequencing, and Cloning ▿

    Science.gov (United States)

    Ruecker, Norma J.; Hoffman, Rebecca M.; Chalmers, Rachel M.; Neumann, Norman F.

    2011-01-01

    Molecular methods incorporating nested PCR-restriction fragment length polymorphism (RFLP) analysis of the 18S rRNA gene of Cryptosporidium species were validated to assess performance based on limit of detection (LoD) and for detecting and resolving mixtures of species and genotypes within a single sample. The 95% LoD was determined for seven species (Cryptosporidium hominis, C. parvum, C. felis, C. meleagridis, C. ubiquitum, C. muris, and C. andersoni) and ranged from 7 to 11 plasmid template copies with overlapping 95% confidence limits. The LoD values for genomic DNA from oocysts on microscope slides were 7 and 10 template copies for C. andersoni and C. parvum, respectively. The repetitive nested PCR-RFLP slide protocol had an LoD of 4 oocysts per slide. When templates of two species were mixed in equal ratios in the nested PCR-RFLP reaction mixture, there was no amplification bias toward one species over another. At high ratios of template mixtures (>1:10), there was a reduction or loss of detection of the less abundant species by RFLP analysis, most likely due to heteroduplex formation in the later cycles of the PCR. Replicate nested PCR was successful at resolving many mixtures of Cryptosporidium at template concentrations near or below the LoD. The cloning of nested PCR products resulted in 17% of the cloned sequences being recombinants of the two original templates. Limiting-dilution nested PCR followed by the sequencing of PCR products resulted in no sequence anomalies, suggesting that this method is an effective and accurate way to study the species diversity of Cryptosporidium, particularly for environmental water samples, in which mixtures of parasites are common. PMID:21498746

  12. Homogeneity of the 16S rDNA sequence among geographically disparate isolates of Taylorella equigenitalis

    Directory of Open Access Journals (Sweden)

    Moore JE

    2006-01-01

    Full Text Available Abstract Background At present, six accessible sequences of 16S rDNA from Taylorella equigenitalis (T. equigenitalis are available, whose sequence differences occur at a few nucleotide positions. Thus it is important to determine these sequences from additional strains in other countries, if possible, in order to clarify any anomalies regarding 16S rDNA sequence heterogeneity. Here, we clone and sequence the approximate full-length 16S rDNA from additional strains of T. equigenitalis isolated in Japan, Australia and France and compare these sequences to the existing published sequences. Results Clarification of any anomalies regarding 16S rDNA sequence heterogeneity of T. equigenitalis was carried out. When cloning, sequencing and comparison of the approximate full-length 16S rDNA from 17 strains of T. equigenitalis isolated in Japan, Australia and France, nucleotide sequence differences were demonstrated at the six loci in the 1,469 nucleotide sequence. Moreover, 12 polymorphic sites occurred among 23 sequences of the 16S rDNA, including the six reference sequences. Conclusion High sequence similarity (99.5% or more was observed throughout, except from nucleotide positions 138 to 501 where substitutions and deletions were noted.

  13. Homogeneity of the 16S rDNA sequence among geographically disparate isolates of Taylorella equigenitalis

    Science.gov (United States)

    Matsuda, M; Tazumi, A; Kagawa, S; Sekizuka, T; Murayama, O; Moore, JE; Millar, BC

    2006-01-01

    Background At present, six accessible sequences of 16S rDNA from Taylorella equigenitalis (T. equigenitalis) are available, whose sequence differences occur at a few nucleotide positions. Thus it is important to determine these sequences from additional strains in other countries, if possible, in order to clarify any anomalies regarding 16S rDNA sequence heterogeneity. Here, we clone and sequence the approximate full-length 16S rDNA from additional strains of T. equigenitalis isolated in Japan, Australia and France and compare these sequences to the existing published sequences. Results Clarification of any anomalies regarding 16S rDNA sequence heterogeneity of T. equigenitalis was carried out. When cloning, sequencing and comparison of the approximate full-length 16S rDNA from 17 strains of T. equigenitalis isolated in Japan, Australia and France, nucleotide sequence differences were demonstrated at the six loci in the 1,469 nucleotide sequence. Moreover, 12 polymorphic sites occurred among 23 sequences of the 16S rDNA, including the six reference sequences. Conclusion High sequence similarity (99.5% or more) was observed throughout, except from nucleotide positions 138 to 501 where substitutions and deletions were noted. PMID:16398935

  14. A transcriptome atlas of rabbit revealed by PacBio single-molecule long-read sequencing.

    Science.gov (United States)

    Chen, Shi-Yi; Deng, Feilong; Jia, Xianbo; Li, Cao; Lai, Song-Jia

    2017-08-09

    It is widely acknowledged that transcriptional diversity largely contributes to biological regulation in eukaryotes. Since the advent of second-generation sequencing technologies, a large number of RNA sequencing studies have considerably improved our understanding of transcriptome complexity. However, it still remains a huge challenge for obtaining full-length transcripts because of difficulties in the short read-based assembly. In the present study we employ PacBio single-molecule long-read sequencing technology for whole-transcriptome profiling in rabbit (Oryctolagus cuniculus). We totally obtain 36,186 high-confidence transcripts from 14,474 genic loci, among which more than 23% of genic loci and 66% of isoforms have not been annotated yet within the current reference genome. Furthermore, about 17% of transcripts are computationally revealed to be non-coding RNAs. Up to 24,797 alternative splicing (AS) and 11,184 alternative polyadenylation (APA) events are detected within this de novo constructed transcriptome, respectively. The results provide a comprehensive set of reference transcripts and hence contribute to the improved annotation of rabbit genome.

  15. Retrieval and Representation of Nucleotide Sequence of ...

    African Journals Online (AJOL)

    Nigerian Journal of Basic and Applied Science (March, 2013), 21(1): 27-32 ... Full Length R esearch A rticle ... The present study highlights data retrieval and representation. .... the end of information and the start of the sequence on the next ...

  16. The Fibonacci-Padovan sequences in finite groups

    Directory of Open Access Journals (Sweden)

    Sait Tas

    2014-11-01

    Full Text Available The Fibonacci-Padovan sequence modulo m was studied. Also, the Fibonacci-Padovan orbits of -generator finite groups such that was examined. The Fibonacci-Padovan lengths of the groups , and for , where Z is integer, were then obtained.

  17. Bias of purine stretches in sequenced chromosomes

    DEFF Research Database (Denmark)

    Ussery, David; Soumpasis, Dikeos Mario; Brunak, Søren

    2002-01-01

    /pur tracts was slightly less than expected, with an average of 0.8%. One of the most surprising findings is a clear difference in the length distributions of the regions studied between prokaryotes and eukaryotes. Whereas short-range correlations can explain the length distributions in prokaryotes......, in eukaryotes there is an abundance of long stretches of purines or alternating purine/pyrimidine tracts, which cannot be explained in this way; these sequences are likely to play an important role in eukaryotic chromosome organisation....

  18. The influence of NDT-Bobath and PNF methods on the field support and total path length measure foot pressure (COP) in patients after stroke.

    Science.gov (United States)

    Krukowska, Jolanta; Bugajski, Marcin; Sienkiewicz, Monika; Czernicki, Jan

    In stroke patients, the NDT - (Bobath - Neurodevelopmental Treatment) and PNF (Proprioceptive Neuromuscular Facilitation) methods are used to achieve the main objective of rehabilitation, which aims at the restoration of maximum patient independence in the shortest possible period of time (especially the balance of the body). The aim of the study is to evaluate the effect of the NDT-Bobath and PNF methods on the field support and total path length measure foot pressure (COP) in patients after stroke. The study included 72 patients aged from 20 to 69 years after ischemic stroke with Hemiparesis. The patients were divided into 4 groups by a simple randomization. The criteria for this division were: the body side (right or left) affected by paresis and the applied rehabilitation methods. All the patients were applied the recommended kinesitherapeutic method (randomized), 35 therapy sessions, every day for a period of six weeks. Before initiation of therapy and after 6 weeks was measured the total area of the support and path length (COP (Center Of Pressure) measure foot pressure) using stabilometer platform - alpha. The results were statistically analyzed. After treatment studied traits decreased in all groups. The greatest improvement was obtained in groups with NDT-Bobath therapy. NDT-Bobath method for improving the balance of the body is a more effective method of treatment in comparison with of the PNF method. In stroke patients, the effectiveness of NDT-Bobath method does not depend on hand paresis. Copyright © 2016 Polish Neurological Society. Published by Elsevier Urban & Partner Sp. z o.o. All rights reserved.

  19. The complete chloroplast genome sequence of Hibiscus syriacus.

    Science.gov (United States)

    Kwon, Hae-Yun; Kim, Joon-Hyeok; Kim, Sea-Hyun; Park, Ji-Min; Lee, Hyoshin

    2016-09-01

    The complete chloroplast genome sequence of Hibiscus syriacus L. is presented in this study. The genome is composed of 161 019 bp in length, with a typical circular structure containing a pair of inverted repeats of 25 745 bp of length separated by a large single-copy region and a small single-copy region of 89 698 bp and 19 831 bp of length, respectively. The overall GC content is 36.8%. One hundred and fourteen genes were annotated, including 81 protein-coding genes, 4 ribosomal RNA genes and 29 transfer RNA genes.

  20. On algorithmic equivalence of instruction sequences for computing bit string functions

    NARCIS (Netherlands)

    Bergstra, J.A.; Middelburg, C.A.

    2015-01-01

    Every partial function from bit strings of a given length to bit strings of a possibly different given length can be computed by a finite instruction sequence that contains only instructions to set and get the content of Boolean registers, forward jump instructions, and a termination instruction. We

  1. On algorithmic equivalence of instruction sequences for computing bit string functions

    NARCIS (Netherlands)

    Bergstra, J.A.; Middelburg, C.A.

    2014-01-01

    Every partial function from bit strings of a given length to bit strings of a possibly different given length can be computed by a finite instruction sequence that contains only instructions to set and get the content of Boolean registers, forward jump instructions, and a termination instruction. We

  2. Output-Sensitive Pattern Extraction in Sequences

    DEFF Research Database (Denmark)

    Grossi, Roberto; Menconi, Giulia; Pisanti, Nadia

    2014-01-01

    Genomic Analysis, Plagiarism Detection, Data Mining, Intrusion Detection, Spam Fighting and Time Series Analysis are just some examples of applications where extraction of recurring patterns in sequences of objects is one of the main computational challenges. Several notions of patterns exist...... or extend them causes a loss of significant information (where the number of occurrences changes). Output-sensitive algorithms have been proposed to enumerate and list these patterns, taking polynomial time O(nc) per pattern for constant c > 1, which is impractical for massive sequences of very large length...

  3. Near Full-Length Identification of a Novel HIV-1 CRF01_AE/B/C Recombinant in Northern Myanmar.

    Science.gov (United States)

    Zhou, Yan-Heng; Chen, Xin; Liang, Yue-Bo; Pang, Wei; Qin, Wei-Hong; Zhang, Chiyu; Zheng, Yong-Tang

    2015-08-01

    The Myanmar-China border appears to be the "hot spot" region for the occurrence of HIV-1 recombination. The majority of the previous analyses of HIV-1 recombination were based on partial genomic sequences, which obviously cannot reflect the reality of the genetic diversity of HIV-1 in this area well. Here, we present a near full-length characterization of a novel HIV-1 CRF01_AE/B/C recombinant isolated from a long-distance truck driver in Northern Myanmar. It is the first description of a near full-length genomic sequence in Myanmar since 2003, and might be one of the most complicated HIV-1 chimeras ever detected in Myanmar, containing four CRF01_AE, six B segments, and five C segments separated by 14 breakpoints throughout its genome. The discovery and characterization of this new CRF01_AE/B/C recombinant indicate that intersubtype recombination is ongoing in Myanmar, continuously generating new forms of HIV-1. More work based on near full-length sequence analyses is urgently needed to better understand the genetic diversity of HIV-1 in these regions.

  4. Identification of lignin genes and regulatory sequences involved in secondary cell wall formation in Acacia auriculiformis and Acacia mangium via de novo transcriptome sequencing

    Directory of Open Access Journals (Sweden)

    Cannon Charles H

    2011-07-01

    Full Text Available Abstract Background Acacia auriculiformis × Acacia mangium hybrids are commercially important trees for the timber and pulp industry in Southeast Asia. Increasing pulp yield while reducing pulping costs are major objectives of tree breeding programs. The general monolignol biosynthesis and secondary cell wall formation pathways are well-characterized but genes in these pathways are poorly characterized in Acacia hybrids. RNA-seq on short-read platforms is a rapid approach for obtaining comprehensive transcriptomic data and to discover informative sequence variants. Results We sequenced transcriptomes of A. auriculiformis and A. mangium from non-normalized cDNA libraries synthesized from pooled young stem and inner bark tissues using paired-end libraries and a single lane of an Illumina GAII machine. De novo assembly produced a total of 42,217 and 35,759 contigs with an average length of 496 bp and 498 bp for A. auriculiformis and A. mangium respectively. The assemblies of A. auriculiformis and A. mangium had a total length of 21,022,649 bp and 17,838,260 bp, respectively, with the largest contig 15,262 bp long. We detected all ten monolignol biosynthetic genes using Blastx and further analysis revealed 18 lignin isoforms for each species. We also identified five contigs homologous to R2R3-MYB proteins in other plant species that are involved in transcriptional regulation of secondary cell wall formation and lignin deposition. We searched the contigs against public microRNA database and predicted the stem-loop structures of six highly conserved microRNA families (miR319, miR396, miR160, miR172, miR162 and miR168 and one legume-specific family (miR2086. Three microRNA target genes were predicted to be involved in wood formation and flavonoid biosynthesis. By using the assemblies as a reference, we discovered 16,648 and 9,335 high quality putative Single Nucleotide Polymorphisms (SNPs in the transcriptomes of A. auriculiformis and A. mangium

  5. Transcriptome characterization and sequencing-based identification of salt-responsive genes in Millettia pinnata, a semi-mangrove plant.

    Science.gov (United States)

    Huang, Jianzi; Lu, Xiang; Yan, Hao; Chen, Shouyi; Zhang, Wanke; Huang, Rongfeng; Zheng, Yizhi

    2012-04-01

    Semi-mangroves form a group of transitional species between glycophytes and halophytes, and hold unique potential for learning molecular mechanisms underlying plant salt tolerance. Millettia pinnata is a semi-mangrove plant that can survive a wide range of saline conditions in the absence of specialized morphological and physiological traits. By employing the Illumina sequencing platform, we generated ~192 million short reads from four cDNA libraries of M. pinnata and processed them into 108,598 unisequences with a high depth of coverage. The mean length and total length of these unisequences were 606 bp and 65.8 Mb, respectively. A total of 54,596 (50.3%) unisequences were assigned Nr annotations. Functional classification revealed the involvement of unisequences in various biological processes related to metabolism and environmental adaptation. We identified 23,815 candidate salt-responsive genes with significantly differential expression under seawater and freshwater treatments. Based on the reverse transcription-polymerase chain reaction (RT-PCR) and real-time PCR analyses, we verified the changes in expression levels for a number of candidate genes. The functional enrichment analyses for the candidate genes showed tissue-specific patterns of transcriptome remodelling upon salt stress in the roots and the leaves. The transcriptome of M. pinnata will provide valuable gene resources for future application in crop improvement. In addition, this study sets a good example for large-scale identification of salt-responsive genes in non-model organisms using the sequencing-based approach.

  6. ATWS analysis for total loss of feedwater sequence in UCN 3 and 4

    International Nuclear Information System (INIS)

    Park, S. H.; Song, Y. M.; Kim, D. H.; Kim, S. D.; Park, S. Y.

    1999-01-01

    ATWS is a trip-failed severe accident initiated from the transients like a turbine trip, a control bank withdrawal, and a loss of feedwater which are expected to occur comparatively often (one or two occurrences / year). In this study, an ATWS sequence in Ulchin 3 and 4 is analyzed and the effects of the important systems are studied for accident management purpose using a MIDAS/PK computer code. The MIDAS/PK code has been developed via coupling a point kinetics module with the MELCOR code. The code calculates a primary peak pressure of about 24MPa at 240 seconds for the ATWS initiated by a TLOF (Total Loss of Feedwater) transient. Along with the basic ATWS analysis, several sensitivity runs are performed. From these, the turbines and the safety depressurization system (SDS) are judged to be important. The turbine trip resulting in a loss of offsite power and a RCP trip, degrades primary heat transfer to the secondary sides, and in turn, increases primary coolant temperature which reduces the reactor power due to the negative moderator temperature coefficient. Manual operation of SDS has an effect to lower the primary peak pressure considerably via supplementary depressurization in addition to the PORVs

  7. Comparison of methods for genomic localization of gene trap sequences

    Directory of Open Access Journals (Sweden)

    Ferrin Thomas E

    2006-09-01

    Full Text Available Abstract Background Gene knockouts in a model organism such as mouse provide a valuable resource for the study of basic biology and human disease. Determining which gene has been inactivated by an untargeted gene trapping event poses a challenging annotation problem because gene trap sequence tags, which represent sequence near the vector insertion site of a trapped gene, are typically short and often contain unresolved residues. To understand better the localization of these sequences on the mouse genome, we compared stand-alone versions of the alignment programs BLAT, SSAHA, and MegaBLAST. A set of 3,369 sequence tags was aligned to build 34 of the mouse genome using default parameters for each algorithm. Known genome coordinates for the cognate set of full-length genes (1,659 sequences were used to evaluate localization results. Results In general, all three programs performed well in terms of localizing sequences to a general region of the genome, with only relatively subtle errors identified for a small proportion of the sequence tags. However, large differences in performance were noted with regard to correctly identifying exon boundaries. BLAT correctly identified the vast majority of exon boundaries, while SSAHA and MegaBLAST missed the majority of exon boundaries. SSAHA consistently reported the fewest false positives and is the fastest algorithm. MegaBLAST was comparable to BLAT in speed, but was the most susceptible to localizing sequence tags incorrectly to pseudogenes. Conclusion The differences in performance for sequence tags and full-length reference sequences were surprisingly small. Characteristic variations in localization results for each program were noted that affect the localization of sequence at exon boundaries, in particular.

  8. Telomerase activity and telomere length in the colorectal polyp-carcinoma sequence Actividad de la telomerasa y longitud del telómero en la secuencia pólipo-carcinoma colorrectal

    Directory of Open Access Journals (Sweden)

    C. Valls Bautista

    2009-03-01

    Full Text Available Objective: the role of telomerase activity and telomere length in the adenoma-carcinoma sequence of colon carcinogenesis has not been well established. The objective of this study was to determine telomerase activity and telomere length patterns in patients with adenomatous polyps either associated or not with colorectal cancer, as well as the role of telomeric instability in the adenoma-carcinoma sequence. Patients and methods: we included in the study 14 patients who underwent surgery for colorectal cancer and/or polyps. In 6 of these patients fresh samples of tumor tissue, polyps, and normal mucosa were obtained; in the 8 remaining cases, we collected only polyps and normal mucosa. We used the fluorescent-telomeric repeat amplification protocol assay (TRAP-F to determine telomerase activity and telomere length using Southern-blot testing. Results: telomerase activity was detected in 86% of polyps and 50% of associated normal mucosa. Mean telomerase activity in polyp tissue was 5.85; in the normal mucosa it was 0.58 TPG. Mean telomere length was 6.78 Kbp and 7.78, respectively. Polyps in patients without synchronous cancer had a telomerase activity that was significantly higher (9.4 than in those with cancer (1.1. Conclusions: telomerase activity increases in the colorectal adenoma-carcinoma sequence, concurrently with a decrease in telomere length. The presence of synchronous cancer modifies telomerase activity in polyps.Objetivo: el papel de la actividad de la telomerasa y la longitud del telómero en la secuencia adenoma-carcinoma de la carcinogénesis colónica no ha sido bien establecido. El objetivo fue determinar el comportamiento de la actividad de la telomerasa y la longitud del telómero en pacientes con pólipos adenomatosos asociados o no a cáncer colorrectal y conocer el papel de la inestabilidad telomérica en la secuencia adenoma-carcinoma. Pacientes y métodos: se estudiaron 14 pacientes intervenidos de cáncer colorrectal y

  9. Computer simulation of replacement sequences in copper

    International Nuclear Information System (INIS)

    Schiffgens, J.O.; Schwartz, D.W.; Ariyasu, R.G.; Cascadden, S.E.

    1978-01-01

    Results of computer simulations of , , and replacement sequences in copper are presented, including displacement thresholds, focusing energies, energy losses per replacement, and replacement sequence lengths. These parameters are tabulated for six interatomic potentials and shown to vary in a systematic way with potential stiffness and range. Comparisons of results from calculations made with ADDES, a quasi-dynamical code, and COMENT, a dynamical code, show excellent agreement, demonstrating that the former can be calibrated and used satisfactorily in the analysis of low energy displacement cascades. Upper limits on , , and replacement sequences were found to be approximately 10, approximately 30, and approximately 14 replacements, respectively. (author)

  10. Effect of stereochemistry, chain length and sequence pattern on antimicrobial properties of short synthetic β-sheet forming peptide amphiphiles.

    Science.gov (United States)

    Ong, Zhan Yuin; Cheng, Junchi; Huang, Yuan; Xu, Kaijin; Ji, Zhongkang; Fan, Weimin; Yang, Yi Yan

    2014-01-01

    In the face of mounting global antibiotics resistance, the identification and development of membrane-active antimicrobial peptides (AMPs) as an alternative class of antimicrobial agent have gained significant attention. The physical perturbation and disruption of microbial membranes by the AMPs have been proposed to be an effective means to overcome conventional mechanisms of drug resistance. Recently, we have reported the design of a series of short synthetic β-sheet folding peptide amphiphiles comprised of recurring (X1Y1X2Y2)n-NH2 sequences where X: hydrophobic amino acids, Y: cationic amino acids and n: number of repeat units. In efforts to investigate the effects of key parameters including stereochemistry, chain length and sequence pattern on antimicrobial effects, systematic d-amino acid substitutions of the lead peptides (IRIK)2-NH2 (IK8-all L) and (IRVK)3-NH2 (IK12-all L) were performed. It was found that the corresponding D-enantiomers exhibited stronger antimicrobial activities with minimal or no change in hemolytic activities, hence translating very high selectivity indices of 407.0 and >9.8 for IK8-all D and IK12-all D respectively. IK8-all D was also demonstrated to be stable to degradation by broad spectrum proteases trypsin and proteinase K. The membrane disrupting bactericidal properties of IK8-all D effectively prevented drug resistance development and inhibited the growth of various clinically isolated MRSA, VRE, Acinetobacter baumanni, Pseudomonas aeruginosa, Cryptococcus. neoformans and Mycobacterium tuberculosis. Significant reduction in intracellular bacteria counts was also observed following treatment with IK8-all D in the Staphylococcus. aureus infected mouse macrophage cell line RAW264.7 (P < 0.01). These results suggest that the d-amino acids substituted β-sheet forming peptide IK8-all D with its enhanced antimicrobial activities and improved protease stability, is a promising therapeutic candidate with potential to combat

  11. The telomere length dynamic and methods of its assessment.

    Science.gov (United States)

    Lin, Kah-Wai; Yan, Ju

    2005-01-01

    Human telomeres are composed of long repeating sequences of TTAGGG, associated with a variety of telomere-binding proteins. Its function as an end-protector of chromosomes prevents the chromosome from end-to-end fusion, recombination and degradation. Telomerase acts as reverse transcriptase in the elongation of telomeres, which prevent the loss of telomeres due to the end replication problems. However, telomerase activity is detected at low level in somatic cells and high level in embryonic stem cells and tumor cells. It confers immortality to embryonic stem cells and tumor cells. In most tumor cells, telomeres are extremely short and stable. Telomere length is an important indicator of the telomerase activity in tumor cells and it may be used in the prognosis of malignancy. Thus, the assessment of telomeres length is of great experimental and clinical significance. This review describes the role of telomere and telomerase in cancer pathogenesis and the dynamics of the telomeres length in different cell types. The various methods of measurement of telomeres length, i.e. southern blot, hybridization protection assay, fluorescence in situ hybridization, primed in situ, quantitative PCR and single telomere length analysis are discussed. The principle and comparative evaluation of these methods are reviewed. The detection of G-strand overhang by telomeric-oligonucleotide ligation assay, primer extension/nick translation assay and electron microscopy are briefly discussed.

  12. Gene discovery and molecular marker development, based on high-throughput transcript sequencing of Paspalum dilatatum Poir.

    Directory of Open Access Journals (Sweden)

    Andrea Giordano

    Full Text Available BACKGROUND: Paspalum dilatatum Poir. (common name dallisgrass is a native grass species of South America, with special relevance to dairy and red meat production. P. dilatatum exhibits higher forage quality than other C4 forage grasses and is tolerant to frost and water stress. This species is predominantly cultivated in an apomictic monoculture, with an inherent high risk that biotic and abiotic stresses could potentially devastate productivity. Therefore, advanced breeding strategies that characterise and use available genetic diversity, or assess germplasm collections effectively are required to deliver advanced cultivars for production systems. However, there are limited genomic resources available for this forage grass species. RESULTS: Transcriptome sequencing using second-generation sequencing platforms has been employed using pooled RNA from different tissues (stems, roots, leaves and inflorescences at the final reproductive stage of P. dilatatum cultivar Primo. A total of 324,695 sequence reads were obtained, corresponding to c. 102 Mbp. The sequences were assembled, generating 20,169 contigs of a combined length of 9,336,138 nucleotides. The contigs were BLAST analysed against the fully sequenced grass species of Oryza sativa subsp. japonica, Brachypodium distachyon, the closely related Sorghum bicolor and foxtail millet (Setaria italica genomes as well as against the UniRef 90 protein database allowing a comprehensive gene ontology analysis to be performed. The contigs generated from the transcript sequencing were also analysed for the presence of simple sequence repeats (SSRs. A total of 2,339 SSR motifs were identified within 1,989 contigs and corresponding primer pairs were designed. Empirical validation of a cohort of 96 SSRs was performed, with 34% being polymorphic between sexual and apomictic biotypes. CONCLUSIONS: The development of genetic and genomic resources for P. dilatatum will contribute to gene discovery and expression

  13. A Hybrid Metaheuristic Approach for Minimizing the Total Flow Time in A Flow Shop Sequence Dependent Group Scheduling Problem

    Directory of Open Access Journals (Sweden)

    Antonio Costa

    2014-07-01

    Full Text Available Production processes in Cellular Manufacturing Systems (CMS often involve groups of parts sharing the same technological requirements in terms of tooling and setup. The issue of scheduling such parts through a flow-shop production layout is known as the Flow-Shop Group Scheduling (FSGS problem or, whether setup times are sequence-dependent, the Flow-Shop Sequence-Dependent Group Scheduling (FSDGS problem. This paper addresses the FSDGS issue, proposing a hybrid metaheuristic procedure integrating features from Genetic Algorithms (GAs and Biased Random Sampling (BRS search techniques with the aim of minimizing the total flow time, i.e., the sum of completion times of all jobs. A well-known benchmark of test cases, entailing problems with two, three, and six machines, is employed for both tuning the relevant parameters of the developed procedure and assessing its performances against two metaheuristic algorithms recently presented by literature. The obtained results and a properly arranged ANOVA analysis highlight the superiority of the proposed approach in tackling the scheduling problem under investigation.

  14. Arc Length Coding by Interference of Theta Frequency Oscillations May Underlie Context-Dependent Hippocampal Unit Data and Episodic Memory Function

    Science.gov (United States)

    Hasselmo, Michael E.

    2007-01-01

    Many memory models focus on encoding of sequences by excitatory recurrent synapses in region CA3 of the hippocampus. However, data and modeling suggest an alternate mechanism for encoding of sequences in which interference between theta frequency oscillations encodes the position within a sequence based on spatial arc length or time. Arc length…

  15. Early postnatal development of the mandible in children with isolated cleft palate and children with nonsyndromic Robin sequence

    DEFF Research Database (Denmark)

    Eriksen, J.; Hermann, N.V.; Darvann, Tron Andre

    2006-01-01

    Objective: Analysis of early postnatal mandibular size and growth velocity in children with untreated isolated cleft palate (ICP), nonsyndromic Robin sequence (RS), and a control group of children with unilateral incomplete cleft lip (UICL). Material: 114 children (66 isolated cleft palate, 7 Robin...... and mandibular growth velocity (mm/year) was calculated. Cleft width was measured on the casts at 2 months of age. Results: Mean mandibular length and posterior height were significantly smaller in isolated cleft palate and Robin sequence, compared with unilateral incomplete cleft lip. Mandibular length in Robin...... sequence was also significantly shorter, compared with isolated cleft palate. No significant difference was found between mean mandibular growth velocities in the three groups. No significant correlation was found between mandibular length and cleft width in either isolated cleft palate or Robin sequence...

  16. Association of Telomere Length with Breast Cancer Prognostic Factors.

    Directory of Open Access Journals (Sweden)

    Kaoutar Ennour-Idrissi

    Full Text Available Telomere length, a marker of cell aging, seems to be affected by the same factors thought to be associated with breast cancer prognosis.To examine associations of peripheral blood cell-measured telomere length with traditional and potential prognostic factors in breast cancer patients.We conducted a cross-sectional analysis of data collected before surgery from 162 breast cancer patients recruited consecutively between 01/2011 and 05/2012, at a breast cancer reference center. Data on the main lifestyle factors (smoking, alcohol consumption, physical activity were collected using standardized questionnaires. Anthropometric factors were measured. Tumor biological characteristics were extracted from pathology reports. Telomere length was measured using a highly reproducible quantitative PCR method in peripheral white blood cells. Spearman partial rank-order correlations and multivariate general linear models were used to evaluate relationships between telomere length and prognostic factors.Telomere length was positively associated with total physical activity (rs = 0.17, P = 0.033; Ptrend = 0.069, occupational physical activity (rs = 0.15, P = 0.054; Ptrend = 0.054 and transportation-related physical activity (rs = 0.19, P = 0.019; P = 0.005. Among post-menopausal women, telomere length remained positively associated with total physical activity (rs = 0.27, P = 0.016; Ptrend = 0.054 and occupational physical activity (rs = 0.26, P = 0.021; Ptrend = 0.056 and was only associated with transportation-related physical activity among pre-menopausal women (rs = 0.27, P = 0.015; P = 0.004. No association was observed between telomere length and recreational or household activities, other lifestyle factors or traditional prognostic factors.Telomeres are longer in more active breast cancer patients. Since white blood cells are involved in anticancer immune responses, these findings suggest that even regular low-intensity physical activity, such as that

  17. String matching with variable length gaps

    DEFF Research Database (Denmark)

    Bille, Philip; Gørtz, Inge Li; Vildhøj, Hjalte Wedel

    2012-01-01

    primitive in computational biology applications. Let m and n be the lengths of P and T, respectively, and let k be the number of strings in P. We present a new algorithm achieving time O(nlogk+m+α) and space O(m+A), where A is the sum of the lower bounds of the lengths of the gaps in P and α is the total...... number of occurrences of the strings in P within T. Compared to the previous results this bound essentially achieves the best known time and space complexities simultaneously. Consequently, our algorithm obtains the best known bounds for almost all combinations of m, n, k, A, and α. Our algorithm...

  18. The use of phase sequence image sets to reconstruct the total volume occupied by a mobile lung tumor

    International Nuclear Information System (INIS)

    Gagne, Isabelle M.; Robinson, Don M.; Halperin, Ross; Roa, Wilson

    2005-01-01

    The use of phase sequence image (PSI) sets to reveal the total volume occupied by a mobile target is presented. Isocontrast composite clinical target volumes (CCTVs) may be constructed from PSI sets in order to reveal the total volume occupied by a mobile target during the course of its travel. The ability of the CCTV technique to properly account for target motion is demonstrated by comparison to contours of the true total volume occupied (TVO) for a number of experimental phantom geometries. Finally, using real patient data, the clinical utility of the CCTV technique to properly account for internal tumor motion while minimizing the volume of healthy lung tissue irradiated is assessed by comparison to the standard approach of applying safety margins. Results of the phantom study reveal that CCTV cross sections constructed at the 20% isocontrast level yield good agreement with the total cross sections (TXO) of mobile targets. These CCTVs conform well to the TVOs of the moving targets examined whereby the addition of small uniform margins ensures complete circumscription of the TVO with the inclusion of minimal amounts of surrounding external volumes. The CCTV technique is seen to be clearly superior to the common practice of the addition of safety margins to individual CTV contours in order to account for internal target motion. Margins required with the CCTV technique are eight to ten times smaller than those required with individual CTVs

  19. Sex identification and reconstruction of length of humerus from its fragments: An Egyptian study

    Directory of Open Access Journals (Sweden)

    Dalia Mohamed Ali

    2016-06-01

    Full Text Available The aim of this study was to calculate the total length of the humerus and identify the sex from its fragments in Egyptians. One hundred and fifty dry adult right humeri (75 male and 75 female were studied. The humeri were divided into seven fragments according to specific anatomical landmarks. Data obtained was subjected to descriptive statistical analysis. The longest fragmentary portion revealed a good result with closest proximity to the total length of humerus. All fragments showed significant sexual differences (P < 0.001 between males and females except H2. Total length of humerus revealed the highest percentage of accuracy (93.3% followed by H4 (86.7% and H7 (83.3% for sex identification. Finally, from measurements of different humeral fragments in Egyptian population; the length of the humerus can be estimated and the sex can be identified.

  20. Complete mitogenomic sequence of the Critically Endangered Northern River Shark Glyphis garricki (Carcharhiniformes: Carcharhinidae).

    Science.gov (United States)

    Feutry, Pierre; Grewe, Peter M; Kyne, Peter M; Chen, Xiao

    2015-01-01

    In this study we describe the first complete mitochondrial sequence for the Critically Endangered Northern River shark Glyphis garricki. The complete mitochondrial sequence is 16,702 bp in length, contains 37 genes and one control region with the typical gene order and transcriptional direction of vertebrate mitogenomes. The overall base composition is 31.5% A, 26.3% C, 12.9% G and 29.3% T. The length of 22 tRNA genes ranged from 68 (tRNA-Ser2 and tRNA-Cys) to 75 (tRNA-Leu1) bp. The control region of G. garricki was 1067 bp in length with high A+T (67.9%) and poor G (12.6%) content. The mitogenomic characters (base composition, codon usage and gene length) of G. garricki were very similar to Glyphis glyphis.

  1. An Enumerative Combinatorics Model for Fragmentation Patterns in RNA Sequencing Provides Insights into Nonuniformity of the Expected Fragment Starting-Point and Coverage Profile.

    Science.gov (United States)

    Prakash, Celine; Haeseler, Arndt Von

    2017-03-01

    RNA sequencing (RNA-seq) has emerged as the method of choice for measuring the expression of RNAs in a given cell population. In most RNA-seq technologies, sequencing the full length of RNA molecules requires fragmentation into smaller pieces. Unfortunately, the issue of nonuniform sequencing coverage across a genomic feature has been a concern in RNA-seq and is attributed to biases for certain fragments in RNA-seq library preparation and sequencing. To investigate the expected coverage obtained from fragmentation, we develop a simple fragmentation model that is independent of bias from the experimental method and is not specific to the transcript sequence. Essentially, we enumerate all configurations for maximal placement of a given fragment length, F, on transcript length, T, to represent every possible fragmentation pattern, from which we compute the expected coverage profile across a transcript. We extend this model to incorporate general empirical attributes such as read length, fragment length distribution, and number of molecules of the transcript. We further introduce the fragment starting-point, fragment coverage, and read coverage profiles. We find that the expected profiles are not uniform and that factors such as fragment length to transcript length ratio, read length to fragment length ratio, fragment length distribution, and number of molecules influence the variability of coverage across a transcript. Finally, we explore a potential application of the model where, with simulations, we show that it is possible to correctly estimate the transcript copy number for any transcript in the RNA-seq experiment.

  2. Targeted genomic enrichment and sequencing of CyHV-3 from carp tissues confirms low nucleotide diversity and mixed genotype infections

    Directory of Open Access Journals (Sweden)

    Saliha Hammoumi

    2016-09-01

    Full Text Available Koi herpesvirus disease (KHVD is an emerging disease that causes mass mortality in koi and common carp, Cyprinus carpio L. Its causative agent is Cyprinid herpesvirus 3 (CyHV-3, also known as koi herpesvirus (KHV. Although data on the pathogenesis of this deadly virus is relatively abundant in the literature, still little is known about its genomic diversity and about the molecular mechanisms that lead to such a high virulence. In this context, we developed a new strategy for sequencing full-length CyHV-3 genomes directly from infected fish tissues. Total genomic DNA extracted from carp gill tissue was specifically enriched with CyHV-3 sequences through hybridization to a set of nearly 2 million overlapping probes designed to cover the entire genome length, using KHV-J sequence (GenBank accession number AP008984 as reference. Applied to 7 CyHV-3 specimens from Poland and Indonesia, this targeted genomic enrichment enabled recovery of the full genomes with >99.9% reference coverage. The enrichment rate was directly correlated to the estimated number of viral copies contained in the DNA extracts used for library preparation, which varied between ∼5000 and ∼2×107. The average sequencing depth was >200 for all samples, thus allowing the search for variants with high confidence. Sequence analyses highlighted a significant proportion of intra-specimen sequence heterogeneity, suggesting the presence of mixed infections in all investigated fish. They also showed that inter-specimen genetic diversity at the genome scale was very low (>99.95% of sequence identity. By enabling full genome comparisons directly from infected fish tissues, this new method will be valuable to trace outbreaks rapidly and at a reasonable cost, and in turn to understand the transmission routes of CyHV-3.

  3. Genome-wide identification of aquaporin encoding genes in Brassica oleracea and their phylogenetic sequence comparison to Brassica crops and Arabidopsis

    Science.gov (United States)

    Diehn, Till A.; Pommerrenig, Benjamin; Bernhardt, Nadine; Hartmann, Anja; Bienert, Gerd P.

    2015-01-01

    Aquaporins (AQPs) are essential channel proteins that regulate plant water homeostasis and the uptake and distribution of uncharged solutes such as metalloids, urea, ammonia, and carbon dioxide. Despite their importance as crop plants, little is known about AQP gene and protein function in cabbage (Brassica oleracea) and other Brassica species. The recent releases of the genome sequences of B. oleracea and Brassica rapa allow comparative genomic studies in these species to investigate the evolution and features of Brassica genes and proteins. In this study, we identified all AQP genes in B. oleracea by a genome-wide survey. In total, 67 genes of four plant AQP subfamilies were identified. Their full-length gene sequences and locations on chromosomes and scaffolds were manually curated. The identification of six additional full-length AQP sequences in the B. rapa genome added to the recently published AQP protein family of this species. A phylogenetic analysis of AQPs of Arabidopsis thaliana, B. oleracea, B. rapa allowed us to follow AQP evolution in closely related species and to systematically classify and (re-) name these isoforms. Thirty-three groups of AQP-orthologous genes were identified between B. oleracea and Arabidopsis and their expression was analyzed in different organs. The two selectivity filters, gene structure and coding sequences were highly conserved within each AQP subfamily while sequence variations in some introns and untranslated regions were frequent. These data suggest a similar substrate selectivity and function of Brassica AQPs compared to Arabidopsis orthologs. The comparative analyses of all AQP subfamilies in three Brassicaceae species give initial insights into AQP evolution in these taxa. Based on the genome-wide AQP identification in B. oleracea and the sequence analysis and reprocessing of Brassica AQP information, our dataset provides a sequence resource for further investigations of the physiological and molecular functions of

  4. Effects of Presentation Format and List Length on Children's False Memories

    Science.gov (United States)

    Swannell, Ellen R.; Dewhurst, Stephen A.

    2013-01-01

    The effect of list length on children's false memories was investigated using list and story versions of the Deese/Roediger-McDermott procedure. Short (7 items) and long (14 items) sequences of semantic associates were presented to children aged 6, 8, and 10 years old either in lists or embedded within a story that emphasized the list theme.…

  5. Accurate typing of short tandem repeats from genome-wide sequencing data and its applications.

    Science.gov (United States)

    Fungtammasan, Arkarachai; Ananda, Guruprasad; Hile, Suzanne E; Su, Marcia Shu-Wei; Sun, Chen; Harris, Robert; Medvedev, Paul; Eckert, Kristin; Makova, Kateryna D

    2015-05-01

    Short tandem repeats (STRs) are implicated in dozens of human genetic diseases and contribute significantly to genome variation and instability. Yet profiling STRs from short-read sequencing data is challenging because of their high sequencing error rates. Here, we developed STR-FM, short tandem repeat profiling using flank-based mapping, a computational pipeline that can detect the full spectrum of STR alleles from short-read data, can adapt to emerging read-mapping algorithms, and can be applied to heterogeneous genetic samples (e.g., tumors, viruses, and genomes of organelles). We used STR-FM to study STR error rates and patterns in publicly available human and in-house generated ultradeep plasmid sequencing data sets. We discovered that STRs sequenced with a PCR-free protocol have up to ninefold fewer errors than those sequenced with a PCR-containing protocol. We constructed an error correction model for genotyping STRs that can distinguish heterozygous alleles containing STRs with consecutive repeat numbers. Applying our model and pipeline to Illumina sequencing data with 100-bp reads, we could confidently genotype several disease-related long trinucleotide STRs. Utilizing this pipeline, for the first time we determined the genome-wide STR germline mutation rate from a deeply sequenced human pedigree. Additionally, we built a tool that recommends minimal sequencing depth for accurate STR genotyping, depending on repeat length and sequencing read length. The required read depth increases with STR length and is lower for a PCR-free protocol. This suite of tools addresses the pressing challenges surrounding STR genotyping, and thus is of wide interest to researchers investigating disease-related STRs and STR evolution. © 2015 Fungtammasan et al.; Published by Cold Spring Harbor Laboratory Press.

  6. Length correction for early-juvenile Brazilian herring Sardinella janeiro (Eigenmann, 1894 after preservation in formalin, ethanol and freezing

    Directory of Open Access Journals (Sweden)

    Joaquim N. S. Santos

    Full Text Available This work aims to quantify the variation in total length and body mass for the early-juvenile Brazilian herring Sardinella janeiro and to determine total length and body mass correction equation to allow fresh measures to be calculated from preserved ones. Fishes were randomly assigned to one of five preservation methods (freezing at - 20º C, 2.5% and 5% formalin, 70% and 95% ethanol, and measured for total length (TL and body mass (W before preservation, and on days 5, 15, 30, and 60 after storage. Significant reductions in total length and body mass occurred during the first 5 days after preservation and continued to contract significantly at a lesser rate through 30 days in most methods. Exceptions were shown for body mass in freezing and 5% formalin, where the greatest losses occurred after 30 days of preservation. The degree of shrinkage for total length and body mass was very much dependent on fish size, with smaller specimens shrinking more than larger ones. The fresh total length and body mass can be back-calculated using equations that describe the relationship between fresh and preserved individuals after 60 days storage for all methods except for body mass in freezing.

  7. Next Generation Sequencing Analysis of Human Platelet PolyA+ mRNAs and rRNA-Depleted Total RNA

    Science.gov (United States)

    Kissopoulou, Antheia; Jonasson, Jon; Lindahl, Tomas L.; Osman, Abdimajid

    2013-01-01

    Background Platelets are small anucleate cells circulating in the blood vessels where they play a key role in hemostasis and thrombosis. Here, we compared platelet RNA-Seq results obtained from polyA+ mRNA and rRNA-depleted total RNA. Materials and Methods We used purified, CD45 depleted, human blood platelets collected by apheresis from three male and one female healthy blood donors. The Illumina HiSeq 2000 platform was employed to sequence cDNA converted either from oligo(dT) isolated polyA+ RNA or from rRNA-depleted total RNA. The reads were aligned to the GRCh37 reference assembly with the TopHat/Cufflinks alignment package using Ensembl annotations. A de novo assembly of the platelet transcriptome using the Trinity software package and RSEM was also performed. The bioinformatic tools HTSeq and DESeq from Bioconductor were employed for further statistical analyses of read counts. Results Consistent with previous findings our data suggests that mitochondrially expressed genes comprise a substantial fraction of the platelet transcriptome. We also identified high transcript levels for protein coding genes related to the cytoskeleton function, chemokine signaling, cell adhesion, aggregation, as well as receptor interaction between cells. Certain transcripts were particularly abundant in platelets compared with other cell and tissue types represented by RNA-Seq data from the Illumina Human Body Map 2.0 project. Irrespective of the different library preparation and sequencing protocols, there was good agreement between samples from the 4 individuals. Eighteen differentially expressed genes were identified in the two sexes at 10% false discovery rate using DESeq. Conclusion The present data suggests that platelets may have a unique transcriptome profile characterized by a relative over-expression of mitochondrially encoded genes and also of genomic transcripts related to the cytoskeleton function, chemokine signaling and surface components compared with other cell and

  8. Next generation sequencing analysis of human platelet PolyA+ mRNAs and rRNA-depleted total RNA.

    Directory of Open Access Journals (Sweden)

    Antheia Kissopoulou

    Full Text Available BACKGROUND: Platelets are small anucleate cells circulating in the blood vessels where they play a key role in hemostasis and thrombosis. Here, we compared platelet RNA-Seq results obtained from polyA+ mRNA and rRNA-depleted total RNA. MATERIALS AND METHODS: We used purified, CD45 depleted, human blood platelets collected by apheresis from three male and one female healthy blood donors. The Illumina HiSeq 2000 platform was employed to sequence cDNA converted either from oligo(dT isolated polyA+ RNA or from rRNA-depleted total RNA. The reads were aligned to the GRCh37 reference assembly with the TopHat/Cufflinks alignment package using Ensembl annotations. A de novo assembly of the platelet transcriptome using the Trinity software package and RSEM was also performed. The bioinformatic tools HTSeq and DESeq from Bioconductor were employed for further statistical analyses of read counts. RESULTS: Consistent with previous findings our data suggests that mitochondrially expressed genes comprise a substantial fraction of the platelet transcriptome. We also identified high transcript levels for protein coding genes related to the cytoskeleton function, chemokine signaling, cell adhesion, aggregation, as well as receptor interaction between cells. Certain transcripts were particularly abundant in platelets compared with other cell and tissue types represented by RNA-Seq data from the Illumina Human Body Map 2.0 project. Irrespective of the different library preparation and sequencing protocols, there was good agreement between samples from the 4 individuals. Eighteen differentially expressed genes were identified in the two sexes at 10% false discovery rate using DESeq. CONCLUSION: The present data suggests that platelets may have a unique transcriptome profile characterized by a relative over-expression of mitochondrially encoded genes and also of genomic transcripts related to the cytoskeleton function, chemokine signaling and surface components

  9. Amplification and chromosomal dispersion of human endogenous retroviral sequences

    International Nuclear Information System (INIS)

    Steele, P.E.; Martin, M.A.; Rabson, A.B.; Bryan, T.; O'Brien, S.J.

    1986-01-01

    Endogenous retroviral sequences have undergone amplification events involving both viral and flanking cellular sequences. The authors cloned members of an amplified family of full-length endogenous retroviral sequences. Genomic blotting, employing a flanking cellular DNA probe derived from a member of this family, revealed a similar array of reactive bands in both humans and chimpanzees, indicating that an amplification event involving retroviral and associated cellular DNA sequences occurred before the evolutionary separation of these two primates. Southern analyses of restricted somatic cell hybrid DNA preparations suggested that endogenous retroviral segments are widely dispersed in the human genome and that amplification and dispersion events may be linked

  10. Dynamic programming algorithms for biological sequence comparison.

    Science.gov (United States)

    Pearson, W R; Miller, W

    1992-01-01

    Efficient dynamic programming algorithms are available for a broad class of protein and DNA sequence comparison problems. These algorithms require computer time proportional to the product of the lengths of the two sequences being compared [O(N2)] but require memory space proportional only to the sum of these lengths [O(N)]. Although the requirement for O(N2) time limits use of the algorithms to the largest computers when searching protein and DNA sequence databases, many other applications of these algorithms, such as calculation of distances for evolutionary trees and comparison of a new sequence to a library of sequence profiles, are well within the capabilities of desktop computers. In particular, the results of library searches with rapid searching programs, such as FASTA or BLAST, should be confirmed by performing a rigorous optimal alignment. Whereas rapid methods do not overlook significant sequence similarities, FASTA limits the number of gaps that can be inserted into an alignment, so that a rigorous alignment may extend the alignment substantially in some cases. BLAST does not allow gaps in the local regions that it reports; a calculation that allows gaps is very likely to extend the alignment substantially. Although a Monte Carlo evaluation of the statistical significance of a similarity score with a rigorous algorithm is much slower than the heuristic approach used by the RDF2 program, the dynamic programming approach should take less than 1 hr on a 386-based PC or desktop Unix workstation. For descriptive purposes, we have limited our discussion to methods for calculating similarity scores and distances that use gap penalties of the form g = rk. Nevertheless, programs for the more general case (g = q+rk) are readily available. Versions of these programs that run either on Unix workstations, IBM-PC class computers, or the Macintosh can be obtained from either of the authors.

  11. Cloning and sequencing of Indian Water buffalo (Bubalus bubalis) interleukin-3 cDNA

    KAUST Repository

    Sugumar, Thennarasu; Harishankar, M.; Dhinakar Raj, G.

    2011-01-01

    Full-length cDNA (435 bp) of the interleukin-3(IL-3) gene of the Indian water buffalo was amplified by reverse transcriptase-polymerase chain reaction and sequenced. This sequence had 96% nucleotide identity and 92% amino acid identity with bovine

  12. The Distribution of Lightning Channel Lengths in Northern Alabama Thunderstorms

    Science.gov (United States)

    Peterson, H. S.; Koshak, W. J.

    2010-01-01

    Lightning is well known to be a major source of tropospheric NOx, and in most cases is the dominant natural source (Huntreiser et al 1998, Jourdain and Hauglustaine 2001). Production of NOx by a segment of a lightning channel is a function of channel segment energy density and channel segment altitude. A first estimate of NOx production by a lightning flash can be found by multiplying production per segment [typically 104 J/m; Hill (1979)] by the total length of the flash s channel. The purpose of this study is to determine average channel length for lightning flashes near NALMA in 2008, and to compare average channel length of ground flashes to the average channel length of cloud flashes.

  13. Sequencing of chloroplast genome using whole cellular DNA and Solexa sequencing technology

    Directory of Open Access Journals (Sweden)

    Jian eWu

    2012-11-01

    Full Text Available Sequencing of the chloroplast genome using traditional sequencing methods has been difficult because of its size (>120 kb and the complicated procedures required to prepare templates. To explore the feasibility of sequencing the chloroplast genome using DNA extracted from whole cells and Solexa sequencing technology, we sequenced whole cellular DNA isolated from leaves of three Brassica rapa accessions with one lane per accession. In total, 246 Mb, 362Mb, 361 Mb sequence data were generated for the three accessions Chiifu-401-42, Z16 and FT, respectively. Microreads were assembled by reference-guided assembly using the cpDNA sequences of B. rapa, Arabidopsis thaliana, and Nicotiana tabacum. We achieved coverage of more than 99.96% of the cp genome in the three tested accessions using the B. rapa sequence as the reference. When A. thaliana or N. tabacum sequences were used as references, 99.7–99.8% or 95.5–99.7% of the B. rapa chloroplast genome was covered, respectively. These results demonstrated that sequencing of whole cellular DNA isolated from young leaves using the Illumina Genome Analyzer is an efficient method for high-throughput sequencing of chloroplast genome.

  14. Genetic diversity analysis of Leuconostoc mesenteroides from Korean vegetables and food products by multilocus sequence typing.

    Science.gov (United States)

    Sharma, Anshul; Kaur, Jasmine; Lee, Sulhee; Park, Young-Seo

    2018-06-01

    In the present study, 35 Leuconostoc mesenteroides strains isolated from vegetables and food products from South Korea were studied by multilocus sequence typing (MLST) of seven housekeeping genes (atpA, groEL, gyrB, pheS, pyrG, rpoA, and uvrC). The fragment sizes of the seven amplified housekeeping genes ranged in length from 366 to 1414 bp. Sequence analysis indicated 27 different sequence types (STs) with 25 of them being represented by a single strain indicating high genetic diversity, whereas the remaining 2 were characterized by five strains each. In total, 220 polymorphic nucleotide sites were detected among seven housekeeping genes. The phylogenetic analysis based on the STs of the seven loci indicated that the 35 strains belonged to two major groups, A (28 strains) and B (7 strains). Split decomposition analysis showed that intraspecies recombination played a role in generating diversity among strains. The minimum spanning tree showed that the evolution of the STs was not correlated with food source. This study signifies that the multilocus sequence typing is a valuable tool to access the genetic diversity among L. mesenteroides strains from South Korea and can be used further to monitor the evolutionary changes.

  15. Simultaneous and staged bilateral total hip arthroplasty

    DEFF Research Database (Denmark)

    Lindberg-Larsen, Martin; Joergensen, Christoffer Calov; Husted, Henrik

    2013-01-01

    Bilateral total hip arthroplasty (BTHA) and bilateral simultaneous total hip arthroplasty (BSTHA) are done increasingly. Previous studies evaluating outcomes after bilateral procedures have found different results. The aim of this study was to investigate length of hospital stay (LOS), 30 days...

  16. MultiLocus Sequence Analysis- and Amplified Fragment Length Polymorphism-based characterization of xanthomonads associated with bacterial spot of tomato and pepper and their relatedness to Xanthomonas species.

    Science.gov (United States)

    Hamza, A A; Robene-Soustrade, I; Jouen, E; Lefeuvre, P; Chiroleu, F; Fisher-Le Saux, M; Gagnevin, L; Pruvost, O

    2012-05-01

    MultiLocus Sequence Analysis (MLSA) and Amplified Fragment Length Polymorphism (AFLP) were used to measure the genetic relatedness of a comprehensive collection of xanthomonads pathogenic to solaneous hosts to Xanthomonas species. The MLSA scheme was based on partial sequences of four housekeeping genes (atpD, dnaK, efp and gyrB). Globally, MLSA data unambiguously identified strains causing bacterial spot of tomato and pepper at the species level and was consistent with AFLP data. Genetic distances derived from both techniques showed a close relatedness of (i) X. euvesicatoria, X. perforans and X. alfalfae and (ii) X. gardneri and X. cynarae. Maximum likelihood tree topologies derived from each gene portion and the concatenated data set for species in the X. campestris 16S rRNA core (i.e. the species cluster comprising all strains causing bacterial spot of tomato and pepper) were not congruent, consistent with the detection of several putative recombination events in our data sets by several recombination search algorithms. One recombinant region in atpD was identified in most strains of X. euvesicatoria including the type strain. Copyright © 2012 Elsevier GmbH. All rights reserved.

  17. Characterization of four species of Trichuris (Nematoda: Enoplida) by their second internal transcribed spacer ribosomal DNA sequence.

    Science.gov (United States)

    Oliveros, R; Cutillas, C; De Rojas, M; Arias, P

    2000-12-01

    Adult worms of Trichuris ovis and T. globulosa were collected from Ovis aries (sheep) and Capra hircus (goats). T. suis was isolated from Sus scrofa domestica (swine) and T. leporis was isolated from Lepus europaeus (rabbits) in Spain. Genomic DNA was isolated and a ribosomal internal transcribed spacer (ITS2) was amplified and sequenced using polymerase-chain-reaction (PCR) techniques. The ITS2 of T. ovis and T. globulosa was 407 nucleotides in length and had a GC content of about 62%. Furthermore, the ITS2 of T. suis and T. leporis was 534 and 418 nucleotides in length and had a GC content of about 64.8% and 62.4%, respectively. There was evidence of slight variation in the sequence within individuals of all species analyzed, indicating intraindividual variation in the sequence of different copies of the ribosomal DNA. Furthermore, low-level intraspecific variation was detected. Sequence analyses of ITS2 products of T. ovis and T. globulosa demonstrated no sequence difference between them. Nevertheless, differences were detected between the ITS2 sequences of T. suis, T. leporis, and T. ovis, indicating that Trichuris species can reliably be differentiated by their ITS2 sequences and PCR-linked restriction-fragment-length polymorphism (RFLP).

  18. Examining Hurricane Track Length and Stage Duration Since 1980

    Science.gov (United States)

    Fandrich, K. M.; Pennington, D.

    2017-12-01

    Each year, tropical systems impact thousands of people worldwide. Current research shows a correlation between the intensity and frequency of hurricanes and the changing climate. However, little is known about other prominent hurricane features. This includes information about hurricane track length (the total distance traveled from tropical depression through a hurricane's final category assignment) and how this distance may have changed with time. Also unknown is the typical duration of a hurricane stage, such as tropical storm to category one, and if the time spent in each stage has changed in recent decades. This research aims to examine changes in hurricane stage duration and track lengths for the 319 storms in NOAA's National Ocean Service Hurricane Reanalysis dataset that reached Category 2 - 5 from 1980 - 2015. Based on evident ocean warming, it is hypothesized that a general increase in track length with time will be detected, thus modern hurricanes are traveling a longer distance than past hurricanes. It is also expected that stage durations are decreasing with time so that hurricanes mature faster than in past decades. For each storm, coordinates are acquired at 4-times daily intervals throughout its duration and track lengths are computed for each 6-hour period. Total track lengths are then computed and storms are analyzed graphically and statistically by category for temporal track length changes. The stage durations of each storm are calculated as the time difference between two consecutive stages. Results indicate that average track lengths for Cat 2 and 3 hurricanes are increasing through time. These findings show that these hurricanes are traveling a longer distance than earlier Cat 2 and 3 hurricanes. In contrast, average track lengths for Cat 4 and 5 hurricanes are decreasing through time, showing less distance traveled than earlier decades. Stage durations for all Cat 2, 4 and 5 storms decrease through the decades but Cat 3 storms show a

  19. Enhanced throughput for infrared automated DNA sequencing

    Science.gov (United States)

    Middendorf, Lyle R.; Gartside, Bill O.; Humphrey, Pat G.; Roemer, Stephen C.; Sorensen, David R.; Steffens, David L.; Sutter, Scott L.

    1995-04-01

    Several enhancements have been developed and applied to infrared automated DNA sequencing resulting in significantly higher throughput. A 41 cm sequencing gel (31 cm well- to-read distance) combines high resolution of DNA sequencing fragments with optimized run times yielding two runs per day of 500 bases per sample. A 66 cm sequencing gel (56 cm well-to-read distance) produces sequence read lengths of up to 1000 bases for ds and ss templates using either T7 polymerase or cycle-sequencing protocols. Using a multichannel syringe to load 64 lanes allows 16 samples (compatible with 96-well format) to be visualized for each run. The 41 cm gel configuration allows 16,000 bases per day (16 samples X 500 bases/sample X 2 ten hour runs/day) to be sequenced with the advantages of infrared technology. Enhancements to internal labeling techniques using an infrared-labeled dATP molecule (Boehringer Mannheim GmbH, Penzberg, Germany; Sequenase (U.S. Biochemical) have also been made. The inclusion of glycerol in the sequencing reactions yields greatly improved results for some primer and template combinations. The inclusion of (alpha) -Thio-dNTP's in the labeling reaction increases signal intensity two- to three-fold.

  20. Probabilistic topic modeling for the analysis and classification of genomic sequences

    Science.gov (United States)

    2015-01-01

    Background Studies on genomic sequences for classification and taxonomic identification have a leading role in the biomedical field and in the analysis of biodiversity. These studies are focusing on the so-called barcode genes, representing a well defined region of the whole genome. Recently, alignment-free techniques are gaining more importance because they are able to overcome the drawbacks of sequence alignment techniques. In this paper a new alignment-free method for DNA sequences clustering and classification is proposed. The method is based on k-mers representation and text mining techniques. Methods The presented method is based on Probabilistic Topic Modeling, a statistical technique originally proposed for text documents. Probabilistic topic models are able to find in a document corpus the topics (recurrent themes) characterizing classes of documents. This technique, applied on DNA sequences representing the documents, exploits the frequency of fixed-length k-mers and builds a generative model for a training group of sequences. This generative model, obtained through the Latent Dirichlet Allocation (LDA) algorithm, is then used to classify a large set of genomic sequences. Results and conclusions We performed classification of over 7000 16S DNA barcode sequences taken from Ribosomal Database Project (RDP) repository, training probabilistic topic models. The proposed method is compared to the RDP tool and Support Vector Machine (SVM) classification algorithm in a extensive set of trials using both complete sequences and short sequence snippets (from 400 bp to 25 bp). Our method reaches very similar results to RDP classifier and SVM for complete sequences. The most interesting results are obtained when short sequence snippets are considered. In these conditions the proposed method outperforms RDP and SVM with ultra short sequences and it exhibits a smooth decrease of performance, at every taxonomic level, when the sequence length is decreased. PMID:25916734

  1. Transcriptome sequencing of mung bean (Vigna radiate L.) genes and the identification of EST-SSR markers.

    Science.gov (United States)

    Chen, Honglin; Wang, Lixia; Wang, Suhua; Liu, Chunji; Blair, Matthew Wohlgemuth; Cheng, Xuzhen

    2015-01-01

    Mung bean (Vigna radiate (L.) Wilczek) is an important traditional food legume crop, with high economic and nutritional value. It is widely grown in China and other Asian countries. Despite its importance, genomic information is currently unavailable for this crop plant species or some of its close relatives in the Vigna genus. In this study, more than 103 million high quality cDNA sequence reads were obtained from mung bean using Illumina paired-end sequencing technology. The processed reads were assembled into 48,693 unigenes with an average length of 874 bp. Of these unigenes, 25,820 (53.0%) and 23,235 (47.7%) showed significant similarity to proteins in the NCBI non-redundant protein and nucleotide sequence databases, respectively. Furthermore, 19,242 (39.5%) could be classified into gene ontology categories, 18,316 (37.6%) into Swiss-Prot categories and 10,918 (22.4%) into KOG database categories (E-value SSR), and 2,303 sequences contained more than one SSR together in the same expressed sequence tag (EST). A total of 13,134 EST-SSRs were identified as potential molecular markers, with mono-nucleotide A/T repeats being the most abundant motif class and G/C repeats being rare. In this SSR analysis, we found five main repeat motifs: AG/CT (30.8%), GAA/TTC (12.6%), AAAT/ATTT (6.8%), AAAAT/ATTTT (6.2%) and AAAAAT/ATTTTT (1.9%). A total of 200 SSR loci were randomly selected for validation by PCR amplification as EST-SSR markers. Of these, 66 marker primer pairs produced reproducible amplicons that were polymorphic among 31 mung bean accessions selected from diverse geographical locations. The large number of SSR-containing sequences found in this study will be valuable for the construction of a high-resolution genetic linkage maps, association or comparative mapping and genetic analyses of various Vigna species.

  2. Use of the LUS in sequence allele designations to facilitate probabilistic genotyping of NGS-based STR typing results.

    Science.gov (United States)

    Just, Rebecca S; Irwin, Jodi A

    2018-05-01

    Some of the expected advantages of next generation sequencing (NGS) for short tandem repeat (STR) typing include enhanced mixture detection and genotype resolution via sequence variation among non-homologous alleles of the same length. However, at the same time that NGS methods for forensic DNA typing have advanced in recent years, many caseworking laboratories have implemented or are transitioning to probabilistic genotyping to assist the interpretation of complex autosomal STR typing results. Current probabilistic software programs are designed for length-based data, and were not intended to accommodate sequence strings as the product input. Yet to leverage the benefits of NGS for enhanced genotyping and mixture deconvolution, the sequence variation among same-length products must be utilized in some form. Here, we propose use of the longest uninterrupted stretch (LUS) in allele designations as a simple method to represent sequence variation within the STR repeat regions and facilitate - in the nearterm - probabilistic interpretation of NGS-based typing results. An examination of published population data indicated that a reference LUS region is straightforward to define for most autosomal STR loci, and that using repeat unit plus LUS length as the allele designator can represent greater than 80% of the alleles detected by sequencing. A proof of concept study performed using a freely available probabilistic software demonstrated that the LUS length can be used in allele designations when a program does not require alleles to be integers, and that utilizing sequence information improves interpretation of both single-source and mixed contributor STR typing results as compared to using repeat unit information alone. The LUS concept for allele designation maintains the repeat-based allele nomenclature that will permit backward compatibility to extant STR databases, and the LUS lengths themselves will be concordant regardless of the NGS assay or analysis tools

  3. The relationship between telomere length and mortality in chronic obstructive pulmonary disease (COPD.

    Directory of Open Access Journals (Sweden)

    Jee Lee

    Full Text Available Some have suggested that chronic obstructive pulmonary disease (COPD is a disease of accelerated aging. Aging is characterized by shortening of telomeres. The relationship of telomere length to important clinical outcomes such as mortality, disease progression and cancer in COPD is unknown. Using quantitative polymerase chain reaction (qPCR, we measured telomere length of peripheral leukocytes in 4,271 subjects with mild to moderate COPD who participated in the Lung Health Study (LHS. The subjects were followed for approximately 7.5 years during which time their vital status, FEV(1 and smoking status were ascertained. Using multiple regression methods, we determined the relationship of telomere length to cancer and total mortality in these subjects. We also measured telomere length in healthy "mid-life" volunteers and patients with more severe COPD. The LHS subjects had significantly shorter telomeres than those of healthy "mid-life" volunteers (p<.001. Compared to individuals in the 4(th quartile of relative telomere length (i.e. longest telomere group, the remaining participants had significantly higher risk of cancer mortality (Hazard ratio, HR, 1.48; p = 0.0324 and total mortality (HR, 1.29; p = 0.0425. Smoking status did not make a significant difference in peripheral blood cells telomere length. In conclusion, COPD patients have short leukocyte telomeres, which are in turn associated increased risk of total and cancer mortality. Accelerated aging is of particular relevance to cancer mortality in COPD.

  4. Assessment of colorectal length using the electromagnetic capsule tracking system

    DEFF Research Database (Denmark)

    Mark, E B; Poulsen, J L; Haase, A M

    2017-01-01

    AIM: We aimed to determine colorectal length with the 3D-Transit system by describing a 'centerline' of capsule movement and compare it to known anatomy, as determined by magnetic resonance imaging (MRI). Further, we aimed to test the day-to-day variation of colorectal length assessed......: Computation of colorectal length from capsule passage was possible in 60 of the 67 3D-Transit recordings. Length of the colorectum measured with MRI and 3D-Transit was respectively 95 cm (75-153 cm) and 99 cm (77-147 cm), P = 0.15. Coefficient of variation (CV) between MRI and 3D-Transit was 7.8%. Apart from...... the cecum / ascending colon being 26% (P = 0.002) shorter on MRI, there were no other differences in total or segmental colorectal lengths between methods (all P > 0.05). Length of the colorectum measured with 3D-Transit on two consecutive days was 102 cm (73-119 cm) and 103 cm (75-123 cm), P = 0.67. CV...

  5. Waiting Time Distributions for Pattern Occurrence in a Constrained Sequence

    Directory of Open Access Journals (Sweden)

    Valeri Stefanov

    2007-01-01

    Full Text Available A binary sequence of zeros and ones is called a (d,k-sequence if it does not contain runs of zeros of length either less than d or greater than k, where d and k are arbitrary, but fixed, non-negative integers and d < k. Such sequences find an abundance of applications in communications, in particular for magnetic and optical recording. Occasionally, one requires that (d,k-sequences do not contain a specific pattern w. Therefore, distribution results concerning pattern occurrence in (d,k-sequences are of interest. In this paper we study the distribution of the waiting time until the r th occurrence of a pattern w in a random (d,k-sequence generated by a Markov source. Numerical examples are also provided.

  6. The genomic sequence of cowpea aphid-borne mosaic virus and its similarities with other potyviruses

    NARCIS (Netherlands)

    Mlotshwa, S.; Verver, J.; Sithole-Niang, I.; Kampen, van T.; Kammen, van A.; Wellink, J.

    2002-01-01

    The genomic sequence of a Zimbabwe isolate of Cowpea aphid-borne mosaic virus (CABMV-Z) was determined by sequencing overlapping viral cDNA clones generated by RT-PCR using degenerate and/or specific primers. The sequence is 9465 nucleotides in length excluding the 3' terminal poly (A) tail and

  7. ReRep: Computational detection of repetitive sequences in genome survey sequences (GSS

    Directory of Open Access Journals (Sweden)

    Alves-Ferreira Marcelo

    2008-09-01

    of the dataset, in particular to the length of sequencing reads and the genome coverage. ReRep is freely available for academic use at http://bioinfo.pdtis.fiocruz.br/ReRep/.

  8. Determination of Trichuris skrjabini by sequencing of the ITS1-5.8S-ITS2 segment of the ribosomal DNA: comparative molecular study of different species of trichurids.

    Science.gov (United States)

    Cutillas, C; Oliveros, R; de Rojas, M; Guevara, D C

    2004-06-01

    Adults of Trichuris skrjahini have been isolated from the cecum of caprine hosts (Capra hircus), Trichuris ovis and Trichuris globulosa from Ovis aries (sheep) and C. hircus (goats), and Trichuris leporis from Lepus europaeus (rabbits) in Spain. Genomic DNA was isolated and the ITS1-5.8S-ITS2 segment from the ribosomal DNA (rDNA) was amplified and sequenced by polymerase chain reaction (PCR) techniques. The ITS1 of T. skrjabini, T. ovis, T. globulosa, and T. leporis was 495, 757, 757, and 536 nucleotides in length, respectively, and had G + C contents of 59.6, 58.7, 58.7, and 60.8%, respectively. Intraindividual variation was detected in the ITSI sequences of the 4 species. Furthermore, the 5.8S sequences of T. skrjabini, T. ovis, T. globulosa, and T. leporis were compared. A total of 157, 152, 153, and 157 nucleotides in length was observed in the 5.8S sequences of these 4 species, respectively. There were no sequence differences of ITS1 and 5.8S products between T. ovis and T. globulosa. Nevertheless, clear differences were detected between the ITS1 sequences of T. skrjabini, T. ovis, T. leporis, Trichuris muris, and T. arvicolae. The ITS2 fragment from the rDNA of T. skrjabini was sequenced. A comparative study of the ITS2 sequence of T. skrjabini with the previously published ITS2 sequence data of T. ovis, T. leporis, T. muris, and T. arvicolae suggested that the combined use of sequence data from both spacers would be useful in the molecular characterization of trichurid parasites.

  9. First complete genome sequence of canine bocavirus 2 in mainland China

    Directory of Open Access Journals (Sweden)

    S.-L. Zhai

    2017-07-01

    Full Text Available We obtained the first full-length genome sequence of canine bocavirus 2 (CBoV2 from the faeces of a healthy dog in Guangzhou city, Guangdong province, mainland China. The genome of GZHD15 consisted of 5059 nucleotides. Sequence analysis suggested that GZHD15 was close to a previously circulated Hong Kong isolate.

  10. Application of Logic to Integer Sequences: A Survey

    Science.gov (United States)

    Makowsky, Johann A.

    Chomsky and Schützenberger showed in 1963 that the sequence d L (n), which counts the number of words of a given length n in a regular language L, satisfies a linear recurrence relation with constant coefficients for n, or equivalently, the generating function g_L(x)=sumn d_L(n) x^n is a rational function. In this talk we survey results concerning sequences a(n) of natural numbers which satisfy linear recurrence relations over ℤ or ℤ m , and

  11. Identification of two invasive Cacopsylla chinensis (Hemiptera: Psyllidae) lineages based on two mitochondrial sequences and restriction fragment length polymorphism of cytochrome oxidase I amplicon.

    Science.gov (United States)

    Lee, Hsien-Chung; Yang, Man-Miao; Yeh, Wen-Bin

    2008-08-01

    The occurrence of pear decline, a disease found in some pear (Pyrus spp.) orchards of Taiwan in recent years, is accompanied by an outbreak of Cacopsylla chinensis (Yang & Li). Two major morphological forms (summer and winter forms) with a variety of intermediate body color and two phylogenetic lineages of this psyllid have been described. The work herein used sequences of mitochondrial cytochrome oxidase I (COI) and 16S rDNA regions to delineate the genetic differentiation of this color-variable insect and to elucidate their relationship. Sequence divergence and phylogenetic analysis have shown that C. chinensis individuals could be divided into two lineages with 3.3 and 2.3% divergence of COI and 16S rDNA, respectively. All specimens from China were found to belong to lineage I. Restriction fragment length polymorphism analysis of COI with restriction enzymes AcuI, AseI, BccI, and FokI on 263 specimens of six populations from Taiwan produced two digestion patterns, which are in agreement with the two lineages described above. Both patterns could be found in each population, with most individuals belonging to lineage I and 5-21% of the individuals belonging to lineage II. Because these two lineages included summer as well as winter morphological forms, the lineage differentiation is apparently not related to morphological characters of this psyllid. Because the invasive records are not in favor of a sympatric differentiation, this psyllid is more likely introduced as different populations from countries in temperate regions.

  12. The effect of letter string length and report condition on letter recognition accuracy.

    Science.gov (United States)

    Raghunandan, Avesh; Karmazinaite, Berta; Rossow, Andrea S

    Letter sequence recognition accuracy has been postulated to be limited primarily by low-level visual factors. The influence of high level factors such as visual memory (load and decay) has been largely overlooked. This study provides insight into the role of these factors by investigating the interaction between letter sequence recognition accuracy, letter string length and report condition. Letter sequence recognition accuracy for trigrams and pentagrams were measured in 10 adult subjects for two report conditions. In the complete report condition subjects reported all 3 or all 5 letters comprising trigrams and pentagrams, respectively. In the partial report condition, subjects reported only a single letter in the trigram or pentagram. Letters were presented for 100ms and rendered in high contrast, using black lowercase Courier font that subtended 0.4° at the fixation distance of 0.57m. Letter sequence recognition accuracy was consistently higher for trigrams compared to pentagrams especially for letter positions away from fixation. While partial report increased recognition accuracy in both string length conditions, the effect was larger for pentagrams, and most evident for the final letter positions within trigrams and pentagrams. The effect of partial report on recognition accuracy for the final letter positions increased as eccentricity increased away from fixation, and was independent of the inner/outer position of a letter. Higher-level visual memory functions (memory load and decay) play a role in letter sequence recognition accuracy. There is also suggestion of additional delays imposed on memory encoding by crowded letter elements. Copyright © 2016 Spanish General Council of Optometry. Published by Elsevier España, S.L.U. All rights reserved.

  13. In situ detection of tandem DNA repeat length

    Energy Technology Data Exchange (ETDEWEB)

    Yaar, R.; Szafranski, P.; Cantor, C.R.; Smith, C.L. [Boston Univ., MA (United States)

    1996-11-01

    A simple method for scoring short tandem DNA repeats is presented. An oligonucleotide target, containing tandem repeats embedded in a unique sequence, was hybridized to a set of complementary probes, containing tandem repeats of known lengths. Single-stranded loop structures formed on duplexes containing a mismatched (different) number of tandem repeats. No loop structure formed on duplexes containing a matched (identical) number of tandem repeats. The matched and mismatched loop structures were enzymatically distinguished and differentially labeled by treatment with S1 nuclease and the Klenow fragment of DNA polymerase. 7 refs., 4 figs.

  14. Full-length genomic characterization and molecular evolution of canine parvovirus in China.

    Science.gov (United States)

    Zhou, Ling; Tang, Qinghai; Shi, Lijun; Kong, Miaomiao; Liang, Lin; Mao, Qianqian; Bu, Bin; Yao, Lunguang; Zhao, Kai; Cui, Shangjin; Leal, Élcio

    2016-06-01

    Canine parvovirus type 2 (CPV-2) can cause acute haemorrhagic enteritis in dogs and myocarditis in puppies. This disease has become one of the most serious infectious diseases of dogs. During 2014 in China, there were many cases of acute infectious diarrhoea in dogs. Some faecal samples were negative for the CPV-2 antigen based on a colloidal gold test strip but were positive based on PCR, and a viral strain was isolated from one such sample. The cytopathic effect on susceptible cells and the results of the immunoperoxidase monolayer assay, PCR, and sequencing indicated that the pathogen was CPV-2. The strain was named CPV-NY-14, and the full-length genome was sequenced and analysed. A maximum likelihood tree was constructed using the full-length genome and all available CPV-2 genomes. New strains have replaced the original strain in Taiwan and Italy, although the CPV-2a strain is still predominant there. However, CPV-2a still causes many cases of acute infectious diarrhoea in dogs in China.

  15. [Sequencing and analysis of complete genome of rabies viruses isolated from Chinese Ferret-Badger and dog in Zhejiang province].

    Science.gov (United States)

    Lei, Yong-Liang; Wang, Xiao-Guang; Tao, Xiao-Yan; Li, Hao; Meng, Sheng-Li; Chen, Xiu-Ying; Liu, Fu-Ming; Ye, Bi-Feng; Tang, Qing

    2010-01-01

    Based on sequencing the full-length genomes of four Chinese Ferret-Badger and dog, we analyze the properties of rabies viruses genetic variation in molecular level, get the information about rabies viruses prevalence and variation in Zhejiang, and enrich the genome database of rabies viruses street strains isolated from China. Rabies viruses in suckling mice were isolated, overlapped fragments were amplified by RT-PCR and full-length genomes were assembled to analyze the nucleotide and deduced protein similarities and phylogenetic analyses from Chinese Ferret-Badger, dog, sika deer, vole, used vaccine strain were determined. The four full-length genomes were sequenced completely and had the same genetic structure with the length of 11, 923 nts or 11, 925 nts including 58 nts-Leader, 1353 nts-NP, 894 nts-PP, 609 nts-MP, 1575 nts-GP, 6386 nts-LP, and 2, 5, 5 nts- intergenic regions(IGRs), 423 nts-Pseudogene-like sequence (psi), 70 nts-Trailer. The four full-length genomes were in accordance with the properties of Rhabdoviridae Lyssa virus by BLAST and multi-sequence alignment. The nucleotide and amino acid sequences among Chinese strains had the highest similarity, especially among animals of the same species. Of the four full-length genomes, the similarity in amino acid level was dramatically higher than that in nucleotide level, so the nucleotide mutations happened in these four genomes were most synonymous mutations. Compared with the reference rabies viruses, the lengths of the five protein coding regions had no change, no recombination, only with a few point mutations. It was evident that the five proteins appeared to be stable. The variation sites and types of the four genomes were similar to the reference vaccine or street strains. And the four strains were genotype 1 according to the multi-sequence and phylogenetic analyses, which possessed the distinct district characteristics of China. Therefore, these four rabies viruses are likely to be street viruses

  16. Neighborhood Disadvantage and Telomere Length: Results from the Fragile Families Study

    Directory of Open Access Journals (Sweden)

    Douglas S. Massey

    2018-04-01

    Full Text Available Telomeres are repetitive nucleotide sequences located at the ends of chromosomes that protect genetic material. We use data from the Fragile Families and Child Wellbeing Study to analyze the relationship between exposure to spatially concentrated disadvantage and telomere length for white and black mothers. We find that neighborhood disadvantage is associated with shorter telomere length for mothers of both races. This finding highlights a potential mechanism through which the unique spatially concentrated disadvantage faced by African Americans contributes to racial health disparities. We conclude that equalizing the health and socioeconomic status of black and white Americans will be very difficult without reducing levels of residential segregation in the United States.

  17. Sequence comparison alignment-free approach based on suffix tree and L-words frequency.

    Science.gov (United States)

    Soares, Inês; Goios, Ana; Amorim, António

    2012-01-01

    The vast majority of methods available for sequence comparison rely on a first sequence alignment step, which requires a number of assumptions on evolutionary history and is sometimes very difficult or impossible to perform due to the abundance of gaps (insertions/deletions). In such cases, an alternative alignment-free method would prove valuable. Our method starts by a computation of a generalized suffix tree of all sequences, which is completed in linear time. Using this tree, the frequency of all possible words with a preset length L-L-words--in each sequence is rapidly calculated. Based on the L-words frequency profile of each sequence, a pairwise standard Euclidean distance is then computed producing a symmetric genetic distance matrix, which can be used to generate a neighbor joining dendrogram or a multidimensional scaling graph. We present an improvement to word counting alignment-free approaches for sequence comparison, by determining a single optimal word length and combining suffix tree structures to the word counting tasks. Our approach is, thus, a fast and simple application that proved to be efficient and powerful when applied to mitochondrial genomes. The algorithm was implemented in Python language and is freely available on the web.

  18. EXONSAMPLER: a computer program for genome-wide and candidate gene exon sampling for targeted next-generation sequencing.

    Science.gov (United States)

    Cosart, Ted; Beja-Pereira, Albano; Luikart, Gordon

    2014-11-01

    The computer program EXONSAMPLER automates the sampling of thousands of exon sequences from publicly available reference genome sequences and gene annotation databases. It was designed to provide exon sequences for the efficient, next-generation gene sequencing method called exon capture. The exon sequences can be sampled by a list of gene name abbreviations (e.g. IFNG, TLR1), or by sampling exons from genes spaced evenly across chromosomes. It provides a list of genomic coordinates (a bed file), as well as a set of sequences in fasta format. User-adjustable parameters for collecting exon sequences include a minimum and maximum acceptable exon length, maximum number of exonic base pairs (bp) to sample per gene, and maximum total bp for the entire collection. It allows for partial sampling of very large exons. It can preferentially sample upstream (5 prime) exons, downstream (3 prime) exons, both external exons, or all internal exons. It is written in the Python programming language using its free libraries. We describe the use of EXONSAMPLER to collect exon sequences from the domestic cow (Bos taurus) genome for the design of an exon-capture microarray to sequence exons from related species, including the zebu cow and wild bison. We collected ~10% of the exome (~3 million bp), including 155 candidate genes, and ~16,000 exons evenly spaced genomewide. We prioritized the collection of 5 prime exons to facilitate discovery and genotyping of SNPs near upstream gene regulatory DNA sequences, which control gene expression and are often under natural selection. © 2014 John Wiley & Sons Ltd.

  19. Functional characterization of a full length pregnane X receptor, expression in vivo, and identification of PXR alleles, in Zebrafish (Danio rerio)

    Energy Technology Data Exchange (ETDEWEB)

    Bainy, Afonso C.D. [Biology Department, Woods Hole Oceanographic Institution, Woods Hole, MA 02543 (United States); Departamento de Bioquímica, CCB, Universidade Federal de Santa Catarina, Florianópolis, SC 88040-900 (Brazil); Kubota, Akira; Goldstone, Jared V. [Biology Department, Woods Hole Oceanographic Institution, Woods Hole, MA 02543 (United States); Lille-Langøy, Roger [Department of Biology, University of Bergen, N-5020 Bergen (Norway); Karchner, Sibel I. [Biology Department, Woods Hole Oceanographic Institution, Woods Hole, MA 02543 (United States); Celander, Malin C. [Department of Biological and Environmental Sciences, University of Gothenburg, SE 405 30 Göteborg (Sweden); Hahn, Mark E. [Biology Department, Woods Hole Oceanographic Institution, Woods Hole, MA 02543 (United States); Goksøyr, Anders [Department of Biology, University of Bergen, N-5020 Bergen (Norway); Stegeman, John J., E-mail: jstegeman@whoi.edu [Biology Department, Woods Hole Oceanographic Institution, Woods Hole, MA 02543 (United States)

    2013-10-15

    Highlights: •Full-length pxr has been cloned from zebrafish. •Alleles of pxr were identified in zebrafish. •Full length Pxr was activated less strongly than ligand binding domain in cell-based reporter assays. •High levels of pxr expression were found in eye and brain as well as in liver. •TCPOBOP and PB did not significantly alter expression of pxr in liver. -- Abstract: The pregnane X receptor (PXR) (nuclear receptor NR1I2) is a ligand activated transcription factor, mediating responses to diverse xenobiotic and endogenous chemicals. The properties of PXR in fish are not fully understood. Here we report on cloning and characterization of full-length PXR of zebrafish, Danio rerio, and pxr expression in vivo. Initial efforts gave a cDNA encoding a 430 amino acid protein identified as zebrafish pxr by phylogenetic and synteny analysis. The sequence of the cloned Pxr DNA binding domain (DBD) was highly conserved, with 74% identity to human PXR-DBD, while the ligand-binding domain (LBD) of the cloned sequence was only 44% identical to human PXR-LBD. Sequence variation among clones in the initial effort prompted sequencing of multiple clones from a single fish. There were two prominent variants, one sequence with S183, Y218 and H383 and the other with I183, C218 and N383, which we designate as alleles pxr*1 (nr1i2*1) and pxr*2 (nr1i2*2), respectively. In COS-7 cells co-transfected with a PXR-responsive reporter gene, the full-length Pxr*1 (the more common variant) was activated by known PXR agonists clotrimazole and pregnenolone 16α-carbonitrile but to a lesser extent than the full-length human PXR. Activation of full-length Pxr*1 was only 10% of that with the Pxr*1 LBD. Quantitative real time PCR analysis showed prominent expression of pxr in liver and eye, as well as brain and intestine of adult zebrafish. The pxr was expressed in heart and kidney at levels similar to that in intestine. The expression of pxr in liver was weakly induced by ligands for

  20. Cloning, sequencing, and expression of cDNA for human β-glucuronidase

    International Nuclear Information System (INIS)

    Oshima, A.; Kyle, J.W.; Miller, R.D.

    1987-01-01

    The authors report here the cDNA sequence for human placental β-glucuronidase (β-D-glucuronoside glucuronosohydrolase, EC 3.2.1.31) and demonstrate expression of the human enzyme in transfected COS cells. They also sequenced a partial cDNA clone from human fibroblasts that contained a 153-base-pair deletion within the coding sequence and found a second type of cDNA clone from placenta that contained the same deletion. Nuclease S1 mapping studies demonstrated two types of mRNAs in human placenta that corresponded to the two types of cDNA clones isolated. The NH 2 -terminal amino acid sequence determined for human spleen β-glucuronidase agreed with that inferred from the DNA sequence of the two placental clones, beginning at amino acid 23, suggesting a cleaved signal sequence of 22 amino acids. When transfected into COS cells, plasmids containing either placental clone expressed an immunoprecipitable protein that contained N-linked oligosaccharides as evidenced by sensitivity to endoglycosidase F. However, only transfection with the clone containing the 153-base-pair segment led to expression of human β-glucuronidase activity. These studies provide the sequence for the full-length cDNA for human β-glucuronidase, demonstrate the existence of two populations of mRNA for β-glucuronidase in human placenta, only one of which specifies a catalytically active enzyme, and illustrate the importance of expression studies in verifying that a cDNA is functionally full-length

  1. Variable length adjacent partitioning for PTS based PAPR reduction of OFDM signal

    International Nuclear Information System (INIS)

    Ibraheem, Zeyid T.; Rahman, Md. Mijanur; Yaakob, S. N.; Razalli, Mohammad Shahrazel; Kadhim, Rasim A.

    2015-01-01

    Peak-to-Average power ratio (PAPR) is a major drawback in OFDM communication. It leads the power amplifier into nonlinear region operation resulting into loss of data integrity. As such, there is a strong motivation to find techniques to reduce PAPR. Partial Transmit Sequence (PTS) is an attractive scheme for this purpose. Judicious partitioning the OFDM data frame into disjoint subsets is a pivotal component of any PTS scheme. Out of the existing partitioning techniques, adjacent partitioning is characterized by an attractive trade-off between cost and performance. With an aim of determining effects of length variability of adjacent partitions, we performed an investigation into the performances of a variable length adjacent partitioning (VL-AP) and fixed length adjacent partitioning in comparison with other partitioning schemes such as pseudorandom partitioning. Simulation results with different modulation and partitioning scenarios showed that fixed length adjacent partition had better performance compared to variable length adjacent partitioning. As expected, simulation results showed a slightly better performance of pseudorandom partitioning technique compared to fixed and variable adjacent partitioning schemes. However, as the pseudorandom technique incurs high computational complexities, adjacent partitioning schemes were still seen as favorable candidates for PAPR reduction

  2. Variable length adjacent partitioning for PTS based PAPR reduction of OFDM signal

    Energy Technology Data Exchange (ETDEWEB)

    Ibraheem, Zeyid T.; Rahman, Md. Mijanur; Yaakob, S. N.; Razalli, Mohammad Shahrazel; Kadhim, Rasim A. [School of Computer and Communication Engineering, Universiti Malaysia Perlis, 02600 Arau, Perlis (Malaysia)

    2015-05-15

    Peak-to-Average power ratio (PAPR) is a major drawback in OFDM communication. It leads the power amplifier into nonlinear region operation resulting into loss of data integrity. As such, there is a strong motivation to find techniques to reduce PAPR. Partial Transmit Sequence (PTS) is an attractive scheme for this purpose. Judicious partitioning the OFDM data frame into disjoint subsets is a pivotal component of any PTS scheme. Out of the existing partitioning techniques, adjacent partitioning is characterized by an attractive trade-off between cost and performance. With an aim of determining effects of length variability of adjacent partitions, we performed an investigation into the performances of a variable length adjacent partitioning (VL-AP) and fixed length adjacent partitioning in comparison with other partitioning schemes such as pseudorandom partitioning. Simulation results with different modulation and partitioning scenarios showed that fixed length adjacent partition had better performance compared to variable length adjacent partitioning. As expected, simulation results showed a slightly better performance of pseudorandom partitioning technique compared to fixed and variable adjacent partitioning schemes. However, as the pseudorandom technique incurs high computational complexities, adjacent partitioning schemes were still seen as favorable candidates for PAPR reduction.

  3. K2 and K2*: efficient alignment-free sequence similarity measurement based on Kendall statistics.

    Science.gov (United States)

    Lin, Jie; Adjeroh, Donald A; Jiang, Bing-Hua; Jiang, Yue

    2018-05-15

    Alignment-free sequence comparison methods can compute the pairwise similarity between a huge number of sequences much faster than sequence-alignment based methods. We propose a new non-parametric alignment-free sequence comparison method, called K2, based on the Kendall statistics. Comparing to the other state-of-the-art alignment-free comparison methods, K2 demonstrates competitive performance in generating the phylogenetic tree, in evaluating functionally related regulatory sequences, and in computing the edit distance (similarity/dissimilarity) between sequences. Furthermore, the K2 approach is much faster than the other methods. An improved method, K2*, is also proposed, which is able to determine the appropriate algorithmic parameter (length) automatically, without first considering different values. Comparative analysis with the state-of-the-art alignment-free sequence similarity methods demonstrates the superiority of the proposed approaches, especially with increasing sequence length, or increasing dataset sizes. The K2 and K2* approaches are implemented in the R language as a package and is freely available for open access (http://community.wvu.edu/daadjeroh/projects/K2/K2_1.0.tar.gz). yueljiang@163.com. Supplementary data are available at Bioinformatics online.

  4. Generation of expressed sequence tags for discovery of genes responsible for floral traits of Chrysanthemum morifolium by next-generation sequencing technology.

    Science.gov (United States)

    Sasaki, Katsutomo; Mitsuda, Nobutaka; Nashima, Kenji; Kishimoto, Kyutaro; Katayose, Yuichi; Kanamori, Hiroyuki; Ohmiya, Akemi

    2017-09-04

    Chrysanthemum morifolium is one of the most economically valuable ornamental plants worldwide. Chrysanthemum is an allohexaploid plant with a large genome that is commercially propagated by vegetative reproduction. New cultivars with different floral traits, such as color, morphology, and scent, have been generated mainly by classical cross-breeding and mutation breeding. However, only limited genetic resources and their genome information are available for the generation of new floral traits. To obtain useful information about molecular bases for floral traits of chrysanthemums, we read expressed sequence tags (ESTs) of chrysanthemums by high-throughput sequencing using the 454 pyrosequencing technology. We constructed normalized cDNA libraries, consisting of full-length, 3'-UTR, and 5'-UTR cDNAs derived from various tissues of chrysanthemums. These libraries produced a total number of 3,772,677 high-quality reads, which were assembled into 213,204 contigs. By comparing the data obtained with those of full genome-sequenced species, we confirmed that our chrysanthemum contig set contained the majority of all expressed genes, which was sufficient for further molecular analysis in chrysanthemums. We confirmed that our chrysanthemum EST set (contigs) contained a number of contigs that encoded transcription factors and enzymes involved in pigment and aroma compound metabolism that was comparable to that of other species. This information can serve as an informative resource for identifying genes involved in various biological processes in chrysanthemums. Moreover, the findings of our study will contribute to a better understanding of the floral characteristics of chrysanthemums including the myriad cultivars at the molecular level.

  5. Whole Genome Sequence Analysis of an Alachlor and Endosulfan Degrading Micrococcus sp. strain 2385 Isolated from Ochlockonee River, Florida.

    Science.gov (United States)

    Pathak, Ashish; Chauhan, Ashvini; Ewida, Ayman Y I; Stothard, Paul

    2016-01-01

    We recently isolated Micrococcus sp. strain 2385 from Ochlockonee River, Florida and demonstrated potent biodegradative activity against two commonly used pesticides- alachlor [(2-chloro-2`,6`-diethylphenyl-N (methoxymethyl)acetanilide)] and endosulfan [(6,7,8,9,10,10-hexachloro-1,5,5a,6,9,9a-hexahydro-6,9methano-2,3,4-benzo(e)di-oxathiepin-3-oxide], respectively. To further identify the repertoire of metabolic functions possessed by strain 2385, a draft genome sequence was obtained, assembled, annotated and analyzed. The genome sequence of Micrococcus sp. strain 2385 consisted of 1,460,461,440 bases which assembled into 175 contigs with an N50 contig length of 50,109 bases and a coverage of 600x. The genome size of this strain was estimated at 2,431,226 base pairs with a G+C content of 72.8 and a total number of 2,268 putative genes. RAST annotated a total of 340 subsystems in the genome of strain 2385 along with the presence of 2,177 coding sequences. A genome wide survey indicated that that strain 2385 harbors a plethora of genes to degrade other pollutants including caprolactam, PAHs (such as naphthalene), styrene, toluene and several chloroaromatic compounds.

  6. Differences in caregiver daily impression by sex, education and career length.

    Science.gov (United States)

    Ae, Ryusuke; Kojo, Takao; Kotani, Kazuhiko; Okayama, Masanobu; Kuwabara, Masanari; Makino, Nobuko; Aoyama, Yasuko; Sano, Takashi; Nakamura, Yosikazu

    2017-03-01

    We previously proposed the concept of caregiver daily impression (CDI) as a practical tool for emergency triage. We herein assessed how CDI varies by sex, education and career length by determining CDI scores as quantitative outcome measures. We carried out a cross-sectional study using a self-reported questionnaire among caregivers in 20 long-term care facilities in Hyogo, Japan. A total of 10 CDI variables measured participants' previous experience of emergency transfers using a scale from 0-10. The resulting total was defined as the CDI score. We hypothetically considered that higher scores indicated greater caregiver focus. The CDI scores were compared by sex, education and career length using analysis of covariance. A total of 601 personal caregivers were evaluated (mean age 36.7 years; 36% men). The mean career length was 6.9 years, with the following groupings: 1-4 years (38%), 5-9 years (37%) and >10 years (24%). After adjustment for sex and education, the CDI scores for the variable, "poor eye contact," significantly differed between caregivers with ≥10 and Sex-related differences in CDI might also exist. Geriatr Gerontol Int 2016; 17: 410-415. © 2016 Japan Geriatrics Society.

  7. Sequencing and analysis of the Mediterranean amphioxus (Branchiostoma lanceolatum transcriptome.

    Directory of Open Access Journals (Sweden)

    Silvan Oulion

    Full Text Available BACKGROUND: The basally divergent phylogenetic position of amphioxus (Cephalochordata, as well as its conserved morphology, development and genetics, make it the best proxy for the chordate ancestor. Particularly, studies using the amphioxus model help our understanding of vertebrate evolution and development. Thus, interest for the amphioxus model led to the characterization of both the transcriptome and complete genome sequence of the American species, Branchiostoma floridae. However, recent technical improvements allowing induction of spawning in the laboratory during the breeding season on a daily basis with the Mediterranean species Branchiostoma lanceolatum have encouraged European Evo-Devo researchers to adopt this species as a model even though no genomic or transcriptomic data have been available. To fill this need we used the pyrosequencing method to characterize the B. lanceolatum transcriptome and then compared our results with the published transcriptome of B. floridae. RESULTS: Starting with total RNA from nine different developmental stages of B. lanceolatum, a normalized cDNA library was constructed and sequenced on Roche GS FLX (Titanium mode. Around 1.4 million of reads were produced and assembled into 70,530 contigs (average length of 490 bp. Overall 37% of the assembled sequences were annotated by BlastX and their Gene Ontology terms were determined. These results were then compared to genomic and transcriptomic data of B. floridae to assess similarities and specificities of each species. CONCLUSION: We obtained a high-quality amphioxus (B. lanceolatum reference transcriptome using a high throughput sequencing approach. We found that 83% of the predicted genes in the B. floridae complete genome sequence are also found in the B. lanceolatum transcriptome, while only 41% were found in the B. floridae transcriptome obtained with traditional Sanger based sequencing. Therefore, given the high degree of sequence conservation

  8. Comparative study of freshwater crayfish, Cherax spp. (crustaceae: decapoda: parastacidae) from Papua, Indonesia based on length-weight analysis

    Science.gov (United States)

    Hamidah, H.; Abinawanto, Bowolaksono, A.

    2017-07-01

    The freshwater crayfish is one of the most important fish species as the protein resources. Lake and rivers are the habitat of crayfish in Papua. Morphological characters of crayfish, such as color, total body lengths (L) and body weight (W) were influenced by the habitat. The purpose of the study, therefore, was to compare the total body length and body weight as well as the unique color of crayfish from Uter lake (Atinjo district), Seremuk river (Haha village), Baliem river (Pike village; Hubukiak district, Jayawijaya), and Baliem river (Wesaput village; Wesaput district). Length-weight (body length; LB versus wet weight; WWT) relationships were determined for male and female crayfish (Cherax spp.) The length-weight relationships of total individuals was W = 0,022215.L3,159. This regression differed significantly (R2 = 97.5 %) between locations. Both males and females exhibited positive allometric growth as statistical difference was observed in the mean of the wet weight and body length between males and females. Besides, Canonical function was subjected to determine population distribution based on length-weight data.

  9. Length-weight relationship of Giant Oyster, Crassostrea gyphoides (Schlotheim)

    Digital Repository Service at National Institute of Oceanography (India)

    Chatterji, A.; Ansari, Z.A.; Ingole, B.S.; Parulekar, A.H.

    Relationship between shell length and total weight, shell weight and meat weight of giant oyster, Crassostrea gryphoides revealed that the growth of these parameters is very fast and significant. It indicates the suitability of the species concerned...

  10. Insight into biases and sequencing errors for amplicon sequencing with the Illumina MiSeq platform.

    Science.gov (United States)

    Schirmer, Melanie; Ijaz, Umer Z; D'Amore, Rosalinda; Hall, Neil; Sloan, William T; Quince, Christopher

    2015-03-31

    With read lengths of currently up to 2 × 300 bp, high throughput and low sequencing costs Illumina's MiSeq is becoming one of the most utilized sequencing platforms worldwide. The platform is manageable and affordable even for smaller labs. This enables quick turnaround on a broad range of applications such as targeted gene sequencing, metagenomics, small genome sequencing and clinical molecular diagnostics. However, Illumina error profiles are still poorly understood and programs are therefore not designed for the idiosyncrasies of Illumina data. A better knowledge of the error patterns is essential for sequence analysis and vital if we are to draw valid conclusions. Studying true genetic variation in a population sample is fundamental for understanding diseases, evolution and origin. We conducted a large study on the error patterns for the MiSeq based on 16S rRNA amplicon sequencing data. We tested state-of-the-art library preparation methods for amplicon sequencing and showed that the library preparation method and the choice of primers are the most significant sources of bias and cause distinct error patterns. Furthermore we tested the efficiency of various error correction strategies and identified quality trimming (Sickle) combined with error correction (BayesHammer) followed by read overlapping (PANDAseq) as the most successful approach, reducing substitution error rates on average by 93%. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  11. Length-Weight relationships of mangrove oyster, Crassostrea gasar ...

    African Journals Online (AJOL)

    The length-weight relationships (LWRs) of mangrove oyster, Crassostrea gasar, cultured under continuous and periodic submergence in tidal ponds, for a period of seven months, February to August 2010 were determined. A total of 375 individuals each of pond A (continuous submergence) and B (periodic submergence) ...

  12. The complete chloroplast genome sequence of Maddenia hypoleuca koehne (Prunoideae, Rosaceae).

    Science.gov (United States)

    Chen, Tao; Zhang, Jing; Liu, Yin; Wang, Hao; Wang, Juan; Chen, Qing; Tang, Hao-Ru; Wang, Xiao-Rong

    2016-11-01

    Maddenia hypoleuca Koehne belonging to family Rosaceae is a native species in China. The complete chloroplast (cp) genome was generated by de novo assembly using low coverage whole genome sequencing data and manual correction. The cp genome was 158 084 bp in length, with GC content of 36.63%. It exhibited a typical quadripartite structure: a pair of large inverted repeat regions (IRs, 26 246 bp each), a large single-copy region (LSC, 86 713 bp), and a small single-copy region (SSC, 18 879 bp). A total of 114 genes were predicted, which included 80 protein-coding genes, 30 tRNA genes, and four rRNA genes. Phylogenetic analysis indicated that M. hypoleuca is most closely related to Prunus padus within the Prunoideae subfamily, which conforms to the traditional classification.

  13. [Complete genome sequencing and analyses of rabies viruses isolated from wild animals (Chinese Ferret-Badger) in Zhejiang province].

    Science.gov (United States)

    Lei, Yong-Liang; Wang, Xiao-Guang; Liu, Fu-Ming; Chen, Xiu-Ying; Ye, Bi-Feng; Mei, Jian-Hua; Lan, Jin-Quan; Tang, Qing

    2009-08-01

    Based on sequencing the full-length genomes of two Chinese Ferret-Badger, we analyzed the properties of rabies viruses genetic variation in molecular level to get information on prevalence and variation of rabies viruses in Zhejiang, and to enrich the genome database of rabies viruses street strains isolated from Chinese wildlife. Overlapped fragments were amplified by RT-PCR and full-length genomes were assembled to analyze the nucleotide and deduced protein similarities and phylogenetic analyses of the N genes from Chinese Ferret-Badger, sika deer, vole, dog. Vaccine strains were then determined. The two full-length genomes were completely sequenced to find out that they had the same genetic structure with 11 923 nts including 58 nts-Leader, 1353 nts-NP, 894 nts-PP, 609 nts-MP, 1575 nts-GP, 6386 nts-LP, and 2, 5, 5 nts- intergenic regions (IGRs), 423 nts-Pseudogene-like sequence (Psi), 70 nts-Trailer. The two full-length genomes were in accordance with the properties of Rhabdoviridae Lyssa virus by blast and multi-sequence alignment. The nucleotide and amino acid sequences among Chinese strains had the highest similarity, especially among animals of the same species. Of the two full-length genomes, the similarity in amino acid level was dramatically higher than that in nucleotide level, so that the nucleotide mutations happened in these two genomes were most probably as synonymous mutations. Compared to the referenced rabies viruses, the lengths of the five protein coding regions did not show any changes or recombination, but only with a few-point mutations. It was evident that the five proteins appeared to be stable. The variation sites and types of the two ferret badgers genomes were similar to the referenced vaccine or street strains. The two strains were genotype 1 according to the multi-sequence and phylogenetic analyses, which possessing the distinct geographyphic characteristics of China. All the evidence suggested a cue that these two ferret badgers

  14. RAPD and Internal Transcribed Spacer Sequence Analyses Reveal Zea nicaraguensis as a Section Luxuriantes Species Close to Zea luxurians

    Science.gov (United States)

    Wang, Pei; Lu, Yanli; Zheng, Mingmin; Rong, Tingzhao; Tang, Qilin

    2011-01-01

    Genetic relationship of a newly discovered teosinte from Nicaragua, Zea nicaraguensis with waterlogging tolerance, was determined based on randomly amplified polymorphic DNA (RAPD) markers and the internal transcribed spacer (ITS) sequences of nuclear ribosomal DNA using 14 accessions from Zea species. RAPD analysis showed that a total of 5,303 fragments were produced by 136 random decamer primers, of which 84.86% bands were polymorphic. RAPD-based UPGMA analysis demonstrated that the genus Zea can be divided into section Luxuriantes including Zea diploperennis, Zea luxurians, Zea perennis and Zea nicaraguensis, and section Zea including Zea mays ssp. mexicana, Zea mays ssp. parviglumis, Zea mays ssp. huehuetenangensis and Zea mays ssp. mays. ITS sequence analysis showed the lengths of the entire ITS region of the 14 taxa in Zea varied from 597 to 605 bp. The average GC content was 67.8%. In addition to the insertion/deletions, 78 variable sites were recorded in the total ITS region with 47 in ITS1, 5 in 5.8S, and 26 in ITS2. Sequences of these taxa were analyzed with neighbor-joining (NJ) and maximum parsimony (MP) methods to construct the phylogenetic trees, selecting Tripsacum dactyloides L. as the outgroup. The phylogenetic relationships of Zea species inferred from the ITS sequences are highly concordant with the RAPD evidence that resolved two major subgenus clades. Both RAPD and ITS sequence analyses indicate that Zea nicaraguensis is more closely related to Zea luxurians than the other teosintes and cultivated maize, which should be regarded as a section Luxuriantes species. PMID:21525982

  15. Practical multipeptide synthesis: dedicated software for the definition of multiple, overlapping peptides covering polypeptide sequences.

    Science.gov (United States)

    Heegaard, P M; Holm, A; Hagerup, M

    1993-01-01

    A personal computer program for the conversion of linear amino acid sequences to multiple, small, overlapping peptide sequences has been developed. Peptide lengths and "jumps" (the distance between two consecutive overlapping peptides) are defined by the user. To facilitate the use of the program for parallel solid-phase chemical peptide syntheses for the synchronous production of multiple peptides, amino acids at each acylation step are laid out by the program in a convenient standard multi-well setup. Also, the total number of equivalents, as well as the derived amount in milligrams (depend-ending on user-defined equivalent weights and molar surplus), of each amino acid are given. The program facilitates the implementation of multipeptide synthesis, e.g., for the elucidation of polypeptide structure-function relationships, and greatly reduces the risk of introducing mistakes at the planning step. It is written in Pascal and runs on any DOS-based personal computer. No special graphic display is needed.

  16. A simple method for estimating the length density of convoluted tubular systems.

    Science.gov (United States)

    Ferraz de Carvalho, Cláudio A; de Campos Boldrini, Silvia; Nishimaru, Flávio; Liberti, Edson A

    2008-10-01

    We present a new method for estimating the length density (Lv) of convoluted tubular structures exhibiting an isotropic distribution. Although the traditional equation Lv=2Q/A is used, the parameter Q is obtained by considering the collective perimeters of tubular sections. This measurement is converted to a standard model of the structure, assuming that all cross-sections are approximately circular and have an average perimeter similar to that of actual circular cross-sections observed in the same material. The accuracy of this method was tested in eight experiments using hollow macaroni bent into helical shapes. After measuring the length of the macaroni segments, they were boiled and randomly packed into cylindrical volumes along with an aqueous suspension of gelatin and India ink. The solidified blocks were cut into slices 1.0 cm thick and 33.2 cm2 in area (A). The total perimeter of the macaroni cross-sections so revealed was stereologically estimated using a test system of straight parallel lines. Given Lv and the reference volume, the total length of macaroni in each section could be estimated. Additional corrections were made for the changes induced by boiling, and the off-axis position of the thread used to measure length. No statistical difference was observed between the corrected estimated values and the actual lengths. This technique is useful for estimating the length of capillaries, renal tubules, and seminiferous tubules.

  17. Length and GC content variability of introns among teleostean genomes in the light of the metabolic rate hypothesis.

    Science.gov (United States)

    Chaurasia, Ankita; Tarallo, Andrea; Bernà, Luisa; Yagi, Mitsuharu; Agnisola, Claudio; D'Onofrio, Giuseppe

    2014-01-01

    A comparative analysis of five teleostean genomes, namely zebrafish, medaka, three-spine stickleback, fugu and pufferfish was performed with the aim to highlight the nature of the forces driving both length and base composition of introns (i.e., bpi and GCi). An inter-genome approach using orthologous intronic sequences was carried out, analyzing independently both variables in pairwise comparisons. An average length shortening of introns was observed at increasing average GCi values. The result was not affected by masking transposable and repetitive elements harbored in the intronic sequences. The routine metabolic rate (mass specific temperature-corrected using the Boltzmann's factor) was measured for each species. A significant correlation held between average differences of metabolic rate, length and GC content, while environmental temperature of fish habitat was not correlated with bpi and GCi. Analyzing the concomitant effect of both variables, i.e., bpi and GCi, at increasing genomic GC content, a decrease of bpi and an increase of GCi was observed for the significant majority of the intronic sequences (from ∼ 40% to ∼ 90%, in each pairwise comparison). The opposite event, concomitant increase of bpi and decrease of GCi, was counter selected (from hypothesis that the metabolic rate plays a key role in shaping genome architecture and evolution of vertebrate genomes.

  18. A putative peroxidase cDNA from turnip and analysis of the encoded protein sequence.

    Science.gov (United States)

    Romero-Gómez, S; Duarte-Vázquez, M A; García-Almendárez, B E; Mayorga-Martínez, L; Cervantes-Avilés, O; Regalado, C

    2008-12-01

    A putative peroxidase cDNA was isolated from turnip roots (Brassica napus L. var. purple top white globe) by reverse transcriptase-polymerase chain reaction (RT-PCR) and rapid amplification of cDNA ends (RACE). Total RNA extracted from mature turnip roots was used as a template for RT-PCR, using a degenerated primer designed to amplify the highly conserved distal motif of plant peroxidases. The resulting partial sequence was used to design the rest of the specific primers for 5' and 3' RACE. Two cDNA fragments were purified, sequenced, and aligned with the partial sequence from RT-PCR, and a complete overlapping sequence was obtained and labeled as BbPA (Genbank Accession No. AY423440, named as podC). The full length cDNA is 1167bp long and contains a 1077bp open reading frame (ORF) encoding a 358 deduced amino acid peroxidase polypeptide. The putative peroxidase (BnPA) showed a calculated Mr of 34kDa, and isoelectric point (pI) of 4.5, with no significant identity with other reported turnip peroxidases. Sequence alignment showed that only three peroxidases have a significant identity with BnPA namely AtP29a (84%), and AtPA2 (81%) from Arabidopsis thaliana, and HRPA2 (82%) from horseradish (Armoracia rusticana). Work is in progress to clone this gene into an adequate host to study the specific role and possible biotechnological applications of this alternative peroxidase source.

  19. Persistence length of wormlike micelles composed of ionic surfactants: self-consistent-field predictions

    NARCIS (Netherlands)

    Lauw, Y.; Leermakers, F.A.M.; Cohen Stuart, M.A.

    2007-01-01

    The persistence length of a wormlike micelle composed of ionic surfactants CnEmXk in an aqueous solvent is predicted by means of the self-consistent-field theory where CnEm is the conventional nonionic surfactant and X-k is an additional sequence of k weakly charged (pH-dependent) segments. By

  20. Analysis of Pteridium ribosomal RNA sequences by rapid direct sequencing.

    Science.gov (United States)

    Tan, M K

    1991-08-01

    A total of 864 bases from 5 regions interspersed in the 18S and 26S rRNA molecules from various clones of Pteridium covering the general geographical distribution of the genus was analysed using a rapid rRNA sequencing technique. No base difference has been detected amongst the three major lineages, two of which apparently separated before the breakup of the ancient supercontinent, Pangaea. These regions of the rRNA sequences have thus been conserved for at least 160 million years and are here compared with other eukaryotic, especially plant rRNAs.

  1. Prevalence of single nucleotide polymorphism among 27 diverse alfalfa genotypes as assessed by transcriptome sequencing

    Directory of Open Access Journals (Sweden)

    Li Xuehui

    2012-10-01

    Full Text Available Abstract Background Alfalfa, a perennial, outcrossing species, is a widely planted forage legume producing highly nutritious biomass. Currently, improvement of cultivated alfalfa mainly relies on recurrent phenotypic selection. Marker assisted breeding strategies can enhance alfalfa improvement efforts, particularly if many genome-wide markers are available. Transcriptome sequencing enables efficient high-throughput discovery of single nucleotide polymorphism (SNP markers for a complex polyploid species. Result The transcriptomes of 27 alfalfa genotypes, including elite breeding genotypes, parents of mapping populations, and unimproved wild genotypes, were sequenced using an Illumina Genome Analyzer IIx. De novo assembly of quality-filtered 72-bp reads generated 25,183 contigs with a total length of 26.8 Mbp and an average length of 1,065 bp, with an average read depth of 55.9-fold for each genotype. Overall, 21,954 (87.2% of the 25,183 contigs represented 14,878 unique protein accessions. Gene ontology (GO analysis suggested that a broad diversity of genes was represented in the resulting sequences. The realignment of individual reads to the contigs enabled the detection of 872,384 SNPs and 31,760 InDels. High resolution melting (HRM analysis was used to validate 91% of 192 putative SNPs identified by sequencing. Both allelic variants at about 95% of SNP sites identified among five wild, unimproved genotypes are still present in cultivated alfalfa, and all four US breeding programs also contain a high proportion of these SNPs. Thus, little evidence exists among this dataset for loss of significant DNA sequence diversity from either domestication or breeding of alfalfa. Structure analysis indicated that individuals from the subspecies falcata, the diploid subspecies caerulea, and the tetraploid subspecies sativa (cultivated tetraploid alfalfa were clearly separated. Conclusion We used transcriptome sequencing to discover large numbers of SNPs

  2. Interference Suppression Performance of Automotive UWB Radars Using Pseudo Random Sequences

    Directory of Open Access Journals (Sweden)

    I. Pasya

    2015-12-01

    Full Text Available Ultra wideband (UWB automotive radars have attracted attention from the viewpoint of reducing traffic accidents. The performance of automotive radars may be degraded by interference from nearby radars using the same frequency. In this study, a scenario where two cars pass each other on a road was considered. Considering the utilization of cross-polarization, the desired-to-undesired signal power ratio (DUR was found to vary approximately from -10 to 30 dB. Different pseudo random sequences were employed for spectrum spreading the different radar signals to mitigate the interference effects. This paper evaluates the interference suppression provided by maximum length sequence (MLS and Gold sequence (GS through numerical simulations of the radar’s performance in terms of probability of false alarm and probability of detection. It was found that MLS and GS yielded nearly the same performance when the DUR is -10 dB (worst case; for example when fixing the probability of false alarm to 0.0001, the probabilities of detection were 0.964 and 0.946 respectively. The GS are more advantageous than MLS due to larger number of different sequences having the same length in GS than in MLS.

  3. 454 sequencing of pooled BAC clones on chromosome 3H of barley

    Directory of Open Access Journals (Sweden)

    Yamaji Nami

    2011-05-01

    Full Text Available Abstract Background Genome sequencing of barley has been delayed due to its large genome size (ca. 5,000Mbp. Among the fast sequencing systems, 454 liquid phase pyrosequencing provides the longest reads and is the most promising method for BAC clones. Here we report the results of pooled sequencing of BAC clones selected with ESTs genetically mapped to chromosome 3H. Results We sequenced pooled barley BAC clones using a 454 parallel genome sequencer. A PCR screening system based on primer sets derived from genetically mapped ESTs on chromosome 3H was used for clone selection in a BAC library developed from cultivar "Haruna Nijo". The DNA samples of 10 or 20 BAC clones were pooled and used for shotgun library development. The homology between contig sequences generated in each pooled library and mapped EST sequences was studied. The number of contigs assigned on chromosome 3H was 372. Their lengths ranged from 1,230 bp to 58,322 bp with an average 14,891 bp. Of these contigs, 240 showed homology and colinearity with the genome sequence of rice chromosome 1. A contig annotation browser supplemented with query search by unique sequence or genetic map position was developed. The identified contigs can be annotated with barley cDNAs and reference sequences on the browser. Homology analysis of these contigs with rice genes indicated that 1,239 rice genes can be assigned to barley contigs by the simple comparison of sequence lengths in both species. Of these genes, 492 are assigned to rice chromosome 1. Conclusions We demonstrate the efficiency of sequencing gene rich regions from barley chromosome 3H, with special reference to syntenic relationships with rice chromosome 1.

  4. Biological treatment of PAH-contaminated sediments in a Sequencing Batch Reactor

    International Nuclear Information System (INIS)

    Chiavola, Agostina; Baciocchi, Renato; Gavasci, Renato

    2010-01-01

    The technical feasibility of a sequential batch process for the biological treatment of sediments contaminated by polycyclic aromatic hydrocarbons (PAHs) was evaluated through an experimental study. A bench-scale Sediment Slurry Sequencing Batch Reactor (SS-SBR) was fed with river sediments contaminated by a PAH mixture made by fluorene, anthracene, pyrene and crysene. The process performance was evaluated under different operating conditions, obtained by modifying the influent organic load, the feed composition and the hydraulic residence time. Measurements of the Oxygen Uptake Rates (OURs) provided useful insights on the biological kinetics occurring in the SS-SBR, suggesting the minimum applied cycle time-length of 7 days could be eventually halved, as also confirmed by the trend observed in the volatile solid and total organic carbon data. The removal efficiencies gradually improved during the SS-SBR operation, achieving at the end of the study rather constant removal rates above 80% for both 3-rings PAHs (fluorene and anthracene) and 4-ring PAHs (pyrene and crysene) for an inlet total PAH concentration of 70 mg/kg as dry weight (dw).

  5. Rate-determining Step of Flap Endonuclease 1 (FEN1) Reflects a Kinetic Bias against Long Flaps and Trinucleotide Repeat Sequences.

    Science.gov (United States)

    Tarantino, Mary E; Bilotti, Katharina; Huang, Ji; Delaney, Sarah

    2015-08-21

    Flap endonuclease 1 (FEN1) is a structure-specific nuclease responsible for removing 5'-flaps formed during Okazaki fragment maturation and long patch base excision repair. In this work, we use rapid quench flow techniques to examine the rates of 5'-flap removal on DNA substrates of varying length and sequence. Of particular interest are flaps containing trinucleotide repeats (TNR), which have been proposed to affect FEN1 activity and cause genetic instability. We report that FEN1 processes substrates containing flaps of 30 nucleotides or fewer at comparable single-turnover rates. However, for flaps longer than 30 nucleotides, FEN1 kinetically discriminates substrates based on flap length and flap sequence. In particular, FEN1 removes flaps containing TNR sequences at a rate slower than mixed sequence flaps of the same length. Furthermore, multiple-turnover kinetic analysis reveals that the rate-determining step of FEN1 switches as a function of flap length from product release to chemistry (or a step prior to chemistry). These results provide a kinetic perspective on the role of FEN1 in DNA replication and repair and contribute to our understanding of FEN1 in mediating genetic instability of TNR sequences. © 2015 by The American Society for Biochemistry and Molecular Biology, Inc.

  6. A genome-wide analysis of lentivector integration sites using targeted sequence capture and next generation sequencing technology.

    Science.gov (United States)

    Ustek, Duran; Sirma, Sema; Gumus, Ergun; Arikan, Muzaffer; Cakiris, Aris; Abaci, Neslihan; Mathew, Jaicy; Emrence, Zeliha; Azakli, Hulya; Cosan, Fulya; Cakar, Atilla; Parlak, Mahmut; Kursun, Olcay

    2012-10-01

    One application of next-generation sequencing (NGS) is the targeted resequencing of interested genes which has not been used in viral integration site analysis of gene therapy applications. Here, we combined targeted sequence capture array and next generation sequencing to address the whole genome profiling of viral integration sites. Human 293T and K562 cells were transduced with a HIV-1 derived vector. A custom made DNA probe sets targeted pLVTHM vector used to capture lentiviral vector/human genome junctions. The captured DNA was sequenced using GS FLX platform. Seven thousand four hundred and eighty four human genome sequences flanking the long terminal repeats (LTR) of pLVTHM fragment sequences matched with an identity of at least 98% and minimum 50 bp criteria in both cells. In total, 203 unique integration sites were identified. The integrations in both cell lines were totally distant from the CpG islands and from the transcription start sites and preferentially located in introns. A comparison between the two cell lines showed that the lentiviral-transduced DNA does not have the same preferred regions in the two different cell lines. Copyright © 2012 Elsevier B.V. All rights reserved.

  7. Relationship between hamstring length and gluteus maximus strength with and without normalization.

    Science.gov (United States)

    Lee, Dong-Kyu; Oh, Jae-Seop

    2018-01-01

    [Purpose] This study assessed the relationship between hamstring length and gluteus maximus (GM) strength with and without normalization by body weight and height. [Subjects and Methods] In total, 34 healthy male subjects volunteered for this study. To measure GM strength, subjects performed maximal hip joint extension with the knee joints flexed to 90° in the prone position. GM strength was normalized for body weight and height. [Results] GM strength with normalization was positively correlated with hamstring length, whereas GM strength without normalization was negatively correlated with hamstring length. [Conclusion] The normalization of GM strength by body weight and height has the potential to lead to more appropriate conclusions and interpretations about its correlation with hamstring length. Hamstring length may be related to GM strength.

  8. Metaxa: a software tool for automated detection and discrimination among ribosomal small subunit (12S/16S/18S) sequences of archaea, bacteria, eukaryotes, mitochondria, and chloroplasts in metagenomes and environmental sequencing datasets.

    Science.gov (United States)

    Bengtsson, Johan; Eriksson, K Martin; Hartmann, Martin; Wang, Zheng; Shenoy, Belle Damodara; Grelet, Gwen-Aëlle; Abarenkov, Kessy; Petri, Anna; Rosenblad, Magnus Alm; Nilsson, R Henrik

    2011-10-01

    The ribosomal small subunit (SSU) rRNA gene has emerged as an important genetic marker for taxonomic identification in environmental sequencing datasets. In addition to being present in the nucleus of eukaryotes and the core genome of prokaryotes, the gene is also found in the mitochondria of eukaryotes and in the chloroplasts of photosynthetic eukaryotes. These three sets of genes are conceptually paralogous and should in most situations not be aligned and analyzed jointly. To identify the origin of SSU sequences in complex sequence datasets has hitherto been a time-consuming and largely manual undertaking. However, the present study introduces Metaxa ( http://microbiology.se/software/metaxa/ ), an automated software tool to extract full-length and partial SSU sequences from larger sequence datasets and assign them to an archaeal, bacterial, nuclear eukaryote, mitochondrial, or chloroplast origin. Using data from reference databases and from full-length organelle and organism genomes, we show that Metaxa detects and scores SSU sequences for origin with very low proportions of false positives and negatives. We believe that this tool will be useful in microbial and evolutionary ecology as well as in metagenomics.

  9. Stalk-length-dependence of the contractility of Vorticella convallaria

    Science.gov (United States)

    Gul Chung, Eun; Ryu, Sangjin

    2017-12-01

    Vorticella convallaria is a sessile protozoan of which the spasmoneme contracts on a millisecond timescale. Because this contraction is induced and powered by the binding of calcium ions (Ca2+), the spasmoneme showcases Ca2+-powered cellular motility. Because the isometric tension of V. convallaria increases linearly with its stalk length, it is hypothesized that the contractility of V. convallaria during unhindered contraction depends on the stalk length. In this study, the contractile force and energetics of V. convallaria cells of different stalk lengths were evaluated using a fluid dynamic drag model which accounts for the unsteadiness and finite Reynolds number of the water flow caused by contracting V. convallaria and the wall effect of the no-slip substrate. It was found that the contraction displacement, peak contraction speed, peak contractile force, total mechanical work, and peak power depended on the stalk length. The observed stalk-length-dependencies were simulated using a damped spring model, and the model estimated that the average spring constant of the contracting stalk was 1.34 nN µm-1. These observed length-dependencies of Vorticella’s key contractility parameters reflect the biophysical mechanism of the spasmonemal contraction, and thus they should be considered in developing a theoretical model of the Vorticella spasmoneme.

  10. Sequence Comparison Alignment-Free Approach Based on Suffix Tree and L-Words Frequency

    Directory of Open Access Journals (Sweden)

    Inês Soares

    2012-01-01

    Full Text Available The vast majority of methods available for sequence comparison rely on a first sequence alignment step, which requires a number of assumptions on evolutionary history and is sometimes very difficult or impossible to perform due to the abundance of gaps (insertions/deletions. In such cases, an alternative alignment-free method would prove valuable. Our method starts by a computation of a generalized suffix tree of all sequences, which is completed in linear time. Using this tree, the frequency of all possible words with a preset length L—L-words—in each sequence is rapidly calculated. Based on the L-words frequency profile of each sequence, a pairwise standard Euclidean distance is then computed producing a symmetric genetic distance matrix, which can be used to generate a neighbor joining dendrogram or a multidimensional scaling graph. We present an improvement to word counting alignment-free approaches for sequence comparison, by determining a single optimal word length and combining suffix tree structures to the word counting tasks. Our approach is, thus, a fast and simple application that proved to be efficient and powerful when applied to mitochondrial genomes. The algorithm was implemented in Python language and is freely available on the web.

  11. International interlaboratory study comparing single organism 16S rRNA gene sequencing data: Beyond consensus sequence comparisons

    Science.gov (United States)

    Olson, Nathan D.; Lund, Steven P.; Zook, Justin M.; Rojas-Cornejo, Fabiola; Beck, Brian; Foy, Carole; Huggett, Jim; Whale, Alexandra S.; Sui, Zhiwei; Baoutina, Anna; Dobeson, Michael; Partis, Lina; Morrow, Jayne B.

    2015-01-01

    This study presents the results from an interlaboratory sequencing study for which we developed a novel high-resolution method for comparing data from different sequencing platforms for a multi-copy, paralogous gene. The combination of PCR amplification and 16S ribosomal RNA gene (16S rRNA) sequencing has revolutionized bacteriology by enabling rapid identification, frequently without the need for culture. To assess variability between laboratories in sequencing 16S rRNA, six laboratories sequenced the gene encoding the 16S rRNA from Escherichia coli O157:H7 strain EDL933 and Listeria monocytogenes serovar 4b strain NCTC11994. Participants performed sequencing methods and protocols available in their laboratories: Sanger sequencing, Roche 454 pyrosequencing®, or Ion Torrent PGM®. The sequencing data were evaluated on three levels: (1) identity of biologically conserved position, (2) ratio of 16S rRNA gene copies featuring identified variants, and (3) the collection of variant combinations in a set of 16S rRNA gene copies. The same set of biologically conserved positions was identified for each sequencing method. Analytical methods using Bayesian and maximum likelihood statistics were developed to estimate variant copy ratios, which describe the ratio of nucleotides at each identified biologically variable position, as well as the likely set of variant combinations present in 16S rRNA gene copies. Our results indicate that estimated variant copy ratios at biologically variable positions were only reproducible for high throughput sequencing methods. Furthermore, the likely variant combination set was only reproducible with increased sequencing depth and longer read lengths. We also demonstrate novel methods for evaluating variable positions when comparing multi-copy gene sequence data from multiple laboratories generated using multiple sequencing technologies. PMID:27077030

  12. International interlaboratory study comparing single organism 16S rRNA gene sequencing data: Beyond consensus sequence comparisons

    Directory of Open Access Journals (Sweden)

    Nathan D. Olson

    2015-03-01

    Full Text Available This study presents the results from an interlaboratory sequencing study for which we developed a novel high-resolution method for comparing data from different sequencing platforms for a multi-copy, paralogous gene. The combination of PCR amplification and 16S ribosomal RNA gene (16S rRNA sequencing has revolutionized bacteriology by enabling rapid identification, frequently without the need for culture. To assess variability between laboratories in sequencing 16S rRNA, six laboratories sequenced the gene encoding the 16S rRNA from Escherichia coli O157:H7 strain EDL933 and Listeria monocytogenes serovar 4b strain NCTC11994. Participants performed sequencing methods and protocols available in their laboratories: Sanger sequencing, Roche 454 pyrosequencing®, or Ion Torrent PGM®. The sequencing data were evaluated on three levels: (1 identity of biologically conserved position, (2 ratio of 16S rRNA gene copies featuring identified variants, and (3 the collection of variant combinations in a set of 16S rRNA gene copies. The same set of biologically conserved positions was identified for each sequencing method. Analytical methods using Bayesian and maximum likelihood statistics were developed to estimate variant copy ratios, which describe the ratio of nucleotides at each identified biologically variable position, as well as the likely set of variant combinations present in 16S rRNA gene copies. Our results indicate that estimated variant copy ratios at biologically variable positions were only reproducible for high throughput sequencing methods. Furthermore, the likely variant combination set was only reproducible with increased sequencing depth and longer read lengths. We also demonstrate novel methods for evaluating variable positions when comparing multi-copy gene sequence data from multiple laboratories generated using multiple sequencing technologies.

  13. cDNA sequences of two inducible T-cell genes

    Energy Technology Data Exchange (ETDEWEB)

    Kwon, B.S. (Indiana Univ. School of Medicine, Indianapolis (USA) Guthrie Research Institute, Sayre, PA (USA)); Weissman, S.M. (Yale Univ., New Haven, CT (USA))

    1989-03-01

    The authors have previously described a set of human T-lymphocyte-specific cDNA clones isolated by a modified differential screening procedure. Apparent full-length cDNAs containing the sequences of 14 of the 16 initial isolates were sequenced and were found to represent five different species of mRNA; three of the five species were identical to previously reported cDNA sequences of preproenkephalin, T-cell-replacing factor, and a serine esterase, respectively. The other two species, 4-1BB and L2G25B, were inducible sequences found in mRNA from both a cytolytic T-lymphocyte and a helper T-lymphocyte clone and were not previously described in T-cell mRNA; these mRNA sequences encode peptides of 256 and 92 amino acids, respectively. Both peptides contain putative leader sequences. The protein encoded by 4-1BB also has a potential membrane anchor segment and other features also seen in known receptor proteins.

  14. Differences in a ribosomal DNA sequence of Strongylus species allows identification of single eggs.

    Science.gov (United States)

    Campbell, A J; Gasser, R B; Chilton, N B

    1995-03-01

    In the current study, molecular techniques were evaluated for the species identification of individual strongyle eggs. Adult worms of Strongylus edentatus, S. equinus and S. vulgaris were collected at necropsy from horses from Australia and the U.S.A. Genomic DNA was isolated and a ribosomal transcribed spacer (ITS-2) amplified and sequenced using polymerase chain reaction (PCR) techniques. The length of the ITS-2 sequence of S. edentatus, S. equinus and S. vulgaris ranged between 217 and 235 nucleotides. Extensive sequence analysis demonstrated a low degree (0-0.9%) of intraspecific variation in the ITS-2 for the Strongylus species examined, whereas the levels of interspecific differences (13-29%) were significantly greater. Interspecific differences in the ITS-2 sequences allowed unequivocal species identification of single worms and eggs using PCR-linked restriction fragment length polymorphism. These results demonstrate the potential of the ribosomal spacers as genetic markers for species identification of single strongyle eggs from horse faeces.

  15. In silico Analysis of 3′-End-Processing Signals in Aspergillus oryzae Using Expressed Sequence Tags and Genomic Sequencing Data

    Science.gov (United States)

    Tanaka, Mizuki; Sakai, Yoshifumi; Yamada, Osamu; Shintani, Takahiro; Gomi, Katsuya

    2011-01-01

    To investigate 3′-end-processing signals in Aspergillus oryzae, we created a nucleotide sequence data set of the 3′-untranslated region (3′ UTR) plus 100 nucleotides (nt) sequence downstream of the poly(A) site using A. oryzae expressed sequence tags and genomic sequencing data. This data set comprised 1065 sequences derived from 1042 unique genes. The average 3′ UTR length in A. oryzae was 241 nt, which is greater than that in yeast but similar to that in plants. The 3′ UTR and 100 nt sequence downstream of the poly(A) site is notably U-rich, while the region located 15–30 nt upstream of the poly(A) site is markedly A-rich. The most frequently found hexanucleotide in this A-rich region is AAUGAA, although this sequence accounts for only 6% of all transcripts. These data suggested that A. oryzae has no highly conserved sequence element equivalent to AAUAAA, a mammalian polyadenylation signal. We identified that putative 3′-end-processing signals in A. oryzae, while less well conserved than those in mammals, comprised four sequence elements: the furthest upstream U-rich element, A-rich sequence, cleavage site, and downstream U-rich element flanking the cleavage site. Although these putative 3′-end-processing signals are similar to those in yeast and plants, some notable differences exist between them. PMID:21586533

  16. QTL-mapping in mink (Neovison vison) shows evidence for QTL for guard hair thickness, guard hair length and skin length

    DEFF Research Database (Denmark)

    Thirstrup, Janne Pia; Labouriau, Rodrigo; Guldbrandtsen, Bernt

    2011-01-01

    Fur quality in mink (Neovison vison) is a composite trait, consisting of e.g. guard hair length, guard hair thickness and density of wool. A genome wide QTL search was performed to detect QTL for fur quality traits in mink. Here we present the results of QTL analyses for guard hair length, guard...... hair thickness and density of wool. Data from an F2-cross was analysed across fourteen chromosomes using 100 microsatellites as markers with a spacing of approximately 20 cM. The two lines used for the F2-cross were Nordic wild mink and American short nap mink. In total 1,083 animals (21 wild type, 25...... short nap, 103 F1 and 934 F2) were marker typed and recorded for the three presented fur quality traits. For the QTL-analyses a regression analysis implemented in QTL Express software was used. Evidence was found for the existence of QTL for guard hair length, guard hair thickness and density of wool...

  17. What can we learn about lyssavirus genomes using 454 sequencing?

    Science.gov (United States)

    Höper, Dirk; Finke, Stefan; Freuling, Conrad M; Hoffmann, Bernd; Beer, Martin

    2012-01-01

    The main task of the individual project number four"Whole genome sequencing, virus-host adaptation, and molecular epidemiological analyses of lyssaviruses "within the network" Lyssaviruses--a potential re-emerging public health threat" is to provide high quality complete genome sequences from lyssaviruses. These sequences are analysed in-depth with regard to the diversity of the viral populations as to both quasi-species and so-called defective interfering RNAs. Moreover, the sequence data will facilitate further epidemiological analyses, will provide insight into the evolution of lyssaviruses and will be the basis for the design of novel nucleic acid based diagnostics. The first results presented here indicate that not only high quality full-length lyssavirus genome sequences can be generated, but indeed efficient analysis of the viral population gets feasible.

  18. Whole Genome Sequencing of Enterovirus species C Isolates by High-throughput Sequencing: Development of Generic Primers

    Directory of Open Access Journals (Sweden)

    Maël Bessaud

    2016-08-01

    Full Text Available Enteroviruses are among the most common viruses infecting humans and can cause diverse clinical syndromes ranging from minor febrile illness to severe and potentially fatal diseases. Enterovirus species C (EV-C consists of more than 20 types, among which the 3 serotypes of polioviruses, the etiological agents of poliomyelitis, are included. Biodiversity and evolution of EV-C genomes are shaped by frequent recombination events. Therefore, identification and characterization of circulating EV-C strains require the sequencing of different genomic regions.A simple method was developed to sequence quickly the entire genome of EV-C isolates. Four overlapping fragments were produced separately by RT-PCR performed with generic primers. The four amplicons were then pooled and purified prior to be sequenced by high-throughput technique.The method was assessed on a panel of EV-Cs belonging to a wide-range of types. It can be used to determine full-length genome sequences through de novo assembly of thousands of reads. It was also able to discriminate reads from closely related viruses in mixtures.By decreasing the workload compared to classical Sanger-based techniques, this method will serve as a precious tool for sequencing large panels of EV-Cs isolated in cell cultures during environmental surveillance or from patients, including vaccine-derived polioviruses.

  19. Automatic measurement of axial length of human eye using three-dimensional magnetic resonance imaging

    International Nuclear Information System (INIS)

    Watanabe, Masaki; Kiryu, Tohru

    2011-01-01

    The measurement of axial length and the evaluation of three dimensional (3D) form of an eye are essential to evaluate the mechanism of myopia progression. We propose a method of automatic measurement of axial length including adjustment of the pulse sequence of short-term scan which could suppress influence of eyeblink, using a magnetic resonance imaging (MRI) which acquires 3D images noninvasively. Acquiring T 2 -weighted images with 3.0 tesla MRI device and eight-channel phased-array head coil, we extracted left and right eye ball images, and then reconstructed 3D volume. The surface coordinates were calculated from 3D volume, fitting the ellipsoid model coordinates with the surface coordinates, and measured the axial length automatically. Measuring twenty one subjects, we compared the automatically measured values of axial length with the manually measured ones, then confirmed significant elongation in the axial length of myopia compared with that of emmetropia. Furthermore, there were no significant differences (P<0.05) between the means of automatic measurements and the manual ones. Accordingly, the automatic measurement process of axial length could be a tool for the elucidation of the mechanism of myopia progression, which would be suitable for evaluating the axial length easily and noninvasively. (author)

  20. Using RNA-Seq to assemble a rose transcriptome with more than 13,000 full-length expressed genes and to develop the WagRhSNP 68k Axiom SNP array for rose (Rosa L.).

    Science.gov (United States)

    Koning-Boucoiran, Carole F S; Esselink, G Danny; Vukosavljev, Mirjana; van 't Westende, Wendy P C; Gitonga, Virginia W; Krens, Frans A; Voorrips, Roeland E; van de Weg, W Eric; Schulz, Dietmar; Debener, Thomas; Maliepaard, Chris; Arens, Paul; Smulders, Marinus J M

    2015-01-01

    In order to develop a versatile and large SNP array for rose, we set out to mine ESTs from diverse sets of rose germplasm. For this RNA-Seq libraries containing about 700 million reads were generated from tetraploid cut and garden roses using Illumina paired-end sequencing, and from diploid Rosa multiflora using 454 sequencing. Separate de novo assemblies were performed in order to identify single nucleotide polymorphisms (SNPs) within and between rose varieties. SNPs among tetraploid roses were selected for constructing a genotyping array that can be employed for genetic mapping and marker-trait association discovery in breeding programs based on tetraploid germplasm, both from cut roses and from garden roses. In total 68,893 SNPs were included on the WagRhSNP Axiom array. Next, an orthology-guided assembly was performed for the construction of a non-redundant rose transcriptome database. A total of 21,740 transcripts had significant hits with orthologous genes in the strawberry (Fragaria vesca L.) genome. Of these 13,390 appeared to contain the full-length coding regions. This newly established transcriptome resource adds considerably to the currently available sequence resources for the Rosaceae family in general and the genus Rosa in particular.

  1. Reduction of total labor length through the addition of parenteral dextrose solution in induction of labor in nulliparous: results of DEXTRONS prospective randomized controlled trial.

    Science.gov (United States)

    Paré, Josianne; Pasquier, Jean-Charles; Lewin, Antoine; Fraser, William; Bureau, Yves-André

    2017-05-01

    Prolonged labor is a significant cause of maternal and fetal morbidity and very few interventions are known to shorten labor course. Skeletal muscle physiology suggests that glucose supplementation might improve muscle performance in case of prolonged exercise and this situation is analogous to the gravid uterus during delivery. Therefore, it seemed imperative to evaluate the impact of adding carbohydrate supplements on the course of labor. We sought to provide evidence as to whether intravenous glucose supplementation during labor induction in nulliparous women can reduce total duration of active labor. We performed a single-center prospective double-blind randomized controlled trial comparing the use of parental intravenous dextrose 5% with normal saline to normal saline in induced nulliparous women. The study was conducted in a tertiary-level university hospital setting. Participants, caregivers, and those assessing the outcomes were blinded to group assignment. Inclusion criteria were singleton pregnancy at term with cephalic presentation and favorable cervix. Based on blocked randomization, patients were assigned to receive either 250 mL/h of intravenous dextrose 5% with normal saline or 250 mL/h of normal saline for the whole duration of induction, labor, and delivery. The primary outcome studied was the total length of active labor. Secondary outcomes included duration of the active phase of second stage of labor, the mode of delivery, Apgar scores, and arterial cord pH. In all, 100 patients were randomized into each group. A total of 193 patients (96 in the dextrose with normal saline group and 97 in the normal saline group) were analyzed in the study. The median total duration of labor was significantly less in the dextrose with normal saline group (499 vs 423 minutes, P = .024) than in the normal saline group. The probabilities of a woman being delivered at 200 minutes and 450 minutes were 18.8% and 77.1% in the dextrose with normal saline group vs 8

  2. Application of the integrated analysis of safety (IAS) to sequences of Total loss of feed water in a PWR Reactor; Aplicacion del Analisis Integrado de Seguridad (ISA) a Secuencias de Perdidas Total de Agua de Alimentacion en un Reactor PWR

    Energy Technology Data Exchange (ETDEWEB)

    Moreno Chamorro, P.; Gallego Diaz, C.

    2011-07-01

    The main objective of this work is to show the current status of the implementation of integrated analysis of safety (IAS) methodology and its SCAIS associated tool (system of simulation codes for IAS) to the sequence analysis of total loss of feedwater in a PWR reactor model Westinghouse of three loops with large, dry containment.

  3. Mouse tetranectin: cDNA sequence, tissue-specific expression, and chromosomal mapping

    DEFF Research Database (Denmark)

    Ibaraki, K; Kozak, C A; Wewer, U M

    1995-01-01

    regulation, mouse tetranectin cDNA was cloned from a 16-day-old mouse embryo library. Sequence analysis revealed a 992-bp cDNA with an open reading frame of 606 bp, which is identical in length to the human tetranectin cDNA. The deduced amino acid sequence showed high homology to the human cDNA with 76......(s) of tetranectin. The sequence analysis revealed a difference in both sequence and size of the noncoding regions between mouse and human cDNAs. Northern analysis of the various tissues from mouse, rat, and cow showed the major transcript(s) to be approximately 1 kb, which is similar in size to that observed...

  4. Sequence requirement of the ade6-4095 meiotic recombination hotspot in Schizosaccharomyces pombe.

    Science.gov (United States)

    Foulis, Steven J; Fowler, Kyle R; Steiner, Walter W

    2018-02-01

    Homologous recombination occurs at a greatly elevated frequency in meiosis compared to mitosis and is initiated by programmed double-strand DNA breaks (DSBs). DSBs do not occur at uniform frequency throughout the genome in most organisms, but occur preferentially at a limited number of sites referred to as hotspots. The location of hotspots have been determined at nucleotide-level resolution in both the budding and fission yeasts, and while several patterns have emerged regarding preferred locations for DSB hotspots, it remains unclear why particular sites experience DSBs at much higher frequency than other sites with seemingly similar properties. Short sequence motifs, which are often sites for binding of transcription factors, are known to be responsible for a number of hotspots. In this study we identified the minimum sequence required for activity of one of such motif identified in a screen of random sequences capable of producing recombination hotspots. The experimentally determined sequence, GGTCTRGACC, closely matches the previously inferred sequence. Full hotspot activity requires an effective sequence length of 9.5 bp, whereas moderate activity requires an effective sequence length of approximately 8.2 bp and shows significant association with DSB hotspots. In combination with our previous work, this result is consistent with a large number of different sequence motifs capable of producing recombination hotspots, and supports a model in which hotspots can be rapidly regenerated by mutation as they are lost through recombination.

  5. Genomic multiple sequence alignments: refinement using a genetic algorithm

    Directory of Open Access Journals (Sweden)

    Lefkowitz Elliot J

    2005-08-01

    Full Text Available Abstract Background Genomic sequence data cannot be fully appreciated in isolation. Comparative genomics – the practice of comparing genomic sequences from different species – plays an increasingly important role in understanding the genotypic differences between species that result in phenotypic differences as well as in revealing patterns of evolutionary relationships. One of the major challenges in comparative genomics is producing a high-quality alignment between two or more related genomic sequences. In recent years, a number of tools have been developed for aligning large genomic sequences. Most utilize heuristic strategies to identify a series of strong sequence similarities, which are then used as anchors to align the regions between the anchor points. The resulting alignment is globally correct, but in many cases is suboptimal locally. We describe a new program, GenAlignRefine, which improves the overall quality of global multiple alignments by using a genetic algorithm to improve local regions of alignment. Regions of low quality are identified, realigned using the program T-Coffee, and then refined using a genetic algorithm. Because a better COFFEE (Consistency based Objective Function For alignmEnt Evaluation score generally reflects greater alignment quality, the algorithm searches for an alignment that yields a better COFFEE score. To improve the intrinsic slowness of the genetic algorithm, GenAlignRefine was implemented as a parallel, cluster-based program. Results We tested the GenAlignRefine algorithm by running it on a Linux cluster to refine sequences from a simulation, as well as refine a multiple alignment of 15 Orthopoxvirus genomic sequences approximately 260,000 nucleotides in length that initially had been aligned by Multi-LAGAN. It took approximately 150 minutes for a 40-processor Linux cluster to optimize some 200 fuzzy (poorly aligned regions of the orthopoxvirus alignment. Overall sequence identity increased only

  6. Question-answer sequences in survey interviews

    NARCIS (Netherlands)

    Dijkstra, W.; Ongena, Y.P.

    2006-01-01

    Interaction analysis was used to analyze a total of 14,265 question-answer sequences of (Q-A Sequences) 80 questions that originated from two face-to-face and three telephone surveys. The analysis was directed towards the causes and effects of particular interactional problems. Our results showed

  7. Characterization of the Kenaf (Hibiscus cannabinus) Global Transcriptome Using Illumina Paired-End Sequencing and Development of EST-SSR Markers

    Science.gov (United States)

    Li, Hui; Li, Defang; Chen, Anguo; Tang, Huijuan; Li, Jianjun; Huang, Siqi

    2016-01-01

    Kenaf (Hibiscus cannabinus L.) is an economically important natural fiber crop grown worldwide. However, only 20 expressed tag sequences (ESTs) for kenaf are available in public databases. The aim of this study was to develop large-scale simple sequence repeat (SSR) markers to lay a solid foundation for the construction of genetic linkage maps and marker-assisted breeding in kenaf. We used Illumina paired-end sequencing technology to generate new EST-simple sequences and MISA software to mine SSR markers. We identified 71,318 unigenes with an average length of 1143 nt and annotated these unigenes using four different protein databases. Overall, 9324 complementary pairs were designated as EST-SSR markers, and their quality was validated using 100 randomly selected SSR markers. In total, 72 primer pairs reproducibly amplified target amplicons, and 61 of these primer pairs detected significant polymorphism among 28 kenaf accessions. Thus, in this study, we have developed large-scale SSR markers for kenaf, and this new resource will facilitate construction of genetic linkage maps, investigation of fiber growth and development in kenaf, and also be of value to novel gene discovery and functional genomic studies. PMID:26960153

  8. Characterization of the Kenaf (Hibiscus cannabinus) Global Transcriptome Using Illumina Paired-End Sequencing and Development of EST-SSR Markers.

    Science.gov (United States)

    Li, Hui; Li, Defang; Chen, Anguo; Tang, Huijuan; Li, Jianjun; Huang, Siqi

    2016-01-01

    Kenaf (Hibiscus cannabinus L.) is an economically important natural fiber crop grown worldwide. However, only 20 expressed tag sequences (ESTs) for kenaf are available in public databases. The aim of this study was to develop large-scale simple sequence repeat (SSR) markers to lay a solid foundation for the construction of genetic linkage maps and marker-assisted breeding in kenaf. We used Illumina paired-end sequencing technology to generate new EST-simple sequences and MISA software to mine SSR markers. We identified 71,318 unigenes with an average length of 1143 nt and annotated these unigenes using four different protein databases. Overall, 9324 complementary pairs were designated as EST-SSR markers, and their quality was validated using 100 randomly selected SSR markers. In total, 72 primer pairs reproducibly amplified target amplicons, and 61 of these primer pairs detected significant polymorphism among 28 kenaf accessions. Thus, in this study, we have developed large-scale SSR markers for kenaf, and this new resource will facilitate construction of genetic linkage maps, investigation of fiber growth and development in kenaf, and also be of value to novel gene discovery and functional genomic studies.

  9. Profiling dehydrin gene sequence and physiological parameters in drought tolerant and susceptible spring wheat cultivars

    International Nuclear Information System (INIS)

    Baloch, M.J.; Jatoi, W.A.

    2012-01-01

    Physiological and yield traits such as stomatal conductance (mmol m-/sup 2/s/sup -1/), Leaf relative water content (RWC %) and grain yield per plant were studied in a separate experiment. Results revealed that five out of sixteen cultivars viz. Anmol, Moomal, Sarsabz, Bhitai and Pavan, appeared to be relatively more drought tolerant. Based on morphophysiological results, studies were continued to look at these cultivars for drought tolerance at molecular level. Initially, four well recognized primers for dehydrin genes (DHNs) responsible for drought induction in T. durum L., T. aestivum L. and O. sativa L. were used for profiling gene sequence of sixteen wheat cultivars. The primers amplified the DHN genes variably like Primer WDHN13 (T. aestivum L.) amplified the DHN gene in only seven cultivars whereas primer TdDHN15 ( T. durum L.) amplified all the sixteen cultivars with even different DNA banding patterns some showing second weaker DNA bands. Third primer TdDHN16 (T. durum L.) has shown entirely different PCR amplification prototype, specially showing two strong DNA bands while fourth primer RAB16C (O. sativa L.) failed to amplify DHN gene in any of the cultivars. Examination of DNA sequences revealed several interesting features. First, it identified the two exon/one intron structure of this gene (complete sequences were not shown), a feature not previously described in the two database cDNA sequences available from T. aestivum L. (gi|21850). Secondly, the analysis identified several single nucleotide polymorphisms (SNPs), positions in gene sequence. Although complete gene sequence was not obtained for all the cultivars, yet there were a total of 38 variable positions in exonic (coding region) sequence, from a total gene length of 453 nucleotides. Matrix of SNP shows these 37 positions with individual sequence at positions given for each of the 14 cultivars (sequence of two cultivars was not obtained) included in this analysis. It demonstrated a considerab le

  10. Optimizing and benchmarking de novo transcriptome sequencing: from library preparation to assembly evaluation.

    Science.gov (United States)

    Hara, Yuichiro; Tatsumi, Kaori; Yoshida, Michio; Kajikawa, Eriko; Kiyonari, Hiroshi; Kuraku, Shigehiro

    2015-11-18

    RNA-seq enables gene expression profiling in selected spatiotemporal windows and yields massive sequence information with relatively low cost and time investment, even for non-model species. However, there remains a large room for optimizing its workflow, in order to take full advantage of continuously developing sequencing capacity. Transcriptome sequencing for three embryonic stages of Madagascar ground gecko (Paroedura picta) was performed with the Illumina platform. The output reads were assembled de novo for reconstructing transcript sequences. In order to evaluate the completeness of transcriptome assemblies, we prepared a reference gene set consisting of vertebrate one-to-one orthologs. To take advantage of increased read length of >150 nt, we demonstrated shortened RNA fragmentation time, which resulted in a dramatic shift of insert size distribution. To evaluate products of multiple de novo assembly runs incorporating reads with different RNA sources, read lengths, and insert sizes, we introduce a new reference gene set, core vertebrate genes (CVG), consisting of 233 genes that are shared as one-to-one orthologs by all vertebrate genomes examined (29 species)., The completeness assessment performed by the computational pipelines CEGMA and BUSCO referring to CVG, demonstrated higher accuracy and resolution than with the gene set previously established for this purpose. As a result of the assessment with CVG, we have derived the most comprehensive transcript sequence set of the Madagascar ground gecko by means of assembling individual libraries followed by clustering the assembled sequences based on their overall similarities. Our results provide several insights into optimizing de novo RNA-seq workflow, including the coordination between library insert size and read length, which manifested in improved connectivity of assemblies. The approach and assembly assessment with CVG demonstrated here would be applicable to transcriptome analysis of other species as

  11. An optimized protocol for generation and analysis of Ion Proton sequencing reads for RNA-Seq.

    Science.gov (United States)

    Yuan, Yongxian; Xu, Huaiqian; Leung, Ross Ka-Kit

    2016-05-26

    Previous studies compared running cost, time and other performance measures of popular sequencing platforms. However, comprehensive assessment of library construction and analysis protocols for Proton sequencing platform remains unexplored. Unlike Illumina sequencing platforms, Proton reads are heterogeneous in length and quality. When sequencing data from different platforms are combined, this can result in reads with various read length. Whether the performance of the commonly used software for handling such kind of data is satisfactory is unknown. By using universal human reference RNA as the initial material, RNaseIII and chemical fragmentation methods in library construction showed similar result in gene and junction discovery number and expression level estimated accuracy. In contrast, sequencing quality, read length and the choice of software affected mapping rate to a much larger extent. Unspliced aligner TMAP attained the highest mapping rate (97.27 % to genome, 86.46 % to transcriptome), though 47.83 % of mapped reads were clipped. Long reads could paradoxically reduce mapping in junctions. With reference annotation guide, the mapping rate of TopHat2 significantly increased from 75.79 to 92.09 %, especially for long (>150 bp) reads. Sailfish, a k-mer based gene expression quantifier attained highly consistent results with that of TaqMan array and highest sensitivity. We provided for the first time, the reference statistics of library preparation methods, gene detection and quantification and junction discovery for RNA-Seq by the Ion Proton platform. Chemical fragmentation performed equally well with the enzyme-based one. The optimal Ion Proton sequencing options and analysis software have been evaluated.

  12. The first complete genome sequences of clinical isolates of human coronavirus 229E

    NARCIS (Netherlands)

    Farsani, Seyed Mohammad Jazaeri; Dijkman, Ronald; Jebbink, Maarten F.; Goossens, Herman; Ieven, Margareta; Deijs, Martin; Molenkamp, Richard; van der Hoek, Lia

    2012-01-01

    Human coronavirus 229E has been identified in the mid-1960s, yet still only one full-genome sequence is available. This full-length sequence has been determined from the cDNA-clone Inf-1 that is based on the lab-adapted strain VR-740. Lab-adaptation might have resulted in genomic changes, due to

  13. Identification, Characterization and Full-Length Sequence Analysis of a Novel Polerovirus Associated with Wheat Leaf Yellowing Disease.

    Science.gov (United States)

    Zhang, Peipei; Liu, Yan; Liu, Wenwen; Cao, Mengji; Massart, Sebastien; Wang, Xifeng

    2017-01-01

    To identify the pathogens responsible for leaf yellowing symptoms on wheat samples collected from Jinan, China, we tested for the presence of three known barley/wheat yellow dwarf viruses (BYDV-GAV, -PAV, WYDV-GPV) (most likely pathogens) using RT-PCR. A sample that tested negative for the three viruses was selected for small RNA sequencing. Twenty-five million sequences were generated, among which 5% were of viral origin. A novel polerovirus was discovered and temporarily named wheat leaf yellowing-associated virus (WLYaV). The full genome of WLYaV corresponds to 5,772 nucleotides (nt), with six AUG-initiated open reading frames, one non-AUG-initiated open reading frame, and three untranslated regions, showing typical features of the family Luteoviridae . Sequence comparison and phylogenetic analyses suggested that WLYaV had the closest relationship with sugarcane yellow leaf virus (ScYLV), but the identities of full genomic nucleotides and deduced amino acid sequence of coat protein (CP) were 64.9 and 86.2%, respectively, below the species demarcation thresholds (90%) in the family Luteoviridae . Furthermore, agroinoculation of Nicotiana benthamiana leaves with a cDNA clone of WLYaV caused yellowing symptoms on the plant. Our study adds a new polerovirus that is associated with wheat leaf yellowing disease, which would help to identify and control pathogens of wheat.

  14. Identification, Characterization and Full-Length Sequence Analysis of a Novel Polerovirus Associated with Wheat Leaf Yellowing Disease

    Directory of Open Access Journals (Sweden)

    Peipei Zhang

    2017-09-01

    Full Text Available To identify the pathogens responsible for leaf yellowing symptoms on wheat samples collected from Jinan, China, we tested for the presence of three known barley/wheat yellow dwarf viruses (BYDV-GAV, -PAV, WYDV-GPV (most likely pathogens using RT-PCR. A sample that tested negative for the three viruses was selected for small RNA sequencing. Twenty-five million sequences were generated, among which 5% were of viral origin. A novel polerovirus was discovered and temporarily named wheat leaf yellowing-associated virus (WLYaV. The full genome of WLYaV corresponds to 5,772 nucleotides (nt, with six AUG-initiated open reading frames, one non-AUG-initiated open reading frame, and three untranslated regions, showing typical features of the family Luteoviridae. Sequence comparison and phylogenetic analyses suggested that WLYaV had the closest relationship with sugarcane yellow leaf virus (ScYLV, but the identities of full genomic nucleotides and deduced amino acid sequence of coat protein (CP were 64.9 and 86.2%, respectively, below the species demarcation thresholds (90% in the family Luteoviridae. Furthermore, agroinoculation of Nicotiana benthamiana leaves with a cDNA clone of WLYaV caused yellowing symptoms on the plant. Our study adds a new polerovirus that is associated with wheat leaf yellowing disease, which would help to identify and control pathogens of wheat.

  15. Molecular comparisons of full length metapneumovirus (MPV genomes, including newly determined French AMPV-C and -D isolates, further supports possible subclassification within the MPV Genus.

    Directory of Open Access Journals (Sweden)

    Paul A Brown

    Full Text Available Four avian metapneumovirus (AMPV subgroups (A-D have been reported previously based on genetic and antigenic differences. However, until now full length sequences of the only known isolates of European subgroup C and subgroup D viruses (duck and turkey origin, respectively have been unavailable. These full length sequences were determined and compared with other full length AMPV and human metapneumoviruses (HMPV sequences reported previously, using phylogenetics, comparisons of nucleic and amino acid sequences and study of codon usage bias. Results confirmed that subgroup C viruses were more closely related to HMPV than they were to the other AMPV subgroups in the study. This was consistent with previous findings using partial genome sequences. Closer relationships between AMPV-A, B and D were also evident throughout the majority of results. Three metapneumovirus "clusters" HMPV, AMPV-C and AMPV-A, B and D were further supported by codon bias and phylogenetics. The data presented here together with those of previous studies describing antigenic relationships also between AMPV-A, B and D and between AMPV-C and HMPV may call for a subclassification of metapneumoviruses similar to that used for avian paramyxoviruses, grouping AMPV-A, B and D as type I metapneumoviruses and AMPV-C and HMPV as type II.

  16. Television Watching and Telomere Length Among Adults in Southwest China.

    Science.gov (United States)

    Xue, Hong-Mei; Liu, Qian-Qian; Tian, Guo; Quan, Li-Ming; Zhao, Yong; Cheng, Guo

    2017-09-01

    To explore the independent associations of sedentary behavior and physical activity with telomere length among Chinese adults. Data on total time of sedentary behavior, screen-based sedentary behavior (including television watching and computer or phone use), moderate to vigorous physical activity, and dietary intake of 518 adults in Chengdu, Guizhou, and Xiamen in China (54.25% women) aged 20 to 70 years were obtained between 2013 and 2015 through questionnaires. Height, weight, and waist circumference were measured to calculate body mass index and percentage of body fat. Telomere length was measured through Southern blot technique. Television watching was inversely related to adjusted telomere length (-71.75 base pair; SE = 34.40; P  = .04). Furthermore, a similar trend between telomere length and television watching was found in the group aged 20 to 40 years after adjusting for all covariates. Adults aged 20 to 40 years in the highest tertile of daily time spent on watching television had 4.0% shorter telomere length than adults in the lowest tertile (P = .03). Although the association is modest, television watching is inversely related to telomere length among Chinese adults, warranting further investigation in large prospective studies.

  17. Characterization of near full-length genomes of HIV type 1 strains in Denmark: Basis for a universal therapeutic vaccine

    DEFF Research Database (Denmark)

    Andresen, Betina S.; Vinner, Lasse; Tang, Sheila Tuyet

    2007-01-01

    We report here the near full-length sequence characterization of 17 Danish clinical HIV-1 strains isolated from HLA-A02 patients not in need of ART, with relatively low viral loads and normal CD4 cell counts. Sequencing was performed directly on DNA extracted from short-term cocultures of PBMCs...... of a universal immunotherapeutic vaccine construct based on these epitopes....

  18. De novo assembly and next-generation sequencing to analyse full-length gene variants from codon-barcoded libraries.

    Science.gov (United States)

    Cho, Namjin; Hwang, Byungjin; Yoon, Jung-ki; Park, Sangun; Lee, Joongoo; Seo, Han Na; Lee, Jeewon; Huh, Sunghoon; Chung, Jinsoo; Bang, Duhee

    2015-09-21

    Interpreting epistatic interactions is crucial for understanding evolutionary dynamics of complex genetic systems and unveiling structure and function of genetic pathways. Although high resolution mapping of en masse variant libraries renders molecular biologists to address genotype-phenotype relationships, long-read sequencing technology remains indispensable to assess functional relationship between mutations that lie far apart. Here, we introduce JigsawSeq for multiplexed sequence identification of pooled gene variant libraries by combining a codon-based molecular barcoding strategy and de novo assembly of short-read data. We first validate JigsawSeq on small sub-pools and observed high precision and recall at various experimental settings. With extensive simulations, we then apply JigsawSeq to large-scale gene variant libraries to show that our method can be reliably scaled using next-generation sequencing. JigsawSeq may serve as a rapid screening tool for functional genomics and offer the opportunity to explore evolutionary trajectories of protein variants.

  19. Insight into the transcriptome of Arthrobotrys conoides using high throughput sequencing.

    Science.gov (United States)

    Ramesh, Pandit; Reena, Patel; Amitbikram, Mohapatra; Chaitanya, Joshi; Anju, Kunjadia

    2015-12-01

    Arthrobotrys conoides is a nematode-trapping fungus belonging to Orbiliales, Ascomycota group, and traps prey nematodes by means of adhesive network. Fungus has a potential to be used as a biocontrol agent against plant parasitic nematodes. In the present study, we characterized the transcriptome of A. conoides using high-throughput sequencing technology and characterized its virulence unigenes. Total 7,255 cDNA contigs with an average length of 425 bp were generated and 6184 (61.81%) transcripts were functionally annotated and characterized. Majority of unigenes were found analogous to the genes of plant pathogenic fungi. A total of 1749 transcripts were found to be orthologous with eukaryotic proteins of KOG database. Several carbohydrate active enzymes and peptidases were identified. We also analyzed classically and nonclassically secreted proteins and confirmed by BLASTP against fungal secretome database. A total of 916 contigs were analogous to 556 unique proteins of Pathogen Host Interaction (PHI) database. Further, we identified 91 unigenes homologous to the database of fungal virulence factor (DFVF). A total of 104 putative protein kinases coding transcripts were identified by BLASTP against KinBase database, which are major players in signaling pathways. This study provides a comprehensive look at the transcriptome of A. conoides and the identified unigenes might have a role in catching and killing prey nematodes by A. conoides. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  20. Fast discovery and visualization of conserved regions in DNA sequences using quasi-alignment.

    Science.gov (United States)

    Nagar, Anurag; Hahsler, Michael

    2013-01-01

    Next Generation Sequencing techniques are producing enormous amounts of biological sequence data and analysis becomes a major computational problem. Currently, most analysis, especially the identification of conserved regions, relies heavily on Multiple Sequence Alignment and its various heuristics such as progressive alignment, whose run time grows with the square of the number and the length of the aligned sequences and requires significant computational resources. In this work, we present a method to efficiently discover regions of high similarity across multiple sequences without performing expensive sequence alignment. The method is based on approximating edit distance between segments of sequences using p-mer frequency counts. Then, efficient high-throughput data stream clustering is used to group highly similar segments into so called quasi-alignments. Quasi-alignments have numerous applications such as identifying species and their taxonomic class from sequences, comparing sequences for similarities, and, as in this paper, discovering conserved regions across related sequences. In this paper, we show that quasi-alignments can be used to discover highly similar segments across multiple sequences from related or different genomes efficiently and accurately. Experiments on a large number of unaligned 16S rRNA sequences obtained from the Greengenes database show that the method is able to identify conserved regions which agree with known hypervariable regions in 16S rRNA. Furthermore, the experiments show that the proposed method scales well for large data sets with a run time that grows only linearly with the number and length of sequences, whereas for existing multiple sequence alignment heuristics the run time grows super-linearly. Quasi-alignment-based algorithms can detect highly similar regions and conserved areas across multiple sequences. Since the run time is linear and the sequences are converted into a compact clustering model, we are able to

  1. Length and GC content variability of introns among teleostean genomes in the light of the metabolic rate hypothesis.

    Directory of Open Access Journals (Sweden)

    Ankita Chaurasia

    Full Text Available A comparative analysis of five teleostean genomes, namely zebrafish, medaka, three-spine stickleback, fugu and pufferfish was performed with the aim to highlight the nature of the forces driving both length and base composition of introns (i.e., bpi and GCi. An inter-genome approach using orthologous intronic sequences was carried out, analyzing independently both variables in pairwise comparisons. An average length shortening of introns was observed at increasing average GCi values. The result was not affected by masking transposable and repetitive elements harbored in the intronic sequences. The routine metabolic rate (mass specific temperature-corrected using the Boltzmann's factor was measured for each species. A significant correlation held between average differences of metabolic rate, length and GC content, while environmental temperature of fish habitat was not correlated with bpi and GCi. Analyzing the concomitant effect of both variables, i.e., bpi and GCi, at increasing genomic GC content, a decrease of bpi and an increase of GCi was observed for the significant majority of the intronic sequences (from ∼ 40% to ∼ 90%, in each pairwise comparison. The opposite event, concomitant increase of bpi and decrease of GCi, was counter selected (from <1% to ∼ 10%, in each pairwise comparison. The results further support the hypothesis that the metabolic rate plays a key role in shaping genome architecture and evolution of vertebrate genomes.

  2. Single Machine Scheduling and Due Date Assignment with Past-Sequence-Dependent Setup Time and Position-Dependent Processing Time

    Directory of Open Access Journals (Sweden)

    Chuan-Li Zhao

    2014-01-01

    Full Text Available This paper considers single machine scheduling and due date assignment with setup time. The setup time is proportional to the length of the already processed jobs; that is, the setup time is past-sequence-dependent (p-s-d. It is assumed that a job's processing time depends on its position in a sequence. The objective functions include total earliness, the weighted number of tardy jobs, and the cost of due date assignment. We analyze these problems with two different due date assignment methods. We first consider the model with job-dependent position effects. For each case, by converting the problem to a series of assignment problems, we proved that the problems can be solved in On4 time. For the model with job-independent position effects, we proved that the problems can be solved in On3 time by providing a dynamic programming algorithm.

  3. MinION™ nanopore sequencing of environmental metagenomes: a synthetic approach.

    Science.gov (United States)

    Brown, Bonnie L; Watson, Mick; Minot, Samuel S; Rivera, Maria C; Franklin, Rima B

    2017-03-01

    Environmental metagenomic analysis is typically accomplished by assigning taxonomy and/or function from whole genome sequencing or 16S amplicon sequences. Both of these approaches are limited, however, by read length, among other technical and biological factors. A nanopore-based sequencing platform, MinION™, produces reads that are ≥1 × 104 bp in length, potentially providing for more precise assignment, thereby alleviating some of the limitations inherent in determining metagenome composition from short reads. We tested the ability of sequence data produced by MinION (R7.3 flow cells) to correctly assign taxonomy in single bacterial species runs and in three types of low-complexity synthetic communities: a mixture of DNA using equal mass from four species, a community with one relatively rare (1%) and three abundant (33% each) components, and a mixture of genomic DNA from 20 bacterial strains of staggered representation. Taxonomic composition of the low-complexity communities was assessed by analyzing the MinION sequence data with three different bioinformatic approaches: Kraken, MG-RAST, and One Codex. Results: Long read sequences generated from libraries prepared from single strains using the version 5 kit and chemistry, run on the original MinION device, yielded as few as 224 to as many as 3497 bidirectional high-quality (2D) reads with an average overall study length of 6000 bp. For the single-strain analyses, assignment of reads to the correct genus by different methods ranged from 53.1% to 99.5%, assignment to the correct species ranged from 23.9% to 99.5%, and the majority of misassigned reads were to closely related organisms. A synthetic metagenome sequenced with the same setup yielded 714 high quality 2D reads of approximately 5500 bp that were up to 98% correctly assigned to the species level. Synthetic metagenome MinION libraries generated using version 6 kit and chemistry yielded from 899 to 3497 2D reads with lengths averaging 5700 bp with up

  4. Phylogenetic analysis of Gossypium L. using restriction fragment length polymorphism of repeated sequences.

    Science.gov (United States)

    Zhang, Meiping; Rong, Ying; Lee, Mi-Kyung; Zhang, Yang; Stelly, David M; Zhang, Hong-Bin

    2015-10-01

    Cotton is the world's leading textile fiber crop and is also grown as a bioenergy and food crop. Knowledge of the phylogeny of closely related species and the genome origin and evolution of polyploid species is significant for advanced genomics research and breeding. We have reconstructed the phylogeny of the cotton genus, Gossypium L., and deciphered the genome origin and evolution of its five polyploid species by restriction fragment analysis of repeated sequences. Nuclear DNA of 84 accessions representing 35 species and all eight genomes of the genus were analyzed. The phylogenetic tree of the genus was reconstructed using the parsimony method on 1033 polymorphic repeated sequence restriction fragments. The genome origin of its polyploids was determined by calculating the diploid-polyploid restriction fragment correspondence (RFC). The tree is consistent with the morphological classification, genome designation and geographic distribution of the species at subgenus, section and subsection levels. Gossypium lobatum (D7) was unambiguously shown to have the highest RFC with the D-subgenomes of all five polyploids of the genus, while the common ancestor of Gossypium herbaceum (A1) and Gossypium arboreum (A2) likely contributed to the A-subgenomes of the polyploids. These results provide a comprehensive phylogenetic tree of the cotton genus and new insights into the genome origin and evolution of its polyploid species. The results also further demonstrate a simple, rapid and inexpensive method suitable for phylogenetic analysis of closely related species, especially congeneric species, and the inference of genome origin of polyploids that constitute over 70 % of flowering plants.

  5. Large-scale analysis of full-length cDNAs from the tomato (Solanum lycopersicum) cultivar Micro-Tom, a reference system for the Solanaceae genomics.

    Science.gov (United States)

    Aoki, Koh; Yano, Kentaro; Suzuki, Ayako; Kawamura, Shingo; Sakurai, Nozomu; Suda, Kunihiro; Kurabayashi, Atsushi; Suzuki, Tatsuya; Tsugane, Taneaki; Watanabe, Manabu; Ooga, Kazuhide; Torii, Maiko; Narita, Takanori; Shin-I, Tadasu; Kohara, Yuji; Yamamoto, Naoki; Takahashi, Hideki; Watanabe, Yuichiro; Egusa, Mayumi; Kodama, Motoichiro; Ichinose, Yuki; Kikuchi, Mari; Fukushima, Sumire; Okabe, Akiko; Arie, Tsutomu; Sato, Yuko; Yazawa, Katsumi; Satoh, Shinobu; Omura, Toshikazu; Ezura, Hiroshi; Shibata, Daisuke

    2010-03-30

    The Solanaceae family includes several economically important vegetable crops. The tomato (Solanum lycopersicum) is regarded as a model plant of the Solanaceae family. Recently, a number of tomato resources have been developed in parallel with the ongoing tomato genome sequencing project. In particular, a miniature cultivar, Micro-Tom, is regarded as a model system in tomato genomics, and a number of genomics resources in the Micro-Tom-background, such as ESTs and mutagenized lines, have been established by an international alliance. To accelerate the progress in tomato genomics, we developed a collection of fully-sequenced 13,227 Micro-Tom full-length cDNAs. By checking redundant sequences, coding sequences, and chimeric sequences, a set of 11,502 non-redundant full-length cDNAs (nrFLcDNAs) was generated. Analysis of untranslated regions demonstrated that tomato has longer 5'- and 3'-untranslated regions than most other plants but rice. Classification of functions of proteins predicted from the coding sequences demonstrated that nrFLcDNAs covered a broad range of functions. A comparison of nrFLcDNAs with genes of sixteen plants facilitated the identification of tomato genes that are not found in other plants, most of which did not have known protein domains. Mapping of the nrFLcDNAs onto currently available tomato genome sequences facilitated prediction of exon-intron structure. Introns of tomato genes were longer than those of Arabidopsis and rice. According to a comparison of exon sequences between the nrFLcDNAs and the tomato genome sequences, the frequency of nucleotide mismatch in exons between Micro-Tom and the genome-sequencing cultivar (Heinz 1706) was estimated to be 0.061%. The collection of Micro-Tom nrFLcDNAs generated in this study will serve as a valuable genomic tool for plant biologists to bridge the gap between basic and applied studies. The nrFLcDNA sequences will help annotation of the tomato whole-genome sequence and aid in tomato functional

  6. Large-scale analysis of full-length cDNAs from the tomato (Solanum lycopersicum cultivar Micro-Tom, a reference system for the Solanaceae genomics

    Directory of Open Access Journals (Sweden)

    Kikuchi Mari

    2010-03-01

    Full Text Available Abstract Background The Solanaceae family includes several economically important vegetable crops. The tomato (Solanum lycopersicum is regarded as a model plant of the Solanaceae family. Recently, a number of tomato resources have been developed in parallel with the ongoing tomato genome sequencing project. In particular, a miniature cultivar, Micro-Tom, is regarded as a model system in tomato genomics, and a number of genomics resources in the Micro-Tom-background, such as ESTs and mutagenized lines, have been established by an international alliance. Results To accelerate the progress in tomato genomics, we developed a collection of fully-sequenced 13,227 Micro-Tom full-length cDNAs. By checking redundant sequences, coding sequences, and chimeric sequences, a set of 11,502 non-redundant full-length cDNAs (nrFLcDNAs was generated. Analysis of untranslated regions demonstrated that tomato has longer 5'- and 3'-untranslated regions than most other plants but rice. Classification of functions of proteins predicted from the coding sequences demonstrated that nrFLcDNAs covered a broad range of functions. A comparison of nrFLcDNAs with genes of sixteen plants facilitated the identification of tomato genes that are not found in other plants, most of which did not have known protein domains. Mapping of the nrFLcDNAs onto currently available tomato genome sequences facilitated prediction of exon-intron structure. Introns of tomato genes were longer than those of Arabidopsis and rice. According to a comparison of exon sequences between the nrFLcDNAs and the tomato genome sequences, the frequency of nucleotide mismatch in exons between Micro-Tom and the genome-sequencing cultivar (Heinz 1706 was estimated to be 0.061%. Conclusion The collection of Micro-Tom nrFLcDNAs generated in this study will serve as a valuable genomic tool for plant biologists to bridge the gap between basic and applied studies. The nrFLcDNA sequences will help annotation of the

  7. Long multiplication by instruction sequences with backward jump instructions

    NARCIS (Netherlands)

    Bergstra, J.A.; Middelburg, C.A.

    2013-01-01

    For each function on bit strings, its restriction to bit strings of any given length can be computed by a finite instruction sequence that contains only instructions to set and get the content of Boolean registers, forward jump instructions, and a termination instruction. Backward jump instructions

  8. Prehospital rapid sequence intubation improves functional outcome for patients with severe traumatic brain injury: a randomized controlled trial.

    Science.gov (United States)

    Bernard, Stephen A; Nguyen, Vina; Cameron, Peter; Masci, Kevin; Fitzgerald, Mark; Cooper, David J; Walker, Tony; Std, B Paramed; Myles, Paul; Murray, Lynne; David; Taylor; Smith, Karen; Patrick, Ian; Edington, John; Bacon, Andrew; Rosenfeld, Jeffrey V; Judson, Rodney

    2010-12-01

    To determine whether paramedic rapid sequence intubation in patients with severe traumatic brain injury (TBI) improves neurologic outcomes at 6 months compared with intubation in the hospital. Severe TBI is associated with a high rate of mortality and long-term morbidity. Comatose patients with TBI routinely undergo endo-tracheal intubation to protect the airway, prevent hypoxia, and control ventilation. In many places, paramedics perform intubation prior to hospital arrival. However, it is unknown whether this approach improves outcomes. In a prospective, randomized, controlled trial, we assigned adults with severe TBI in an urban setting to either prehospital rapid sequence intubation by paramedics or transport to a hospital emergency department for intubation by physicians. The primary outcome measure was the median extended Glasgow Outcome Scale (GOSe) score at 6 months. Secondary end-points were favorable versus unfavorable outcome at 6 months, length of intensive care and hospital stay, and survival to hospital discharge. A total of 312 patients with severe TBI were randomly assigned to paramedic rapid sequence intubation or hospital intubation. The success rate for paramedic intubation was 97%. At 6 months, the median GOSe score was 5 (interquartile range, 1-6) in patients intubated by paramedics compared with 3 (interquartile range, 1-6) in the patients intubated at hospital (P = 0.28).The proportion of patients with favorable outcome (GOSe, 5-8) was 80 of 157 patients (51%) in the paramedic intubation group compared with 56 of 142 patients (39%) in the hospital intubation group (risk ratio, 1.28; 95% confidence interval, 1.00-1.64; P = 0.046). There were no differences in intensive care or hospital length of stay, or in survival to hospital discharge. In adults with severe TBI, prehospital rapid sequence intubation by paramedics increases the rate of favorable neurologic outcome at 6 months compared with intubation in the hospital.

  9. Duplex scanning using sparse data sequences

    DEFF Research Database (Denmark)

    Møllenbach, S. K.; Jensen, Jørgen Arendt

    2008-01-01

    reconstruction of the missing samples possible. The periodic pattern has the length T = M + A samples, where M are for B-mode and A for velocity estimation. The missing samples can now be reconstructed using a filter bank. One filter bank reconstructs one missing sample, so the number of filter banks corresponds...... to M. The number of sub filters in every filter bank is the same as A. Every sub filter contains fractional delay (FD) filter and an interpolation function. Many different sequences can be selected to adapt the B-mode frame rate needed. The drawback of the method is that the maximum velocity detectable......, the fprf and the resolution are 15 MHz, 3.5 kHz, and 12 bit sample (8 kHz and 16 bit for the Carotid artery). The resulting data contains 8000 RF lines with 128 samples at a depth of 45 mm for the vein and 50 mm for Aorta. Sparse sequences are constructed from the full data sequences to have both...

  10. Low-Energy Electron-Induced Strand Breaks in Telomere-Derived DNA Sequences-Influence of DNA Sequence and Topology.

    Science.gov (United States)

    Rackwitz, Jenny; Bald, Ilko

    2018-03-26

    During cancer radiation therapy high-energy radiation is used to reduce tumour tissue. The irradiation produces a shower of secondary low-energy (DNA very efficiently by dissociative electron attachment. Recently, it was suggested that low-energy electron-induced DNA strand breaks strongly depend on the specific DNA sequence with a high sensitivity of G-rich sequences. Here, we use DNA origami platforms to expose G-rich telomere sequences to low-energy (8.8 eV) electrons to determine absolute cross sections for strand breakage and to study the influence of sequence modifications and topology of telomeric DNA on the strand breakage. We find that the telomeric DNA 5'-(TTA GGG) 2 is more sensitive to low-energy electrons than an intermixed sequence 5'-(TGT GTG A) 2 confirming the unique electronic properties resulting from G-stacking. With increasing length of the oligonucleotide (i.e., going from 5'-(GGG ATT) 2 to 5'-(GGG ATT) 4 ), both the variety of topology and the electron-induced strand break cross sections increase. Addition of K + ions decreases the strand break cross section for all sequences that are able to fold G-quadruplexes or G-intermediates, whereas the strand break cross section for the intermixed sequence remains unchanged. These results indicate that telomeric DNA is rather sensitive towards low-energy electron-induced strand breakage suggesting significant telomere shortening that can also occur during cancer radiation therapy. © 2018 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.

  11. Change in Length of Growing Season by State, 1895-2015

    Data.gov (United States)

    U.S. Environmental Protection Agency — This map shows the total change in length of the growing season, time of first fall frost and time of last spring frost from 1895 to 2015 for each of the contiguous...

  12. Is DNA a worm-like chain in Couette flow? In search of persistence length, a critical review.

    Science.gov (United States)

    Rittman, Martyn; Gilroy, Emma; Koohya, Hashem; Rodger, Alison; Richards, Adair

    2009-01-01

    Persistence length is the foremost measure of DNA flexibility. Its origins lie in polymer theory which was adapted for DNA following the determination of BDNA structure in 1953. There is no single definition of persistence length used, and the links between published definitions are based on assumptions which may, or may not be, clearly stated. DNA flexibility is affected by local ionic strength, solvent environment, bound ligands and intrinsic sequence-dependent flexibility. This article is a review of persistence length providing a mathematical treatment of the relationships between four definitions of persistence length, including: correlation, Kuhn length, bending, and curvature. Persistence length has been measured using various microscopy, force extension and solution methods such as linear dichroism and transient electric birefringence. For each experimental method a model of DNA is required to interpret the data. The importance of understanding the underlying models, along with the assumptions required by each definition to determine a value of persistence length, is highlighted for linear dichroism data, where it transpires that no model is currently available for long DNA or medium to high shear rate experiments.

  13. Transcriptome sequencing of the Microarray Quality Control (MAQC RNA reference samples using next generation sequencing

    Directory of Open Access Journals (Sweden)

    Thierry-Mieg Danielle

    2009-06-01

    Full Text Available Abstract Background Transcriptome sequencing using next-generation sequencing platforms will soon be competing with DNA microarray technologies for global gene expression analysis. As a preliminary evaluation of these promising technologies, we performed deep sequencing of cDNA synthesized from the Microarray Quality Control (MAQC reference RNA samples using Roche's 454 Genome Sequencer FLX. Results We generated more that 3.6 million sequence reads of average length 250 bp for the MAQC A and B samples and introduced a data analysis pipeline for translating cDNA read counts into gene expression levels. Using BLAST, 90% of the reads mapped to the human genome and 64% of the reads mapped to the RefSeq database of well annotated genes with e-values ≤ 10-20. We measured gene expression levels in the A and B samples by counting the numbers of reads that mapped to individual RefSeq genes in multiple sequencing runs to evaluate the MAQC quality metrics for reproducibility, sensitivity, specificity, and accuracy and compared the results with DNA microarrays and Quantitative RT-PCR (QRTPCR from the MAQC studies. In addition, 88% of the reads were successfully aligned directly to the human genome using the AceView alignment programs with an average 90% sequence similarity to identify 137,899 unique exon junctions, including 22,193 new exon junctions not yet contained in the RefSeq database. Conclusion Using the MAQC metrics for evaluating the performance of gene expression platforms, the ExpressSeq results for gene expression levels showed excellent reproducibility, sensitivity, and specificity that improved systematically with increasing shotgun sequencing depth, and quantitative accuracy that was comparable to DNA microarrays and QRTPCR. In addition, a careful mapping of the reads to the genome using the AceView alignment programs shed new light on the complexity of the human transcriptome including the discovery of thousands of new splice variants.

  14. Full-length Dysferlin Transfer by the Hyperactive Sleeping Beauty Transposase Restores Dysferlin-deficient Muscle

    Directory of Open Access Journals (Sweden)

    Helena Escobar

    2016-01-01

    Full Text Available Dysferlin-deficient muscular dystrophy is a progressive disease characterized by muscle weakness and wasting for which there is no treatment. It is caused by mutations in DYSF, a large, multiexonic gene that forms a coding sequence of 6.2 kb. Sleeping Beauty (SB transposon is a nonviral gene transfer vector, already used in clinical trials. The hyperactive SB system consists of a transposon DNA sequence and a transposase protein, SB100X, that can integrate DNA over 10 kb into the target genome. We constructed an SB transposon-based vector to deliver full-length human DYSF cDNA into dysferlin-deficient H2K A/J myoblasts. We demonstrate proper dysferlin expression as well as highly efficient engraftment (>1,100 donor-derived fibers of the engineered myoblasts in the skeletal muscle of dysferlin- and immunodeficient B6. Cg-Dysfprmd Prkdcscid/J (Scid/BLA/J mice. Nonviral gene delivery of full-length human dysferlin into muscle cells, along with a successful and efficient transplantation into skeletal muscle are important advances towards successful gene therapy of dysferlin-deficient muscular dystrophy.

  15. Molecular Comparisons of Full Length Metapneumovirus (MPV) Genomes, Including Newly Determined French AMPV-C and –D Isolates, Further Supports Possible Subclassification within the MPV Genus

    Science.gov (United States)

    Brown, Paul A.; Lemaitre, Evelyne; Briand, François-Xavier; Courtillon, Céline; Guionie, Olivier; Allée, Chantal; Toquin, Didier; Bayon-Auboyer, Marie-Hélène; Jestin, Véronique; Eterradossi, Nicolas

    2014-01-01

    Four avian metapneumovirus (AMPV) subgroups (A–D) have been reported previously based on genetic and antigenic differences. However, until now full length sequences of the only known isolates of European subgroup C and subgroup D viruses (duck and turkey origin, respectively) have been unavailable. These full length sequences were determined and compared with other full length AMPV and human metapneumoviruses (HMPV) sequences reported previously, using phylogenetics, comparisons of nucleic and amino acid sequences and study of codon usage bias. Results confirmed that subgroup C viruses were more closely related to HMPV than they were to the other AMPV subgroups in the study. This was consistent with previous findings using partial genome sequences. Closer relationships between AMPV-A, B and D were also evident throughout the majority of results. Three metapneumovirus “clusters” HMPV, AMPV-C and AMPV-A, B and D were further supported by codon bias and phylogenetics. The data presented here together with those of previous studies describing antigenic relationships also between AMPV-A, B and D and between AMPV-C and HMPV may call for a subclassification of metapneumoviruses similar to that used for avian paramyxoviruses, grouping AMPV-A, B and D as type I metapneumoviruses and AMPV-C and HMPV as type II. PMID:25036224

  16. From Conventional to Next Generation Sequencing of Epstein-Barr Virus Genomes.

    Science.gov (United States)

    Kwok, Hin; Chiang, Alan Kwok Shing

    2016-02-24

    Genomic sequences of Epstein-Barr virus (EBV) have been of interest because the virus is associated with cancers, such as nasopharyngeal carcinoma, and conditions such as infectious mononucleosis. The progress of whole-genome EBV sequencing has been limited by the inefficiency and cost of the first-generation sequencing technology. With the advancement of next-generation sequencing (NGS) and target enrichment strategies, increasing number of EBV genomes has been published. These genomes were sequenced using different approaches, either with or without EBV DNA enrichment. This review provides an overview of the EBV genomes published to date, and a description of the sequencing technology and bioinformatic analyses employed in generating these sequences. We further explored ways through which the quality of sequencing data can be improved, such as using DNA oligos for capture hybridization, and longer insert size and read length in the sequencing runs. These advances will enable large-scale genomic sequencing of EBV which will facilitate a better understanding of the genetic variations of EBV in different geographic regions and discovery of potentially pathogenic variants in specific diseases.

  17. From Conventional to Next Generation Sequencing of Epstein-Barr Virus Genomes

    Directory of Open Access Journals (Sweden)

    Hin Kwok

    2016-02-01

    Full Text Available Genomic sequences of Epstein–Barr virus (EBV have been of interest because the virus is associated with cancers, such as nasopharyngeal carcinoma, and conditions such as infectious mononucleosis. The progress of whole-genome EBV sequencing has been limited by the inefficiency and cost of the first-generation sequencing technology. With the advancement of next-generation sequencing (NGS and target enrichment strategies, increasing number of EBV genomes has been published. These genomes were sequenced using different approaches, either with or without EBV DNA enrichment. This review provides an overview of the EBV genomes published to date, and a description of the sequencing technology and bioinformatic analyses employed in generating these sequences. We further explored ways through which the quality of sequencing data can be improved, such as using DNA oligos for capture hybridization, and longer insert size and read length in the sequencing runs. These advances will enable large-scale genomic sequencing of EBV which will facilitate a better understanding of the genetic variations of EBV in different geographic regions and discovery of potentially pathogenic variants in specific diseases.

  18. Use of inter-simple sequence repeats and amplified fragment length polymorphisms to analyze genetic relationships among small grain-infecting species of ustilago.

    Science.gov (United States)

    Menzies, J G; Bakkeren, G; Matheson, F; Procunier, J D; Woods, S

    2003-02-01

    ABSTRACT In the smut fungi, few features are available for use as taxonomic criteria (spore size, shape, morphology, germination type, and host range). DNA-based molecular techniques are useful in expanding the traits considered in determining relationships among these fungi. We examined the phylogenetic relationships among seven species of Ustilago (U. avenae, U. bullata, U. hordei, U. kolleri, U. nigra, U. nuda, and U. tritici) using inter-simple sequence repeats (ISSRs) and amplified fragment length polymorphisms (AFLPs) to compare their DNA profiles. Fifty-four isolates of different Ustilago spp. were analyzed using ISSR primers, and 16 isolates of Ustilago were studied using AFLP primers. The variability among isolates within species was low for all species except U. bullata. The isolates of U. bullata, U. nuda, and U. tritici were well separated and our data supports their speciation. U. avenae and U. kolleri isolates did not separate from each other and there was little variability between these species. U. hordei and U. nigra isolates also showed little variability between species, but the isolates from each species grouped together. Our data suggest that U. avenae and U. kolleri are monophyletic and should be considered one species, as should U. hordei and U. nigra.

  19. CDKL5 regulates flagellar length and localizes to the base of the flagella in Chlamydomonas

    Science.gov (United States)

    Tam, Lai-Wa; Ranum, Paul T.; Lefebvre, Paul A.

    2013-01-01

    The length of Chlamydomonas flagella is tightly regulated. Mutations in four genes—LF1, LF2, LF3, and LF4—cause cells to assemble flagella up to three times wild-type length. LF2 and LF4 encode protein kinases. Here we describe a new gene, LF5, in which null mutations cause cells to assemble flagella of excess length. The LF5 gene encodes a protein kinase very similar in sequence to the protein kinase CDKL5. In humans, mutations in this kinase cause a severe form of juvenile epilepsy. The LF5 protein localizes to a unique location: the proximal 1 μm of the flagella. The proximal localization of the LF5 protein is lost when genes that make up the proteins in the cytoplasmic length regulatory complex (LRC)—LF1, LF2, and LF3—are mutated. In these mutants LF5p becomes localized either at the distal tip of the flagella or along the flagellar length, indicating that length regulation involves, at least in part, control of LF5p localization by the LRC. PMID:23283985

  20. Nucleotide sequence of Hungarian grapevine chrome mosaic nepovirus RNA1.

    OpenAIRE

    Le Gall, O; Candresse, T; Brault, V; Dunez, J

    1989-01-01

    The nucleotide sequence of the RNA1 of hungarian grapevine chrome mosaic virus, a nepovirus very closely related to tomato black ring virus, has been determined from cDNA clones. It is 7212 nucleotides in length excluding the 3' terminal poly(A) tail and contains a large open reading frame extending from nucleotides 216 to 6971. The presumably encoded polyprotein is 2252 amino acids in length with a molecular weight of 250 kDa. The primary structure of the polyprotein was compared with that o...

  1. Insect-computer hybrid legged robot with user-adjustable speed, step length and walking gait.

    Science.gov (United States)

    Cao, Feng; Zhang, Chao; Choo, Hao Yu; Sato, Hirotaka

    2016-03-01

    We have constructed an insect-computer hybrid legged robot using a living beetle (Mecynorrhina torquata; Coleoptera). The protraction/retraction and levation/depression motions in both forelegs of the beetle were elicited by electrically stimulating eight corresponding leg muscles via eight pairs of implanted electrodes. To perform a defined walking gait (e.g., gallop), different muscles were individually stimulated in a predefined sequence using a microcontroller. Different walking gaits were performed by reordering the applied stimulation signals (i.e., applying different sequences). By varying the duration of the stimulation sequences, we successfully controlled the step frequency and hence the beetle's walking speed. To the best of our knowledge, this paper presents the first demonstration of living insect locomotion control with a user-adjustable walking gait, step length and walking speed. © 2016 The Author(s).

  2. Construction of cDNA library and preliminary analysis of expressed sequence tags from Siberian tiger

    Science.gov (United States)

    Liu, Chang-Qing; Lu, Tao-Feng; Feng, Bao-Gang; Liu, Dan; Guan, Wei-Jun; Ma, Yue-Hui

    2010-01-01

    In this study we successfully constructed a full-length cDNA library from Siberian tiger, Panthera tigris altaica, the most well-known wild Animal. Total RNA was extracted from cultured Siberian tiger fibroblasts in vitro. The titers of primary and amplified libraries were 1.30×106 pfu/ml and 1.62×109 pfu/ml respectively. The proportion of recombinants from unamplified library was 90.5% and average length of exogenous inserts was 1.13 kb. A total of 282 individual ESTs with sizes ranging from 328 to 1,142bps were then analyzed the BLASTX score revealed that 53.9% of the sequences were classified as strong match, 38.6% as nominal and 7.4% as weak match. 28.0% of them were found to be related to enzyme/catalytic protein, 20.9% ESTs to metabolism, 13.1% ESTs to transport, 12.1% ESTs to signal transducer/cell communication, 9.9% ESTs to structure protein, 3.9% ESTs to immunity protein/defense metabolism, 3.2% ESTs to cell cycle, and 8.9 ESTs classified as novel genes. These results demonstrated that the reliability and representativeness of the cDNA library attained to the requirements of a standard cDNA library. This library provided a useful platform for the functional genomic research of Siberian tigers. PMID:20941376

  3. Production of a full-length infectious GFP-tagged cDNA clone of Beet mild yellowing virus for the study of plant-polerovirus interactions.

    Science.gov (United States)

    Stevens, Mark; Viganó, Felicita

    2007-04-01

    The full-length cDNA of Beet mild yellowing virus (Broom's Barn isolate) was sequenced and cloned into the vector pLitmus 29 (pBMYV-BBfl). The sequence of BMYV-BBfl (5721 bases) shared 96% and 98% nucleotide identity with the other complete sequences of BMYV (BMYV-2ITB, France and BMYV-IPP, Germany respectively). Full-length capped RNA transcripts of pBMYV-BBfl were synthesised and found to be biologically active in Arabidopsis thaliana protoplasts following electroporation or PEG inoculation when the protoplasts were subsequently analysed using serological and molecular methods. The BMYV sequence was modified by inserting DNA that encoded the jellyfish green fluorescent protein (GFP) into the P5 gene close to its 3' end. A. thaliana protoplasts electroporated with these RNA transcripts were biologically active and up to 2% of transfected protoplasts showed GFP-specific fluorescence. The exploitation of these cDNA clones for the study of the biology of beet poleroviruses is discussed.

  4. 3G vector-primer plasmid for constructing full-length-enriched cDNA libraries.

    Science.gov (United States)

    Zheng, Dong; Zhou, Yanna; Zhang, Zidong; Li, Zaiyu; Liu, Xuedong

    2008-09-01

    We designed a 3G vector-primer plasmid for the generation of full-length-enriched complementary DNA (cDNA) libraries. By employing the terminal transferase activity of reverse transcriptase and the modified strand replacement method, this plasmid (assembled with a polydT end and a deoxyguanosine [dG] end) combines priming full-length cDNA strand synthesis and directional cDNA cloning. As a result, the number of steps involved in cDNA library preparation is decreased while simplifying downstream gene manipulation, sequencing, and subcloning. The 3G vector-primer plasmid method yields fully represented plasmid primed libraries that are equivalent to those made by the SMART (switching mechanism at 5' end of RNA transcript) approach.

  5. Diagnostic value of newborn foot length to predict gestational age

    Directory of Open Access Journals (Sweden)

    Mutia Farah Fawziah

    2017-08-01

    Full Text Available Background  Identification of gestational age, especially within 48 hours of birth, is crucial for newborns, as the earlier preterm status is detected, the earlier the child can receive optimal management. Newborn foot length is an anthropometric measurement which is easy to perform, inexpensive, and potentially efficient for predicting gestational age. Objective  To analyze the diagnostic value of newborn foot length in predicting gestational age. Methods  This diagnostic study was performed between October 2016 and February 2017 in the High Care Unit of Neonates at Dr. Moewardi General Hospital, Surakarta. A total of 152 newborns were consecutively selected and underwent right foot length measurements before 96 hours of age. The correlation between newborn foot length to classify as full term and gestational age was analyzed with Spearman’s correlation test because of non-normal data distribution. The cut-off point of newborn foot length was calculated by receiver operating characteristic (ROC curve and diagnostic values of newborn foot length were analyzed by 2 x 2 table with SPSS 21.0 software. Results There were no significant differences between male and female newborns in terms of gestational age, birth weight, choronological age, and newborn foot length (P>0.05. Newborn foot length and gestational age had a significant correlation (r=0.53; P=0.000. The optimal cut-off newborn foot length to predict full term status was 7.1 cm. Newborn foot length below 7.1 cm had sensitivity 75%, specificity 98%, positive predictive value 94.3%, negative predictive value 90.6%, positive likelihood ratio 40.5, negative likelihood ratio 0.25, and post-test probability 94.29%, to predict preterm status in newborns. Conclusion  Newborn foot length can be used to predict gestational age, especially for the purpose of differentiating between preterm and full term newborns.

  6. Length weight relationship of Sufflamen fraenatus (Latreille, 1804) and Zenodon niger (Ruppell, 1835)

    Digital Repository Service at National Institute of Oceanography (India)

    Sahayak, S

    The relationship between total length and total weight in balistids is not significantly different in males and females. The common equation for both the sexes in Sufflamen fraenatus is Log W= -9.0429+2.7296 log L or W=0.000000000905 L sup(2...

  7. Microsatellite Development for an Endangered Bream Megalobrama pellegrini (Teleostei, Cyprinidae Using 454 Sequencing

    Directory of Open Access Journals (Sweden)

    Zuogang Peng

    2012-03-01

    Full Text Available Megalobrama pellegrini is an endemic fish species found in the upper Yangtze River basin in China. This species has become endangered due to the construction of the Three Gorges Dam and overfishing. However, the available genetic data for this species is limited. Here, we developed 26 polymorphic microsatellite markers from the M. pellegrini genome using next-generation sequencing techniques. A total of 257,497 raw reads were obtained from a quarter-plate run on 454 GS-FLX titanium platforms and 49,811 unique sequences were generated with an average length of 404 bp; 24,522 (49.2% sequences contained microsatellite repeats. Of the 53 loci screened, 33 were amplified successfully and 26 were polymorphic. The genetic diversity in M. pellegrini was moderate, with an average of 3.08 alleles per locus, and the mean observed and expected heterozygosity were 0.47 and 0.51, respectively. In addition, we tested cross-species amplification for all 33 loci in four additional breams: M. amblycephala, M. skolkovii, M. terminalis, and Sinibrama wui. The cross-species amplification showed a significant high level of transferability (79%–97%, which might be due to their dramatically close genetic relationships. The polymorphic microsatellites developed in the current study will not only contribute to further conservation genetic studies and parentage analyses of this endangered species, but also facilitate future work on the other closely related species.

  8. Gelada vocal sequences follow Menzerath’s linguistic law

    Science.gov (United States)

    Gustison, Morgan L.; Semple, Stuart; Ferrer-i-Cancho, Ramon; Bergman, Thore J.

    2016-01-01

    Identifying universal principles underpinning diverse natural systems is a key goal of the life sciences. A powerful approach in addressing this goal has been to test whether patterns consistent with linguistic laws are found in nonhuman animals. Menzerath’s law is a linguistic law that states that, the larger the construct, the smaller the size of its constituents. Here, to our knowledge, we present the first evidence that Menzerath’s law holds in the vocal communication of a nonhuman species. We show that, in vocal sequences of wild male geladas (Theropithecus gelada), construct size (sequence size in number of calls) is negatively correlated with constituent size (duration of calls). Call duration does not vary significantly with position in the sequence, but call sequence composition does change with sequence size and most call types are abbreviated in larger sequences. We also find that intercall intervals follow the same relationship with sequence size as do calls. Finally, we provide formal mathematical support for the idea that Menzerath’s law reflects compression—the principle of minimizing the expected length of a code. Our findings suggest that a common principle underpins human and gelada vocal communication, highlighting the value of exploring the applicability of linguistic laws in vocal systems outside the realm of language. PMID:27091968

  9. Isolation and Whole-genome Sequence Analysis of the Imipenem Heteroresistant Acinetobacter baumannii Clinical Isolate HRAB-85.

    Science.gov (United States)

    Li, Puyuan; Huang, Yong; Yu, Lan; Liu, Yannan; Niu, Wenkai; Zou, Dayang; Liu, Huiying; Zheng, Jing; Yin, Xiuyun; Yuan, Jing; Yuan, Xin; Bai, Changqing

    2017-09-01

    Heteroresistance is a phenomenon in which there are various responses to antibiotics from bacterial cells within the same population. Here, we isolated and characterised an imipenem heteroresistant Acinetobacter baumannii strain (HRAB-85). The genome of strain HRAB-85 was completely sequenced and analysed to understand its antibiotic resistance mechanisms. Population analysis and multilocus sequence typing were performed. Subpopulations grew in the presence of imipenem at concentrations of up to 64μg/mL, and the strain was found to belong to ST208. The total length of strain HRAB-85 was 4,098,585bp with a GC content of 39.98%. The genome harboured at least four insertion sequences: the common ISAba1, ISAba22, ISAba24, and newly reported ISAba26. Additionally, 19 antibiotic-resistance genes against eight classes of antimicrobial agents were found, and 11 genomic islands (GIs) were identified. Among them, GI3, GI10, and GI11 contained many ISs and antibiotic-resistance determinants. The existence of imipenem heteroresistant phenotypes in A. baumannii was substantiated in this hospital, and imipenem pressure, which could induce imipenem-heteroresistant subpopulations, may select for highly resistant strains. The complete genome sequencing and bioinformatics analysis of HRAB-85 could improve our understanding of the epidemiology and resistance mechanisms of carbapenem-heteroresistant A. baumannii. Copyright © 2017. Published by Elsevier Ltd.

  10. Complete Plastid Genome Sequencing of Four Tilia Species (Malvaceae: A Comparative Analysis and Phylogenetic Implications.

    Directory of Open Access Journals (Sweden)

    Jie Cai

    Full Text Available Tilia is an ecologically and economically important genus in the family Malvaceae. However, there is no complete plastid genome of Tilia sequenced to date, and the taxonomy of Tilia is difficult owing to frequent hybridization and polyploidization. A well-supported interspecific relationships of this genus is not available due to limited informative sites from the commonly used molecular markers. We report here the complete plastid genome sequences of four Tilia species determined by the Illumina technology. The Tilia plastid genome is 162,653 bp to 162,796 bp in length, encoding 113 unique genes and a total number of 130 genes. The gene order and organization of the Tilia plastid genome exhibits the general structure of angiosperms and is very similar to other published plastid genomes of Malvaceae. As other long-lived tree genera, the sequence divergence among the four Tilia plastid genomes is very low. And we analyzed the nucleotide substitution patterns and the evolution of insertions and deletions in the Tilia plastid genomes. Finally, we build a phylogeny of the four sampled Tilia species with high supports using plastid phylogenomics, suggesting that it is an efficient way to resolve the phylogenetic relationships of this genus.

  11. Telomere length analysis.

    Science.gov (United States)

    Canela, Andrés; Klatt, Peter; Blasco, María A

    2007-01-01

    Most somatic cells of long-lived species undergo telomere shortening throughout life. Critically short telomeres trigger loss of cell viability in tissues, which has been related to alteration of tissue function and loss of regenerative capabilities in aging and aging-related diseases. Hence, telomere length is an important biomarker for aging and can be used in the prognosis of aging diseases. These facts highlight the importance of developing methods for telomere length determination that can be employed to evaluate telomere length during the human aging process. Telomere length quantification methods have improved greatly in accuracy and sensitivity since the development of the conventional telomeric Southern blot. Here, we describe the different methodologies recently developed for telomere length quantification, as well as their potential applications for human aging studies.

  12. A parallel VLSI architecture for a digital filter of arbitrary length using Fermat number transforms

    Science.gov (United States)

    Truong, T. K.; Reed, I. S.; Yeh, C. S.; Shao, H. M.

    1982-01-01

    A parallel architecture for computation of the linear convolution of two sequences of arbitrary lengths using the Fermat number transform (FNT) is described. In particular a pipeline structure is designed to compute a 128-point FNT. In this FNT, only additions and bit rotations are required. A standard barrel shifter circuit is modified so that it performs the required bit rotation operation. The overlap-save method is generalized for the FNT to compute a linear convolution of arbitrary length. A parallel architecture is developed to realize this type of overlap-save method using one FNT and several inverse FNTs of 128 points. The generalized overlap save method alleviates the usual dynamic range limitation in FNTs of long transform lengths. Its architecture is regular, simple, and expandable, and therefore naturally suitable for VLSI implementation.

  13. Integrated genome sequence and linkage map of physic nut (Jatropha curcas L.), a biodiesel plant.

    Science.gov (United States)

    Wu, Pingzhi; Zhou, Changpin; Cheng, Shifeng; Wu, Zhenying; Lu, Wenjia; Han, Jinli; Chen, Yanbo; Chen, Yan; Ni, Peixiang; Wang, Ying; Xu, Xun; Huang, Ying; Song, Chi; Wang, Zhiwen; Shi, Nan; Zhang, Xudong; Fang, Xiaohua; Yang, Qing; Jiang, Huawu; Chen, Yaping; Li, Meiru; Wang, Ying; Chen, Fan; Wang, Jun; Wu, Guojiang

    2015-03-01

    The family Euphorbiaceae includes some of the most efficient biomass accumulators. Whole genome sequencing and the development of genetic maps of these species are important components in molecular breeding and genetic improvement. Here we report the draft genome of physic nut (Jatropha curcas L.), a biodiesel plant. The assembled genome has a total length of 320.5 Mbp and contains 27,172 putative protein-coding genes. We established a linkage map containing 1208 markers and anchored the genome assembly (81.7%) to this map to produce 11 pseudochromosomes. After gene family clustering, 15,268 families were identified, of which 13,887 existed in the castor bean genome. Analysis of the genome highlighted specific expansion and contraction of a number of gene families during the evolution of this species, including the ribosome-inactivating proteins and oil biosynthesis pathway enzymes. The genomic sequence and linkage map provide a valuable resource not only for fundamental and applied research on physic nut but also for evolutionary and comparative genomics analysis, particularly in the Euphorbiaceae. © 2015 The Authors The Plant Journal © 2015 John Wiley & Sons Ltd.

  14. Sequencing and analysis of the gene-rich space of cowpea

    Directory of Open Access Journals (Sweden)

    Cheung Foo

    2008-02-01

    Full Text Available Abstract Background Cowpea, Vigna unguiculata (L. Walp., is one of the most important food and forage legumes in the semi-arid tropics because of its drought tolerance and ability to grow on poor quality soils. Approximately 80% of cowpea production takes place in the dry savannahs of tropical West and Central Africa, mostly by poor subsistence farmers. Despite its economic and social importance in the developing world, cowpea remains to a large extent an underexploited crop. Among the major goals of cowpea breeding and improvement programs is the stacking of desirable agronomic traits, such as disease and pest resistance and response to abiotic stresses. Implementation of marker-assisted selection and breeding programs is severely limited by a paucity of trait-linked markers and a general lack of information on gene structure and organization. With a nuclear genome size estimated at ~620 Mb, the cowpea genome is an ideal target for reduced representation sequencing. Results We report here the sequencing and analysis of the gene-rich, hypomethylated portion of the cowpea genome selectively cloned by methylation filtration (MF technology. Over 250,000 gene-space sequence reads (GSRs with an average length of 610 bp were generated, yielding ~160 Mb of sequence information. The GSRs were assembled, annotated by BLAST homology searches of four public protein annotation databases and four plant proteomes (A. thaliana, M. truncatula, O. sativa, and P. trichocarpa, and analyzed using various domain and gene modeling tools. A total of 41,260 GSR assemblies and singletons were annotated, of which 19,786 have unique GenBank accession numbers. Within the GSR dataset, 29% of the sequences were annotated using the Arabidopsis Gene Ontology (GO with the largest categories of assigned function being catalytic activity and metabolic processes, groups that include the majority of cellular enzymes and components of amino acid, carbohydrate and lipid metabolism. A

  15. [Comparison of rDNA internal transcribed spacer sequences in asparagus].

    Science.gov (United States)

    Ou, Li-Jun; Ye, Wei; Zeng, Gui-Ping; Jiang, Xiang-Hui; She, Chao-Wen; Xu, Dong; Yang, Jia-Qiang

    2010-10-01

    Using ITS sequence of nine species to identify counterfeiting medicine and analyse phylogenetic of Asparagus. Analysing ITS sequences by amplification, cloning,sequencing and alignment. The length range of ITS sequence of nine species was from 711 to 748 bp, the percentage of G + C content was about 60%. The phylogenetic tree constructed on the basis of the ITS sequences showed that nine species were divided into two branches: Asparagus cochinchinensis, Asparagus officinalis, Asparagus densiflorus, Asparagus densiflorus cv. Myers and Asparagus densiflorus cv. Sprengeri were a branch and the others were a branch. Asparagus densiflorus and Asparagus densflorus cv. Myers those were from Africa had priority to clustering and then clustering with Asparagus densiflorus cv. Sprengeri that was a variant of Asparagus densiflorus in the first branch. Asparagus setaceus had relatively distant genetic relationship with the others three materials in another branch. The ITS sequences could distinguish species of Asparagus to test the counterfeit. Division status in phylogenetic tree of some species were debatable and ITS sequence was combined with others analytical tools to analyze the realistic phylogeny.

  16. A programmable method for massively parallel targeted sequencing

    Science.gov (United States)

    Hopmans, Erik S.; Natsoulis, Georges; Bell, John M.; Grimes, Susan M.; Sieh, Weiva; Ji, Hanlee P.

    2014-01-01

    We have developed a targeted resequencing approach referred to as Oligonucleotide-Selective Sequencing. In this study, we report a series of significant improvements and novel applications of this method whereby the surface of a sequencing flow cell is modified in situ to capture specific genomic regions of interest from a sample and then sequenced. These improvements include a fully automated targeted sequencing platform through the use of a standard Illumina cBot fluidics station. Targeting optimization increased the yield of total on-target sequencing data 2-fold compared to the previous iteration, while simultaneously increasing the percentage of reads that could be mapped to the human genome. The described assays cover up to 1421 genes with a total coverage of 5.5 Megabases (Mb). We demonstrate a 10-fold abundance uniformity of greater than 90% in 1 log distance from the median and a targeting rate of up to 95%. We also sequenced continuous genomic loci up to 1.5 Mb while simultaneously genotyping SNPs and genes. Variants with low minor allele fraction were sensitively detected at levels of 5%. Finally, we determined the exact breakpoint sequence of cancer rearrangements. Overall, this approach has high performance for selective sequencing of genome targets, configuration flexibility and variant calling accuracy. PMID:24782526

  17. The male-specific region of the human Y chromosome is a mosaic of discrete sequence classes

    NARCIS (Netherlands)

    Skaletsky, Helen; Kuroda-Kawaguchi, Tomoko; Minx, Patrick J.; Cordum, Holland S.; Hillier, LaDeana; Brown, Laura G.; Repping, Sjoerd; Pyntikova, Tatyana; Ali, Johar; Bieri, Tamberlyn; Chinwalla, Asif; Delehaunty, Andrew; Delehaunty, Kim; Du, Hui; Fewell, Ginger; Fulton, Lucinda; Fulton, Robert; Graves, Tina; Hou, Shun-Fang; Latrielle, Philip; Leonard, Shawn; Mardis, Elaine; Maupin, Rachel; McPherson, John; Miner, Tracie; Nash, William; Nguyen, Christine; Ozersky, Philip; Pepin, Kymberlie; Rock, Susan; Rohlfing, Tracy; Scott, Kelsi; Schultz, Brian; Strong, Cindy; Tin-Wollam, Aye; Yang, Shiaw-Pyng; Waterston, Robert H.; Wilson, Richard K.; Rozen, Steve; Page, David C.

    2003-01-01

    The male-specific region of the Y chromosome, the MSY, differentiates the sexes and comprises 95% of the chromosome's length. Here, we report that the MSY is a mosaic of heterochromatic sequences and three classes of euchromatic sequences: X-transposed, X-degenerate and ampliconic. These classes

  18. PSSRdb: a relational database of polymorphic simple sequence repeats extracted from prokaryotic genomes.

    Science.gov (United States)

    Kumar, Pankaj; Chaitanya, Pasumarthy S; Nagarajaram, Hampapathalu A

    2011-01-01

    PSSRdb (Polymorphic Simple Sequence Repeats database) (http://www.cdfd.org.in/PSSRdb/) is a relational database of polymorphic simple sequence repeats (PSSRs) extracted from 85 different species of prokaryotes. Simple sequence repeats (SSRs) are the tandem repeats of nucleotide motifs of the sizes 1-6 bp and are highly polymorphic. SSR mutations in and around coding regions affect transcription and translation of genes. Such changes underpin phase variations and antigenic variations seen in some bacteria. Although SSR-mediated phase variation and antigenic variations have been well-studied in some bacteria there seems a lot of other species of prokaryotes yet to be investigated for SSR mediated adaptive and other evolutionary advantages. As a part of our on-going studies on SSR polymorphism in prokaryotes we compared the genome sequences of various strains and isolates available for 85 different species of prokaryotes and extracted a number of SSRs showing length variations and created a relational database called PSSRdb. This database gives useful information such as location of PSSRs in genomes, length variation across genomes, the regions harboring PSSRs, etc. The information provided in this database is very useful for further research and analysis of SSRs in prokaryotes.

  19. The complete mitochondrial genome sequence of Oceanic whitetip shark, Carcharhinus longimanus (Carcharhiniformes: Carcharhinidae).

    Science.gov (United States)

    Li, Weiwen; Dai, Xiaojie; Xu, Qianghua; Wu, Feng; Gao, Chunxia; Zhang, Yanbo

    2016-05-01

    The complete mitochondrial DNA sequence of Carcharhinus longimanus was determined and analyzed. The complete mtDNA genome sequence of C. longimanus was 16,706 bp in length. It contained 22 tRNA genes, 2 rRNA genes, 13 protein-coding genes and 2 non-conding regions: control region (D-loop) and origin of light-strand replication (OL). The complete mitogenome sequence information of C. longimanus can provide a useful data for further studies on molecular systematics, stock evaluation, taxonomic status and conservation genetics.

  20. Transcriptome sequencing and positive selected genes analysis of Bombyx mandarina.

    Directory of Open Access Journals (Sweden)

    Tingcai Cheng

    Full Text Available The wild silkworm Bombyx mandarina is widely believed to be an ancestor of the domesticated silkworm, Bombyx mori. Silkworms are often used as a model for studying the mechanism of species domestication. Here, we performed transcriptome sequencing of the wild silkworm using an Illumina HiSeq2000 platform. We produced 100,004,078 high-quality reads and assembled them into 50,773 contigs with an N50 length of 1764 bp and a mean length of 941.62 bp. A total of 33,759 unigenes were identified, with 12,805 annotated in the Nr database, 8273 in the Pfam database, and 9093 in the Swiss-Prot database. Expression profile analysis found significant differential expression of 1308 unigenes between the middle silk gland (MSG and posterior silk gland (PSG. Three sericin genes (sericin 1, sericin 2, and sericin 3 were expressed specifically in the MSG and three fibroin genes (fibroin-H, fibroin-L, and fibroin/P25 were expressed specifically in the PSG. In addition, 32,297 Single-nucleotide polymorphisms (SNPs and 361 insertion-deletions (INDELs were detected. Comparison with the domesticated silkworm p50/Dazao identified 5,295 orthologous genes, among which 400 might have experienced or to be experiencing positive selection by Ka/Ks analysis. These data and analyses presented here provide insights into silkworm domestication and an invaluable resource for wild silkworm genomics research.

  1. Sequence Capture and Phylogenetic Utility of Genomic Ultraconserved Elements Obtained from Pinned Insect Specimens.

    Directory of Open Access Journals (Sweden)

    Bonnie B Blaimer

    Full Text Available Obtaining sequence data from historical museum specimens has been a growing research interest, invigorated by next-generation sequencing methods that allow inputs of highly degraded DNA. We applied a target enrichment and next-generation sequencing protocol to generate ultraconserved elements (UCEs from 51 large carpenter bee specimens (genus Xylocopa, representing 25 species with specimen ages ranging from 2-121 years. We measured the correlation between specimen age and DNA yield (pre- and post-library preparation DNA concentration and several UCE sequence capture statistics (raw read count, UCE reads on target, UCE mean contig length and UCE locus count with linear regression models. We performed piecewise regression to test for specific breakpoints in the relationship of specimen age and DNA yield and sequence capture variables. Additionally, we compared UCE data from newer and older specimens of the same species and reconstructed their phylogeny in order to confirm the validity of our data. We recovered 6-972 UCE loci from samples with pre-library DNA concentrations ranging from 0.06-9.8 ng/μL. All investigated DNA yield and sequence capture variables were significantly but only moderately negatively correlated with specimen age. Specimens of age 20 years or less had significantly higher pre- and post-library concentrations, UCE contig lengths, and locus counts compared to specimens older than 20 years. We found breakpoints in our data indicating a decrease of the initial detrimental effect of specimen age on pre- and post-library DNA concentration and UCE contig length starting around 21-39 years after preservation. Our phylogenetic results confirmed the integrity of our data, giving preliminary insights into relationships within Xylocopa. We consider the effect of additional factors not measured in this study on our age-related sequence capture results, such as DNA fragmentation and preservation method, and discuss the promise of the UCE

  2. Transcriptional blockages in a cell-free system by sequence-selective DNA alkylating agents.

    Science.gov (United States)

    Ferguson, L R; Liu, A P; Denny, W A; Cullinane, C; Talarico, T; Phillips, D R

    2000-04-14

    There is considerable interest in DNA sequence-selective DNA-binding drugs as potential inhibitors of gene expression. Five compounds with distinctly different base pair specificities were compared in their effects on the formation and elongation of the transcription complex from the lac UV5 promoter in a cell-free system. All were tested at drug levels which killed 90% of cells in a clonogenic survival assay. Cisplatin, a selective alkylator at purine residues, inhibited transcription, decreasing the full-length transcript, and causing blockage at a number of GG or AG sequences, making it probable that intrastrand crosslinks are the blocking lesions. A cyclopropylindoline known to be an A-specific alkylator also inhibited transcription, with blocks at adenines. The aniline mustard chlorambucil, that targets primarily G but also A sequences, was also effective in blocking the formation of full-length transcripts. It produced transcription blocks either at, or one base prior to, AA or GG sequences, suggesting that intrastrand crosslinks could again be involved. The non-alkylating DNA minor groove binder Hoechst 33342 (a bisbenzimidazole) blocked formation of the full-length transcript, but without creating specific blockage sites. A bisbenzimidazole-linked aniline mustard analogue was a more effective transcription inhibitor than either chlorambucil or Hoechst 33342, with different blockage sites occurring immediately as compared with 2 h after incubation. The blockages were either immediately prior to AA or GG residues, or four to five base pairs prior to such sites, a pattern not predicted from in vitro DNA-binding studies. Minor groove DNA-binding ligands are of particular interest as inhibitors of gene expression, since they have the potential ability to bind selectively to long sequences of DNA. The results suggest that the bisbenzimidazole-linked mustard does cause alkylation and transcription blockage at novel DNA sites. in addition to sites characteristic of

  3. Health inequalities in Ethiopia: modeling inequalities in length of life within and between population groups.

    Science.gov (United States)

    Tranvåg, Eirik Joakim; Ali, Merima; Norheim, Ole Frithjof

    2013-07-11

    Most studies on health inequalities use average measures, but describing the distribution of health can also provide valuable knowledge. In this paper, we estimate and compare within-group and between-group inequalities in length of life for population groups in Ethiopia in 2000 and 2011. We used data from the 2011 and 2000 Ethiopia Demographic and Health Survey and the Global Burden of Disease study 2010, and the MODMATCH modified logit life table system developed by the World Health Organization to model mortality rates, life expectancy, and length of life for Ethiopian population groups stratified by wealth quintiles, gender and residence. We then estimated and compared within-group and between-group inequality in length of life using the Gini index and absolute length of life inequality. Length of life inequality has decreased and life expectancy has increased for all population groups between 2000 and 2011. Length of life inequality within wealth quintiles is about three times larger than the between-group inequality of 9 years. Total length of life inequality in Ethiopia was 27.6 years in 2011. Longevity has increased and the distribution of health in Ethiopia is more equal in 2011 than 2000, with length of life inequality reduced for all population groups. Still there is considerable potential for further improvement. In the Ethiopian context with a poor and highly rural population, inequality in length of life within wealth quintiles is considerably larger than between them. This suggests that other factors than wealth substantially contribute to total health inequality in Ethiopia and that identification and quantification of these factors will be important for identifying proper measures to further reduce length of life inequality.

  4. A Novel Ant Colony Algorithm for the Single-Machine Total Weighted Tardiness Problem with Sequence Dependent Setup Times

    Directory of Open Access Journals (Sweden)

    Fardin Ahmadizar

    2011-08-01

    Full Text Available This paper deals with the NP-hard single-machine total weighted tardiness problem with sequence dependent setup times. Incorporating fuzzy sets and genetic operators, a novel ant colony optimization algorithm is developed for the problem. In the proposed algorithm, artificial ants construct solutions as orders of jobs based on the heuristic information as well as pheromone trails. To calculate the heuristic information, three well-known priority rules are adopted as fuzzy sets and then aggregated. When all artificial ants have terminated their constructions, genetic operators such as crossover and mutation are applied to generate new regions of the solution space. A local search is then performed to improve the performance quality of some of the solutions found. Moreover, at run-time the pheromone trails are locally as well as globally updated, and limited between lower and upper bounds. The proposed algorithm is experimented on a set of benchmark problems from the literature and compared with other metaheuristics.

  5. Development of Insertion and Deletion Markers for Bottle Gourd Based on Restriction Site-associated DNA Sequencing Data

    Directory of Open Access Journals (Sweden)

    Xinyi WU

    2017-01-01

    Full Text Available Bottle gourd is an important cucurbit crop worldwide. To provide more available molecular markers for this crop, a bioinformatic approach was employed to develop insertion–deletions (InDels markers in bottle gourd based on restriction site-associated DNA sequencing (RAD-Seq data. A total of 892 Indels were predicted, with the length varying from 1 bp to 167 bp. Single-nucleotide InDels were the predominant types of InDels. To validate these InDels, PCR primers were designed from 162 loci where InDels longer than 2 bp were predicated. A total of 112 InDels were found to be polymorphic among 9 bottle gourd accessions under investigation. The rate of prediction accuracy was thus at a high level of 72.7%. DNA fingerprinting for 4 cultivars were performed using 8 selected Indels markers, demonstrating the usefulness of these markers.

  6. Error Analysis of Deep Sequencing of Phage Libraries: Peptides Censored in Sequencing

    Directory of Open Access Journals (Sweden)

    Wadim L. Matochko

    2013-01-01

    Full Text Available Next-generation sequencing techniques empower selection of ligands from phage-display libraries because they can detect low abundant clones and quantify changes in the copy numbers of clones without excessive selection rounds. Identification of errors in deep sequencing data is the most critical step in this process because these techniques have error rates >1%. Mechanisms that yield errors in Illumina and other techniques have been proposed, but no reports to date describe error analysis in phage libraries. Our paper focuses on error analysis of 7-mer peptide libraries sequenced by Illumina method. Low theoretical complexity of this phage library, as compared to complexity of long genetic reads and genomes, allowed us to describe this library using convenient linear vector and operator framework. We describe a phage library as N×1 frequency vector n=ni, where ni is the copy number of the ith sequence and N is the theoretical diversity, that is, the total number of all possible sequences. Any manipulation to the library is an operator acting on n. Selection, amplification, or sequencing could be described as a product of a N×N matrix and a stochastic sampling operator (Sa. The latter is a random diagonal matrix that describes sampling of a library. In this paper, we focus on the properties of Sa and use them to define the sequencing operator (Seq. Sequencing without any bias and errors is Seq=Sa IN, where IN is a N×N unity matrix. Any bias in sequencing changes IN to a nonunity matrix. We identified a diagonal censorship matrix (CEN, which describes elimination or statistically significant downsampling, of specific reads during the sequencing process.

  7. Using RNA-Seq to assemble a rose transcriptome with more than 13,000 full-length expressed genes and to develop the WagRhSNP 68k Axiom SNP array for rose (Rosa L.

    Directory of Open Access Journals (Sweden)

    Carole F S Koning-Boucoiran

    2015-04-01

    Full Text Available In order to develop a versatile and large SNP array for rose, we set out to mine ESTs from diverse sets of rose germplasm. For this RNA-Seq libraries containing about 700 million reads were generated from tetraploid cut and garden roses using Illumina paired-end sequencing, and from diploid Rosa multiflora using 454 sequencing. Separate de novo assemblies were performed in order to identify single nucleotide polymorphisms (SNPs within and between rose varieties. SNPs among tetraploid roses were selected for constructing a genotyping array that can be employed for genetic mapping and marker-trait association discovery in breeding programs based on tetraploid germplasm, both from cut roses and from garden roses. In total 68,893 SNPs were included on the WagRhSNP Axiom array.Next, an orthology-guided assembly was performed for the construction of a non-redundant rose transcriptome database. A total of 21,740 transcripts had significant hits with orthologous genes in the strawberry (Fragaria vesca L. genome. Of these 13,390 appeared to contain the full-length coding regions. This newly established transcriptome resource adds considerably to the currently available sequence resources for the Rosaceae family in general and the genus Rosa in particular.

  8. Genome-Wide Analysis of Simple Sequence Repeats in Bitter Gourd (Momordica charantia

    Directory of Open Access Journals (Sweden)

    Junjie Cui

    2017-06-01

    Full Text Available Bitter gourd (Momordica charantia is widely cultivated as a vegetable and medicinal herb in many Asian and African countries. After the sequencing of the cucumber (Cucumis sativus, watermelon (Citrullus lanatus, and melon (Cucumis melo genomes, bitter gourd became the fourth cucurbit species whose whole genome was sequenced. However, a comprehensive analysis of simple sequence repeats (SSRs in bitter gourd, including a comparison with the three aforementioned cucurbit species has not yet been published. Here, we identified a total of 188,091 and 167,160 SSR motifs in the genomes of the bitter gourd lines ‘Dali-11’ and ‘OHB3-1,’ respectively. Subsequently, the SSR content, motif lengths, and classified motif types were characterized for the bitter gourd genomes and compared among all the cucurbit genomes. Lastly, a large set of 138,727 unique in silico SSR primer pairs were designed for bitter gourd. Among these, 71 primers were selected, all of which successfully amplified SSRs from the two bitter gourd lines ‘Dali-11’ and ‘K44’. To further examine the utilization of unique SSR primers, 21 SSR markers were used to genotype a collection of 211 bitter gourd lines from all over the world. A model-based clustering method and phylogenetic analysis indicated a clear separation among the geographic groups. The genomic SSR markers developed in this study have considerable potential value in advancing bitter gourd research.

  9. Structural and sequence diversity of the transposon Galileo in the Drosophila willistoni genome.

    Science.gov (United States)

    Gonçalves, Juliana W; Valiati, Victor Hugo; Delprat, Alejandra; Valente, Vera L S; Ruiz, Alfredo

    2014-09-13

    Galileo is one of three members of the P superfamily of DNA transposons. It was originally discovered in Drosophila buzzatii, in which three segregating chromosomal inversions were shown to have been generated by ectopic recombination between Galileo copies. Subsequently, Galileo was identified in six of 12 sequenced Drosophila genomes, indicating its widespread distribution within this genus. Galileo is strikingly abundant in Drosophila willistoni, a neotropical species that is highly polymorphic for chromosomal inversions, suggesting a role for this transposon in the evolution of its genome. We carried out a detailed characterization of all Galileo copies present in the D. willistoni genome. A total of 191 copies, including 133 with two terminal inverted repeats (TIRs), were classified according to structure in six groups. The TIRs exhibited remarkable variation in their length and structure compared to the most complete copy. Three copies showed extended TIRs due to internal tandem repeats, the insertion of other transposable elements (TEs), or the incorporation of non-TIR sequences into the TIRs. Phylogenetic analyses of the transposase (TPase)-encoding and TIR segments yielded two divergent clades, which we termed Galileo subfamilies V and W. Target-site duplications (TSDs) in D. willistoni Galileo copies were 7- or 8-bp in length, with the consensus sequence GTATTAC. Analysis of the region around the TSDs revealed a target site motif (TSM) with a 15-bp palindrome that may give rise to a stem-loop secondary structure. There is a remarkable abundance and diversity of Galileo copies in the D. willistoni genome, although no functional copies were found. The TIRs in particular have a dynamic structure and extend in different ways, but their ends (required for transposition) are more conserved than the rest of the element. The D. willistoni genome harbors two Galileo subfamilies (V and W) that diverged ~9 million years ago and may have descended from an ancestral

  10. Relationship between mRNA secondary structure and sequence variability in Chloroplast genes: possible life history implications.

    Science.gov (United States)

    Krishnan, Neeraja M; Seligmann, Hervé; Rao, Basuthkar J

    2008-01-28

    Synonymous sites are freer to vary because of redundancy in genetic code. Messenger RNA secondary structure restricts this freedom, as revealed by previous findings in mitochondrial genes that mutations at third codon position nucleotides in helices are more selected against than those in loops. This motivated us to explore the constraints imposed by mRNA secondary structure on evolutionary variability at all codon positions in general, in chloroplast systems. We found that the evolutionary variability and intrinsic secondary structure stability of these sequences share an inverse relationship. Simulations of most likely single nucleotide evolution in Psilotum nudum and Nephroselmis olivacea mRNAs, indicate that helix-forming propensities of mutated mRNAs are greater than those of the natural mRNAs for short sequences and vice-versa for long sequences. Moreover, helix-forming propensity estimated by the percentage of total mRNA in helices increases gradually with mRNA length, saturating beyond 1000 nucleotides. Protection levels of functionally important sites vary across plants and proteins: r-strategists minimize mutation costs in large genes; K-strategists do the opposite. Mrna length presumably predisposes shorter mRNAs to evolve under different constraints than longer mRNAs. The positive correlation between secondary structure protection and functional importance of sites suggests that some sites might be conserved due to packing-protection constraints at the nucleic acid level in addition to protein level constraints. Consequently, nucleic acid secondary structure a priori biases mutations. The converse (exposure of conserved sites) apparently occurs in a smaller number of cases, indicating a different evolutionary adaptive strategy in these plants. The differences between the protection levels of functionally important sites for r- and K-strategists reflect their respective molecular adaptive strategies. These converge with increasing domestication levels of

  11. Using expected sequence features to improve basecalling accuracy of amplicon pyrosequencing data

    DEFF Research Database (Denmark)

    Rask, Thomas Salhøj; Petersen, Bent; Chen, Donald S.

    2016-01-01

    . The new basecalling method described here, named Multipass, implements a probabilistic framework for working with the raw flowgrams obtained by pyrosequencing. For each sequence variant Multipass calculates the likelihood and nucleotide sequence of several most likely sequences given the flowgram data....... This probabilistic approach enables integration of basecalling into a larger model where other parameters can be incorporated, such as the likelihood for observing a full-length open reading frame at the targeted region. We apply the method to 454 amplicon pyrosequencing data obtained from a malaria virulence gene...... family, where Multipass generates 20 % more error-free sequences than current state of the art methods, and provides sequence characteristics that allow generation of a set of high confidence error-free sequences. This novel method can be used to increase accuracy of existing and future amplicon...

  12. MiSeq: A Next Generation Sequencing Platform for Genomic Analysis.

    Science.gov (United States)

    Ravi, Rupesh Kanchi; Walton, Kendra; Khosroheidari, Mahdieh

    2018-01-01

    MiSeq, Illumina's integrated next generation sequencing instrument, uses reversible-terminator sequencing-by-synthesis technology to provide end-to-end sequencing solutions. The MiSeq instrument is one of the smallest benchtop sequencers that can perform onboard cluster generation, amplification, genomic DNA sequencing, and data analysis, including base calling, alignment and variant calling, in a single run. It performs both single- and paired-end runs with adjustable read lengths from 1 × 36 base pairs to 2 × 300 base pairs. A single run can produce output data of up to 15 Gb in as little as 4 h of runtime and can output up to 25 M single reads and 50 M paired-end reads. Thus, MiSeq provides an ideal platform for rapid turnaround time. MiSeq is also a cost-effective tool for various analyses focused on targeted gene sequencing (amplicon sequencing and target enrichment), metagenomics, and gene expression studies. For these reasons, MiSeq has become one of the most widely used next generation sequencing platforms. Here, we provide a protocol to prepare libraries for sequencing using the MiSeq instrument and basic guidelines for analysis of output data from the MiSeq sequencing run.

  13. CUDASW++: optimizing Smith-Waterman sequence database searches for CUDA-enabled graphics processing units

    Directory of Open Access Journals (Sweden)

    Maskell Douglas L

    2009-05-01

    Full Text Available Abstract Background The Smith-Waterman algorithm is one of the most widely used tools for searching biological sequence databases due to its high sensitivity. Unfortunately, the Smith-Waterman algorithm is computationally demanding, which is further compounded by the exponential growth of sequence databases. The recent emergence of many-core architectures, and their associated programming interfaces, provides an opportunity to accelerate sequence database searches using commonly available and inexpensive hardware. Findings Our CUDASW++ implementation (benchmarked on a single-GPU NVIDIA GeForce GTX 280 graphics card and a dual-GPU GeForce GTX 295 graphics card provides a significant performance improvement compared to other publicly available implementations, such as SWPS3, CBESW, SW-CUDA, and NCBI-BLAST. CUDASW++ supports query sequences of length up to 59K and for query sequences ranging in length from 144 to 5,478 in Swiss-Prot release 56.6, the single-GPU version achieves an average performance of 9.509 GCUPS with a lowest performance of 9.039 GCUPS and a highest performance of 9.660 GCUPS, and the dual-GPU version achieves an average performance of 14.484 GCUPS with a lowest performance of 10.660 GCUPS and a highest performance of 16.087 GCUPS. Conclusion CUDASW++ is publicly available open-source software. It provides a significant performance improvement for Smith-Waterman-based protein sequence database searches by fully exploiting the compute capability of commonly used CUDA-enabled low-cost GPUs.

  14. Development of cleaved amplified polymorphic sequence markers and a CAPS-based genetic linkage map in watermelon (Citrullus lanatus [Thunb.] Matsum. and Nakai) constructed using whole-genome re-sequencing data.

    Science.gov (United States)

    Liu, Shi; Gao, Peng; Zhu, Qianglong; Luan, Feishi; Davis, Angela R; Wang, Xiaolu

    2016-03-01

    Cleaved amplified polymorphic sequence (CAPS) markers are useful tools for detecting single nucleotide polymorphisms (SNPs). This study detected and converted SNP sites into CAPS markers based on high-throughput re-sequencing data in watermelon, for linkage map construction and quantitative trait locus (QTL) analysis. Two inbred lines, Cream of Saskatchewan (COS) and LSW-177 had been re-sequenced and analyzed by Perl self-compiled script for CAPS marker development. 88.7% and 78.5% of the assembled sequences of the two parental materials could map to the reference watermelon genome, respectively. Comparative assembled genome data analysis provided 225,693 and 19,268 SNPs and indels between the two materials. 532 pairs of CAPS markers were designed with 16 restriction enzymes, among which 271 pairs of primers gave distinct bands of the expected length and polymorphic bands, via PCR and enzyme digestion, with a polymorphic rate of 50.94%. Using the new CAPS markers, an initial CAPS-based genetic linkage map was constructed with the F2 population, spanning 1836.51 cM with 11 linkage groups and 301 markers. 12 QTLs were detected related to fruit flesh color, length, width, shape index, and brix content. These newly CAPS markers will be a valuable resource for breeding programs and genetic studies of watermelon.

  15. The isolation and localization of arbitrary restriction fragment length polymorphisms in Southern African populations

    International Nuclear Information System (INIS)

    Conn, V.

    1987-01-01

    The main aim of this study was to contribute to the mapping of the human genome by searching for and characterizing a number of RFLPs (restriction fragment length polymorphisms) in the human genome. The more specific aims of this study were: 1. To isolate single-copy human DNA sequences from a human genomic library. 2. To use these single-copy sequences as DNA probes to search for polymorphic variation among Caucasoid individuals. 3. To show by means of family studies that the RFLPs were inherited in a co-dominant Mendelian fashion. 4. To determine the population frequencies of these RFLPs in Southern African Populations, namely the Bantu-speaking Negroids and the San. 5. To assign these RFLP-detecting DNA sequences to human chromosomes using somatic cell hybrid lines. In this study DNA was labelled with Phosphorus 32

  16. Permutations avoiding an increasing number of length-increasing forbidden subsequences

    Directory of Open Access Journals (Sweden)

    Elena Barcucci

    2000-12-01

    Full Text Available A permutation π is said to be τ-avoiding if it does not contain any subsequence having all the same pairwise comparisons as τ. This paper concerns the characterization and enumeration of permutations which avoid a set F j of subsequences increasing both in number and in length at the same time. Let F j be the set of subsequences of the form σ(j+1(j+2, σ being any permutation on {1,...,j}. For j=1 the only subsequence in F 1 is 123 and the 123-avoiding permutations are enumerated by the Catalan numbers; for j=2 the subsequences in F 2 are 1234 2134 and the (1234,2134 avoiding permutations are enumerated by the Schröder numbers; for each other value of j greater than 2 the subsequences in F j are j! and their length is (j+2 the permutations avoiding these j! subsequences are enumerated by a number sequence {a n } such that C n ≤ a n ≤ n!, C n being the n th Catalan number. For each j we determine the generating function of permutations avoiding the subsequences in F j according to the length, to the number of left minima and of non-inversions.

  17. Asymptotic sequences over ideals and projectively equivalent ideals with respect to modules

    International Nuclear Information System (INIS)

    Naghipour, R.; Sedghi, M.

    2007-09-01

    Let R be a commutative Noetherian ring, and let N be a non-zero finitely generated R-module. The purpose of this paper is to show that if I and J are projectively equivalent ideals w.r.t. N, then a sequence x := x 1 , . . . , x n of elements of R is an asymptotic sequence over I w.r.t. N if and only if it is an asymptotic sequence over J w.r.t. N. Also, it is shown that if R is local, then the lengths of all maximal asymptotic sequences over an ideal I w.r.t. N are the same. As a consequence we derive a generalization of Rees' theorem. (author)

  18. Terminal Restriction Fragment Length Polymorphism Analysis of Soil Bacterial Communities under Different Vegetation Types in Subtropical Area.

    Directory of Open Access Journals (Sweden)

    Zeyan Wu

    Full Text Available Soil microbes are active players in energy flow and material exchange of the forest ecosystems, but the research on the relationship between the microbial diversity and the vegetation types is less conducted, especially in the subtropical area of China. In this present study, the rhizosphere soils of evergreen broad-leaf forest (EBF, coniferous forest (CF, subalpine dwarf forest (SDF and alpine meadow (AM were chosen as test sites. Terminal-restriction fragment length polymorphisms (T-RFLP analysis was used to detect the composition and diversity of soil bacterial communities under different vegetation types in the National Natural Reserve of Wuyi Mountains. Our results revealed distinct differences in soil microbial composition under different vegetation types. Total 73 microbes were identified in soil samples of the four vegetation types, and 56, 49, 46 and 36 clones were obtained from the soils of EBF, CF, SDF and AM, respectively, and subsequently sequenced. The Actinobacteria, Fusobacterium, Bacteroidetes and Proteobacteria were the most predominant in all soil samples. The order of Shannon-Wiener index (H of all soil samples was in the order of EBF>CF>SDF>AM, whereas bacterial species richness as estimated by four restriction enzymes indicated no significant difference. Principal component analysis (PCA revealed that the soil bacterial communities' structures of EBF, CF, SDF and AM were clearly separated along the first and second principal components, which explained 62.17% and 31.58% of the total variance, respectively. The soil physical-chemical properties such as total organic carbon (TOC, total nitrogen (TN, total phosphorus (TP and total potassium (TK were positively correlated with the diversity of bacterial communities.

  19. Comparative Study between Standard and Totally Tubeless Percutaneous Nephrolithotomy.

    Science.gov (United States)

    Yun, Sung Il; Lee, Yoon Hyung; Kim, Jae Soo; Cho, Sung Ryong; Kim, Bum Soo; Kwon, Joon Beom

    2012-11-01

    Several recent studies have reported the benefits of tubeless percutaneous nephrolithotomy (PNL). Postoperatively, tubeless PNL patients have an indwelling ureteral stent placed, which is often associated with stent-related morbidity. We have performed totally tubeless (tubeless and stentless) PNL in which no nephrostomy tube or ureteral stent is placed postoperatively. We evaluated the safety, effectiveness, and feasibility of totally tubeless PNL. From March 2008 to February 2012, 57 selected patients underwent standard or totally tubeless PNL. Neither a nephrostomy tube nor a ureteral stent was placed in the totally tubeless PNL group. We compared patient and stone characteristics, operation time, length of hospitalization, analgesia requirements, stone-free rate, blood loss, change in creatinine, and perioperative complications between the standard and totally tubeless PNL groups. There were no significant differences in preoperative patient characteristics, postoperative complications, or the stone-free rate between the two groups, but the totally tubeless PNL group showed a shorter hospitalization and a lesser analgesic requirement compared with the standard PNL group. Blood loss and change in creatinine were not significantly different between the two groups. Totally tubeless PNL appears to be a safe and effective alternative for the management of renal stone patients and is associated with a decrease in length of hospital stay.

  20. Impact of a hospitalist system on length of stay and cost for children with common conditions.

    Science.gov (United States)

    Srivastava, Rajendu; Landrigan, Christopher P; Ross-Degnan, Dennis; Soumerai, Stephen B; Homer, Charles J; Goldmann, Donald A; Muret-Wagstaff, Sharon

    2007-08-01

    This study examined mechanisms of efficiency in a managed care hospitalist system on length of stay and total costs for common pediatric conditions. We conducted a retrospective cohort study (October 1993 to July 1998) of patients in a not-for-profit staff model (HMO 1) and a non-staff-model (HMO 2) managed care organization at a freestanding children's hospital. HMO 1 introduced a hospitalist system for patients in October 1996. Patients were included if they had 1 of 3 common diagnoses: asthma, dehydration, or viral illness. Linear regression models examining length-of-stay-specific costs for prehospitalist and posthospitalist systems were built. Distribution of length of stay for each diagnosis before and after the system change in both study groups was calculated. Interrupted time series analysis tested whether changes in the trends of length of stay and total costs occurred after implementation of the hospitalist system by HMO1 (HMO 2 as comparison group) for all 3 diagnoses combined. A total of 1970 patients with 1 of the 3 study conditions were cared for in HMO 1, and 1001 in HMO 2. After the hospitalist system was introduced in HMO 1, length of stay was reduced by 0.23 days (13%) for asthma and 0.19 days (11%) for dehydration; there was no difference for patients with viral illness. The largest relative reduction in length of stay occurred in patients with a shorter length of stay whose hospitalizations were reduced from 2 days to 1 day. This shift resulted in an average cost-per-case reduction of $105.51 (9.3%) for patients with asthma and $86.22 (7.8%) for patients with dehydration. During the same period, length of stay and total cost rose in HMO 2. Introduction of a hospitalist system in one health maintenance organization resulted in earlier discharges and reduced costs for children with asthma and dehydration compared with another one, with the largest reductions occurring in reducing some 2-day hospitalizations to 1 day. These findings suggest that