WorldWideScience

Sample records for full-length coding sequences

  1. An integrated PCR colony hybridization approach to screen cDNA libraries for full-length coding sequences.

    Science.gov (United States)

    Pollier, Jacob; González-Guzmán, Miguel; Ardiles-Diaz, Wilson; Geelen, Danny; Goossens, Alain

    2011-01-01

    cDNA-Amplified Fragment Length Polymorphism (cDNA-AFLP) is a commonly used technique for genome-wide expression analysis that does not require prior sequence knowledge. Typically, quantitative expression data and sequence information are obtained for a large number of differentially expressed gene tags. However, most of the gene tags do not correspond to full-length (FL) coding sequences, which is a prerequisite for subsequent functional analysis. A medium-throughput screening strategy, based on integration of polymerase chain reaction (PCR) and colony hybridization, was developed that allows in parallel screening of a cDNA library for FL clones corresponding to incomplete cDNAs. The method was applied to screen for the FL open reading frames of a selection of 163 cDNA-AFLP tags from three different medicinal plants, leading to the identification of 109 (67%) FL clones. Furthermore, the protocol allows for the use of multiple probes in a single hybridization event, thus significantly increasing the throughput when screening for rare transcripts. The presented strategy offers an efficient method for the conversion of incomplete expressed sequence tags (ESTs), such as cDNA-AFLP tags, to FL-coding sequences.

  2. Cost-effective sequencing of full-length cDNA clones powered by a de novo-reference hybrid assembly.

    Science.gov (United States)

    Kuroshu, Reginaldo M; Watanabe, Junichi; Sugano, Sumio; Morishita, Shinichi; Suzuki, Yutaka; Kasahara, Masahiro

    2010-05-07

    Sequencing full-length cDNA clones is important to determine gene structures including alternative splice forms, and provides valuable resources for experimental analyses to reveal the biological functions of coded proteins. However, previous approaches for sequencing cDNA clones were expensive or time-consuming, and therefore, a fast and efficient sequencing approach was demanded. We developed a program, MuSICA 2, that assembles millions of short (36-nucleotide) reads collected from a single flow cell lane of Illumina Genome Analyzer to shotgun-sequence approximately 800 human full-length cDNA clones. MuSICA 2 performs a hybrid assembly in which an external de novo assembler is run first and the result is then improved by reference alignment of shotgun reads. We compared the MuSICA 2 assembly with 200 pooled full-length cDNA clones finished independently by the conventional primer-walking using Sanger sequencers. The exon-intron structure of the coding sequence was correct for more than 95% of the clones with coding sequence annotation when we excluded cDNA clones insufficiently represented in the shotgun library due to PCR failure (42 out of 200 clones excluded), and the nucleotide-level accuracy of coding sequences of those correct clones was over 99.99%. We also applied MuSICA 2 to full-length cDNA clones from Toxoplasma gondii, to confirm that its ability was competent even for non-human species. The entire sequencing and shotgun assembly takes less than 1 week and the consumables cost only approximately US$3 per clone, demonstrating a significant advantage over previous approaches.

  3. Full-length cDNA sequences from Rhesus monkey placenta tissue: analysis and utility for comparative mapping

    Directory of Open Access Journals (Sweden)

    Lee Sang-Rae

    2010-07-01

    Full Text Available Abstract Background Rhesus monkeys (Macaca mulatta are widely-used as experimental animals in biomedical research and are closely related to other laboratory macaques, such as cynomolgus monkeys (Macaca fascicularis, and to humans, sharing a last common ancestor from about 25 million years ago. Although rhesus monkeys have been studied extensively under field and laboratory conditions, research has been limited by the lack of genetic resources. The present study generated placenta full-length cDNA libraries, characterized the resulting expressed sequence tags, and described their utility for comparative mapping with human RefSeq mRNA transcripts. Results From rhesus monkey placenta full-length cDNA libraries, 2000 full-length cDNA sequences were determined and 1835 rhesus placenta cDNA sequences longer than 100 bp were collected. These sequences were annotated based on homology to human genes. Homology search against human RefSeq mRNAs revealed that our collection included the sequences of 1462 putative rhesus monkey genes. Moreover, we identified 207 genes containing exon alterations in the coding region and the untranslated region of rhesus monkey transcripts, despite the highly conserved structure of the coding regions. Approximately 10% (187 of all full-length cDNA sequences did not represent any public human RefSeq mRNAs. Intriguingly, two rhesus monkey specific exons derived from the transposable elements of AluYRa2 (SINE family and MER11B (LTR family were also identified. Conclusion The 1835 rhesus monkey placenta full-length cDNA sequences described here could expand genomic resources and information of rhesus monkeys. This increased genomic information will greatly contribute to the development of evolutionary biology and biomedical research.

  4. An RNA-Seq strategy to detect the complete coding and non-coding transcriptome including full-length imprinted macro ncRNAs.

    Directory of Open Access Journals (Sweden)

    Ru Huang

    Full Text Available Imprinted macro non-protein-coding (nc RNAs are cis-repressor transcripts that silence multiple genes in at least three imprinted gene clusters in the mouse genome. Similar macro or long ncRNAs are abundant in the mammalian genome. Here we present the full coding and non-coding transcriptome of two mouse tissues: differentiated ES cells and fetal head using an optimized RNA-Seq strategy. The data produced is highly reproducible in different sequencing locations and is able to detect the full length of imprinted macro ncRNAs such as Airn and Kcnq1ot1, whose length ranges between 80-118 kb. Transcripts show a more uniform read coverage when RNA is fragmented with RNA hydrolysis compared with cDNA fragmentation by shearing. Irrespective of the fragmentation method, all coding and non-coding transcripts longer than 8 kb show a gradual loss of sequencing tags towards the 3' end. Comparisons to published RNA-Seq datasets show that the strategy presented here is more efficient in detecting known functional imprinted macro ncRNAs and also indicate that standardization of RNA preparation protocols would increase the comparability of the transcriptome between different RNA-Seq datasets.

  5. Characterization of full-length sequenced cDNA inserts (FLIcs from Atlantic salmon (Salmo salar

    Directory of Open Access Journals (Sweden)

    Lunner Sigbjørn

    2009-10-01

    Full Text Available Abstract Background Sequencing of the Atlantic salmon genome is now being planned by an international research consortium. Full-length sequenced inserts from cDNAs (FLIcs are an important tool for correct annotation and clustering of the genomic sequence in any species. The large amount of highly similar duplicate sequences caused by the relatively recent genome duplication in the salmonid ancestor represents a particular challenge for the genome project. FLIcs will therefore be an extremely useful resource for the Atlantic salmon sequencing project. In addition to be helpful in order to distinguish between duplicate genome regions and in determining correct gene structures, FLIcs are an important resource for functional genomic studies and for investigation of regulatory elements controlling gene expression. In contrast to the large number of ESTs available, including the ESTs from 23 developmental and tissue specific cDNA libraries contributed by the Salmon Genome Project (SGP, the number of sequences where the full-length of the cDNA insert has been determined has been small. Results High quality full-length insert sequences from 560 pre-smolt white muscle tissue specific cDNAs were generated, accession numbers [GenBank: BT043497 - BT044056]. Five hundred and ten (91% of the transcripts were annotated using Gene Ontology (GO terms and 440 of the FLIcs are likely to contain a complete coding sequence (cCDS. The sequence information was used to identify putative paralogs, characterize salmon Kozak motifs, polyadenylation signal variation and to identify motifs likely to be involved in the regulation of particular genes. Finally, conserved 7-mers in the 3'UTRs were identified, of which some were identical to miRNA target sequences. Conclusion This paper describes the first Atlantic salmon FLIcs from a tissue and developmental stage specific cDNA library. We have demonstrated that many FLIcs contained a complete coding sequence (cCDS. This

  6. Characterization of full-length sequenced cDNA inserts (FLIcs) from Atlantic salmon (Salmo salar)

    Science.gov (United States)

    Andreassen, Rune; Lunner, Sigbjørn; Høyheim, Bjørn

    2009-01-01

    Background Sequencing of the Atlantic salmon genome is now being planned by an international research consortium. Full-length sequenced inserts from cDNAs (FLIcs) are an important tool for correct annotation and clustering of the genomic sequence in any species. The large amount of highly similar duplicate sequences caused by the relatively recent genome duplication in the salmonid ancestor represents a particular challenge for the genome project. FLIcs will therefore be an extremely useful resource for the Atlantic salmon sequencing project. In addition to be helpful in order to distinguish between duplicate genome regions and in determining correct gene structures, FLIcs are an important resource for functional genomic studies and for investigation of regulatory elements controlling gene expression. In contrast to the large number of ESTs available, including the ESTs from 23 developmental and tissue specific cDNA libraries contributed by the Salmon Genome Project (SGP), the number of sequences where the full-length of the cDNA insert has been determined has been small. Results High quality full-length insert sequences from 560 pre-smolt white muscle tissue specific cDNAs were generated, accession numbers [GenBank: BT043497 - BT044056]. Five hundred and ten (91%) of the transcripts were annotated using Gene Ontology (GO) terms and 440 of the FLIcs are likely to contain a complete coding sequence (cCDS). The sequence information was used to identify putative paralogs, characterize salmon Kozak motifs, polyadenylation signal variation and to identify motifs likely to be involved in the regulation of particular genes. Finally, conserved 7-mers in the 3'UTRs were identified, of which some were identical to miRNA target sequences. Conclusion This paper describes the first Atlantic salmon FLIcs from a tissue and developmental stage specific cDNA library. We have demonstrated that many FLIcs contained a complete coding sequence (cCDS). This suggests that the remaining c

  7. Sequencing, mapping, and analysis of 27,455 maize full-length cDNAs.

    Directory of Open Access Journals (Sweden)

    Carol Soderlund

    2009-11-01

    Full Text Available Full-length cDNA (FLcDNA sequencing establishes the precise primary structure of individual gene transcripts. From two libraries representing 27 B73 tissues and abiotic stress treatments, 27,455 high-quality FLcDNAs were sequenced. The average transcript length was 1.44 kb including 218 bases and 321 bases of 5' and 3' UTR, respectively, with 8.6% of the FLcDNAs encoding predicted proteins of fewer than 100 amino acids. Approximately 94% of the FLcDNAs were stringently mapped to the maize genome. Although nearly two-thirds of this genome is composed of transposable elements (TEs, only 5.6% of the FLcDNAs contained TE sequences in coding or UTR regions. Approximately 7.2% of the FLcDNAs are putative transcription factors, suggesting that rare transcripts are well-enriched in our FLcDNA set. Protein similarity searching identified 1,737 maize transcripts not present in rice, sorghum, Arabidopsis, or poplar annotated genes. A strict FLcDNA assembly generated 24,467 non-redundant sequences, of which 88% have non-maize protein matches. The FLcDNAs were also assembled with 41,759 FLcDNAs in GenBank from other projects, where semi-strict parameters were used to identify 13,368 potentially unique non-redundant sequences from this project. The libraries, ESTs, and FLcDNA sequences produced from this project are publicly available. The annotated EST and FLcDNA assemblies are available through the maize FLcDNA web resource (www.maizecdna.org.

  8. Technology development for gene discovery and full-length sequencing

    Energy Technology Data Exchange (ETDEWEB)

    Marcelo Bento Soares

    2004-07-19

    In previous years, with support from the U.S. Department of Energy, we developed methods for construction of normalized and subtracted cDNA libraries, and constructed hundreds of high-quality libraries for production of Expressed Sequence Tags (ESTs). Our clones were made widely available to the scientific community through the IMAGE Consortium, and millions of ESTs were produced from our libraries either by collaborators or by our own sequencing laboratory at the University of Iowa. During this grant period, we focused on (1) the development of a method for preferential cloning of tissue-specific and/or rare transcripts, (2) its utilization to expedite EST-based gene discovery for the NIH Mouse Brain Molecular Anatomy Project, (3) further development and optimization of a method for construction of full-length-enriched cDNA libraries, and (4) modification of a plasmid vector to maximize efficiency of full-length cDNA sequencing by the transposon-mediated approach. It is noteworthy that the technology developed for preferential cloning of rare mRNAs enabled identification of over 2,000 mouse transcripts differentially expressed in the hippocampus. In addition, the method that we optimized for construction of full-length-enriched cDNA libraries was successfully utilized for the production of approximately fifty libraries from the developing mouse nervous system, from which over 2,500 full-ORF-containing cDNAs have been identified and accurately sequenced in their entirety either by our group or by the NIH-Mammalian Gene Collection Program Sequencing Team.

  9. Comparisons between Arabidopsis thaliana and Drosophila melanogaster in relation to Coding and Noncoding Sequence Length and Gene Expression

    Directory of Open Access Journals (Sweden)

    Rachel Caldwell

    2015-01-01

    Full Text Available There is a continuing interest in the analysis of gene architecture and gene expression to determine the relationship that may exist. Advances in high-quality sequencing technologies and large-scale resource datasets have increased the understanding of relationships and cross-referencing of expression data to the large genome data. Although a negative correlation between expression level and gene (especially transcript length has been generally accepted, there have been some conflicting results arising from the literature concerning the impacts of different regions of genes, and the underlying reason is not well understood. The research aims to apply quantile regression techniques for statistical analysis of coding and noncoding sequence length and gene expression data in the plant, Arabidopsis thaliana, and fruit fly, Drosophila melanogaster, to determine if a relationship exists and if there is any variation or similarities between these species. The quantile regression analysis found that the coding sequence length and gene expression correlations varied, and similarities emerged for the noncoding sequence length (5′ and 3′ UTRs between animal and plant species. In conclusion, the information described in this study provides the basis for further exploration into gene regulation with regard to coding and noncoding sequence length.

  10. Analysis of expressed sequence tags generated from full-length enriched cDNA libraries of melon

    Directory of Open Access Journals (Sweden)

    Bendahmane Abdelhafid

    2011-05-01

    longer than many other dicot plants. Codon usages of melon full-length transcripts were largely similar to those of Arabidopsis coding sequences. Conclusion The collection of melon ESTs generated from full-length enriched and standard cDNA libraries is expected to play significant roles in annotating the melon genome. The ESTs and associated analysis results will be useful resources for gene discovery, functional analysis, marker-assisted breeding of melon and closely related species, comparative genomic studies and for gaining insights into gene expression patterns.

  11. Sequencing and analysis of full-length cDNAs, 5'-ESTs and 3'-ESTs from a cartilaginous fish, the elephant shark (Callorhinchus milii).

    KAUST Repository

    Brenner, Sydney

    2012-10-08

    Cartilaginous fishes are the most ancient group of living jawed vertebrates (gnathostomes) and are, therefore, an important reference group for understanding the evolution of vertebrates. The elephant shark (Callorhinchus milii), a holocephalan cartilaginous fish, has been identified as a model cartilaginous fish genome because of its compact genome (∼910 Mb) and a genome project has been initiated to obtain its whole genome sequence. In this study, we have generated and sequenced full-length enriched cDNA libraries of the elephant shark using the \\'oligo-capping\\' method and Sanger sequencing. A total of 6,778 full-length protein-coding cDNA and 10,701 full-length noncoding cDNA were sequenced from six tissues (gills, intestine, kidney, liver, spleen, and testis) of the elephant shark. Analysis of their polyadenylation signals showed that polyadenylation usage in elephant shark is similar to that in mammals. Furthermore, both coding and noncoding transcripts of the elephant shark use the same proportion of canonical polyadenylation sites. Besides BLASTX searches, protein-coding transcripts were annotated by Gene Ontology, InterPro domain, and KEGG pathway analyses. By comparing elephant shark genes to bony vertebrate genes, we identified several ancient genes present in elephant shark but differentially lost in tetrapods or teleosts. Only ∼6% of elephant shark noncoding cDNA showed similarity to known noncoding RNAs (ncRNAs). The rest are either highly divergent ncRNAs or novel ncRNAs. In addition to full-length transcripts, 30,375 5\\'-ESTs and 41,317 3\\'-ESTs were sequenced and annotated. The clones and transcripts generated in this study are valuable resources for annotating transcription start sites, exon-intron boundaries, and UTRs of genes in the elephant shark genome, and for the functional characterization of protein sequences. These resources will also be useful for annotating genes in other cartilaginous fishes whose genomes have been targeted for

  12. Sequencing and analysis of full-length cDNAs, 5'-ESTs and 3'-ESTs from a cartilaginous fish, the elephant shark (Callorhinchus milii).

    KAUST Repository

    Brenner, Sydney; Kodzius, Rimantas; Tan, Yue Ying; Tay, Alice; Tay, Boon-Hui; Venkatesh, Byrappa

    2012-01-01

    Cartilaginous fishes are the most ancient group of living jawed vertebrates (gnathostomes) and are, therefore, an important reference group for understanding the evolution of vertebrates. The elephant shark (Callorhinchus milii), a holocephalan cartilaginous fish, has been identified as a model cartilaginous fish genome because of its compact genome (∼910 Mb) and a genome project has been initiated to obtain its whole genome sequence. In this study, we have generated and sequenced full-length enriched cDNA libraries of the elephant shark using the 'oligo-capping' method and Sanger sequencing. A total of 6,778 full-length protein-coding cDNA and 10,701 full-length noncoding cDNA were sequenced from six tissues (gills, intestine, kidney, liver, spleen, and testis) of the elephant shark. Analysis of their polyadenylation signals showed that polyadenylation usage in elephant shark is similar to that in mammals. Furthermore, both coding and noncoding transcripts of the elephant shark use the same proportion of canonical polyadenylation sites. Besides BLASTX searches, protein-coding transcripts were annotated by Gene Ontology, InterPro domain, and KEGG pathway analyses. By comparing elephant shark genes to bony vertebrate genes, we identified several ancient genes present in elephant shark but differentially lost in tetrapods or teleosts. Only ∼6% of elephant shark noncoding cDNA showed similarity to known noncoding RNAs (ncRNAs). The rest are either highly divergent ncRNAs or novel ncRNAs. In addition to full-length transcripts, 30,375 5'-ESTs and 41,317 3'-ESTs were sequenced and annotated. The clones and transcripts generated in this study are valuable resources for annotating transcription start sites, exon-intron boundaries, and UTRs of genes in the elephant shark genome, and for the functional characterization of protein sequences. These resources will also be useful for annotating genes in other cartilaginous fishes whose genomes have been targeted for whole

  13. The function analysis of full-length cDNA sequence from IRM-2 mouse cDNA library

    International Nuclear Information System (INIS)

    Wang Qin; Liu Xiaoqiu; Xu Chang; Du Liqing; Sun Zhijuan; Wang Yan; Liu Qiang; Song Li; Li Jin; Fan Feiyue

    2013-01-01

    Objective: To identify the function of full-length cDNA sequence from IRM-2 mouse cDNA library. Methods: Full-length cDNA products were amplified by PCR from IRM-2 mouse cDNA library according to twenty-one pieces of expressed sequence tag. The expression of full-length cDNAs were detected after mouse embryonic fibroblasts were exposed to 6.5 Gy γ-ray radiation. And the effect on the growth of radiosensitivity cells AT5B1VA transfected with full-length cDNAs was investigated. Results: The expression of No.4, 5 and 2 full-length cDNAs from IRM-2 mouse were higher than that of parental ICR and 615 mouse after mouse embryonic fibroblasts irradiated with γ-ray radiation. And the survival rate of AT5B1VA cells transfected with No.4, 5 and 2 full-length cDNAs was high. Conclusion: No.4, 5 and 2 full-length cDNAs of IRM-2 mouse are of high radioresistance. (authors)

  14. Full-length sequencing and identification of novel polymorphisms in ...

    Indian Academy of Sciences (India)

    The aim of this work was to sequence the entirecoding region of ACACA gene in Valle del Belice sheep breed to identify polymorphic sites. A total of 51 coding exons of ACACA gene were sequenced in 32 individuals of Valle del Belice sheep breed. Sequencing analysis and alignment of obtained sequences showed the ...

  15. Blind sequence-length estimation of low-SNR cyclostationary sequences

    CSIR Research Space (South Africa)

    Vlok, JD

    2014-06-01

    Full Text Available Several existing direct-sequence spread spectrum (DSSS) detection and estimation algorithms assume prior knowledge of the symbol period or sequence length, although very few sequence-length estimation techniques are available in the literature...

  16. Fast comparison of IS radar code sequences for lag profile inversion

    Directory of Open Access Journals (Sweden)

    M. S. Lehtinen

    2008-08-01

    Full Text Available A fast method for theoretically comparing the posteriori variances produced by different phase code sequences in incoherent scatter radar (ISR experiments is introduced. Alternating codes of types 1 and 2 are known to be optimal for selected range resolutions, but the code sets are inconveniently long for many purposes like ground clutter estimation and in cases where coherent echoes from lower ionospheric layers are to be analyzed in addition to standard F-layer spectra.

    The method is used in practice for searching binary code quads that have estimation accuracy almost equal to that of much longer alternating code sets. Though the code sequences can consist of as few as four different transmission envelopes, the lag profile estimation variances are near to the theoretical minimum. Thus the short code sequence is equally good as a full cycle of alternating codes with the same pulse length and bit length. The short code groups cannot be directly decoded, but the decoding is done in connection with more computationally expensive lag profile inversion in data analysis.

    The actual code searches as well as the analysis and real data results from the found short code searches are explained in other papers sent to the same issue of this journal. We also discuss interesting subtle differences found between the different alternating codes by this method. We assume that thermal noise dominates the incoherent scatter signal.

  17. Evidence for a Complex Mosaic Genome Pattern in a Full-length Hepatitis C Virus Sequence

    Directory of Open Access Journals (Sweden)

    R.S. Ross

    2008-01-01

    Full Text Available The genome of the hepatitis C virus (HCV exhibits a high genetic variability. This remarkable heterogeneity is mainly attributed to the gradual accumulation of mutational changes, whereas the contribution of recombination events to the evolution of HCV remains controversial so far. While performing phylogenetic analyses including a large number of sequences deposited in the GenBank, we encountered a full-length HCV sequence (AY651061 that showed evidence for inter-subtype recombination and was, therefore, subjected to a detailed analysis of its molecular structure. The obtained results indicated that AY651061 does not represent a “simple” HCV 1c isolate, but a complex 1a/1c mosaic genome, showing five putative breakpoints in the core to NS3 regions. To our knowledge, this is the first report on a mosaic HCV full- length sequence with multiple breakpoints. The molecular structure of AY651061 is reminiscent of complex homologous recombinant variants occurring among other members of the flaviviridae family, e.g. GB virus C, dengue virus, and Japanese encephalitis virus. Our finding of a mosaic HCV sequence may have important implications for many fields of current HCV research which merit careful consideration.

  18. Full-length mRNA sequencing uncovers a widespread coupling between transcription initiation and mRNA processing.

    Science.gov (United States)

    Anvar, Seyed Yahya; Allard, Guy; Tseng, Elizabeth; Sheynkman, Gloria M; de Klerk, Eleonora; Vermaat, Martijn; Yin, Raymund H; Johansson, Hans E; Ariyurek, Yavuz; den Dunnen, Johan T; Turner, Stephen W; 't Hoen, Peter A C

    2018-03-29

    The multifaceted control of gene expression requires tight coordination of regulatory mechanisms at transcriptional and post-transcriptional level. Here, we studied the interdependence of transcription initiation, splicing and polyadenylation events on single mRNA molecules by full-length mRNA sequencing. In MCF-7 breast cancer cells, we find 2700 genes with interdependent alternative transcription initiation, splicing and polyadenylation events, both in proximal and distant parts of mRNA molecules, including examples of coupling between transcription start sites and polyadenylation sites. The analysis of three human primary tissues (brain, heart and liver) reveals similar patterns of interdependency between transcription initiation and mRNA processing events. We predict thousands of novel open reading frames from full-length mRNA sequences and obtained evidence for their translation by shotgun proteomics. The mapping database rescues 358 previously unassigned peptides and improves the assignment of others. By recognizing sample-specific amino-acid changes and novel splicing patterns, full-length mRNA sequencing improves proteogenomics analysis of MCF-7 cells. Our findings demonstrate that our understanding of transcriptome complexity is far from complete and provides a basis to reveal largely unresolved mechanisms that coordinate transcription initiation and mRNA processing.

  19. Thousands of primer-free, high-quality, full-length SSU rRNA sequences from all domains of life

    DEFF Research Database (Denmark)

    Karst, Soeren M; Dueholm, Morten S; McIlroy, Simon J

    2016-01-01

    Ribosomal RNA (rRNA) genes are the consensus marker for determination of microbial diversity on the planet, invaluable in studies of evolution and, for the past decade, high-throughput sequencing of variable regions of ribosomal RNA genes has become the backbone of most microbial ecology studies...... (SSU) rRNA genes and synthetic long read sequencing by molecular tagging, to generate primer-free, full-length SSU rRNA gene sequences from all domains of life, with a median raw error rate of 0.17%. We generated thousands of full-length SSU rRNA sequences from five well-studied ecosystems (soil, human...... gut, fresh water, anaerobic digestion, and activated sludge) and obtained sequences covering all domains of life and the majority of all described phyla. Interestingly, 30% of all bacterial operational taxonomic units were novel, compared to the SILVA database (less than 97% similarity...

  20. Construction of a Full-Length Enriched cDNA Library and Preliminary Analysis of Expressed Sequence Tags from Bengal Tiger Panthera tigris tigris

    Directory of Open Access Journals (Sweden)

    Changqing Liu

    2013-05-01

    Full Text Available In this study, a full-length enriched cDNA library was successfully constructed from Bengal tiger, Panthera tigris tigris, the most well-known wild Animal. Total RNA was extracted from cultured Bengal tiger fibroblasts in vitro. The titers of primary and amplified libraries were 1.28 × 106 pfu/mL and 1.56 × 109 pfu/mL respectively. The percentage of recombinants from unamplified library was 90.2% and average length of exogenous inserts was 0.98 kb. A total of 212 individual ESTs with sizes ranging from 356 to 1108 bps were then analyzed. The BLASTX score revealed that 48.1% of the sequences were classified as a strong match, 45.3% as nominal and 6.6% as a weak match. Among the ESTs with known putative function, 26.4% ESTs were found to be related to all kinds of metabolisms, 19.3% ESTs to information storage and processing, 11.3% ESTs to posttranslational modification, protein turnover, chaperones, 11.3% ESTs to transport, 9.9% ESTs to signal transducer/cell communication, 9.0% ESTs to structure protein, 3.8% ESTs to cell cycle, and only 6.6% ESTs classified as novel genes. By EST sequencing, a full-length gene coding ferritin was identified and characterized. The recombinant plasmid pET32a-TAT-Ferritin was constructed, coded for the TAT-Ferritin fusion protein with two 6× His-tags in N and C-terminal. After BCA assay, the concentration of soluble Trx-TAT-Ferritin recombinant protein was 2.32 ± 0.12 mg/mL. These results demonstrated that the reliability and representativeness of the cDNA library attained to the requirements of a standard cDNA library. This library provided a useful platform for the functional genome and transcriptome research of Bengal tigers.

  1. Isolation and characterization of full-length cDNA clones coding for cholinesterase from fetal human tissues

    International Nuclear Information System (INIS)

    Prody, C.A.; Zevin-Sonkin, D.; Gnatt, A.; Goldberg, O.; Soreq, H.

    1987-01-01

    To study the primary structure and regulation of human cholinesterases, oligodeoxynucleotide probes were prepared according to a consensus peptide sequence present in the active site of both human serum pseudocholinesterase and Torpedo electric organ true acetylcholinesterase. Using these probes, the authors isolated several cDNA clones from λgt10 libraries of fetal brain and liver origins. These include 2.4-kilobase cDNA clones that code for a polypeptide containing a putative signal peptide and the N-terminal, active site, and C-terminal peptides of human BtChoEase, suggesting that they code either for BtChoEase itself or for a very similar but distinct fetal form of cholinesterase. In RNA blots of poly(A) + RNA from the cholinesterase-producing fetal brain and liver, these cDNAs hybridized with a single 2.5-kilobase band. Blot hybridization to human genomic DNA revealed that these fetal BtChoEase cDNA clones hybridize with DNA fragments of the total length of 17.5 kilobases, and signal intensities indicated that these sequences are not present in many copies. Both the cDNA-encoded protein and its nucleotide sequence display striking homology to parallel sequences published for Torpedo AcChoEase. These finding demonstrate extensive homologies between the fetal BtChoEase encoded by these clones and other cholinesterases of various forms and species

  2. Generation and Analysis of Full-length cDNA Sequences from Elephant Shark (Callorhinchus milii)

    KAUST Repository

    Kodzius, Rimantas

    2009-03-17

    Cartilaginous fishes are the oldest living group of jawed vertebrates and therefore is an important group for understanding the evolution of vertebrate genomes including the human genome. Our laboratory has proposed elephant shark (C. milii) as a model cartilaginous fish genome because of its relatively small genome size (910 Mb). The whole genome of C. milii is being sequenced (first cartilaginous fish genome to be sequenced completely). To characterize the transcriptome of C. milii and to assist in annotating exon-intron boundaries, transcriptional start sites and alternatively spliced transcripts, we are generating full-length cDNA sequences from C. milii.

  3. Construction of a Full-Length Enriched cDNA Library and Preliminary Analysis of Expressed Sequence Tags from Bengal Tiger Panthera tigris tigris

    Science.gov (United States)

    Liu, Changqing; Liu, Dan; Guo, Yu; Lu, Taofeng; Li, Xiangchen; Zhang, Minghai; Ma, Jianzhang; Ma, Yuehui; Guan, Weijun

    2013-01-01

    In this study, a full-length enriched cDNA library was successfully constructed from Bengal tiger, Panthera tigris tigris, the most well-known wild Animal. Total RNA was extracted from cultured Bengal tiger fibroblasts in vitro. The titers of primary and amplified libraries were 1.28 × 106 pfu/mL and 1.56 × 109 pfu/mL respectively. The percentage of recombinants from unamplified library was 90.2% and average length of exogenous inserts was 0.98 kb. A total of 212 individual ESTs with sizes ranging from 356 to 1108 bps were then analyzed. The BLASTX score revealed that 48.1% of the sequences were classified as a strong match, 45.3% as nominal and 6.6% as a weak match. Among the ESTs with known putative function, 26.4% ESTs were found to be related to all kinds of metabolisms, 19.3% ESTs to information storage and processing, 11.3% ESTs to posttranslational modification, protein turnover, chaperones, 11.3% ESTs to transport, 9.9% ESTs to signal transducer/cell communication, 9.0% ESTs to structure protein, 3.8% ESTs to cell cycle, and only 6.6% ESTs classified as novel genes. By EST sequencing, a full-length gene coding ferritin was identified and characterized. The recombinant plasmid pET32a-TAT-Ferritin was constructed, coded for the TAT-Ferritin fusion protein with two 6× His-tags in N and C-terminal. After BCA assay, the concentration of soluble Trx-TAT-Ferritin recombinant protein was 2.32 ± 0.12 mg/mL. These results demonstrated that the reliability and representativeness of the cDNA library attained to the requirements of a standard cDNA library. This library provided a useful platform for the functional genome and transcriptome research of Bengal tigers. PMID:23708105

  4. An analysis of expressed sequence tags of developing castor endosperm using a full-length cDNA library

    Directory of Open Access Journals (Sweden)

    Wallis James G

    2007-07-01

    Full Text Available Abstract Background Castor seeds are a major source for ricinoleate, an important industrial raw material. Genomics studies of castor plant will provide critical information for understanding seed metabolism, for effectively engineering ricinoleate production in transgenic oilseeds, or for genetically improving castor plants by eliminating toxic and allergic proteins in seeds. Results Full-length cDNAs are useful resources in annotating genes and in providing functional analysis of genes and their products. We constructed a full-length cDNA library from developing castor endosperm, and obtained 4,720 ESTs from 5'-ends of the cDNA clones representing 1,908 unique sequences. The most abundant transcripts are genes encoding storage proteins, ricin, agglutinin and oleosins. Several other sequences are also very numerous, including two acidic triacylglycerol lipases, and the oleate hydroxylase (FAH12 gene that is responsible for ricinoleate biosynthesis. The role(s of the lipases in developing castor seeds are not clear, and co-expressing of a lipase and the FAH12 did not result in significant changes in hydroxy fatty acid accumulation in transgenic Arabidopsis seeds. Only one oleate desaturase (FAD2 gene was identified in our cDNA sequences. Sequence and functional analyses of the castor FAD2 were carried out since it had not been characterized previously. Overexpression of castor FAD2 in a FAH12-expressing Arabidopsis line resulted in decreased accumulation of hydroxy fatty acids in transgenic seeds. Conclusion Our results suggest that transcriptional regulation of FAD2 and FAH12 genes maybe one of the mechanisms that contribute to a high level of ricinoleate accumulation in castor endosperm. The full-length cDNA library will be used to search for additional genes that affect ricinoleate accumulation in seed oils. Our EST sequences will also be useful to annotate the castor genome, which whole sequence is being generated by shotgun sequencing at

  5. Signal sequence and keyword trap in silico for selection of full-length human cDNAs encoding secretion or membrane proteins from oligo-capped cDNA libraries.

    Science.gov (United States)

    Otsuki, Tetsuji; Ota, Toshio; Nishikawa, Tetsuo; Hayashi, Koji; Suzuki, Yutaka; Yamamoto, Jun-ichi; Wakamatsu, Ai; Kimura, Kouichi; Sakamoto, Katsuhiko; Hatano, Naoto; Kawai, Yuri; Ishii, Shizuko; Saito, Kaoru; Kojima, Shin-ichi; Sugiyama, Tomoyasu; Ono, Tetsuyoshi; Okano, Kazunori; Yoshikawa, Yoko; Aotsuka, Satoshi; Sasaki, Naokazu; Hattori, Atsushi; Okumura, Koji; Nagai, Keiichi; Sugano, Sumio; Isogai, Takao

    2005-01-01

    We have developed an in silico method of selection of human full-length cDNAs encoding secretion or membrane proteins from oligo-capped cDNA libraries. Fullness rates were increased to about 80% by combination of the oligo-capping method and ATGpr, software for prediction of translation start point and the coding potential. Then, using 5'-end single-pass sequences, cDNAs having the signal sequence were selected by PSORT ('signal sequence trap'). We also applied 'secretion or membrane protein-related keyword trap' based on the result of BLAST search against the SWISS-PROT database for the cDNAs which could not be selected by PSORT. Using the above procedures, 789 cDNAs were primarily selected and subjected to full-length sequencing, and 334 of these cDNAs were finally selected as novel. Most of the cDNAs (295 cDNAs: 88.3%) were predicted to encode secretion or membrane proteins. In particular, 165(80.5%) of the 205 cDNAs selected by PSORT were predicted to have signal sequences, while 70 (54.2%) of the 129 cDNAs selected by 'keyword trap' preserved the secretion or membrane protein-related keywords. Many important cDNAs were obtained, including transporters, receptors, and ligands, involved in significant cellular functions. Thus, an efficient method of selecting secretion or membrane protein-encoding cDNAs was developed by combining the above four procedures.

  6. Fiscal 2000 report on result of the full-length cDNA structure analysis; 2000 nendo kanzen cho cDNA kozo kaiseki seika hokokusho

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    2001-03-01

    This paper explains the results of research on full-length cDNA structure analysis for the period from April, 2000 to March, 2001. The outline of human genome sequence was published in June, 2000. In Japan, human gene analysis was such that, as the basic technology of the bio industry, a millennium project was decided in the budget of fiscal 2000. The full-length cDNA structure analysis is the core of the project. The libraries of cDNA were prepared using full-length and more than 4-5kbp-long cDNAs by oligo-capping method. It began from determining partial sequence data at end cDNA, and then, with new clones selected therefrom, full-length human cDNA sequence data were determined. The partial sequence data determined by fiscal 2000 were 1,035,000 clones while the full-length sequence data were 12,144 clones. The sequence data obtained were analyzed by homology search and translated into amino acid coding sequences, with predictions conducted on protein functions. A clustering method was examined that selects new clones from partial sequences. Database was constructed on gene expression profiles and disease-related gene sequence data. (NEDO)

  7. RT-PCR and sequence analysis of the full-length fusion protein of Canine Distemper Virus from domestic dogs.

    Science.gov (United States)

    Romanutti, Carina; Gallo Calderón, Marina; Keller, Leticia; Mattion, Nora; La Torre, José

    2016-02-01

    During 2007-2014, 84 out of 236 (35.6%) samples from domestic dogs submitted to our laboratory for diagnostic purposes were positive for Canine Distemper Virus (CDV), as analyzed by RT-PCR amplification of a fragment of the nucleoprotein gene. Fifty-nine of them (70.2%) were from dogs that had been vaccinated against CDV. The full-length gene encoding the Fusion (F) protein of fifteen isolates was sequenced and compared with that of those of other CDVs, including wild-type and vaccine strains. Phylogenetic analysis using the F gene full-length sequences grouped all the Argentinean CDV strains in the SA2 clade. Sequence identity with the Onderstepoort vaccine strain was 89.0-90.6%, and the highest divergence was found in the 135 amino acids corresponding to the F protein signal-peptide, Fsp (64.4-66.7% identity). In contrast, this region was highly conserved among the local strains (94.1-100% identity). One extra putative N-glycosylation site was identified in the F gene of CDV Argentinean strains with respect to the vaccine strain. The present report is the first to analyze full-length F protein sequences of CDV strains circulating in Argentina, and contributes to the knowledge of molecular epidemiology of CDV, which may help in understanding future disease outbreaks. Copyright © 2015 Elsevier B.V. All rights reserved.

  8. WebPrInSeS: automated full-length clone sequence identification and verification using high-throughput sequencing data.

    Science.gov (United States)

    Massouras, Andreas; Decouttere, Frederik; Hens, Korneel; Deplancke, Bart

    2010-07-01

    High-throughput sequencing (HTS) is revolutionizing our ability to obtain cheap, fast and reliable sequence information. Many experimental approaches are expected to benefit from the incorporation of such sequencing features in their pipeline. Consequently, software tools that facilitate such an incorporation should be of great interest. In this context, we developed WebPrInSeS, a web server tool allowing automated full-length clone sequence identification and verification using HTS data. WebPrInSeS encompasses two separate software applications. The first is WebPrInSeS-C which performs automated sequence verification of user-defined open-reading frame (ORF) clone libraries. The second is WebPrInSeS-E, which identifies positive hits in cDNA or ORF-based library screening experiments such as yeast one- or two-hybrid assays. Both tools perform de novo assembly using HTS data from any of the three major sequencing platforms. Thus, WebPrInSeS provides a highly integrated, cost-effective and efficient way to sequence-verify or identify clones of interest. WebPrInSeS is available at http://webprinses.epfl.ch/ and is open to all users.

  9. Genetic characterization of human herpesvirus type 1: Full-length genome sequence of strain obtained from an encephalitis case from India

    Directory of Open Access Journals (Sweden)

    Vijay P Bondre

    2016-01-01

    Interpretation & conclusions: Our results showed that the full-length genome sequence generated from an Indian HSV-1 isolate shared close genetic relationship with the American KOS and Chinese CR38 strains which belonged to the Asian genetic lineage. Recombination analysis of Indian isolate demonstrated multiple recombination crossover points throughout the genome. This full-length genome sequence amplified from the Indian isolate would be helpful to study HSV evolution, genetic basis of differential pathogenesis, host-virus interactions and viral factors contributing towards differential clinical outcome in human infections.

  10. First full-length genome sequence of the polerovirus luffa aphid-borne yellows virus (LABYV) reveals the presence of at least two consensus sequences in an isolate from Thailand.

    Science.gov (United States)

    Knierim, Dennis; Maiss, Edgar; Kenyon, Lawrence; Winter, Stephan; Menzel, Wulf

    2015-10-01

    Luffa aphid-borne yellows virus (LABYV) was proposed as the name for a previously undescribed polerovirus based on partial genome sequences obtained from samples of cucurbit plants collected in Thailand between 2008 and 2013. In this study, we determined the first full-length genome sequence of LABYV. Based on phylogenetic analysis and genome properties, it is clear that this virus represents a distinct species in the genus Polerovirus. Analysis of sequences from sample TH24, which was collected in 2010 from a luffa plant in Thailand, reveals the presence of two different full-length genome consensus sequences.

  11. Full-length Dysferlin Transfer by the Hyperactive Sleeping Beauty Transposase Restores Dysferlin-deficient Muscle

    Directory of Open Access Journals (Sweden)

    Helena Escobar

    2016-01-01

    Full Text Available Dysferlin-deficient muscular dystrophy is a progressive disease characterized by muscle weakness and wasting for which there is no treatment. It is caused by mutations in DYSF, a large, multiexonic gene that forms a coding sequence of 6.2 kb. Sleeping Beauty (SB transposon is a nonviral gene transfer vector, already used in clinical trials. The hyperactive SB system consists of a transposon DNA sequence and a transposase protein, SB100X, that can integrate DNA over 10 kb into the target genome. We constructed an SB transposon-based vector to deliver full-length human DYSF cDNA into dysferlin-deficient H2K A/J myoblasts. We demonstrate proper dysferlin expression as well as highly efficient engraftment (>1,100 donor-derived fibers of the engineered myoblasts in the skeletal muscle of dysferlin- and immunodeficient B6. Cg-Dysfprmd Prkdcscid/J (Scid/BLA/J mice. Nonviral gene delivery of full-length human dysferlin into muscle cells, along with a successful and efficient transplantation into skeletal muscle are important advances towards successful gene therapy of dysferlin-deficient muscular dystrophy.

  12. Gene discovery from Jatropha curcas by sequencing of ESTs from normalized and full-length enriched cDNA library from developing seeds

    Directory of Open Access Journals (Sweden)

    Sugantham Priyanka Annabel

    2010-10-01

    Full Text Available Abstract Background Jatropha curcas L. is promoted as an important non-edible biodiesel crop worldwide. Jatropha oil, which is a triacylglycerol, can be directly blended with petro-diesel or transesterified with methanol and used as biodiesel. Genetic improvement in jatropha is needed to increase the seed yield, oil content, drought and pest resistance, and to modify oil composition so that it becomes a technically and economically preferred source for biodiesel production. However, genetic improvement efforts in jatropha could not take advantage of genetic engineering methods due to lack of cloned genes from this species. To overcome this hurdle, the current gene discovery project was initiated with an objective of isolating as many functional genes as possible from J. curcas by large scale sequencing of expressed sequence tags (ESTs. Results A normalized and full-length enriched cDNA library was constructed from developing seeds of J. curcas. The cDNA library contained about 1 × 106 clones and average insert size of the clones was 2.1 kb. Totally 12,084 ESTs were sequenced to average high quality read length of 576 bp. Contig analysis revealed 2258 contigs and 4751 singletons. Contig size ranged from 2-23 and there were 7333 ESTs in the contigs. This resulted in 7009 unigenes which were annotated by BLASTX. It showed 3982 unigenes with significant similarity to known genes and 2836 unigenes with significant similarity to genes of unknown, hypothetical and putative proteins. The remaining 191 unigenes which did not show similarity with any genes in the public database may encode for unique genes. Functional classification revealed unigenes related to broad range of cellular, molecular and biological functions. Among the 7009 unigenes, 6233 unigenes were identified to be potential full-length genes. Conclusions The high quality normalized cDNA library was constructed from developing seeds of J. curcas for the first time and 7009 unigenes coding

  13. Integrative annotation of 21,037 human genes validated by full-length cDNA clones.

    Directory of Open Access Journals (Sweden)

    Tadashi Imanishi

    2004-06-01

    Full Text Available The human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and function of the genes in the human genome. Even so, gene prediction remains a difficult task, as the varieties of transcripts of a gene may vary to a great extent. We thus performed an exhaustive integrative characterization of 41,118 full-length cDNAs that capture the gene transcripts as complete functional cassettes, providing an unequivocal report of structural and functional diversity at the gene level. Our international collaboration has validated 21,037 human gene candidates by analysis of high-quality full-length cDNA clones through curation using unified criteria. This led to the identification of 5,155 new gene candidates. It also manifested the most reliable way to control the quality of the cDNA clones. We have developed a human gene database, called the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/. It provides the following: integrative annotation of human genes, description of gene structures, details of novel alternative splicing isoforms, non-protein-coding RNAs, functional domains, subcellular localizations, metabolic pathways, predictions of protein three-dimensional structure, mapping of known single nucleotide polymorphisms (SNPs, identification of polymorphic microsatellite repeats within human genes, and comparative results with mouse full-length cDNAs. The H-InvDB analysis has shown that up to 4% of the human genome sequence (National Center for Biotechnology Information build 34 assembly may contain misassembled or missing regions. We found that 6.5% of the human gene candidates (1,377 loci did not have a good protein-coding open reading frame, of which 296 loci are strong candidates for non-protein-coding RNA

  14. Improvement of genome assembly completeness and identification of novel full-length protein-coding genes by RNA-seq in the giant panda genome.

    Science.gov (United States)

    Chen, Meili; Hu, Yibo; Liu, Jingxing; Wu, Qi; Zhang, Chenglin; Yu, Jun; Xiao, Jingfa; Wei, Fuwen; Wu, Jiayan

    2015-12-11

    High-quality and complete gene models are the basis of whole genome analyses. The giant panda (Ailuropoda melanoleuca) genome was the first genome sequenced on the basis of solely short reads, but the genome annotation had lacked the support of transcriptomic evidence. In this study, we applied RNA-seq to globally improve the genome assembly completeness and to detect novel expressed transcripts in 12 tissues from giant pandas, by using a transcriptome reconstruction strategy that combined reference-based and de novo methods. Several aspects of genome assembly completeness in the transcribed regions were effectively improved by the de novo assembled transcripts, including genome scaffolding, the detection of small-size assembly errors, the extension of scaffold/contig boundaries, and gap closure. Through expression and homology validation, we detected three groups of novel full-length protein-coding genes. A total of 12.62% of the novel protein-coding genes were validated by proteomic data. GO annotation analysis showed that some of the novel protein-coding genes were involved in pigmentation, anatomical structure formation and reproduction, which might be related to the development and evolution of the black-white pelage, pseudo-thumb and delayed embryonic implantation of giant pandas. The updated genome annotation will help further giant panda studies from both structural and functional perspectives.

  15. Human uroporphyrinogen III synthase: Molecular cloning, nucleotide sequence, and expression of a full-length cDNA

    International Nuclear Information System (INIS)

    Tsai, Shihfeng; Bishop, D.F.; Desnick, R.J.

    1988-01-01

    Uroporphyrinogen III synthase, the fourth enzyme in the heme biosynthetic pathway, is responsible for conversion of the linear tetrapyrrole, hydroxymethylbilane, to the cyclic tetrapyrrole, uroporphyrinogen III. The deficient activity of URO-synthase is the enzymatic defect in the autosomal recessive disorder congenital erythropoietic porphyria. To facilitate the isolation of a full-length cDNA for human URO-synthase, the human erythrocyte enzyme was purified to homogeneity and 81 nonoverlapping amino acids were determined by microsequencing the N terminus and four tryptic peptides. Two synthetic oligonucleotide mixtures were used to screen 1.2 x 10 6 recombinants from a human adult liver cDNA library. Eight clones were positive with both oligonucleotide mixtures. Of these, dideoxy sequencing of the 1.3 kilobase insert from clone pUROS-2 revealed 5' and 3' untranslated sequences of 196 and 284 base pairs, respectively, and an open reading frame of 798 base pairs encoding a protein of 265 amino acids with a predicted molecular mass of 28,607 Da. The isolation and expression of this full-length cDNA for human URO-synthase should facilitate studies of the structure, organization, and chromosomal localization of this heme biosynthetic gene as well as the characterization of the molecular lesions causing congenital erythropoietic porphyria

  16. Some Algebraic Aspects of MorseCode Sequences

    Directory of Open Access Journals (Sweden)

    Johann Cigler

    2003-06-01

    Full Text Available Morse code sequences are very useful to give combinatorial interpretations of various properties of Fibonacci numbers. In this note we study some algebraic and combinatorial aspects of Morse code sequences and obtain several q-analogues of Fibonacci numbers and Fibonacci polynomials and their generalizations.

  17. On the normalization of the minimum free energy of RNAs by sequence length.

    Directory of Open Access Journals (Sweden)

    Edoardo Trotta

    Full Text Available The minimum free energy (MFE of ribonucleic acids (RNAs increases at an apparent linear rate with sequence length. Simple indices, obtained by dividing the MFE by the number of nucleotides, have been used for a direct comparison of the folding stability of RNAs of various sizes. Although this normalization procedure has been used in several studies, the relationship between normalized MFE and length has not yet been investigated in detail. Here, we demonstrate that the variation of MFE with sequence length is not linear and is significantly biased by the mathematical formula used for the normalization procedure. For this reason, the normalized MFEs strongly decrease as hyperbolic functions of length and produce unreliable results when applied for the comparison of sequences with different sizes. We also propose a simple modification of the normalization formula that corrects the bias enabling the use of the normalized MFE for RNAs longer than 40 nt. Using the new corrected normalized index, we analyzed the folding free energies of different human RNA families showing that most of them present an average MFE density more negative than expected for a typical genomic sequence. Furthermore, we found that a well-defined and restricted range of MFE density characterizes each RNA family, suggesting the use of our corrected normalized index to improve RNA prediction algorithms. Finally, in coding and functional human RNAs the MFE density appears scarcely correlated with sequence length, consistent with a negligible role of thermodynamic stability demands in determining RNA size.

  18. Inconsistencies of genome annotations in apicomplexan parasites revealed by 5'-end-one-pass and full-length sequences of oligo-capped cDNAs

    Directory of Open Access Journals (Sweden)

    Sugano Sumio

    2009-07-01

    Full Text Available Abstract Background Apicomplexan parasites are causative agents of various diseases including malaria and have been targets of extensive genomic sequencing. We generated 5'-EST collections for six apicomplexa parasites using our full-length oligo-capping cDNA library method. To improve upon the current genome annotations, as well as to validate the importance for physical cDNA clone resources, we generated a large-scale collection of full-length cDNAs for several apicomplexa parasites. Results In this study, we used a total of 61,056 5'-end-single-pass cDNA sequences from Plasmodium falciparum, P. vivax, P. yoelii, P. berghei, Cryptosporidium parvum, and Toxoplasma gondii. We compared these partially sequenced cDNA sequences with the currently annotated gene models and observed significant inconsistencies between the two datasets. In particular, we found that on average 14% of the exons in the current gene models were not supported by any cDNA evidence, and that 16% of the current gene models may contain at least one mis-annotation and should be re-evaluated. We also identified a large number of transcripts that had been previously unidentified. For 732 cDNAs in T. gondii, the entire sequences were determined in order to evaluate the annotated gene models at the complete full-length transcript level. We found that 41% of the T. gondii gene models contained at least one inconsistency. We also identified and confirmed by RT-PCR 140 previously unidentified transcripts found in the intergenic regions of the current gene annotations. We show that the majority of these discrepancies are due to questionable predictions of one or two extra exons in the upstream or downstream regions of the genes. Conclusion Our data indicates that the current gene models are likely to still be incomplete and have much room for improvement. Our unique full-length cDNA information is especially useful for further refinement of the annotations for the genomes of

  19. PCR-based isolation and identification of full-length low-molecular-weight glutenin subunit genes in bread wheat (Triticum aestivum L.).

    Science.gov (United States)

    Zhang, Xiaofei; Liu, Dongcheng; Jiang, Wei; Guo, Xiaoli; Yang, Wenlong; Sun, Jiazhu; Ling, Hongqing; Zhang, Aimin

    2011-12-01

    Low-molecular-weight glutenin subunits (LMW-GSs) are encoded by a multi-gene family and are essential for determining the quality of wheat flour products, such as bread and noodles. However, the exact role or contribution of individual LMW-GS genes to wheat quality remains unclear. This is, at least in part, due to the difficulty in characterizing complete sequences of all LMW-GS gene family members in bread wheat. To identify full-length LMW-GS genes, a polymerase chain reaction (PCR)-based method was established, consisting of newly designed conserved primers and the previously developed LMW-GS gene molecular marker system. Using the PCR-based method, 17 LMW-GS genes were identified and characterized in Xiaoyan 54, of which 12 contained full-length sequences. Sequence alignments showed that 13 LMW-GS genes were identical to those found in Xiaoyan 54 using the genomic DNA library screening, and the other four full-length LMW-GS genes were first isolated from Xiaoyan 54. In Chinese Spring, 16 unique LMW-GS genes were isolated, and 13 of them contained full-length coding sequences. Additionally, 16 and 17 LMW-GS genes in Dongnong 101 and Lvhan 328 (chosen from the micro-core collections of Chinese germplasm), respectively, were also identified. Sequence alignments revealed that at least 15 LMW-GS genes were common in the four wheat varieties, and allelic variants of each gene shared high sequence identities (>95%) but exhibited length polymorphism in repetitive regions. This study provides a PCR-based method for efficiently identifying LMW-GS genes in bread wheat, which will improve the characterization of complex members of the LMW-GS gene family and facilitate the understanding of their contributions to wheat quality.

  20. Full Genome Sequence and sfRNA Interferon Antagonist Activity of Zika Virus from Recife, Brazil.

    Directory of Open Access Journals (Sweden)

    Claire L Donald

    2016-10-01

    Full Text Available The outbreak of Zika virus (ZIKV in the Americas has transformed a previously obscure mosquito-transmitted arbovirus of the Flaviviridae family into a major public health concern. Little is currently known about the evolution and biology of ZIKV and the factors that contribute to the associated pathogenesis. Determining genomic sequences of clinical viral isolates and characterization of elements within these are an important prerequisite to advance our understanding of viral replicative processes and virus-host interactions.We obtained a ZIKV isolate from a patient who presented with classical ZIKV-associated symptoms, and used high throughput sequencing and other molecular biology approaches to determine its full genome sequence, including non-coding regions. Genome regions were characterized and compared to the sequences of other isolates where available. Furthermore, we identified a subgenomic flavivirus RNA (sfRNA in ZIKV-infected cells that has antagonist activity against RIG-I induced type I interferon induction, with a lesser effect on MDA-5 mediated action.The full-length genome sequence including non-coding regions of a South American ZIKV isolate from a patient with classical symptoms will support efforts to develop genetic tools for this virus. Detection of sfRNA that counteracts interferon responses is likely to be important for further understanding of pathogenesis and virus-host interactions.

  1. Assessment of adaptive evolution between wheat and rice as deduced from full-length common wheat cDNA sequence data and expression patterns

    Directory of Open Access Journals (Sweden)

    Hayashizaki Yoshihide

    2009-06-01

    Full Text Available Abstract Background Wheat is an allopolyploid plant that harbors a huge, complex genome. Therefore, accumulation of expressed sequence tags (ESTs for wheat is becoming particularly important for functional genomics and molecular breeding. We prepared a comprehensive collection of ESTs from the various tissues that develop during the wheat life cycle and from tissues subjected to stress. We also examined their expression profiles in silico. As full-length cDNAs are indispensable to certify the collected ESTs and annotate the genes in the wheat genome, we performed a systematic survey and sequencing of the full-length cDNA clones. This sequence information is a valuable genetic resource for functional genomics and will enable carrying out comparative genomics in cereals. Results As part of the functional genomics and development of genomic wheat resources, we have generated a collection of full-length cDNAs from common wheat. By grouping the ESTs of recombinant clones randomly selected from the full-length cDNA library, we were able to sequence 6,162 independent clones with high accuracy. About 10% of the clones were wheat-unique genes, without any counterparts within the DNA database. Wheat clones that showed high homology to those of rice were selected in order to investigate their expression patterns in various tissues throughout the wheat life cycle and in response to abiotic-stress treatments. To assess the variability of genes that have evolved differently in wheat and rice, we calculated the substitution rate (Ka/Ks of the counterparts in wheat and rice. Genes that were preferentially expressed in certain tissues or treatments had higher Ka/Ks values than those in other tissues and treatments, which suggests that the genes with the higher variability expressed in these tissues is under adaptive selection. Conclusion We have generated a high-quality full-length cDNA resource for common wheat, which is essential for continuation of the

  2. Molecular cloning of full-length coding sequences and ...

    African Journals Online (AJOL)

    DR TONUKARI NYEROVWO

    structure and function of collagen, the distribution patterns of these two characteristic residues in α chains of ... the extracellular matrix. Besides ... number in collagen family and the major matrix protein in ..... Dashes represent missing residues.

  3. Global identification of the full-length transcripts and alternative splicing related to phenolic acid biosynthetic genes in Salvia miltiorrhiza

    Directory of Open Access Journals (Sweden)

    Zhichao eXu

    2016-02-01

    Full Text Available Salvianolic acids are among the main bioactive components in Salvia miltiorrhiza, and their biosynthesis has attracted widespread interest. However, previous studies on the biosynthesis of phenolic acids using next-generation sequencing platforms are limited with regard to the assembly of full-length transcripts. Based on hybrid-seq (next-generation and single molecular real-time sequencing of the S. miltiorrhiza root transcriptome, we experimentally identified 15 full-length transcripts and 4 alternative splicing events of enzyme-coding genes involved in the biosynthesis of rosmarinic acid. Moreover, we herein demonstrate that lithospermic acid B accumulates in the phloem and xylem of roots, in agreement with the expression patterns of the identified key genes related to rosmarinic acid biosynthesis. According to co-expression patterns, we predicted that 6 candidate cytochrome P450s and 5 candidate laccases participate in the salvianolic acid pathway. Our results provide a valuable resource for further investigation into the synthetic biology of phenolic acids in S. miltiorrhiza.

  4. The relationship of protein conservation and sequence length

    Directory of Open Access Journals (Sweden)

    Panchenko Anna R

    2002-11-01

    Full Text Available Abstract Background In general, the length of a protein sequence is determined by its function and the wide variance in the lengths of an organism's proteins reflects the diversity of specific functional roles for these proteins. However, additional evolutionary forces that affect the length of a protein may be revealed by studying the length distributions of proteins evolving under weaker functional constraints. Results We performed sequence comparisons to distinguish highly conserved and poorly conserved proteins from the bacterium Escherichia coli, the archaeon Archaeoglobus fulgidus, and the eukaryotes Saccharomyces cerevisiae, Drosophila melanogaster, and Homo sapiens. For all organisms studied, the conserved and nonconserved proteins have strikingly different length distributions. The conserved proteins are, on average, longer than the poorly conserved ones, and the length distributions for the poorly conserved proteins have a relatively narrow peak, in contrast to the conserved proteins whose lengths spread over a wider range of values. For the two prokaryotes studied, the poorly conserved proteins approximate the minimal length distribution expected for a diverse range of structural folds. Conclusions There is a relationship between protein conservation and sequence length. For all the organisms studied, there seems to be a significant evolutionary trend favoring shorter proteins in the absence of other, more specific functional constraints.

  5. Large-scale analysis of full-length cDNAs from the tomato (Solanum lycopersicum cultivar Micro-Tom, a reference system for the Solanaceae genomics

    Directory of Open Access Journals (Sweden)

    Kikuchi Mari

    2010-03-01

    Full Text Available Abstract Background The Solanaceae family includes several economically important vegetable crops. The tomato (Solanum lycopersicum is regarded as a model plant of the Solanaceae family. Recently, a number of tomato resources have been developed in parallel with the ongoing tomato genome sequencing project. In particular, a miniature cultivar, Micro-Tom, is regarded as a model system in tomato genomics, and a number of genomics resources in the Micro-Tom-background, such as ESTs and mutagenized lines, have been established by an international alliance. Results To accelerate the progress in tomato genomics, we developed a collection of fully-sequenced 13,227 Micro-Tom full-length cDNAs. By checking redundant sequences, coding sequences, and chimeric sequences, a set of 11,502 non-redundant full-length cDNAs (nrFLcDNAs was generated. Analysis of untranslated regions demonstrated that tomato has longer 5'- and 3'-untranslated regions than most other plants but rice. Classification of functions of proteins predicted from the coding sequences demonstrated that nrFLcDNAs covered a broad range of functions. A comparison of nrFLcDNAs with genes of sixteen plants facilitated the identification of tomato genes that are not found in other plants, most of which did not have known protein domains. Mapping of the nrFLcDNAs onto currently available tomato genome sequences facilitated prediction of exon-intron structure. Introns of tomato genes were longer than those of Arabidopsis and rice. According to a comparison of exon sequences between the nrFLcDNAs and the tomato genome sequences, the frequency of nucleotide mismatch in exons between Micro-Tom and the genome-sequencing cultivar (Heinz 1706 was estimated to be 0.061%. Conclusion The collection of Micro-Tom nrFLcDNAs generated in this study will serve as a valuable genomic tool for plant biologists to bridge the gap between basic and applied studies. The nrFLcDNA sequences will help annotation of the

  6. Critical lengths of error events in convolutional codes

    DEFF Research Database (Denmark)

    Justesen, Jørn

    1994-01-01

    If the calculation of the critical length is based on the expurgated exponent, the length becomes nonzero for low error probabilities. This result applies to typical long codes, but it may also be useful for modeling error events in specific codes......If the calculation of the critical length is based on the expurgated exponent, the length becomes nonzero for low error probabilities. This result applies to typical long codes, but it may also be useful for modeling error events in specific codes...

  7. Critical Lengths of Error Events in Convolutional Codes

    DEFF Research Database (Denmark)

    Justesen, Jørn; Andersen, Jakob Dahl

    1998-01-01

    If the calculation of the critical length is based on the expurgated exponent, the length becomes nonzero for low error probabilities. This result applies to typical long codes, but it may also be useful for modeling error events in specific codes......If the calculation of the critical length is based on the expurgated exponent, the length becomes nonzero for low error probabilities. This result applies to typical long codes, but it may also be useful for modeling error events in specific codes...

  8. PATACSDB—the database of polyA translational attenuators in coding sequences

    Directory of Open Access Journals (Sweden)

    Malgorzata Habich

    2016-02-01

    Full Text Available Recent additions to the repertoire of gene expression regulatory mechanisms are polyadenylate (polyA tracks encoding for poly-lysine runs in protein sequences. Such tracks stall the translation apparatus and induce frameshifting independently of the effects of charged nascent poly-lysine sequence on the ribosome exit channel. As such, they substantially influence the stability of mRNA and the amount of protein produced from a given transcript. Single base changes in these regions are enough to exert a measurable response on both protein and mRNA abundance; this makes each of these sequences a potentially interesting case study for the effects of synonymous mutation, gene dosage balance and natural frameshifting. Here we present PATACSDB, a resource that contain a comprehensive list of polyA tracks from over 250 eukaryotic genomes. Our data is based on the Ensembl genomic database of coding sequences and filtered with algorithm of 12A-1 which selects sequences of polyA tracks with a minimal length of 12 A’s allowing for one mismatched base. The PATACSDB database is accessible at: http://sysbio.ibb.waw.pl/patacsdb. The source code is available at http://github.com/habich/PATACSDB, and it includes the scripts with which the database can be recreated.

  9. Construction and evaluation of normalized cDNA libraries enriched with full-length sequences for rapid discovery of new genes from Sisal (Agave sisalana Perr.) different developmental stages.

    Science.gov (United States)

    Zhou, Wen-Zhao; Zhang, Yan-Mei; Lu, Jun-Ying; Li, Jun-Feng

    2012-10-12

    To provide a resource of sisal-specific expressed sequence data and facilitate this powerful approach in new gene research, the preparation of normalized cDNA libraries enriched with full-length sequences is necessary. Four libraries were produced with RNA pooled from Agave sisalana multiple tissues to increase efficiency of normalization and maximize the number of independent genes by SMART™ method and the duplex-specific nuclease (DSN). This procedure kept the proportion of full-length cDNAs in the subtracted/normalized libraries and dramatically enhanced the discovery of new genes. Sequencing of 3875 cDNA clones of libraries revealed 3320 unigenes with an average insert length about 1.2 kb, indicating that the non-redundancy of libraries was about 85.7%. These unigene functions were predicted by comparing their sequences to functional domain databases and extensively annotated with Gene Ontology (GO) terms. Comparative analysis of sisal unigenes and other plant genomes revealed that four putative MADS-box genes and knotted-like homeobox (knox) gene were obtained from a total of 1162 full-length transcripts. Furthermore, real-time PCR showed that the characteristics of their transcripts mainly depended on the tight expression regulation of a number of genes during the leaf and flower development. Analysis of individual library sequence data indicated that the pooled-tissue approach was highly effective in discovering new genes and preparing libraries for efficient deep sequencing.

  10. Large-scale analysis of full-length cDNAs from the tomato (Solanum lycopersicum) cultivar Micro-Tom, a reference system for the Solanaceae genomics.

    Science.gov (United States)

    Aoki, Koh; Yano, Kentaro; Suzuki, Ayako; Kawamura, Shingo; Sakurai, Nozomu; Suda, Kunihiro; Kurabayashi, Atsushi; Suzuki, Tatsuya; Tsugane, Taneaki; Watanabe, Manabu; Ooga, Kazuhide; Torii, Maiko; Narita, Takanori; Shin-I, Tadasu; Kohara, Yuji; Yamamoto, Naoki; Takahashi, Hideki; Watanabe, Yuichiro; Egusa, Mayumi; Kodama, Motoichiro; Ichinose, Yuki; Kikuchi, Mari; Fukushima, Sumire; Okabe, Akiko; Arie, Tsutomu; Sato, Yuko; Yazawa, Katsumi; Satoh, Shinobu; Omura, Toshikazu; Ezura, Hiroshi; Shibata, Daisuke

    2010-03-30

    The Solanaceae family includes several economically important vegetable crops. The tomato (Solanum lycopersicum) is regarded as a model plant of the Solanaceae family. Recently, a number of tomato resources have been developed in parallel with the ongoing tomato genome sequencing project. In particular, a miniature cultivar, Micro-Tom, is regarded as a model system in tomato genomics, and a number of genomics resources in the Micro-Tom-background, such as ESTs and mutagenized lines, have been established by an international alliance. To accelerate the progress in tomato genomics, we developed a collection of fully-sequenced 13,227 Micro-Tom full-length cDNAs. By checking redundant sequences, coding sequences, and chimeric sequences, a set of 11,502 non-redundant full-length cDNAs (nrFLcDNAs) was generated. Analysis of untranslated regions demonstrated that tomato has longer 5'- and 3'-untranslated regions than most other plants but rice. Classification of functions of proteins predicted from the coding sequences demonstrated that nrFLcDNAs covered a broad range of functions. A comparison of nrFLcDNAs with genes of sixteen plants facilitated the identification of tomato genes that are not found in other plants, most of which did not have known protein domains. Mapping of the nrFLcDNAs onto currently available tomato genome sequences facilitated prediction of exon-intron structure. Introns of tomato genes were longer than those of Arabidopsis and rice. According to a comparison of exon sequences between the nrFLcDNAs and the tomato genome sequences, the frequency of nucleotide mismatch in exons between Micro-Tom and the genome-sequencing cultivar (Heinz 1706) was estimated to be 0.061%. The collection of Micro-Tom nrFLcDNAs generated in this study will serve as a valuable genomic tool for plant biologists to bridge the gap between basic and applied studies. The nrFLcDNA sequences will help annotation of the tomato whole-genome sequence and aid in tomato functional

  11. Joint source-channel coding using variable length codes

    NARCIS (Netherlands)

    Balakirsky, V.B.

    2001-01-01

    We address the problem of joint source-channel coding when variable-length codes are used for information transmission over a discrete memoryless channel. Data transmitted over the channel are interpreted as pairs (m k ,t k ), where m k is a message generated by the source and t k is a time instant

  12. Cloning and sequencing of full-length cDNAs of RNA1 and RNA2 of a Tomato black ring virus isolate from Poland.

    Science.gov (United States)

    Jończyk, M; Le Gall, O; Pałucha, A; Borodynko, N; Pospieszny, H

    2004-04-01

    Full-length cDNA clones corresponding to the RNA1 and RNA2 of the Polish isolate MJ of Tomato black ring virus (TBRV, genus Nepovirus) were obtained using a direct recombination strategy in yeast, and their complete nucleotide sequences were established. RNA1 is 7358 nucleotides and RNA2 is 4633 nucleotides in length, excluding the poly(A) tails. Both RNAs contain a single open reading frame encoding polyproteins of 254 kDa and 149 kDa for RNA1 and RNA2 respectively. Putative cleavage sites were identified, and the relationships between TBRV and related nepoviruses were studied by sequence comparison.

  13. Retrieval of a million high-quality, full-length microbial 16S and 18S rRNA gene sequences without primer bias

    DEFF Research Database (Denmark)

    Karst, Søren Michael; Dueholm, Morten Simonsen; McIlroy, Simon Jon

    2018-01-01

    Small subunit ribosomal RNA (SSU rRNA) genes, 16S in bacteria and 18S in eukaryotes, have been the standard phylogenetic markers used to characterize microbial diversity and evolution for decades. However, the reference databases of full-length SSU rRNA gene sequences are skewed to well-studied e...

  14. Salmo salar and Esox lucius full-length cDNA sequences reveal changes in evolutionary pressures on a post-tetraploidization genome

    Directory of Open Access Journals (Sweden)

    Holt Robert A

    2010-04-01

    Full Text Available Abstract Background Salmonids are one of the most intensely studied fish, in part due to their economic and environmental importance, and in part due to a recent whole genome duplication in the common ancestor of salmonids. This duplication greatly impacts species diversification, functional specialization, and adaptation. Extensive new genomic resources have recently become available for Atlantic salmon (Salmo salar, but documentation of allelic versus duplicate reference genes remains a major uncertainty in the complete characterization of its genome and its evolution. Results From existing expressed sequence tag (EST resources and three new full-length cDNA libraries, 9,057 reference quality full-length gene insert clones were identified for Atlantic salmon. A further 1,365 reference full-length clones were annotated from 29,221 northern pike (Esox lucius ESTs. Pairwise dN/dS comparisons within each of 408 sets of duplicated salmon genes using northern pike as a diploid out-group show asymmetric relaxation of selection on salmon duplicates. Conclusions 9,057 full-length reference genes were characterized in S. salar and can be used to identify alleles and gene family members. Comparisons of duplicated genes show that while purifying selection is the predominant force acting on both duplicates, consistent with retention of functionality in both copies, some relaxation of pressure on gene duplicates can be identified. In addition, there is evidence that evolution has acted asymmetrically on paralogs, allowing one of the pair to diverge at a faster rate.

  15. Full-length genome sequences of porcine epidemic diarrhoea virus strain CV777; Use of NGS to analyse genomic and sub-genomic RNAs

    DEFF Research Database (Denmark)

    Rasmussen, Thomas Bruun; Boniotti, Maria Beatrice; Papetti, Alice

    2018-01-01

    Porcine epidemic diarrhoea virus, strain CV777, was initially characterized in 1978 as the causative agent of a disease first identified in the UK in 1971. This coronavirus has been widely distributed among laboratories and has been passaged both within pigs and in cell culture. To determine...... the variability between different stocks of the PEDV strain CV777, sequencing of the full-length genome (ca. 28kb) has been performed in 6 different laboratories, using different protocols. Not surprisingly, each of the different full genome sequences were distinct from each other and from the reference sequence...... the analysis of sub-genomic mRNAs from infected cells. It is clearly important to know the features of the specific sample of CV777 being used for experimental studies....

  16. XenDB: Full length cDNA prediction and cross species mapping in Xenopus laevis

    Directory of Open Access Journals (Sweden)

    Giegerich Robert

    2005-09-01

    Full Text Available Abstract Background Research using the model system Xenopus laevis has provided critical insights into the mechanisms of early vertebrate development and cell biology. Large scale sequencing efforts have provided an increasingly important resource for researchers. To provide full advantage of the available sequence, we have analyzed 350,468 Xenopus laevis Expressed Sequence Tags (ESTs both to identify full length protein encoding sequences and to develop a unique database system to support comparative approaches between X. laevis and other model systems. Description Using a suffix array based clustering approach, we have identified 25,971 clusters and 40,877 singleton sequences. Generation of a consensus sequence for each cluster resulted in 31,353 tentative contig and 4,801 singleton sequences. Using both BLASTX and FASTY comparison to five model organisms and the NR protein database, more than 15,000 sequences are predicted to encode full length proteins and these have been matched to publicly available IMAGE clones when available. Each sequence has been compared to the KOG database and ~67% of the sequences have been assigned a putative functional category. Based on sequence homology to mouse and human, putative GO annotations have been determined. Conclusion The results of the analysis have been stored in a publicly available database XenDB http://bibiserv.techfak.uni-bielefeld.de/xendb/. A unique capability of the database is the ability to batch upload cross species queries to identify potential Xenopus homologues and their associated full length clones. Examples are provided including mapping of microarray results and application of 'in silico' analysis. The ability to quickly translate the results of various species into 'Xenopus-centric' information should greatly enhance comparative embryological approaches. Supplementary material can be found at http://bibiserv.techfak.uni-bielefeld.de/xendb/.

  17. Identification of the full-length β-actin sequence and expression profiles in the tree shrew (Tupaia belangeri).

    Science.gov (United States)

    Zheng, Yu; Yun, Chenxia; Wang, Qihui; Smith, Wanli W; Leng, Jing

    2015-02-01

    The tree shrew (Tupaia belangeri) diverges from the primate order (Primates) and is classified as a separate taxonomic group of mammals - Scandentia. It has been suggested that the tree shrew can be used as an animal model for studying human diseases; however, the genomic sequence of the tree shrew is largely unidentified. In the present study, we reported the full-length cDNA sequence of the housekeeping gene, β-actin, in the tree shrew. The amino acid sequence of β-actin in the tree shrew was compared to that of humans and other species; a simple phylogenetic relationship was discovered. Quantitative polymerase chain reaction (qPCR) and western blot analysis further demonstrated that the expression profiles of β-actin, as a general conservative housekeeping gene, in the tree shrew were similar to those in humans, although the expression levels varied among different types of tissue in the tree shrew. Our data provide evidence that the tree shrew has a close phylogenetic association with humans. These findings further enhance the potential that the tree shrew, as a species, may be used as an animal model for studying human disorders.

  18. Design LDPC Codes without Cycles of Length 4 and 6

    Directory of Open Access Journals (Sweden)

    Kiseon Kim

    2008-04-01

    Full Text Available We present an approach for constructing LDPC codes without cycles of length 4 and 6. Firstly, we design 3 submatrices with different shifting functions given by the proposed schemes, then combine them into the matrix specified by the proposed approach, and, finally, expand the matrix into a desired parity-check matrix using identity matrices and cyclic shift matrices of the identity matrices. The simulation result in AWGN channel verifies that the BER of the proposed code is close to those of Mackay's random codes and Tanner's QC codes, and the good BER performance of the proposed can remain at high code rates.

  19. Generation and analysis of large-scale expressed sequence tags (ESTs from a full-length enriched cDNA library of porcine backfat tissue

    Directory of Open Access Journals (Sweden)

    Lee Hae-Young

    2006-02-01

    Full Text Available Abstract Background Genome research in farm animals will expand our basic knowledge of the genetic control of complex traits, and the results will be applied in the livestock industry to improve meat quality and productivity, as well as to reduce the incidence of disease. A combination of quantitative trait locus mapping and microarray analysis is a useful approach to reduce the overall effort needed to identify genes associated with quantitative traits of interest. Results We constructed a full-length enriched cDNA library from porcine backfat tissue. The estimated average size of the cDNA inserts was 1.7 kb, and the cDNA fullness ratio was 70%. In total, we deposited 16,110 high-quality sequences in the dbEST division of GenBank (accession numbers: DT319652-DT335761. For all the expressed sequence tags (ESTs, approximately 10.9 Mb of porcine sequence were generated with an average length of 674 bp per EST (range: 200–952 bp. Clustering and assembly of these ESTs resulted in a total of 5,008 unique sequences with 1,776 contigs (35.46% and 3,232 singleton (65.54% ESTs. From a total of 5,008 unique sequences, 3,154 (62.98% were similar to other sequences, and 1,854 (37.02% were identified as having no hit or low identity (Sus scrofa. Gene ontology (GO annotation of unique sequences showed that approximately 31.7, 32.3, and 30.8% were assigned molecular function, biological process, and cellular component GO terms, respectively. A total of 1,854 putative novel transcripts resulted after comparison and filtering with the TIGR SsGI; these included a large percentage of singletons (80.64% and a small proportion of contigs (13.36%. Conclusion The sequence data generated in this study will provide valuable information for studying expression profiles using EST-based microarrays and assist in the condensation of current pig TCs into clusters representing longer stretches of cDNA sequences. The isolation of genes expressed in backfat tissue is the

  20. Construction of a full-length infectious bacterial artificial chromosome clone of duck enteritis virus vaccine strain

    Science.gov (United States)

    2013-01-01

    Background Duck enteritis virus (DEV) is the causative agent of duck viral enteritis, which causes an acute, contagious and lethal disease of many species of waterfowl within the order Anseriformes. In recent years, two laboratories have reported on the successful construction of DEV infectious clones in viral vectors to express exogenous genes. The clones obtained were either created with deletion of viral genes and based on highly virulent strains or were constructed using a traditional overlapping fosmid DNA system. Here, we report the construction of a full-length infectious clone of DEV vaccine strain that was cloned into a bacterial artificial chromosome (BAC). Methods A mini-F vector as a BAC that allows the maintenance of large circular DNA in E. coli was introduced into the intergenic region between UL15B and UL18 of a DEV vaccine strain by homologous recombination in chicken embryoblasts (CEFs). Then, the full-length DEV clone pDEV-vac was obtained by electroporating circular viral replication intermediates containing the mini-F sequence into E. coli DH10B and identified by enzyme digestion and sequencing. The infectivity of the pDEV-vac was validated by DEV reconstitution from CEFs transfected with pDEV-vac. The reconstructed virus without mini-F vector sequence was also rescued by co-transfecting the Cre recombinase expression plasmid pCAGGS-NLS/Cre and pDEV-vac into CEF cultures. Finally, the in vitro growth properties and immunoprotection capacity in ducks of the reconstructed viruses were also determined and compared with the parental virus. Results The full genome of the DEV vaccine strain was successfully cloned into the BAC, and this BAC clone was infectious. The in vitro growth properties of these reconstructions were very similar to parental DEV, and ducks immunized with these viruses acquired protection against virulent DEV challenge. Conclusions DEV vaccine virus was cloned as an infectious bacterial artificial chromosome maintaining full-length

  1. On the normalization of the minimum free energy of RNAs by sequence length.

    Science.gov (United States)

    Trotta, Edoardo

    2014-01-01

    The minimum free energy (MFE) of ribonucleic acids (RNAs) increases at an apparent linear rate with sequence length. Simple indices, obtained by dividing the MFE by the number of nucleotides, have been used for a direct comparison of the folding stability of RNAs of various sizes. Although this normalization procedure has been used in several studies, the relationship between normalized MFE and length has not yet been investigated in detail. Here, we demonstrate that the variation of MFE with sequence length is not linear and is significantly biased by the mathematical formula used for the normalization procedure. For this reason, the normalized MFEs strongly decrease as hyperbolic functions of length and produce unreliable results when applied for the comparison of sequences with different sizes. We also propose a simple modification of the normalization formula that corrects the bias enabling the use of the normalized MFE for RNAs longer than 40 nt. Using the new corrected normalized index, we analyzed the folding free energies of different human RNA families showing that most of them present an average MFE density more negative than expected for a typical genomic sequence. Furthermore, we found that a well-defined and restricted range of MFE density characterizes each RNA family, suggesting the use of our corrected normalized index to improve RNA prediction algorithms. Finally, in coding and functional human RNAs the MFE density appears scarcely correlated with sequence length, consistent with a negligible role of thermodynamic stability demands in determining RNA size.

  2. Performance Analysis for Cooperative Communication System with QC-LDPC Codes Constructed with Integer Sequences

    Directory of Open Access Journals (Sweden)

    Yan Zhang

    2015-01-01

    Full Text Available This paper presents four different integer sequences to construct quasi-cyclic low-density parity-check (QC-LDPC codes with mathematical theory. The paper introduces the procedure of the coding principle and coding. Four different integer sequences constructing QC-LDPC code are compared with LDPC codes by using PEG algorithm, array codes, and the Mackey codes, respectively. Then, the integer sequence QC-LDPC codes are used in coded cooperative communication. Simulation results show that the integer sequence constructed QC-LDPC codes are effective, and overall performance is better than that of other types of LDPC codes in the coded cooperative communication. The performance of Dayan integer sequence constructed QC-LDPC is the most excellent performance.

  3. Importance of Viral Sequence Length and Number of Variable and Informative Sites in Analysis of HIV Clustering.

    Science.gov (United States)

    Novitsky, Vlad; Moyo, Sikhulile; Lei, Quanhong; DeGruttola, Victor; Essex, M

    2015-05-01

    To improve the methodology of HIV cluster analysis, we addressed how analysis of HIV clustering is associated with parameters that can affect the outcome of viral clustering. The extent of HIV clustering and tree certainty was compared between 401 HIV-1C near full-length genome sequences and subgenomic regions retrieved from the LANL HIV Database. Sliding window analysis was based on 99 windows of 1,000 bp and 45 windows of 2,000 bp. Potential associations between the extent of HIV clustering and sequence length and the number of variable and informative sites were evaluated. The near full-length genome HIV sequences showed the highest extent of HIV clustering and the highest tree certainty. At the bootstrap threshold of 0.80 in maximum likelihood (ML) analysis, 58.9% of near full-length HIV-1C sequences but only 15.5% of partial pol sequences (ViroSeq) were found in clusters. Among HIV-1 structural genes, pol showed the highest extent of clustering (38.9% at a bootstrap threshold of 0.80), although it was significantly lower than in the near full-length genome sequences. The extent of HIV clustering was significantly higher for sliding windows of 2,000 bp than 1,000 bp. We found a strong association between the sequence length and proportion of HIV sequences in clusters, and a moderate association between the number of variable and informative sites and the proportion of HIV sequences in clusters. In HIV cluster analysis, the extent of detectable HIV clustering is directly associated with the length of viral sequences used, as well as the number of variable and informative sites. Near full-length genome sequences could provide the most informative HIV cluster analysis. Selected subgenomic regions with a high extent of HIV clustering and high tree certainty could also be considered as a second choice.

  4. Construction of Short-Length High-Rates LDPC Codes Using Difference Families

    Directory of Open Access Journals (Sweden)

    Deny Hamdani

    2010-10-01

    Full Text Available Low-density parity-check (LDPC code is linear-block error-correcting code defined by sparse parity-check matrix. It is decoded using the massage-passing algorithm, and in many cases, capable of outperforming turbo code. This paper presents a class of low-density parity-check (LDPC codes showing good performance with low encoding complexity. The code is constructed using difference families from  combinatorial design. The resulting code, which is designed to have short code length and high code rate, can be encoded with low complexity due to its quasi-cyclic structure, and performs well when it is iteratively decoded with the sum-product algorithm. These properties of LDPC code are quite suitable for applications in future wireless local area network.

  5. Full-Length Sequence of Mouse Acupuncture-Induced 1-L (Aig1l Gene Including Its Transcriptional Start Site

    Directory of Open Access Journals (Sweden)

    Mika Ohta

    2011-01-01

    Full Text Available We have been investigating the molecular efficacy of electroacupuncture (EA, which is one type of acupuncture therapy. In our previous molecular biological study of acupuncture, we found an EA-induced gene, named acupuncture-induced 1-L (Aig1l, in mouse skeletal muscle. The aims of this study consisted of identification of the full-length cDNA sequence of Aig1l including the transcriptional start site, determination of the tissue distribution of Aig1l and analysis of the effect of EA on Aig1l gene expression. We determined the complete cDNA sequence including the transcriptional start site via cDNA cloning with the cap site hunting method. We then analyzed the tissue distribution of Aig1l by means of northern blot analysis and real-time quantitative polymerase chain reaction. We used the semiquantitative reverse transcriptase-polymerase chain reaction to examine the effect of EA on Aig1l gene expression. Our results showed that the complete cDNA sequence of Aig1l was 6073 bp long, and the putative protein consisted of 962 amino acids. All seven tissues that we analyzed expressed the Aig1l gene. In skeletal muscle, EA induced expression of the Aig1l gene, with high expression observed after 3 hours of EA. Our findings thus suggest that the Aig1l gene may play a key role in the molecular mechanisms of EA efficacy.

  6. Efficient generation of recombinant RNA viruses using targeted recombination-mediated mutagenesis of bacterial artificial chromosomes containing full-length cDNA

    DEFF Research Database (Denmark)

    Rasmussen, Thomas Bruun; Risager, Peter Christian; Fahnøe, Ulrik

    2013-01-01

    Background Infectious cDNA clones are a prerequisite for directed genetic manipulation of RNA viruses. Here, a strategy to facilitate manipulation and rescue of classical swine fever viruses (CSFVs) from full-length cDNAs present within bacterial artificial chromosomes (BACs) is described....... This strategy allows manipulation of viral cDNA by targeted recombination-mediated mutagenesis within bacteria. Results A new CSFV-BAC (pBeloR26) derived from the Riems vaccine strain has been constructed and subsequently modified in the E2 coding sequence, using the targeted recombination strategy to enable...

  7. Full-length genome sequences of five hepatitis C virus isolates representing subtypes 3g, 3h, 3i and 3k, and a unique genotype 3 variant.

    Science.gov (United States)

    Lu, Ling; Li, Chunhua; Yuan, Jie; Lu, Teng; Okamoto, Hiroaki; Murphy, Donald G

    2013-03-01

    We characterized the full-length genomes of five distinct hepatitis C virus (HCV)-3 isolates. These represent the first complete genomes for subtypes 3g and 3h, the second such genomes for 3k and 3i, and of one novel variant presently not assigned to a subtype. Each genome was determined from 18-25 overlapping fragments. They had lengths of 9579-9660 nt and each contained a single ORF encoding 3020-3025 aa. They were isolated from five patients residing in Canada; four were of Asian origin and one was of Somali origin. Phylogenetic analysis using 64 partial NS5B sequences differentiated 10 assigned subtypes, 3a-3i and 3k, and two additional lineages within genotype 3. From the data of this study, HCV-3 full-length sequences are now available for six of the assigned subtypes and one unassigned. Our findings should add insights to HCV evolutionary studies and clinical applications.

  8. Genome-wide identification of coding and non-coding conserved sequence tags in human and mouse genomes

    Directory of Open Access Journals (Sweden)

    Maggi Giorgio P

    2008-06-01

    Full Text Available Abstract Background The accurate detection of genes and the identification of functional regions is still an open issue in the annotation of genomic sequences. This problem affects new genomes but also those of very well studied organisms such as human and mouse where, despite the great efforts, the inventory of genes and regulatory regions is far from complete. Comparative genomics is an effective approach to address this problem. Unfortunately it is limited by the computational requirements needed to perform genome-wide comparisons and by the problem of discriminating between conserved coding and non-coding sequences. This discrimination is often based (thus dependent on the availability of annotated proteins. Results In this paper we present the results of a comprehensive comparison of human and mouse genomes performed with a new high throughput grid-based system which allows the rapid detection of conserved sequences and accurate assessment of their coding potential. By detecting clusters of coding conserved sequences the system is also suitable to accurately identify potential gene loci. Following this analysis we created a collection of human-mouse conserved sequence tags and carefully compared our results to reliable annotations in order to benchmark the reliability of our classifications. Strikingly we were able to detect several potential gene loci supported by EST sequences but not corresponding to as yet annotated genes. Conclusion Here we present a new system which allows comprehensive comparison of genomes to detect conserved coding and non-coding sequences and the identification of potential gene loci. Our system does not require the availability of any annotated sequence thus is suitable for the analysis of new or poorly annotated genomes.

  9. Context quantization by minimum adaptive code length

    DEFF Research Database (Denmark)

    Forchhammer, Søren; Wu, Xiaolin

    2007-01-01

    Context quantization is a technique to deal with the issue of context dilution in high-order conditional entropy coding. We investigate the problem of context quantizer design under the criterion of minimum adaptive code length. A property of such context quantizers is derived for binary symbols....

  10. Employment of Near Full-Length Ribosome Gene TA-Cloning and Primer-Blast to Detect Multiple Species in a Natural Complex Microbial Community Using Species-Specific Primers Designed with Their Genome Sequences.

    Science.gov (United States)

    Zhang, Huimin; He, Hongkui; Yu, Xiujuan; Xu, Zhaohui; Zhang, Zhizhou

    2016-11-01

    It remains an unsolved problem to quantify a natural microbial community by rapidly and conveniently measuring multiple species with functional significance. Most widely used high throughput next-generation sequencing methods can only generate information mainly for genus-level taxonomic identification and quantification, and detection of multiple species in a complex microbial community is still heavily dependent on approaches based on near full-length ribosome RNA gene or genome sequence information. In this study, we used near full-length rRNA gene library sequencing plus Primer-Blast to design species-specific primers based on whole microbial genome sequences. The primers were intended to be specific at the species level within relevant microbial communities, i.e., a defined genomics background. The primers were tested with samples collected from the Daqu (also called fermentation starters) and pit mud of a traditional Chinese liquor production plant. Sixteen pairs of primers were found to be suitable for identification of individual species. Among them, seven pairs were chosen to measure the abundance of microbial species through quantitative PCR. The combination of near full-length ribosome RNA gene library sequencing and Primer-Blast may represent a broadly useful protocol to quantify multiple species in complex microbial population samples with species-specific primers.

  11. Highly conserved non-coding sequences are associated with vertebrate development.

    Directory of Open Access Journals (Sweden)

    Adam Woolfe

    2005-01-01

    Full Text Available In addition to protein coding sequence, the human genome contains a significant amount of regulatory DNA, the identification of which is proving somewhat recalcitrant to both in silico and functional methods. An approach that has been used with some success is comparative sequence analysis, whereby equivalent genomic regions from different organisms are compared in order to identify both similarities and differences. In general, similarities in sequence between highly divergent organisms imply functional constraint. We have used a whole-genome comparison between humans and the pufferfish, Fugu rubripes, to identify nearly 1,400 highly conserved non-coding sequences. Given the evolutionary divergence between these species, it is likely that these sequences are found in, and furthermore are essential to, all vertebrates. Most, and possibly all, of these sequences are located in and around genes that act as developmental regulators. Some of these sequences are over 90% identical across more than 500 bases, being more highly conserved than coding sequence between these two species. Despite this, we cannot find any similar sequences in invertebrate genomes. In order to begin to functionally test this set of sequences, we have used a rapid in vivo assay system using zebrafish embryos that allows tissue-specific enhancer activity to be identified. Functional data is presented for highly conserved non-coding sequences associated with four unrelated developmental regulators (SOX21, PAX6, HLXB9, and SHH, in order to demonstrate the suitability of this screen to a wide range of genes and expression patterns. Of 25 sequence elements tested around these four genes, 23 show significant enhancer activity in one or more tissues. We have identified a set of non-coding sequences that are highly conserved throughout vertebrates. They are found in clusters across the human genome, principally around genes that are implicated in the regulation of development

  12. Study of canine parvovirus evolution: comparative analysis of full-length VP2 gene sequences from Argentina and international field strains.

    Science.gov (United States)

    Gallo Calderón, Marina; Wilda, Maximiliano; Boado, Lorena; Keller, Leticia; Malirat, Viviana; Iglesias, Marcela; Mattion, Nora; La Torre, Jose

    2012-02-01

    The continuous emergence of new strains of canine parvovirus (CPV), poorly protected by current vaccination, is a concern among breeders, veterinarians, and dog owners around the world. Therefore, the understanding of the genetic variation in emerging CPV strains is crucial for the design of disease control strategies, including vaccines. In this paper, we obtained the sequences of the full-length gene encoding for the main capsid protein (VP2) of 11 canine parvovirus type 2 (CPV-2) Argentine representative field strains, selected from a total of 75 positive samples studied in our laboratory in the last 9 years. A comparative sequence analysis was performed on 9 CPV-2c, one CPV-2a, and one CPV-2b Argentine strains with respect to international strains reported in the GenBank database. In agreement with previous reports, a high degree of identity was found among CPV-2c Argentine strains (99.6-100% and 99.7-100% at nucleotide and amino acid levels, respectively). However, the appearance of a new substitution in the 440 position (T440A) in four CPV-2c Argentine strains obtained after the year 2009 gives support to the variability observed for this position located within the VP2, three-fold spike. This is the first report on the genetic characterization of the full-length VP2 gene of emerging CPV strains in South America and shows that all the Argentine CPV-2c isolates cluster together with European and North American CPV-2c strains.

  13. Isolation of full-length putative rat lysophospholipase cDNA using improved methods for mRNA isolation and cDNA cloning

    International Nuclear Information System (INIS)

    Han, J.H.; Stratowa, C.; Rutter, W.J.

    1987-01-01

    The authors have cloned a full-length putative rat pancreatic lysophospholipase cDNA by an improved mRNA isolation method and cDNA cloning strategy using [ 32 P]-labelled nucleotides. These new methods allow the construction of a cDNA library from the adult rat pancreas in which the majority of recombinant clones contained complete sequences for the corresponding mRNAs. A previously recognized but unidentified long and relatively rare cDNA clone containing the entire sequence from the cap site at the 5' end to the poly(A) tail at the 3' end of the mRNA was isolated by single-step screening of the library. The size, amino acid composition, and the activity of the protein expressed in heterologous cells strongly suggest this mRNA codes for lysophospholipase

  14. Full-Length Venom Protein cDNA Sequences from Venom-Derived mRNA: Exploring Compositional Variation and Adaptive Multigene Evolution.

    Science.gov (United States)

    Modahl, Cassandra M; Mackessy, Stephen P

    2016-06-01

    Envenomation of humans by snakes is a complex and continuously evolving medical emergency, and treatment is made that much more difficult by the diverse biochemical composition of many venoms. Venomous snakes and their venoms also provide models for the study of molecular evolutionary processes leading to adaptation and genotype-phenotype relationships. To compare venom complexity and protein sequences, venom gland transcriptomes are assembled, which usually requires the sacrifice of snakes for tissue. However, toxin transcripts are also present in venoms, offering the possibility of obtaining cDNA sequences directly from venom. This study provides evidence that unknown full-length venom protein transcripts can be obtained from the venoms of multiple species from all major venomous snake families. These unknown venom protein cDNAs are obtained by the use of primers designed from conserved signal peptide sequences within each venom protein superfamily. This technique was used to assemble a partial venom gland transcriptome for the Middle American Rattlesnake (Crotalus simus tzabcan) by amplifying sequences for phospholipases A2, serine proteases, C-lectins, and metalloproteinases from within venom. Phospholipase A2 sequences were also recovered from the venoms of several rattlesnakes and an elapid snake (Pseudechis porphyriacus), and three-finger toxin sequences were recovered from multiple rear-fanged snake species, demonstrating that the three major clades of advanced snakes (Elapidae, Viperidae, Colubridae) have stable mRNA present in their venoms. These cDNA sequences from venom were then used to explore potential activities derived from protein sequence similarities and evolutionary histories within these large multigene superfamilies. Venom-derived sequences can also be used to aid in characterizing venoms that lack proteomic profiles and identify sequence characteristics indicating specific envenomation profiles. This approach, requiring only venom, provides

  15. An efficient chaotic source coding scheme with variable-length blocks

    International Nuclear Information System (INIS)

    Lin Qiu-Zhen; Wong Kwok-Wo; Chen Jian-Yong

    2011-01-01

    An efficient chaotic source coding scheme operating on variable-length blocks is proposed. With the source message represented by a trajectory in the state space of a chaotic system, data compression is achieved when the dynamical system is adapted to the probability distribution of the source symbols. For infinite-precision computation, the theoretical compression performance of this chaotic coding approach attains that of optimal entropy coding. In finite-precision implementation, it can be realized by encoding variable-length blocks using a piecewise linear chaotic map within the precision of register length. In the decoding process, the bit shift in the register can track the synchronization of the initial value and the corresponding block. Therefore, all the variable-length blocks are decoded correctly. Simulation results show that the proposed scheme performs well with high efficiency and minor compression loss when compared with traditional entropy coding. (general)

  16. LZW-Kernel: fast kernel utilizing variable length code blocks from LZW compressors for protein sequence classification.

    Science.gov (United States)

    Filatov, Gleb; Bauwens, Bruno; Kertész-Farkas, Attila

    2018-05-07

    Bioinformatics studies often rely on similarity measures between sequence pairs, which often pose a bottleneck in large-scale sequence analysis. Here, we present a new convolutional kernel function for protein sequences called the LZW-Kernel. It is based on code words identified with the Lempel-Ziv-Welch (LZW) universal text compressor. The LZW-Kernel is an alignment-free method, it is always symmetric, is positive, always provides 1.0 for self-similarity and it can directly be used with Support Vector Machines (SVMs) in classification problems, contrary to normalized compression distance (NCD), which often violates the distance metric properties in practice and requires further techniques to be used with SVMs. The LZW-Kernel is a one-pass algorithm, which makes it particularly plausible for big data applications. Our experimental studies on remote protein homology detection and protein classification tasks reveal that the LZW-Kernel closely approaches the performance of the Local Alignment Kernel (LAK) and the SVM-pairwise method combined with Smith-Waterman (SW) scoring at a fraction of the time. Moreover, the LZW-Kernel outperforms the SVM-pairwise method when combined with BLAST scores, which indicates that the LZW code words might be a better basis for similarity measures than local alignment approximations found with BLAST. In addition, the LZW-Kernel outperforms n-gram based mismatch kernels, hidden Markov model based SAM and Fisher kernel, and protein family based PSI-BLAST, among others. Further advantages include the LZW-Kernel's reliance on a simple idea, its ease of implementation, and its high speed, three times faster than BLAST and several magnitudes faster than SW or LAK in our tests. LZW-Kernel is implemented as a standalone C code and is a free open-source program distributed under GPLv3 license and can be downloaded from https://github.com/kfattila/LZW-Kernel. akerteszfarkas@hse.ru. Supplementary data are available at Bioinformatics Online.

  17. Short sequence motifs, overrepresented in mammalian conservednon-coding sequences

    Energy Technology Data Exchange (ETDEWEB)

    Minovitsky, Simon; Stegmaier, Philip; Kel, Alexander; Kondrashov,Alexey S.; Dubchak, Inna

    2007-02-21

    Background: A substantial fraction of non-coding DNAsequences of multicellular eukaryotes is under selective constraint. Inparticular, ~;5 percent of the human genome consists of conservednon-coding sequences (CNSs). CNSs differ from other genomic sequences intheir nucleotide composition and must play important functional roles,which mostly remain obscure.Results: We investigated relative abundancesof short sequence motifs in all human CNSs present in the human/mousewhole-genome alignments vs. three background sets of sequences: (i)weakly conserved or unconserved non-coding sequences (non-CNSs); (ii)near-promoter sequences (located between nucleotides -500 and -1500,relative to a start of transcription); and (iii) random sequences withthe same nucleotide composition as that of CNSs. When compared tonon-CNSs and near-promoter sequences, CNSs possess an excess of AT-richmotifs, often containing runs of identical nucleotides. In contrast, whencompared to random sequences, CNSs contain an excess of GC-rich motifswhich, however, lack CpG dinucleotides. Thus, abundance of short sequencemotifs in human CNSs, taken as a whole, is mostly determined by theiroverall compositional properties and not by overrepresentation of anyspecific short motifs. These properties are: (i) high AT-content of CNSs,(ii) a tendency, probably due to context-dependent mutation, of A's andT's to clump, (iii) presence of short GC-rich regions, and (iv) avoidanceof CpG contexts, due to their hypermutability. Only a small number ofshort motifs, overrepresented in all human CNSs are similar to bindingsites of transcription factors from the FOX family.Conclusion: Human CNSsas a whole appear to be too broad a class of sequences to possess strongfootprints of any short sequence-specific functions. Such footprintsshould be studied at the level of functional subclasses of CNSs, such asthose which flank genes with a particular pattern of expression. Overallproperties of CNSs are affected by

  18. Generation and analysis of a large-scale expressed sequence Tag database from a full-length enriched cDNA library of developing leaves of Gossypium hirsutum L.

    Directory of Open Access Journals (Sweden)

    Min Lin

    Full Text Available BACKGROUND: Cotton (Gossypium hirsutum L. is one of the world's most economically-important crops. However, its entire genome has not been sequenced, and limited resources are available in GenBank for understanding the molecular mechanisms underlying leaf development and senescence. METHODOLOGY/PRINCIPAL FINDINGS: In this study, 9,874 high-quality ESTs were generated from a normalized, full-length cDNA library derived from pooled RNA isolated from throughout leaf development during the plant blooming stage. After clustering and assembly of these ESTs, 5,191 unique sequences, representative 1,652 contigs and 3,539 singletons, were obtained. The average unique sequence length was 682 bp. Annotation of these unique sequences revealed that 84.4% showed significant homology to sequences in the NCBI non-redundant protein database, and 57.3% had significant hits to known proteins in the Swiss-Prot database. Comparative analysis indicated that our library added 2,400 ESTs and 991 unique sequences to those known for cotton. The unigenes were functionally characterized by gene ontology annotation. We identified 1,339 and 200 unigenes as potential leaf senescence-related genes and transcription factors, respectively. Moreover, nine genes related to leaf senescence and eleven MYB transcription factors were randomly selected for quantitative real-time PCR (qRT-PCR, which revealed that these genes were regulated differentially during senescence. The qRT-PCR for three GhYLSs revealed that these genes express express preferentially in senescent leaves. CONCLUSIONS/SIGNIFICANCE: These EST resources will provide valuable sequence information for gene expression profiling analyses and functional genomics studies to elucidate their roles, as well as for studying the mechanisms of leaf development and senescence in cotton and discovering candidate genes related to important agronomic traits of cotton. These data will also facilitate future whole-genome sequence

  19. DNA watermarks in non-coding regulatory sequences

    Directory of Open Access Journals (Sweden)

    Pyka Martin

    2009-07-01

    Full Text Available Abstract Background DNA watermarks can be applied to identify the unauthorized use of genetically modified organisms. It has been shown that coding regions can be used to encrypt information into living organisms by using the DNA-Crypt algorithm. Yet, if the sequence of interest presents a non-coding DNA sequence, either the function of a resulting functional RNA molecule or a regulatory sequence, such as a promoter, could be affected. For our studies we used the small cytoplasmic RNA 1 in yeast and the lac promoter region of Escherichia coli. Findings The lac promoter was deactivated by the integrated watermark. In addition, the RNA molecules displayed altered configurations after introducing a watermark, but surprisingly were functionally intact, which has been verified by analyzing the growth characteristics of both wild type and watermarked scR1 transformed yeast cells. In a third approach we introduced a second overlapping watermark into the lac promoter, which did not affect the promoter activity. Conclusion Even though the watermarked RNA and one of the watermarked promoters did not show any significant differences compared to the wild type RNA and wild type promoter region, respectively, it cannot be generalized that other RNA molecules or regulatory sequences behave accordingly. Therefore, we do not recommend integrating watermark sequences into regulatory regions.

  20. Gray Code for Cayley Permutations

    Directory of Open Access Journals (Sweden)

    J.-L. Baril

    2003-10-01

    Full Text Available A length-n Cayley permutation p of a total ordered set S is a length-n sequence of elements from S, subject to the condition that if an element x appears in p then all elements y < x also appear in p . In this paper, we give a Gray code list for the set of length-n Cayley permutations. Two successive permutations in this list differ at most in two positions.

  1. E2FM: an encrypted and compressed full-text index for collections of genomic sequences.

    Science.gov (United States)

    Montecuollo, Ferdinando; Schmid, Giovannni; Tagliaferri, Roberto

    2017-09-15

    Next Generation Sequencing (NGS) platforms and, more generally, high-throughput technologies are giving rise to an exponential growth in the size of nucleotide sequence databases. Moreover, many emerging applications of nucleotide datasets-as those related to personalized medicine-require the compliance with regulations about the storage and processing of sensitive data. We have designed and carefully engineered E 2 FM -index, a new full-text index in minute space which was optimized for compressing and encrypting nucleotide sequence collections in FASTA format and for performing fast pattern-search queries. E 2 FM -index allows to build self-indexes which occupy till to 1/20 of the storage required by the input FASTA file, thus permitting to save about 95% of storage when indexing collections of highly similar sequences; moreover, it can exactly search the built indexes for patterns in times ranging from few milliseconds to a few hundreds milliseconds, depending on pattern length. Source code is available at https://github.com/montecuollo/E2FM . ferdinando.montecuollo@unicampania.it. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  2. Novel full-length major histocompatibility complex class I allele discovery and haplotype definition in pig-tailed macaques.

    Science.gov (United States)

    Semler, Matthew R; Wiseman, Roger W; Karl, Julie A; Graham, Michael E; Gieger, Samantha M; O'Connor, David H

    2017-11-13

    Pig-tailed macaques (Macaca nemestrina, Mane) are important models for human immunodeficiency virus (HIV) studies. Their infectability with minimally modified HIV makes them a uniquely valuable animal model to mimic human infection with HIV and progression to acquired immunodeficiency syndrome (AIDS). However, variation in the pig-tailed macaque major histocompatibility complex (MHC) and the impact of individual transcripts on the pathogenesis of HIV and other infectious diseases is understudied compared to that of rhesus and cynomolgus macaques. In this study, we used Pacific Biosciences single-molecule real-time circular consensus sequencing to describe full-length MHC class I (MHC-I) transcripts for 194 pig-tailed macaques from three breeding centers. We then used the full-length sequences to infer Mane-A and Mane-B haplotypes containing groups of MHC-I transcripts that co-segregate due to physical linkage. In total, we characterized full-length open reading frames (ORFs) for 313 Mane-A, Mane-B, and Mane-I sequences that defined 86 Mane-A and 106 Mane-B MHC-I haplotypes. Pacific Biosciences technology allows us to resolve these Mane-A and Mane-B haplotypes to the level of synonymous allelic variants. The newly defined haplotypes and transcript sequences containing full-length ORFs provide an important resource for infectious disease researchers as certain MHC haplotypes have been shown to provide exceptional control of simian immunodeficiency virus (SIV) replication and prevention of AIDS-like disease in nonhuman primates. The increased allelic resolution provided by Pacific Biosciences sequencing also benefits transplant research by allowing researchers to more specifically match haplotypes between donors and recipients to the level of nonsynonymous allelic variation, thus reducing the risk of graft-versus-host disease.

  3. Purifying selection acts on coding and non-coding sequences of paralogous genes in Arabidopsis thaliana.

    Science.gov (United States)

    Hoffmann, Robert D; Palmgren, Michael

    2016-06-13

    Whole-genome duplications in the ancestors of many diverse species provided the genetic material for evolutionary novelty. Several models explain the retention of paralogous genes. However, how these models are reflected in the evolution of coding and non-coding sequences of paralogous genes is unknown. Here, we analyzed the coding and non-coding sequences of paralogous genes in Arabidopsis thaliana and compared these sequences with those of orthologous genes in Arabidopsis lyrata. Paralogs with lower expression than their duplicate had more nonsynonymous substitutions, were more likely to fractionate, and exhibited less similar expression patterns with their orthologs in the other species. Also, lower-expressed genes had greater tissue specificity. Orthologous conserved non-coding sequences in the promoters, introns, and 3' untranslated regions were less abundant at lower-expressed genes compared to their higher-expressed paralogs. A gene ontology (GO) term enrichment analysis showed that paralogs with similar expression levels were enriched in GO terms related to ribosomes, whereas paralogs with different expression levels were enriched in terms associated with stress responses. Loss of conserved non-coding sequences in one gene of a paralogous gene pair correlates with reduced expression levels that are more tissue specific. Together with increased mutation rates in the coding sequences, this suggests that similar forces of purifying selection act on coding and non-coding sequences. We propose that coding and non-coding sequences evolve concurrently following gene duplication.

  4. Full-length genomic characterization and molecular evolution of canine parvovirus in China.

    Science.gov (United States)

    Zhou, Ling; Tang, Qinghai; Shi, Lijun; Kong, Miaomiao; Liang, Lin; Mao, Qianqian; Bu, Bin; Yao, Lunguang; Zhao, Kai; Cui, Shangjin; Leal, Élcio

    2016-06-01

    Canine parvovirus type 2 (CPV-2) can cause acute haemorrhagic enteritis in dogs and myocarditis in puppies. This disease has become one of the most serious infectious diseases of dogs. During 2014 in China, there were many cases of acute infectious diarrhoea in dogs. Some faecal samples were negative for the CPV-2 antigen based on a colloidal gold test strip but were positive based on PCR, and a viral strain was isolated from one such sample. The cytopathic effect on susceptible cells and the results of the immunoperoxidase monolayer assay, PCR, and sequencing indicated that the pathogen was CPV-2. The strain was named CPV-NY-14, and the full-length genome was sequenced and analysed. A maximum likelihood tree was constructed using the full-length genome and all available CPV-2 genomes. New strains have replaced the original strain in Taiwan and Italy, although the CPV-2a strain is still predominant there. However, CPV-2a still causes many cases of acute infectious diarrhoea in dogs in China.

  5. Virtually full-length subtype F and F/D recombinant HIV-1 from Africa and South America

    NARCIS (Netherlands)

    Laukkanen, T.; Carr, J. K.; Janssens, W.; Liitsola, K.; Gotte, D.; McCutchan, F. E.; Op de Coul, E.; Cornelissen, M.; Heyndrickx, L.; van der Groen, G.; Salminen, M. O.

    2000-01-01

    For reliable classification of HIV-1 strains appropriate reference sequences are needed. The HIV-1 genetic subtype F has a wide geographic spread, causing significant epidemics in South America, Africa, and some regions of Europe. Previously only two full-length sequences of each of the HIV-1

  6. Some Algebraic Aspects of MorseCode Sequences

    OpenAIRE

    Johann Cigler

    2003-01-01

    Morse code sequences are very useful to give combinatorial interpretations of various properties of Fibonacci numbers. In this note we study some algebraic and combinatorial aspects of Morse code sequences and obtain several q-analogues of Fibonacci numbers and Fibonacci polynomials and their generalizations.

  7. Systematic analysis of coding and noncoding DNA sequences using methods of statistical linguistics

    Science.gov (United States)

    Mantegna, R. N.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Peng, C. K.; Simons, M.; Stanley, H. E.

    1995-01-01

    We compare the statistical properties of coding and noncoding regions in eukaryotic and viral DNA sequences by adapting two tests developed for the analysis of natural languages and symbolic sequences. The data set comprises all 30 sequences of length above 50 000 base pairs in GenBank Release No. 81.0, as well as the recently published sequences of C. elegans chromosome III (2.2 Mbp) and yeast chromosome XI (661 Kbp). We find that for the three chromosomes we studied the statistical properties of noncoding regions appear to be closer to those observed in natural languages than those of coding regions. In particular, (i) a n-tuple Zipf analysis of noncoding regions reveals a regime close to power-law behavior while the coding regions show logarithmic behavior over a wide interval, while (ii) an n-gram entropy measurement shows that the noncoding regions have a lower n-gram entropy (and hence a larger "n-gram redundancy") than the coding regions. In contrast to the three chromosomes, we find that for vertebrates such as primates and rodents and for viral DNA, the difference between the statistical properties of coding and noncoding regions is not pronounced and therefore the results of the analyses of the investigated sequences are less conclusive. After noting the intrinsic limitations of the n-gram redundancy analysis, we also briefly discuss the failure of the zeroth- and first-order Markovian models or simple nucleotide repeats to account fully for these "linguistic" features of DNA. Finally, we emphasize that our results by no means prove the existence of a "language" in noncoding DNA.

  8. TypeLoader: A fast and efficient automated workflow for the annotation and submission of novel full-length HLA alleles.

    Science.gov (United States)

    Surendranath, V; Albrecht, V; Hayhurst, J D; Schöne, B; Robinson, J; Marsh, S G E; Schmidt, A H; Lange, V

    2017-07-01

    Recent years have seen a rapid increase in the discovery of novel allelic variants of the human leukocyte antigen (HLA) genes. Commonly, only the exons encoding the peptide binding domains of novel HLA alleles are submitted. As a result, the IPD-IMGT/HLA Database lacks sequence information outside those regions for the majority of known alleles. This has implications for the application of the new sequencing technologies, which deliver sequence data often covering the complete gene. As these technologies simplify the characterization of the complete gene regions, it is desirable for novel alleles to be submitted as full-length sequences to the database. However, the manual annotation of full-length alleles and the generation of specific formats required by the sequence repositories is prone to error and time consuming. We have developed TypeLoader to address both these facets. With only the full-length sequence as a starting point, Typeloader performs automatic sequence annotation and subsequently handles all steps involved in preparing the specific formats for submission with very little manual intervention. TypeLoader is routinely used at the DKMS Life Science Lab and has aided in the successful submission of more than 900 novel HLA alleles as full-length sequences to the European Nucleotide Archive repository and the IPD-IMGT/HLA Database with a 95% reduction in the time spent on annotation and submission when compared with handling these processes manually. TypeLoader is implemented as a web application and can be easily installed and used on a standalone Linux desktop system or within a Linux client/server architecture. TypeLoader is downloadable from http://www.github.com/DKMS-LSL/typeloader. © 2017 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  9. The algorithm of random length sequences synthesis for frame synchronization of digital television systems

    Directory of Open Access Journals (Sweden)

    Аndriy V. Sadchenko

    2015-12-01

    Full Text Available Digital television systems need to ensure that all digital signals processing operations are performed simultaneously and consistently. Frame synchronization dictated by the need to match phases of transmitter and receiver so that it would be possible to identify the start of a frame. As a frame synchronization signals are often used long length binary sequence with good aperiodic autocorrelation function. Aim: This work is dedicated to the development of the algorithm of random length sequences synthesis. Materials and Methods: The paper provides a comparative analysis of the known sequences, which can be used at present as synchronization ones, revealed their advantages and disadvantages. This work proposes the algorithm for the synthesis of binary synchronization sequences of random length with good autocorrelation properties based on noise generator with a uniform distribution law of probabilities. A "white noise" semiconductor generator is proposed to use as the initial material for the synthesis of binary sequences with desired properties. Results: The statistical analysis of the initial implementations of the "white noise" and synthesized sequences for frame synchronization of digital television is conducted. The comparative analysis of the synthesized sequences with known ones was carried out. The results show the benefits of obtained sequences in compare with known ones. The performed simulations confirm the obtained results. Conclusions: Thus, the search algorithm of binary synchronization sequences with desired autocorrelation properties received. According to this algorithm, the sequence can be longer in length and without length limitations. The received sync sequence can be used for frame synchronization in modern digital communication systems that will increase their efficiency and noise immunity.

  10. Characterization of a Full-Length Endogenous Beta-Retrovirus, EqERV-Beta1, in the Genome of the Horse (Equus caballus

    Directory of Open Access Journals (Sweden)

    Antoinette C. van der Kuyl

    2011-06-01

    Full Text Available Information on endogenous retroviruses fixed in the horse (Equus caballus genome is scarce. The recent availability of a draft sequence of the horse genome enables the detection of such integrated viruses by similarity search. Using translated nucleotide fragments from gamma-, beta-, and delta-retroviral genera for initial searches, a full-length beta-retrovirus genome was retrieved from a horse chromosome 5 contig. The provirus, tentatively named EqERV-beta1 (for the first equine endogenous beta-retrovirus, was 10434 nucleotide (nt in length with the usual retroviral genome structure of 5’LTR-gag-pro-pol-env-3’LTR. The LTRs were 1361 nt long, and differed approximately 1% from each other, suggestive of a relatively recent integration. Coding sequences for gag, pro and pol were present in three different reading-frames, as common for beta-retroviruses, and the reading frames were completely open, except that the env gene was interrupted by a single stopcodon. No reading frame was apparent downstream of the env gene, suggesting that EqERV-beta1 does not encode a superantigen like mouse mammary tumor virus (MMTV. A second proviral genome of EqERV-beta1, with no stopcodon in env, is additionally integrated on chromosome 5 downstream of the first virus. Single EqERV-beta1 LTRs were abundantly present on all chromosomes except chromosome 24. Phylogenetically, EqERV-beta1 most closely resembles an unclassified retroviral sequence from cattle (Bos taurus, and the murine beta-retrovirus MMTV.

  11. Near Full-Length Identification of a Novel HIV-1 CRF01_AE/B/C Recombinant in Northern Myanmar.

    Science.gov (United States)

    Zhou, Yan-Heng; Chen, Xin; Liang, Yue-Bo; Pang, Wei; Qin, Wei-Hong; Zhang, Chiyu; Zheng, Yong-Tang

    2015-08-01

    The Myanmar-China border appears to be the "hot spot" region for the occurrence of HIV-1 recombination. The majority of the previous analyses of HIV-1 recombination were based on partial genomic sequences, which obviously cannot reflect the reality of the genetic diversity of HIV-1 in this area well. Here, we present a near full-length characterization of a novel HIV-1 CRF01_AE/B/C recombinant isolated from a long-distance truck driver in Northern Myanmar. It is the first description of a near full-length genomic sequence in Myanmar since 2003, and might be one of the most complicated HIV-1 chimeras ever detected in Myanmar, containing four CRF01_AE, six B segments, and five C segments separated by 14 breakpoints throughout its genome. The discovery and characterization of this new CRF01_AE/B/C recombinant indicate that intersubtype recombination is ongoing in Myanmar, continuously generating new forms of HIV-1. More work based on near full-length sequence analyses is urgently needed to better understand the genetic diversity of HIV-1 in these regions.

  12. 3G vector-primer plasmid for constructing full-length-enriched cDNA libraries.

    Science.gov (United States)

    Zheng, Dong; Zhou, Yanna; Zhang, Zidong; Li, Zaiyu; Liu, Xuedong

    2008-09-01

    We designed a 3G vector-primer plasmid for the generation of full-length-enriched complementary DNA (cDNA) libraries. By employing the terminal transferase activity of reverse transcriptase and the modified strand replacement method, this plasmid (assembled with a polydT end and a deoxyguanosine [dG] end) combines priming full-length cDNA strand synthesis and directional cDNA cloning. As a result, the number of steps involved in cDNA library preparation is decreased while simplifying downstream gene manipulation, sequencing, and subcloning. The 3G vector-primer plasmid method yields fully represented plasmid primed libraries that are equivalent to those made by the SMART (switching mechanism at 5' end of RNA transcript) approach.

  13. New extremal binary self-dual codes of lengths 64 and 66 from bicubic planar graphs

    OpenAIRE

    Kaya, Abidin

    2016-01-01

    In this work, connected cubic planar bipartite graphs and related binary self-dual codes are studied. Binary self-dual codes of length 16 are obtained by face-vertex incidence matrices of these graphs. By considering their lifts to the ring R_2 new extremal binary self-dual codes of lengths 64 are constructed as Gray images. More precisely, we construct 15 new codes of length 64. Moreover, 10 new codes of length 66 were obtained by applying a building-up construction to the binary codes. Code...

  14. Broadcasting a Common Message with Variable-Length Stop-Feedback codes

    DEFF Research Database (Denmark)

    Trillingsgaard, Kasper Fløe; Yang, Wei; Durisi, Giuseppe

    2015-01-01

    We investigate the maximum coding rate achievable over a two-user broadcast channel for the scenario where a common message is transmitted using variable-length stop-feedback codes. Specifically, upon decoding the common message, each decoder sends a stop signal to the encoder, which transmits...... itself in the absence of a square-root penalty in the asymptotic expansion of the maximum coding rate for large blocklengths, a result also known as zero dispersion. In this paper, we show that this speed-up does not necessarily occur for the broadcast channel with common message. Specifically...... continuously until it receives both stop signals. For the point-to-point case, Polyanskiy, Poor, and Verdú (2011) recently demonstrated that variable-length coding combined with stop feedback significantly increases the speed at which the maximum coding rate converges to capacity. This speed-up manifests...

  15. Construction and performance research on variable-length codes for multirate OCDMA multimedia networks

    Science.gov (United States)

    Li, Chuan-qi; Yang, Meng-jie; Luo, De-jun; Lu, Ye; Kong, Yi-pu; Zhang, Dong-chuang

    2014-09-01

    A new kind of variable-length codes with good correlation properties for the multirate asynchronous optical code division multiple access (OCDMA) multimedia networks is proposed, called non-repetition interval (NRI) codes. The NRI codes can be constructed by structuring the interval-sets with no repetition, and the code length depends on the number of users and the code weight. According to the structural characteristics of NRI codes, the formula of bit error rate (BER) is derived. Compared with other variable-length codes, the NRI codes have lower BER. A multirate OCDMA multimedia simulation system is designed and built, the longer codes are assigned to the users who need slow speed, while the shorter codes are assigned to the users who need high speed. It can be obtained by analyzing the eye diagram that the user with slower speed has lower BER, and the conclusion is the same as the actual demand in multimedia data transport.

  16. Two-dimensional full-wave code for reflectometry simulations in TJ-II

    International Nuclear Information System (INIS)

    Blanco, E.; Heuraux, S.; Estrada, T.; Sanchez, J.; Cupido, L.

    2004-01-01

    A two-dimensional full-wave code in the extraordinary mode has been developed to simulate reflectometry in TJ-II. The code allows us to study the measurement capabilities of the future correlation reflectometer that is being installed in TJ-II. The code uses the finite-difference-time-domain technique to solve Maxwell's equations in the presence of density fluctuations. Boundary conditions are implemented by a perfectly matched layer to simulate free propagation. To assure the stability of the code, the current equations are solved by a fourth-order Runge-Kutta method. Density fluctuation parameters such as fluctuation level, wave numbers, and correlation lengths are extrapolated from those measured at the plasma edge using Langmuir probes. In addition, realistic plasma shape, density profile, magnetic configuration, and experimental setup of TJ-II are included to determine the plasma regimes in which accurate information may be obtained

  17. Construction and Analysis of a Novel 2-D Optical Orthogonal Codes Based on Modified One-coincidence Sequence

    Science.gov (United States)

    Ji, Jianhua; Wang, Yanfen; Wang, Ke; Xu, Ming; Zhang, Zhipeng; Yang, Shuwen

    2013-09-01

    A new two-dimensional OOC (optical orthogonal codes) named PC/MOCS is constructed, using PC (prime code) for time spreading and MOCS (modified one-coincidence sequence) for wavelength hopping. Compared with PC/PC, the number of wavelengths for PC/MOCS is not limited to a prime number. Compared with PC/OCS, the length of MOCS need not be expanded to the same length of PC. PC/MOCS can be constructed flexibly, and also can use available wavelengths effectively. Theoretical analysis shows that PC/MOCS can reduce the bit error rate (BER) of OCDMA system, and can support more users than PC/PC and PC/OCS.

  18. Genetic Code Analysis Toolkit: A novel tool to explore the coding properties of the genetic code and DNA sequences

    Science.gov (United States)

    Kraljić, K.; Strüngmann, L.; Fimmel, E.; Gumbel, M.

    2018-01-01

    The genetic code is degenerated and it is assumed that redundancy provides error detection and correction mechanisms in the translation process. However, the biological meaning of the code's structure is still under current research. This paper presents a Genetic Code Analysis Toolkit (GCAT) which provides workflows and algorithms for the analysis of the structure of nucleotide sequences. In particular, sets or sequences of codons can be transformed and tested for circularity, comma-freeness, dichotomic partitions and others. GCAT comes with a fertile editor custom-built to work with the genetic code and a batch mode for multi-sequence processing. With the ability to read FASTA files or load sequences from GenBank, the tool can be used for the mathematical and statistical analysis of existing sequence data. GCAT is Java-based and provides a plug-in concept for extensibility. Availability: Open source Homepage:http://www.gcat.bio/

  19. The Number, Organization, and Size of Polymorphic Membrane Protein Coding Sequences as well as the Most Conserved Pmp Protein Differ within and across Chlamydia Species.

    Science.gov (United States)

    Van Lent, Sarah; Creasy, Heather Huot; Myers, Garry S A; Vanrompay, Daisy

    2016-01-01

    Variation is a central trait of the polymorphic membrane protein (Pmp) family. The number of pmp coding sequences differs between Chlamydia species, but it is unknown whether the number of pmp coding sequences is constant within a Chlamydia species. The level of conservation of the Pmp proteins has previously only been determined for Chlamydia trachomatis. As different Pmp proteins might be indispensible for the pathogenesis of different Chlamydia species, this study investigated the conservation of Pmp proteins both within and across C. trachomatis,C. pneumoniae,C. abortus, and C. psittaci. The pmp coding sequences were annotated in 16 C. trachomatis, 6 C. pneumoniae, 2 C. abortus, and 16 C. psittaci genomes. The number and organization of polymorphic membrane coding sequences differed within and across the analyzed Chlamydia species. The length of coding sequences of pmpA,pmpB, and pmpH was conserved among all analyzed genomes, while the length of pmpE/F and pmpG, and remarkably also of the subtype pmpD, differed among the analyzed genomes. PmpD, PmpA, PmpH, and PmpA were the most conserved Pmp in C. trachomatis,C. pneumoniae,C. abortus, and C. psittaci, respectively. PmpB was the most conserved Pmp across the 4 analyzed Chlamydia species. © 2016 S. Karger AG, Basel.

  20. Construction of Short-length High-rates Ldpc Codes Using Difference Families

    OpenAIRE

    Deny Hamdani; Ery Safrianti

    2007-01-01

    Low-density parity-check (LDPC) code is linear-block error-correcting code defined by sparse parity-check matrix. It isdecoded using the massage-passing algorithm, and in many cases, capable of outperforming turbo code. This paperpresents a class of low-density parity-check (LDPC) codes showing good performance with low encoding complexity.The code is constructed using difference families from combinatorial design. The resulting code, which is designed tohave short code length and high code r...

  1. Analysis of the Length of Braille Texts in English Braille American Edition, the Nemeth Code, and Computer Braille Code versus the Unified English Braille Code

    Science.gov (United States)

    Knowlton, Marie; Wetzel, Robin

    2006-01-01

    This study compared the length of text in English Braille American Edition, the Nemeth code, and the computer braille code with the Unified English Braille Code (UEBC)--also known as Unified English Braille (UEB). The findings indicate that differences in the length of text are dependent on the type of material that is transcribed and the grade…

  2. Full-length genomic sequence of hepatitis B virus genotype C2 isolated from a native Brazilian patient

    Directory of Open Access Journals (Sweden)

    Mónica Viviana Alvarado-Mora

    2011-06-01

    Full Text Available The hepatitis B virus (HBV is among the leading causes of chronic hepatitis, cirrhosis and hepatocellular carcinoma. In Brazil, genotype A is the most frequent, followed by genotypes D and F. Genotypes B and C are found in Brazil exclusively among Asian patients and their descendants. The aim of this study was to sequence the entire HBV genome of a Caucasian patient infected with HBV/C2 and to infer the origin of the virus based on sequencing analysis. The sequence of this Brazilian isolate was grouped with four other sequences described in China. The sequence of this patient is the first complete genome of HBV/C2 reported in Brazil.

  3. Functional characterization of a full length pregnane X receptor, expression in vivo, and identification of PXR alleles, in Zebrafish (Danio rerio)

    Energy Technology Data Exchange (ETDEWEB)

    Bainy, Afonso C.D. [Biology Department, Woods Hole Oceanographic Institution, Woods Hole, MA 02543 (United States); Departamento de Bioquímica, CCB, Universidade Federal de Santa Catarina, Florianópolis, SC 88040-900 (Brazil); Kubota, Akira; Goldstone, Jared V. [Biology Department, Woods Hole Oceanographic Institution, Woods Hole, MA 02543 (United States); Lille-Langøy, Roger [Department of Biology, University of Bergen, N-5020 Bergen (Norway); Karchner, Sibel I. [Biology Department, Woods Hole Oceanographic Institution, Woods Hole, MA 02543 (United States); Celander, Malin C. [Department of Biological and Environmental Sciences, University of Gothenburg, SE 405 30 Göteborg (Sweden); Hahn, Mark E. [Biology Department, Woods Hole Oceanographic Institution, Woods Hole, MA 02543 (United States); Goksøyr, Anders [Department of Biology, University of Bergen, N-5020 Bergen (Norway); Stegeman, John J., E-mail: jstegeman@whoi.edu [Biology Department, Woods Hole Oceanographic Institution, Woods Hole, MA 02543 (United States)

    2013-10-15

    Highlights: •Full-length pxr has been cloned from zebrafish. •Alleles of pxr were identified in zebrafish. •Full length Pxr was activated less strongly than ligand binding domain in cell-based reporter assays. •High levels of pxr expression were found in eye and brain as well as in liver. •TCPOBOP and PB did not significantly alter expression of pxr in liver. -- Abstract: The pregnane X receptor (PXR) (nuclear receptor NR1I2) is a ligand activated transcription factor, mediating responses to diverse xenobiotic and endogenous chemicals. The properties of PXR in fish are not fully understood. Here we report on cloning and characterization of full-length PXR of zebrafish, Danio rerio, and pxr expression in vivo. Initial efforts gave a cDNA encoding a 430 amino acid protein identified as zebrafish pxr by phylogenetic and synteny analysis. The sequence of the cloned Pxr DNA binding domain (DBD) was highly conserved, with 74% identity to human PXR-DBD, while the ligand-binding domain (LBD) of the cloned sequence was only 44% identical to human PXR-LBD. Sequence variation among clones in the initial effort prompted sequencing of multiple clones from a single fish. There were two prominent variants, one sequence with S183, Y218 and H383 and the other with I183, C218 and N383, which we designate as alleles pxr*1 (nr1i2*1) and pxr*2 (nr1i2*2), respectively. In COS-7 cells co-transfected with a PXR-responsive reporter gene, the full-length Pxr*1 (the more common variant) was activated by known PXR agonists clotrimazole and pregnenolone 16α-carbonitrile but to a lesser extent than the full-length human PXR. Activation of full-length Pxr*1 was only 10% of that with the Pxr*1 LBD. Quantitative real time PCR analysis showed prominent expression of pxr in liver and eye, as well as brain and intestine of adult zebrafish. The pxr was expressed in heart and kidney at levels similar to that in intestine. The expression of pxr in liver was weakly induced by ligands for

  4. On Sequence Lengths of Some Special External Exclusive OR Type LFSR Structures – Study and Analysis

    Directory of Open Access Journals (Sweden)

    A Ahmad

    2014-12-01

    Full Text Available The study of the length of pseudo-random binary sequences generated by Linear- Feedback Shift Registers (LFSRs plays an important role in the design approaches of built-in selftest, cryptosystems, and other applications. However, certain LFSR structures might not be appropriate in some situations. Given that determining the length of generated pseudo-random binary sequence is a complex task, therefore, before using an LFSR structure, it is essential to investigate the length and the properties of the sequence. This paper investigates some conditions and LFSR’s structures, which restrict the pseudo-random binary sequences’ generation to a certain fixed length. The outcomes of this paper are presented in the form of theorems, simulations, and analyses. We believe that these outcomes are of great importance to the designers of built-in self-test equipment, cryptosystems, and other applications such as radar, CDMA, error correction, and Monte Carlo simulation.

  5. Identification, Characterization and Full-Length Sequence Analysis of a Novel Polerovirus Associated with Wheat Leaf Yellowing Disease

    Directory of Open Access Journals (Sweden)

    Peipei Zhang

    2017-09-01

    Full Text Available To identify the pathogens responsible for leaf yellowing symptoms on wheat samples collected from Jinan, China, we tested for the presence of three known barley/wheat yellow dwarf viruses (BYDV-GAV, -PAV, WYDV-GPV (most likely pathogens using RT-PCR. A sample that tested negative for the three viruses was selected for small RNA sequencing. Twenty-five million sequences were generated, among which 5% were of viral origin. A novel polerovirus was discovered and temporarily named wheat leaf yellowing-associated virus (WLYaV. The full genome of WLYaV corresponds to 5,772 nucleotides (nt, with six AUG-initiated open reading frames, one non-AUG-initiated open reading frame, and three untranslated regions, showing typical features of the family Luteoviridae. Sequence comparison and phylogenetic analyses suggested that WLYaV had the closest relationship with sugarcane yellow leaf virus (ScYLV, but the identities of full genomic nucleotides and deduced amino acid sequence of coat protein (CP were 64.9 and 86.2%, respectively, below the species demarcation thresholds (90% in the family Luteoviridae. Furthermore, agroinoculation of Nicotiana benthamiana leaves with a cDNA clone of WLYaV caused yellowing symptoms on the plant. Our study adds a new polerovirus that is associated with wheat leaf yellowing disease, which would help to identify and control pathogens of wheat.

  6. An Auto sequence Code to Integrate a Neutron Unfolding Code with thePC-MCA Accuspec

    International Nuclear Information System (INIS)

    Darsono

    2000-01-01

    In a neutron spectrometry using proton recoil method, the neutronunfolding code is needed to unfold the measured proton spectrum to become theneutron spectrum. The process of the unfolding neutron in the existingneutron spectrometry which was successfully installed last year was doneseparately. This manuscript reports that the auto sequence code to integratethe neutron unfolding code UNFSPEC.EXE with the software facility of thePC-MCA Accuspec has been made and run successfully so that the new neutronspectrometry become compact. The auto sequence code was written based on therules in application program facility of PC-MCA Accuspec and then it wascompiled using AC-EXE. Result of the test of the auto sequence code showedthat for binning width 20, 30, and 40 giving a little different spectrumshape. The binning width around 30 gives a better spectrum in mean of givingsmall error compared to the others. (author)

  7. Analysis of a cDNA clone expressing a human autoimmune antigen: full-length sequence of the U2 small nuclear RNA-associated B antigen

    International Nuclear Information System (INIS)

    Habets, W.J.; Sillekens, P.T.G.; Hoet, M.H.; Schalken, J.A.; Roebroek, A.J.M.; Leunissen, J.A.M.; Van de Ven, W.J.M.; Van Venrooij, W.J.

    1987-01-01

    A U2 small nuclear RNA-associated protein, designated B'', was recently identified as the target antigen for autoimmune sera from certain patients with systemic lupus erythematosus and other rheumatic diseases. Such antibodies enabled them to isolate cDNA clone λHB''-1 from a phage λgt11 expression library. This clone appeared to code for the B'' protein as established by in vitro translation of hybrid-selected mRNA. The identity of clone λHB''-1 was further confirmed by partial peptide mapping and analysis of the reactivity of the recombinant antigen with monospecific and monoclonal antibodies. Analysis of the nucleotide sequence of the 1015-base-pair cDNA insert of clone λHB''-1 revealed a large open reading frame of 800 nucleotides containing the coding sequence for a polypeptide of 25,457 daltons. In vitro transcription of the λHB''-1 cDNA insert and subsequent translation resulted in a protein product with the molecular size of the B'' protein. These data demonstrate that clone λHB''-1 contains the complete coding sequence of this antigen. The deduced polypeptide sequence contains three very hydrophilic regions that might constitute RNA binding sites and/or antigenic determinants. These findings might have implications both for the understanding of the pathogenesis of rheumatic diseases as well as for the elucidation of the biological function of autoimmune antigens

  8. The Complete Sequence of a Human Parainfluenzavirus 4 Genome

    Directory of Open Access Journals (Sweden)

    Carmen Yea

    2009-06-01

    Full Text Available Although the human parainfluenza virus 4 (HPIV4 has been known for a long time, its genome, alone among the human paramyxoviruses, has not been completely sequenced to date. In this study we obtained the first complete genomic sequence of HPIV4 from a clinical isolate named SKPIV4 obtained at the Hospital for Sick Children in Toronto (Ontario, Canada. The coding regions for the N, P/V, M, F and HN proteins show very high identities (95% to 97% with previously available partial sequences for HPIV4B. The sequence for the L protein and the non-coding regions represent new information. A surprising feature of the genome is its length, more than 17 kb, making it the longest genome within the genus Rubulavirus, although the length is well within the known range of 15 kb to 19 kb for the subfamily Paramyxovirinae. The availability of a complete genomic sequence will facilitate investigations on a respiratory virus that is still not completely characterized.

  9. Variable-length code construction for incoherent optical CDMA systems

    Science.gov (United States)

    Lin, Jen-Yung; Jhou, Jhih-Syue; Wen, Jyh-Horng

    2007-04-01

    The purpose of this study is to investigate the multirate transmission in fiber-optic code-division multiple-access (CDMA) networks. In this article, we propose a variable-length code construction for any existing optical orthogonal code to implement a multirate optical CDMA system (called as the multirate code system). For comparison, a multirate system where the lower-rate user sends each symbol twice is implemented and is called as the repeat code system. The repetition as an error-detection code in an ARQ scheme in the repeat code system is also investigated. Moreover, a parallel approach for the optical CDMA systems, which is proposed by Marić et al., is also compared with other systems proposed in this study. Theoretical analysis shows that the bit error probability of the proposed multirate code system is smaller than other systems, especially when the number of lower-rate users is large. Moreover, if there is at least one lower-rate user in the system, the multirate code system accommodates more users than other systems when the error probability of system is set below 10 -9.

  10. Read length and repeat resolution: exploring prokaryote genomes using next-generation sequencing technologies.

    Directory of Open Access Journals (Sweden)

    Matt J Cahill

    Full Text Available BACKGROUND: There are a growing number of next-generation sequencing technologies. At present, the most cost-effective options also produce the shortest reads. However, even for prokaryotes, there is uncertainty concerning the utility of these technologies for the de novo assembly of complete genomes. This reflects an expectation that short reads will be unable to resolve small, but presumably abundant, repeats. METHODOLOGY/PRINCIPAL FINDINGS: Using a simple model of repeat assembly, we develop and test a technique that, for any read length, can estimate the occurrence of unresolvable repeats in a genome, and thus predict the number of gaps that would need to be closed to produce a complete sequence. We apply this technique to 818 prokaryote genome sequences. This provides a quantitative assessment of the relative performance of various lengths. Notably, unpaired reads of only 150nt can reconstruct approximately 50% of the analysed genomes with fewer than 96 repeat-induced gaps. Nonetheless, there is considerable variation amongst prokaryotes. Some genomes can be assembled to near contiguity using very short reads while others require much longer reads. CONCLUSIONS: Given the diversity of prokaryote genomes, a sequencing strategy should be tailored to the organism under study. Our results will provide researchers with a practical resource to guide the selection of the appropriate read length.

  11. Full-length enriched multistage cDNA library construction covering ...

    African Journals Online (AJOL)

    DR TONUKARI NYEROVWO

    2012-04-10

    Apr 10, 2012 ... Full Length Research Paper. Full-length enriched ... complementary DNA; pfu, plaque-forming unit. ... Chinese-native tree species in Populus section Leuce ... the infected bacteria, 2 ml melted top agar was added, and the.

  12. Noise Attenuation Estimation for Maximum Length Sequences in Deconvolution Process of Auditory Evoked Potentials

    Directory of Open Access Journals (Sweden)

    Xian Peng

    2017-01-01

    Full Text Available The use of maximum length sequence (m-sequence has been found beneficial for recovering both linear and nonlinear components at rapid stimulation. Since m-sequence is fully characterized by a primitive polynomial of different orders, the selection of polynomial order can be problematic in practice. Usually, the m-sequence is repetitively delivered in a looped fashion. Ensemble averaging is carried out as the first step and followed by the cross-correlation analysis to deconvolve linear/nonlinear responses. According to the classical noise reduction property based on additive noise model, theoretical equations have been derived in measuring noise attenuation ratios (NARs after the averaging and correlation processes in the present study. A computer simulation experiment was conducted to test the derived equations, and a nonlinear deconvolution experiment was also conducted using order 7 and 9 m-sequences to address this issue with real data. Both theoretical and experimental results show that the NAR is essentially independent of the m-sequence order and is decided by the total length of valid data, as well as stimulation rate. The present study offers a guideline for m-sequence selections, which can be used to estimate required recording time and signal-to-noise ratio in designing m-sequence experiments.

  13. Joint Source-Channel Decoding of Variable-Length Codes with Soft Information: A Survey

    Directory of Open Access Journals (Sweden)

    Pierre Siohan

    2005-05-01

    Full Text Available Multimedia transmission over time-varying wireless channels presents a number of challenges beyond existing capabilities conceived so far for third-generation networks. Efficient quality-of-service (QoS provisioning for multimedia on these channels may in particular require a loosening and a rethinking of the layer separation principle. In that context, joint source-channel decoding (JSCD strategies have gained attention as viable alternatives to separate decoding of source and channel codes. A statistical framework based on hidden Markov models (HMM capturing dependencies between the source and channel coding components sets the foundation for optimal design of techniques of joint decoding of source and channel codes. The problem has been largely addressed in the research community, by considering both fixed-length codes (FLC and variable-length source codes (VLC widely used in compression standards. Joint source-channel decoding of VLC raises specific difficulties due to the fact that the segmentation of the received bitstream into source symbols is random. This paper makes a survey of recent theoretical and practical advances in the area of JSCD with soft information of VLC-encoded sources. It first describes the main paths followed for designing efficient estimators for VLC-encoded sources, the key component of the JSCD iterative structure. It then presents the main issues involved in the application of the turbo principle to JSCD of VLC-encoded sources as well as the main approaches to source-controlled channel decoding. This survey terminates by performance illustrations with real image and video decoding systems.

  14. The Classification of Complementary Information Set Codes of Lengths 14 and 16

    OpenAIRE

    Freibert, Finley

    2012-01-01

    In the paper "A new class of codes for Boolean masking of cryptographic computations," Carlet, Gaborit, Kim, and Sol\\'{e} defined a new class of rate one-half binary codes called \\emph{complementary information set} (or CIS) codes. The authors then classified all CIS codes of length less than or equal to 12. CIS codes have relations to classical Coding Theory as they are a generalization of self-dual codes. As stated in the paper, CIS codes also have important practical applications as they m...

  15. Complete coding sequence of Zika virus from Martinique outbreak in 2015

    Directory of Open Access Journals (Sweden)

    G. Piorkowski

    2016-05-01

    Full Text Available Zika virus is an Aedes-borne Flavivirus causing fever, arthralgia, myalgia rash, associated with Guillain–Barré syndrome and suspected to induce microcephaly in the fetus. We report here the complete coding sequence of the first characterized Caribbean Zika virus strain, isolated from a patient from Martinique in December, 2015.

  16. Computationally Efficient Chaotic Spreading Sequence Selection for Asynchronous DS-CDMA

    Directory of Open Access Journals (Sweden)

    Litviņenko Anna

    2017-12-01

    Full Text Available The choice of the spreading sequence for asynchronous direct-sequence code-division multiple-access (DS-CDMA systems plays a crucial role for the mitigation of multiple-access interference. Considering the rich dynamics of chaotic sequences, their use for spreading allows overcoming the limitations of the classical spreading sequences. However, to ensure low cross-correlation between the sequences, careful selection must be performed. This paper presents a novel exhaustive search algorithm, which allows finding sets of chaotic spreading sequences of required length with a particularly low mutual cross-correlation. The efficiency of the search is verified by simulations, which show a significant advantage compared to non-selected chaotic sequences. Moreover, the impact of sequence length on the efficiency of the selection is studied.

  17. Coding visual features extracted from video sequences.

    Science.gov (United States)

    Baroffio, Luca; Cesana, Matteo; Redondi, Alessandro; Tagliasacchi, Marco; Tubaro, Stefano

    2014-05-01

    Visual features are successfully exploited in several applications (e.g., visual search, object recognition and tracking, etc.) due to their ability to efficiently represent image content. Several visual analysis tasks require features to be transmitted over a bandwidth-limited network, thus calling for coding techniques to reduce the required bit budget, while attaining a target level of efficiency. In this paper, we propose, for the first time, a coding architecture designed for local features (e.g., SIFT, SURF) extracted from video sequences. To achieve high coding efficiency, we exploit both spatial and temporal redundancy by means of intraframe and interframe coding modes. In addition, we propose a coding mode decision based on rate-distortion optimization. The proposed coding scheme can be conveniently adopted to implement the analyze-then-compress (ATC) paradigm in the context of visual sensor networks. That is, sets of visual features are extracted from video frames, encoded at remote nodes, and finally transmitted to a central controller that performs visual analysis. This is in contrast to the traditional compress-then-analyze (CTA) paradigm, in which video sequences acquired at a node are compressed and then sent to a central unit for further processing. In this paper, we compare these coding paradigms using metrics that are routinely adopted to evaluate the suitability of visual features in the context of content-based retrieval, object recognition, and tracking. Experimental results demonstrate that, thanks to the significant coding gains achieved by the proposed coding scheme, ATC outperforms CTA with respect to all evaluation metrics.

  18. Identification, Characterization and Full-Length Sequence Analysis of a Novel Polerovirus Associated with Wheat Leaf Yellowing Disease.

    Science.gov (United States)

    Zhang, Peipei; Liu, Yan; Liu, Wenwen; Cao, Mengji; Massart, Sebastien; Wang, Xifeng

    2017-01-01

    To identify the pathogens responsible for leaf yellowing symptoms on wheat samples collected from Jinan, China, we tested for the presence of three known barley/wheat yellow dwarf viruses (BYDV-GAV, -PAV, WYDV-GPV) (most likely pathogens) using RT-PCR. A sample that tested negative for the three viruses was selected for small RNA sequencing. Twenty-five million sequences were generated, among which 5% were of viral origin. A novel polerovirus was discovered and temporarily named wheat leaf yellowing-associated virus (WLYaV). The full genome of WLYaV corresponds to 5,772 nucleotides (nt), with six AUG-initiated open reading frames, one non-AUG-initiated open reading frame, and three untranslated regions, showing typical features of the family Luteoviridae . Sequence comparison and phylogenetic analyses suggested that WLYaV had the closest relationship with sugarcane yellow leaf virus (ScYLV), but the identities of full genomic nucleotides and deduced amino acid sequence of coat protein (CP) were 64.9 and 86.2%, respectively, below the species demarcation thresholds (90%) in the family Luteoviridae . Furthermore, agroinoculation of Nicotiana benthamiana leaves with a cDNA clone of WLYaV caused yellowing symptoms on the plant. Our study adds a new polerovirus that is associated with wheat leaf yellowing disease, which would help to identify and control pathogens of wheat.

  19. Revised Mimivirus major capsid protein sequence reveals intron-containing gene structure and extra domain

    Directory of Open Access Journals (Sweden)

    Suzan-Monti Marie

    2009-05-01

    Full Text Available Abstract Background Acanthamoebae polyphaga Mimivirus (APM is the largest known dsDNA virus. The viral particle has a nearly icosahedral structure with an internal capsid shell surrounded with a dense layer of fibrils. A Capsid protein sequence, D13L, was deduced from the APM L425 coding gene and was shown to be the most abundant protein found within the viral particle. However this protein remained poorly characterised until now. A revised protein sequence deposited in a database suggested an additional N-terminal stretch of 142 amino acids missing from the original deduced sequence. This result led us to investigate the L425 gene structure and the biochemical properties of the complete APM major Capsid protein. Results This study describes the full length 3430 bp Capsid coding gene and characterises the 593 amino acids long corresponding Capsid protein 1. The recombinant full length protein allowed the production of a specific monoclonal antibody able to detect the Capsid protein 1 within the viral particle. This protein appeared to be post-translationnally modified by glycosylation and phosphorylation. We proposed a secondary structure prediction of APM Capsid protein 1 compared to the Capsid protein structure of Paramecium Bursaria Chlorella Virus 1, another member of the Nucleo-Cytoplasmic Large DNA virus family. Conclusion The characterisation of the full length L425 Capsid coding gene of Acanthamoebae polyphaga Mimivirus provides new insights into the structure of the main Capsid protein. The production of a full length recombinant protein will be useful for further structural studies.

  20. SRComp: short read sequence compression using burstsort and Elias omega coding.

    Directory of Open Access Journals (Sweden)

    Jeremy John Selva

    Full Text Available Next-generation sequencing (NGS technologies permit the rapid production of vast amounts of data at low cost. Economical data storage and transmission hence becomes an increasingly important challenge for NGS experiments. In this paper, we introduce a new non-reference based read sequence compression tool called SRComp. It works by first employing a fast string-sorting algorithm called burstsort to sort read sequences in lexicographical order and then Elias omega-based integer coding to encode the sorted read sequences. SRComp has been benchmarked on four large NGS datasets, where experimental results show that it can run 5-35 times faster than current state-of-the-art read sequence compression tools such as BEETL and SCALCE, while retaining comparable compression efficiency for large collections of short read sequences. SRComp is a read sequence compression tool that is particularly valuable in certain applications where compression time is of major concern.

  1. Machine-Checked Sequencer for Critical Embedded Code Generator

    Science.gov (United States)

    Izerrouken, Nassima; Pantel, Marc; Thirioux, Xavier

    This paper presents the development of a correct-by-construction block sequencer for GeneAuto a qualifiable (according to DO178B/ED12B recommendation) automatic code generator. It transforms Simulink models to MISRA C code for safety critical systems. Our approach which combines classical development process and formal specification and verification using proof-assistants, led to preliminary fruitful exchanges with certification authorities. We present parts of the classical user and tools requirements and derived formal specifications, implementation and verification for the correctness and termination of the block sequencer. This sequencer has been successfully applied to real-size industrial use cases from various transportation domain partners and led to requirement errors detection and a correct-by-construction implementation.

  2. Characterization of near full-length genomes of HIV type 1 strains in Denmark: Basis for a universal therapeutic vaccine

    DEFF Research Database (Denmark)

    Andresen, Betina S.; Vinner, Lasse; Tang, Sheila Tuyet

    2007-01-01

    We report here the near full-length sequence characterization of 17 Danish clinical HIV-1 strains isolated from HLA-A02 patients not in need of ART, with relatively low viral loads and normal CD4 cell counts. Sequencing was performed directly on DNA extracted from short-term cocultures of PBMCs...... of a universal immunotherapeutic vaccine construct based on these epitopes....

  3. An upper bound on the number of errors corrected by a convolutional code

    DEFF Research Database (Denmark)

    Justesen, Jørn

    2000-01-01

    The number of errors that a convolutional codes can correct in a segment of the encoded sequence is upper bounded by the number of distinct syndrome sequences of the relevant length.......The number of errors that a convolutional codes can correct in a segment of the encoded sequence is upper bounded by the number of distinct syndrome sequences of the relevant length....

  4. Detecting Scareware by Mining Variable Length Instruction Sequences

    OpenAIRE

    Shahzad, Raja Khurram; Lavesson, Niklas

    2011-01-01

    Scareware is a recent type of malicious software that may pose financial and privacy-related threats to novice users. Traditional countermeasures, such as anti-virus software, require regular updates and often lack the capability of detecting novel (unseen) instances. This paper presents a scareware detection method that is based on the application of machine learning algorithms to learn patterns in extracted variable length opcode sequences derived from instruction sequences of binary files....

  5. Codon size reduction as the origin of the triplet genetic code.

    Directory of Open Access Journals (Sweden)

    Pavel V Baranov

    Full Text Available The genetic code appears to be optimized in its robustness to missense errors and frameshift errors. In addition, the genetic code is near-optimal in terms of its ability to carry information in addition to the sequences of encoded proteins. As evolution has no foresight, optimality of the modern genetic code suggests that it evolved from less optimal code variants. The length of codons in the genetic code is also optimal, as three is the minimal nucleotide combination that can encode the twenty standard amino acids. The apparent impossibility of transitions between codon sizes in a discontinuous manner during evolution has resulted in an unbending view that the genetic code was always triplet. Yet, recent experimental evidence on quadruplet decoding, as well as the discovery of organisms with ambiguous and dual decoding, suggest that the possibility of the evolution of triplet decoding from living systems with non-triplet decoding merits reconsideration and further exploration. To explore this possibility we designed a mathematical model of the evolution of primitive digital coding systems which can decode nucleotide sequences into protein sequences. These coding systems can evolve their nucleotide sequences via genetic events of Darwinian evolution, such as point-mutations. The replication rates of such coding systems depend on the accuracy of the generated protein sequences. Computer simulations based on our model show that decoding systems with codons of length greater than three spontaneously evolve into predominantly triplet decoding systems. Our findings suggest a plausible scenario for the evolution of the triplet genetic code in a continuous manner. This scenario suggests an explanation of how protein synthesis could be accomplished by means of long RNA-RNA interactions prior to the emergence of the complex decoding machinery, such as the ribosome, that is required for stabilization and discrimination of otherwise weak triplet codon

  6. Non-codingRNA sequence variations in human chronic lymphocytic leukemia and colorectal cancer.

    Science.gov (United States)

    Wojcik, Sylwia E; Rossi, Simona; Shimizu, Masayoshi; Nicoloso, Milena S; Cimmino, Amelia; Alder, Hansjuerg; Herlea, Vlad; Rassenti, Laura Z; Rai, Kanti R; Kipps, Thomas J; Keating, Michael J; Croce, Carlo M; Calin, George A

    2010-02-01

    Cancer is a genetic disease in which the interplay between alterations in protein-coding genes and non-coding RNAs (ncRNAs) plays a fundamental role. In recent years, the full coding component of the human genome was sequenced in various cancers, whereas such attempts related to ncRNAs are still fragmentary. We screened genomic DNAs for sequence variations in 148 microRNAs (miRNAs) and ultraconserved regions (UCRs) loci in patients with chronic lymphocytic leukemia (CLL) or colorectal cancer (CRC) by Sanger technique and further tried to elucidate the functional consequences of some of these variations. We found sequence variations in miRNAs in both sporadic and familial CLL cases, mutations of UCRs in CLLs and CRCs and, in certain instances, detected functional effects of these variations. Furthermore, by integrating our data with previously published data on miRNA sequence variations, we have created a catalog of DNA sequence variations in miRNAs/ultraconserved genes in human cancers. These findings argue that ncRNAs are targeted by both germ line and somatic mutations as well as by single-nucleotide polymorphisms with functional significance for human tumorigenesis. Sequence variations in ncRNA loci are frequent and some have functional and biological significance. Such information can be exploited to further investigate on a genome-wide scale the frequency of genetic variations in ncRNAs and their functional meaning, as well as for the development of new diagnostic and prognostic markers for leukemias and carcinomas.

  7. Molecular comparisons of full length metapneumovirus (MPV genomes, including newly determined French AMPV-C and -D isolates, further supports possible subclassification within the MPV Genus.

    Directory of Open Access Journals (Sweden)

    Paul A Brown

    Full Text Available Four avian metapneumovirus (AMPV subgroups (A-D have been reported previously based on genetic and antigenic differences. However, until now full length sequences of the only known isolates of European subgroup C and subgroup D viruses (duck and turkey origin, respectively have been unavailable. These full length sequences were determined and compared with other full length AMPV and human metapneumoviruses (HMPV sequences reported previously, using phylogenetics, comparisons of nucleic and amino acid sequences and study of codon usage bias. Results confirmed that subgroup C viruses were more closely related to HMPV than they were to the other AMPV subgroups in the study. This was consistent with previous findings using partial genome sequences. Closer relationships between AMPV-A, B and D were also evident throughout the majority of results. Three metapneumovirus "clusters" HMPV, AMPV-C and AMPV-A, B and D were further supported by codon bias and phylogenetics. The data presented here together with those of previous studies describing antigenic relationships also between AMPV-A, B and D and between AMPV-C and HMPV may call for a subclassification of metapneumoviruses similar to that used for avian paramyxoviruses, grouping AMPV-A, B and D as type I metapneumoviruses and AMPV-C and HMPV as type II.

  8. Codes on the Klein quartic, ideals, and decoding

    DEFF Research Database (Denmark)

    Hansen, Johan P.

    1987-01-01

    descriptions as left ideals in the group-algebra GF(2^{3})[G]. This description allows for easy decoding. For instance, in the case of the single error correcting code of length21and dimension16with minimal distance3. decoding is obtained by multiplication with an idempotent in the group algebra.......A sequence of codes with particular symmetries and with large rates compared to their minimal distances is constructed over the field GF(2^{3}). In the sequence there is, for instance, a code of length 21 and dimension10with minimal distance9, and a code of length21and dimension16with minimal...... distance3. The codes are constructed from algebraic geometry using the dictionary between coding theory and algebraic curves over finite fields established by Goppa. The curve used in the present work is the Klein quartic. This curve has the maximal number of rational points over GF(2^{3})allowed by Serre...

  9. Algebraic solution of the synthesis problem for coded sequences

    International Nuclear Information System (INIS)

    Leukhin, Anatolii N

    2005-01-01

    The algebraic solution of a 'complex' problem of synthesis of phase-coded (PC) sequences with the zero level of side lobes of the cyclic autocorrelation function (ACF) is proposed. It is shown that the solution of the synthesis problem is connected with the existence of difference sets for a given code dimension. The problem of estimating the number of possible code combinations for a given code dimension is solved. It is pointed out that the problem of synthesis of PC sequences is related to the fundamental problems of discrete mathematics and, first of all, to a number of combinatorial problems, which can be solved, as the number factorisation problem, by algebraic methods by using the theory of Galois fields and groups. (fourth seminar to the memory of d.n. klyshko)

  10. A new set of ESTs and cDNA clones from full-length and normalized libraries for gene discovery and functional characterization in citrus

    Directory of Open Access Journals (Sweden)

    Alamar Santiago

    2009-09-01

    Full Text Available Abstract Background Interpretation of ever-increasing raw sequence information generated by modern genome sequencing technologies faces multiple challenges, such as gene function analysis and genome annotation. Indeed, nearly 40% of genes in plants encode proteins of unknown function. Functional characterization of these genes is one of the main challenges in modern biology. In this regard, the availability of full-length cDNA clones may fill in the gap created between sequence information and biological knowledge. Full-length cDNA clones facilitate functional analysis of the corresponding genes enabling manipulation of their expression in heterologous systems and the generation of a variety of tagged versions of the native protein. In addition, the development of full-length cDNA sequences has the power to improve the quality of genome annotation. Results We developed an integrated method to generate a new normalized EST collection enriched in full-length and rare transcripts of different citrus species from multiple tissues and developmental stages. We constructed a total of 15 cDNA libraries, from which we isolated 10,898 high-quality ESTs representing 6142 different genes. Percentages of redundancy and proportion of full-length clones range from 8 to 33, and 67 to 85, respectively, indicating good efficiency of the approach employed. The new EST collection adds 2113 new citrus ESTs, representing 1831 unigenes, to the collection of citrus genes available in the public databases. To facilitate functional analysis, cDNAs were introduced in a Gateway-based cloning vector for high-throughput functional analysis of genes in planta. Herein, we describe the technical methods used in the library construction, sequence analysis of clones and the overexpression of CitrSEP, a citrus homolog to the Arabidopsis SEP3 gene, in Arabidopsis as an example of a practical application of the engineered Gateway vector for functional analysis. Conclusion The new

  11. Self-complementary circular codes in coding theory.

    Science.gov (United States)

    Fimmel, Elena; Michel, Christian J; Starman, Martin; Strüngmann, Lutz

    2018-04-01

    Self-complementary circular codes are involved in pairing genetic processes. A maximal [Formula: see text] self-complementary circular code X of trinucleotides was identified in genes of bacteria, archaea, eukaryotes, plasmids and viruses (Michel in Life 7(20):1-16 2017, J Theor Biol 380:156-177, 2015; Arquès and Michel in J Theor Biol 182:45-58 1996). In this paper, self-complementary circular codes are investigated using the graph theory approach recently formulated in Fimmel et al. (Philos Trans R Soc A 374:20150058, 2016). A directed graph [Formula: see text] associated with any code X mirrors the properties of the code. In the present paper, we demonstrate a necessary condition for the self-complementarity of an arbitrary code X in terms of the graph theory. The same condition has been proven to be sufficient for codes which are circular and of large size [Formula: see text] trinucleotides, in particular for maximal circular codes ([Formula: see text] trinucleotides). For codes of small-size [Formula: see text] trinucleotides, some very rare counterexamples have been constructed. Furthermore, the length and the structure of the longest paths in the graphs associated with the self-complementary circular codes are investigated. It has been proven that the longest paths in such graphs determine the reading frame for the self-complementary circular codes. By applying this result, the reading frame in any arbitrary sequence of trinucleotides is retrieved after at most 15 nucleotides, i.e., 5 consecutive trinucleotides, from the circular code X identified in genes. Thus, an X motif of a length of at least 15 nucleotides in an arbitrary sequence of trinucleotides (not necessarily all of them belonging to X) uniquely defines the reading (correct) frame, an important criterion for analyzing the X motifs in genes in the future.

  12. Hibiscus latent Fort Pierce virus in Brazil and synthesis of its biologically active full-length cDNA clone.

    Science.gov (United States)

    Gao, Ruimin; Niu, Shengniao; Dai, Weifang; Kitajima, Elliot; Wong, Sek-Man

    2016-10-01

    A Brazilian isolate of Hibiscus latent Fort Pierce virus (HLFPV-BR) was firstly found in a hibiscus plant in Limeira, SP, Brazil. RACE PCR was carried out to obtain the full-length sequences of HLFPV-BR which is 6453 nucleotides and has more than 99.15 % of complete genomic RNA nucleotide sequence identity with that of HLFPV Japanese isolate. The genomic structure of HLFPV-BR is similar to other tobamoviruses. It includes a 5' untranslated region (UTR), followed by open reading frames encoding for a 128-kDa protein and a 188-kDa readthrough protein, a 38-kDa movement protein, 18-kDa coat protein, and a 3' UTR. Interestingly, the unique feature of poly(A) tract is also found within its 3'-UTR. Furthermore, from the total RNA extracted from the local lesions of HLFPV-BR-infected Chenopodium quinoa leaves, a biologically active, full-length cDNA clone encompassing the genome of HLFPV-BR was amplified and placed adjacent to a T7 RNA polymerase promoter. The capped in vitro transcripts from the cloned cDNA were infectious when mechanically inoculated into C. quinoa and Nicotiana benthamiana plants. This is the first report of the presence of an isolate of HLFPV in Brazil and the successful synthesis of a biologically active HLFPV-BR full-length cDNA clone.

  13. Increased mRNA expression of a laminin-binding protein in human colon carcinoma: Complete sequence of a full-length cDNA encoding the protein

    International Nuclear Information System (INIS)

    Yow, Hsiukang; Wong, Jau Min; Chen, Hai Shiene; Lee, C.; Steele, G.D. Jr.; Chen, Lanbo

    1988-01-01

    Reliable markers to distinguish human colon carcinoma from normal colonic epithelium are needed particularly for poorly differentiated tumors where no useful marker is currently available. To search for markers the authors constructed cDNA libraries from human colon carcinoma cell lines and screened for clones that hybridize to a greater degree with mRNAs of colon carcinomas than with their normal counterparts. Here they report one such cDNA clone that hybridizes with a 1.2-kilobase (kb) mRNA, the level of which is ∼9-fold greater in colon carcinoma than in adjacent normal colonic epithelium. Blot hybridization of total RNA from a variety of human colon carcinoma cell lines shows that the level of this 1.2-kb mRNA in poorly differentiated colon carcinomas is as high as or higher than that in well-differentiated carcinomas. Molecular cloning and complete sequencing of cDNA corresponding to the full-length open reading frame of this 1.2-kb mRNA unexpectedly show it to contain all the partial cDNA sequence encoding 135 amino acid residues previously reported for a human laminin receptor. The deduced amino acid sequence suggests that this putative laminin-binding protein from human colon carcinomas consists of 295 amino acid residues with interesting features. There is an unusual C-terminal 70-amino acid segment, which is trypsin-resistant and highly negatively charged

  14. Assessing the genetic diversity of Cu resistance in mine tailings through high-throughput recovery of full-length copA genes

    Science.gov (United States)

    Li, Xiaofang; Zhu, Yong-Guan; Shaban, Babak; Bruxner, Timothy J. C.; Bond, Philip L.; Huang, Longbin

    2015-01-01

    Characterizing the genetic diversity of microbial copper (Cu) resistance at the community level remains challenging, mainly due to the polymorphism of the core functional gene copA. In this study, a local BLASTN method using a copA database built in this study was developed to recover full-length putative copA sequences from an assembled tailings metagenome; these sequences were then screened for potentially functioning CopA using conserved metal-binding motifs, inferred by evolutionary trace analysis of CopA sequences from known Cu resistant microorganisms. In total, 99 putative copA sequences were recovered from the tailings metagenome, out of which 70 were found with high potential to be functioning in Cu resistance. Phylogenetic analysis of selected copA sequences detected in the tailings metagenome showed that topology of the copA phylogeny is largely congruent with that of the 16S-based phylogeny of the tailings microbial community obtained in our previous study, indicating that the development of copA diversity in the tailings might be mainly through vertical descent with few lateral gene transfer events. The method established here can be used to explore copA (and potentially other metal resistance genes) diversity in any metagenome and has the potential to exhaust the full-length gene sequences for downstream analyses. PMID:26286020

  15. Golay sequences coded coherent optical OFDM for long-haul transmission

    Science.gov (United States)

    Qin, Cui; Ma, Xiangrong; Hua, Tao; Zhao, Jing; Yu, Huilong; Zhang, Jian

    2017-09-01

    We propose to use binary Golay sequences in coherent optical orthogonal frequency division multiplexing (CO-OFDM) to improve the long-haul transmission performance. The Golay sequences are generated by binary Reed-Muller codes, which have low peak-to-average power ratio and certain error correction capability. A low-complexity decoding algorithm for the Golay sequences is then proposed to recover the signal. Under same spectral efficiency, the QPSK modulated OFDM with binary Golay sequences coding with and without discrete Fourier transform (DFT) spreading (DFTS-QPSK-GOFDM and QPSK-GOFDM) are compared with the normal BPSK modulated OFDM with and without DFT spreading (DFTS-BPSK-OFDM and BPSK-OFDM) after long-haul transmission. At a 7% forward error correction code threshold (Q2 factor of 8.5 dB), it is shown that DFTS-QPSK-GOFDM outperforms DFTS-BPSK-OFDM by extending the transmission distance by 29% and 18%, in non-dispersion managed and dispersion managed links, respectively.

  16. A new set of ESTs and cDNA clones from full-length and normalized libraries for gene discovery and functional characterization in citrus

    Science.gov (United States)

    Marques, M Carmen; Alonso-Cantabrana, Hugo; Forment, Javier; Arribas, Raquel; Alamar, Santiago; Conejero, Vicente; Perez-Amador, Miguel A

    2009-01-01

    Background Interpretation of ever-increasing raw sequence information generated by modern genome sequencing technologies faces multiple challenges, such as gene function analysis and genome annotation. Indeed, nearly 40% of genes in plants encode proteins of unknown function. Functional characterization of these genes is one of the main challenges in modern biology. In this regard, the availability of full-length cDNA clones may fill in the gap created between sequence information and biological knowledge. Full-length cDNA clones facilitate functional analysis of the corresponding genes enabling manipulation of their expression in heterologous systems and the generation of a variety of tagged versions of the native protein. In addition, the development of full-length cDNA sequences has the power to improve the quality of genome annotation. Results We developed an integrated method to generate a new normalized EST collection enriched in full-length and rare transcripts of different citrus species from multiple tissues and developmental stages. We constructed a total of 15 cDNA libraries, from which we isolated 10,898 high-quality ESTs representing 6142 different genes. Percentages of redundancy and proportion of full-length clones range from 8 to 33, and 67 to 85, respectively, indicating good efficiency of the approach employed. The new EST collection adds 2113 new citrus ESTs, representing 1831 unigenes, to the collection of citrus genes available in the public databases. To facilitate functional analysis, cDNAs were introduced in a Gateway-based cloning vector for high-throughput functional analysis of genes in planta. Herein, we describe the technical methods used in the library construction, sequence analysis of clones and the overexpression of CitrSEP, a citrus homolog to the Arabidopsis SEP3 gene, in Arabidopsis as an example of a practical application of the engineered Gateway vector for functional analysis. Conclusion The new EST collection denotes an

  17. Hybrid Sequencing of Full-Length cDNA Transcripts of Stems and Leaves in Dendrobium officinale

    Directory of Open Access Journals (Sweden)

    Liu He

    2017-10-01

    Full Text Available Dendrobium officinale is an extremely valuable orchid used in traditional Chinese medicine, so sought after that it has a higher market value than gold. Although the expression profiles of some genes involved in the polysaccharide synthesis have previously been investigated, little research has been carried out on their alternatively spliced isoforms in D. officinale. In addition, information regarding the translocation of sugars from leaves to stems in D. officinale also remains limited. We analyzed the polysaccharide content of D. officinale leaves and stems, and completed in-depth transcriptome sequencing of these two diverse tissue types using second-generation sequencing (SGS and single-molecule real-time (SMRT sequencing technology. The results of this study yielded a digital inventory of gene and mRNA isoform expressions. A comparative analysis of both transcriptomes uncovered a total of 1414 differentially expressed genes, including 844 that were up-regulated and 570 that were down-regulated in stems. Of these genes, one sugars will eventually be exported transporter (SWEET and one sucrose transporter (SUT are expressed to a greater extent in D. officinale stems than in leaves. Two glycosyltransferase (GT and four cellulose synthase (Ces genes undergo a distinct degree of alternative splicing. In the stems, the content of polysaccharides is twice as much as that in the leaves. The differentially expressed GT and transcription factor (TF genes will be the focus of further study. The genes DoSWEET4 and DoSUT1 are significantly expressed in the stem, and are likely to be involved in sugar loading in the phloem.

  18. Full length channel Pressure Tube sagging under completely voided full length pressure tube of an Indian PHWR

    Energy Technology Data Exchange (ETDEWEB)

    Negi, Sujay, E-mail: negi.sujay@gmail.com [Indian Institute of Technology, Roorkee 247667 (India); Kumar, Ravi, E-mail: ravikfme@gmail.com [Indian Institute of Technology, Roorkee 247667 (India); Majumdar, P., E-mail: pmajum@barc.gov.in [Bhabha Atomic Research Centre, Mumbai 400085 (India); Mukopadhyay, D., E-mail: dmukho@barc.gov.in [Bhabha Atomic Research Centre, Mumbai 400085 (India)

    2017-03-15

    Highlights: • At 16 kW/m input, thermal stability was attained at 595 °C, without PT-CT contact. • At 20 kW/m step input, PT-CT contact occurred at 637 °C near bottom-center of the tube. • PT integrity was maintained throughout the experiment. - Abstract: An experimental investigation was conducted to simulate the sagging behavior of a full length Pressure Tube of a channel of 220 MWe Indian PHWR. The investigation aimed to recreate a condition resembling Loss of Coolant Accident (LOCA) with Emergency Core Cooling System (ECCS) failure in a nuclear power plant. A full length channel assembly immersed in moderator was subjected to electrical resistance heating of Pressure Tube (PT) to simulate the residual heat after shutting down of reactor. The temperature of PT started rising and the contact between PT and CT was established at the center of the tube where average bottom temperature was 637 °C. The integrity of PT was maintained throughout the experiment and the PT heat up was arrested on contact with the CT due to transfer of heat to the moderator.

  19. Identification of evolutionarily conserved non-AUG-initiated N-terminal extensions in human coding sequences.

    LENUS (Irish Health Repository)

    Ivanov, Ivaylo P

    2011-05-01

    In eukaryotes, it is generally assumed that translation initiation occurs at the AUG codon closest to the messenger RNA 5\\' cap. However, in certain cases, initiation can occur at codons differing from AUG by a single nucleotide, especially the codons CUG, UUG, GUG, ACG, AUA and AUU. While non-AUG initiation has been experimentally verified for a handful of human genes, the full extent to which this phenomenon is utilized--both for increased coding capacity and potentially also for novel regulatory mechanisms--remains unclear. To address this issue, and hence to improve the quality of existing coding sequence annotations, we developed a methodology based on phylogenetic analysis of predicted 5\\' untranslated regions from orthologous genes. We use evolutionary signatures of protein-coding sequences as an indicator of translation initiation upstream of annotated coding sequences. Our search identified novel conserved potential non-AUG-initiated N-terminal extensions in 42 human genes including VANGL2, FGFR1, KCNN4, TRPV6, HDGF, CITED2, EIF4G3 and NTF3, and also affirmed the conservation of known non-AUG-initiated extensions in 17 other genes. In several instances, we have been able to obtain independent experimental evidence of the expression of non-AUG-initiated products from the previously published literature and ribosome profiling data.

  20. FREQUENCY ANALYSIS OF RLE-BLOCKS REPETITIONS IN THE SERIES OF BINARY CODES WITH OPTIMAL MINIMAX CRITERION OF AUTOCORRELATION FUNCTION

    Directory of Open Access Journals (Sweden)

    A. A. Kovylin

    2013-01-01

    Full Text Available The article describes the problem of searching for binary pseudo-random sequences with quasi-ideal autocorrelation function, which are to be used in contemporary communication systems, including mobile and wireless data transfer interfaces. In the synthesis of binary sequences sets, the target set is manning them based on the minimax criterion by which a sequence is considered to be optimal according to the intended application. In the course of the research the optimal sequences with order of up to 52 were obtained; the analysis of Run Length Encoding was carried out. The analysis showed regularities in the distribution of series number of different lengths in the codes that are optimal on the chosen criteria, which would make it possible to optimize the searching process for such codes in the future.

  1. Sequence Coding and Search System for licensee event reports: code listings. Volume 2

    International Nuclear Information System (INIS)

    Gallaher, R.B.; Guymon, R.H.; Mays, G.T.; Poore, W.P.; Cagle, R.J.; Harrington, K.H.; Johnson, M.P.

    1985-04-01

    Operating experience data from nuclear power plants are essential for safety and reliability analyses, especially analyses of trends and patterns. The licensee event reports (LERs) that are submitted to the Nuclear Regulatory Commission (NRC) by the nuclear power plant utilities contain much of this data. The NRC's Office for Analysis and Evaluation of Operational Data (AEOD) has developed, under contract with NSIC, a system for codifying the events reported in the LERs. The primary objective of the Sequence Coding and Search System (SCSS) is to reduce the descriptive text of the LERs to coded sequences that are both computer-readable and computer-searchable. This system provides a structured format for detailed coding of component, system, and unit effects as well as personnel errors. The database contains all current LERs submitted by nuclear power plant utilities for events occurring since 1981 and is updated on a continual basis. Volume 2 contains all valid and acceptable codes used for searching and encoding the LER data. This volume contains updated material through amendment 1 to revision 1 of the working version of ORNL/NSIC-223, Vol. 2

  2. Calculation of evolutionary correlation between individual genes and full-length genome: a method useful for choosing phylogenetic markers for molecular epidemiology.

    Directory of Open Access Journals (Sweden)

    Shuai Wang

    Full Text Available Individual genes or regions are still commonly used to estimate the phylogenetic relationships among viral isolates. The genomic regions that can faithfully provide assessments consistent with those predicted with full-length genome sequences would be preferable to serve as good candidates of the phylogenetic markers for molecular epidemiological studies of many viruses. Here we employed a statistical method to evaluate the evolutionary relationships between individual viral genes and full-length genomes without tree construction as a way to determine which gene can match the genome well in phylogenetic analyses. This method was performed by calculation of linear correlations between the genetic distance matrices of aligned individual gene sequences and aligned genome sequences. We applied this method to the phylogenetic analyses of porcine circovirus 2 (PCV2, measles virus (MV, hepatitis E virus (HEV and Japanese encephalitis virus (JEV. Phylogenetic trees were constructed for comparisons and the possible factors affecting the method accuracy were also discussed in the calculations. The results revealed that this method could produce results consistent with those of previous studies about the proper consensus sequences that could be successfully used as phylogenetic markers. And our results also suggested that these evolutionary correlations could provide useful information for identifying genes that could be used effectively to infer the genetic relationships.

  3. Production of a full-length infectious GFP-tagged cDNA clone of Beet mild yellowing virus for the study of plant-polerovirus interactions.

    Science.gov (United States)

    Stevens, Mark; Viganó, Felicita

    2007-04-01

    The full-length cDNA of Beet mild yellowing virus (Broom's Barn isolate) was sequenced and cloned into the vector pLitmus 29 (pBMYV-BBfl). The sequence of BMYV-BBfl (5721 bases) shared 96% and 98% nucleotide identity with the other complete sequences of BMYV (BMYV-2ITB, France and BMYV-IPP, Germany respectively). Full-length capped RNA transcripts of pBMYV-BBfl were synthesised and found to be biologically active in Arabidopsis thaliana protoplasts following electroporation or PEG inoculation when the protoplasts were subsequently analysed using serological and molecular methods. The BMYV sequence was modified by inserting DNA that encoded the jellyfish green fluorescent protein (GFP) into the P5 gene close to its 3' end. A. thaliana protoplasts electroporated with these RNA transcripts were biologically active and up to 2% of transfected protoplasts showed GFP-specific fluorescence. The exploitation of these cDNA clones for the study of the biology of beet poleroviruses is discussed.

  4. First full length sequences of the S gene of European isolates reveal further diversity among turkey coronaviruses.

    OpenAIRE

    2011-01-01

    Abstract An increasing incidence of enteric disorders clinically evocative of the poult enteritis complex has been observed in turkeys in France since 2003. Using a newly designed real-time RT-PCR assay specific for the nucleocapsid (N) gene of infectious bronchitis virus (IBV) and turkey coronaviruses (TCoV), coronaviruses were identified in 37 % of the intestinal samples collected from diseased turkey flocks. The full length Spike (S) gene of these viruses was amplified, cloned a...

  5. Generation of pseudo-random sequences for spread spectrum systems

    Science.gov (United States)

    Moser, R.; Stover, J.

    1985-05-01

    The characteristics of pseudo random radio signal sequences (PRS) are explored. The randomness of the PSR is a matter of artificially altering the sequence of binary digits broadcast. Autocorrelations of the two sequences shifted in time, if high, determine if the signals are the same and thus allow for position identification. Cross-correlation can also be calculated between sequences. Correlations closest to zero are obtained with large volume of prime numbers in the sequences. Techniques for selecting optimal and maximal lengths for the sequences are reviewed. If the correlations are near zero in the sequences, then signal channels can accommodate multiple users. Finally, Gold codes are discussed as a technique for maximizing the code lengths.

  6. Molecular Comparisons of Full Length Metapneumovirus (MPV) Genomes, Including Newly Determined French AMPV-C and –D Isolates, Further Supports Possible Subclassification within the MPV Genus

    Science.gov (United States)

    Brown, Paul A.; Lemaitre, Evelyne; Briand, François-Xavier; Courtillon, Céline; Guionie, Olivier; Allée, Chantal; Toquin, Didier; Bayon-Auboyer, Marie-Hélène; Jestin, Véronique; Eterradossi, Nicolas

    2014-01-01

    Four avian metapneumovirus (AMPV) subgroups (A–D) have been reported previously based on genetic and antigenic differences. However, until now full length sequences of the only known isolates of European subgroup C and subgroup D viruses (duck and turkey origin, respectively) have been unavailable. These full length sequences were determined and compared with other full length AMPV and human metapneumoviruses (HMPV) sequences reported previously, using phylogenetics, comparisons of nucleic and amino acid sequences and study of codon usage bias. Results confirmed that subgroup C viruses were more closely related to HMPV than they were to the other AMPV subgroups in the study. This was consistent with previous findings using partial genome sequences. Closer relationships between AMPV-A, B and D were also evident throughout the majority of results. Three metapneumovirus “clusters” HMPV, AMPV-C and AMPV-A, B and D were further supported by codon bias and phylogenetics. The data presented here together with those of previous studies describing antigenic relationships also between AMPV-A, B and D and between AMPV-C and HMPV may call for a subclassification of metapneumoviruses similar to that used for avian paramyxoviruses, grouping AMPV-A, B and D as type I metapneumoviruses and AMPV-C and HMPV as type II. PMID:25036224

  7. Use of Dried Blood Spots to Elucidate Full-Length Transmitted/Founder HIV-1 Genomes

    Directory of Open Access Journals (Sweden)

    Jesus F. Salazar-Gonzalez

    2016-07-01

    Full Text Available Background: Identification of HIV-1 genomes responsible for establishing clinical infection in newly infected individuals is fundamental to prevention and pathogenesis research. Processing, storage, and transportation of the clinical samples required to perform these virologic assays in resource-limited settings requires challenging venipuncture and cold chain logistics. Here, we validate the use of dried-blood spots (DBS as a simple and convenient alternative to collecting and storing frozen plasma. Methods: We performed parallel nucleic acid extraction, single genome amplification (SGA, next generation sequencing (NGS, and phylogenetic analyses on plasma and DBS. Results: We demonstrated the capacity to extract viral RNA from DBS and perform SGA to infer the complete nucleotide sequence of the transmitted/founder (TF HIV-1 envelope gene and full-length genome in two acutely infected individuals. Using both SGA and NGS methodologies, we showed that sequences generated from DBS and plasma display comparable phylogenetic patterns in both acute and chronic infection. SGA was successful on samples with a range of plasma viremia, including samples as low as 1,700 copies/ml and an estimated ~50 viral copies per blood spot. Further, we demonstrated reproducible efficiency in gp160 env sequencing in DBS stored at ambient temperature for up to three weeks or at -20ºC for up to five months. Conclusions: These findings support the use of DBS as a practical and cost-effective alternative to frozen plasma for clinical trials and translational research conducted in resource-limited settings.

  8. Ultrafast all-optical code-division multiple-access networks

    Science.gov (United States)

    Kwong, Wing C.; Prucnal, Paul R.; Liu, Yanming

    1992-12-01

    In optical code-division multiple access (CDMA), the architecture of optical encoders/decoders is another important factor that needs to be considered, besides the correlation properties of those already extensively studied optical codes. The architecture of optical encoders/decoders affects, for example, the amount of power loss and length of optical delays that are associated with code sequence generation and correlation, which, in turn, affect the power budget, size, and cost of an optical CDMA system. Various CDMA coding architectures are studied in the paper. In contrast to the encoders/decoders used in prime networks (i.e., prime encodes/decoders), which generate, select, and correlate code sequences by a parallel combination of fiber-optic delay-lines, and in 2n networks (i.e., 2n encoders/decoders), which generate and correlate code sequences by a serial combination of 2 X 2 passive couplers and fiber delays with sequence selection performed in a parallel fashion, the modified 2n encoders/decoders generate, select, and correlate code sequences by a serial combination of directional couplers and delays. The power and delay- length requirements of the modified 2n encoders/decoders are compared to that of the prime and 2n encoders/decoders. A 100 Mbit/s optical CDMA experiment in free space demonstrating the feasibility of the all-serial coding architecture using a serial combination of 50/50 beam splitters and retroreflectors at 10 Tchip/s (i.e., 100,000 chip/bit) with 100 fs laser pulses is reported.

  9. Transduplication resulted in the incorporation of two protein-coding sequences into the Turmoil-1 transposable element of C. elegans

    Directory of Open Access Journals (Sweden)

    Pupko Tal

    2008-10-01

    Full Text Available Abstract Transposable elements may acquire unrelated gene fragments into their sequences in a process called transduplication. Transduplication of protein-coding genes is common in plants, but is unknown of in animals. Here, we report that the Turmoil-1 transposable element in C. elegans has incorporated two protein-coding sequences into its inverted terminal repeat (ITR sequences. The ITRs of Turmoil-1 contain a conserved RNA recognition motif (RRM that originated from the rsp-2 gene and a fragment from the protein-coding region of the cpg-3 gene. We further report that an open reading frame specific to C. elegans may have been created as a result of a Turmoil-1 insertion. Mutations at the 5' splice site of this open reading frame may have reactivated the transduplicated RRM motif. Reviewers This article was reviewed by Dan Graur and William Martin. For the full reviews, please go to the Reviewers' Reports section.

  10. Full-length fuel rod behavior under severe accident conditions

    International Nuclear Information System (INIS)

    Lombardo, N.J.; Lanning, D.D.; Panisko, F.E.

    1992-12-01

    This document presents an assessment of the severe accident phenomena observed from four Full-Length High-Temperature (FLHT) tests that were performed by the Pacific Northwest Laboratory (PNL) in the National Research Universal (NRU) reactor at Chalk River, Ontario, Canada. These tests were conducted for the US Nuclear Regulatory Commission (NRC) as part of the Severe Accident Research Program. The objectives of the test were to simulate conditions and provide information on the behavior of full-length fuel rods during hypothetical, small-break, loss-of-coolant severe accidents, in commercial light water reactors

  11. Adaptive variable-length coding for efficient compression of spacecraft television data.

    Science.gov (United States)

    Rice, R. F.; Plaunt, J. R.

    1971-01-01

    An adaptive variable length coding system is presented. Although developed primarily for the proposed Grand Tour missions, many features of this system clearly indicate a much wider applicability. Using sample to sample prediction, the coding system produces output rates within 0.25 bit/picture element (pixel) of the one-dimensional difference entropy for entropy values ranging from 0 to 8 bit/pixel. This is accomplished without the necessity of storing any code words. Performance improvements of 0.5 bit/pixel can be simply achieved by utilizing previous line correlation. A Basic Compressor, using concatenated codes, adapts to rapid changes in source statistics by automatically selecting one of three codes to use for each block of 21 pixels. The system adapts to less frequent, but more dramatic, changes in source statistics by adjusting the mode in which the Basic Compressor operates on a line-to-line basis. Furthermore, the compression system is independent of the quantization requirements of the pulse-code modulation system.

  12. Comparative analysis of protein coding sequences from human, mouse and the domesticated pig

    DEFF Research Database (Denmark)

    Jørgensen, Frank Grønlund; Hobolth, Asger; Hornshøj, Henrik

    2005-01-01

    Background: The availability of abundant sequence data from key model organisms has made large scale studies of mulecular evolution an exciting possibility. Here we use full length cDNA alignments comprising more than 700,000 nucleotides from human, mouse, pig and the Japanese pufferfish Fugu rub...... rubrices in order to investigate 1) the relationships between three major lineages of mammals: rodents, artiodactys and primates, and 2) the rate of evolution and the occurrence of positive Darwinian selection using codon based models of sequence evolution. Results: We provide evidence...

  13. Direct recovery of infectious Pestivirus from a full-length RT-PCR amplicon

    DEFF Research Database (Denmark)

    Rasmussen, Thomas Bruun; Reimann, Ilona; Hoffmann, Bernd

    2008-01-01

    This study describes the use of a novel and rapid long reverse transcription (RT)-PCR for the generation of infectious full-length cDNA of pestiviruses. To produce rescued viruses, full-length RT-PCR amplicons of 12.3 kb, including a T7-promotor, were transcribed directly in vitro, and the result......This study describes the use of a novel and rapid long reverse transcription (RT)-PCR for the generation of infectious full-length cDNA of pestiviruses. To produce rescued viruses, full-length RT-PCR amplicons of 12.3 kb, including a T7-promotor, were transcribed directly in vitro......, and the resulting RNA transcripts were electroporated into ovine cells. Infectious virus was obtained after one cell culture passage. The rescued viruses had a phenotype similar to the parental Border Disease virus strain. Therefore, direct generation of infectious pestiviruses from full-length RT-PCR cDNA products...

  14. Increased length of inpatient stay and poor clinical coding: audit of patients with diabetes.

    Science.gov (United States)

    Daultrey, Harriet; Gooday, Catherine; Dhatariya, Ketan

    2011-11-01

    People with diabetes stay in hospital for longer than those without diabetes for similar conditions. Clinical coding is poor across all specialties. Inpatients with diabetes often have unrecognized foot problems. We wanted to look at the relationships between these factors. A single day audit, looking at the prevalence of diabetes in all adult inpatients. Also looking at their feet to find out how many were high-risk or had existing problems. A 998-bed university teaching hospital. All adult inpatients. (a) To see if patients with diabetes and foot problems were in hospital for longer than the national average length of stay compared with national data; (b) to see if there were people in hospital with acute foot problems who were not known to the specialist diabetic foot team; and (c) to assess the accuracy of clinical coding. We identified 110 people with diabetes. However, discharge coding data for inpatients on that day showed 119 people with diabetes. Length of stay (LOS) was substantially higher for those with diabetes compared to those without (± SD) at 22.39 (22.26) days, vs. 11.68 (6.46) (P coding was poor with some people who had been identified as having diabetes on the audit, who were not coded as such on discharge. Clinical coding - which is dependent on discharge summaries - poorly reflects diagnoses. Additionally, length of stay is significantly longer than previous estimates. The discrepancy between coding and diagnosis needs addressing by increasing the levels of awareness and education of coders and physicians. We suggest that our data be used by healthcare planners when deciding on future tariffs.

  15. On the total number of genes and their length distribution in complete microbial genomes

    DEFF Research Database (Denmark)

    Skovgaard, M; Jensen, L J; Brunak, S

    2001-01-01

    In sequenced microbial genomes, some of the annotated genes are actually not protein-coding genes, but rather open reading frames that occur by chance. Therefore, the number of annotated genes is higher than the actual number of genes for most of these microbes. Comparison of the length distribut......In sequenced microbial genomes, some of the annotated genes are actually not protein-coding genes, but rather open reading frames that occur by chance. Therefore, the number of annotated genes is higher than the actual number of genes for most of these microbes. Comparison of the length...... distribution of the annotated genes with the length distribution of those matching a known protein reveals that too many short genes are annotated in many genomes. Here we estimate the true number of protein-coding genes for sequenced genomes. Although it is often claimed that Escherichia coli has about 4300...... genes, we show that it probably has only approximately 3800 genes, and that a similar discrepancy exists for almost all published genomes....

  16. Securing optical code-division multiple-access networks with a postswitching coding scheme of signature reconfiguration

    Science.gov (United States)

    Huang, Jen-Fa; Meng, Sheng-Hui; Lin, Ying-Chen

    2014-11-01

    The optical code-division multiple-access (OCDMA) technique is considered a good candidate for providing optical layer security. An enhanced OCDMA network security mechanism with a pseudonoise (PN) random digital signals type of maximal-length sequence (M-sequence) code switching to protect against eavesdropping is presented. Signature codes unique to individual OCDMA-network users are reconfigured according to the register state of the controlling electrical shift registers. Examples of signature reconfiguration following state switching of the controlling shift register for both the network user and the eavesdropper are numerically illustrated. Dynamically changing the PN state of the shift register to reconfigure the user signature sequence is shown; this hinders eavesdroppers' efforts to decode correct data sequences. The proposed scheme increases the probability of eavesdroppers committing errors in decoding and thereby substantially enhances the degree of an OCDMA network's confidentiality.

  17. Length and sequence dependence in the association of Huntingtin protein with lipid membranes

    Science.gov (United States)

    Jawahery, Sudi; Nagarajan, Anu; Matysiak, Silvina

    2013-03-01

    There is a fundamental gap in our understanding of how aggregates of mutant Huntingtin protein (htt) with overextended polyglutamine (polyQ) sequences gain the toxic properties that cause Huntington's disease (HD). Experimental studies have shown that the most important step associated with toxicity is the binding of mutant htt aggregates to lipid membranes. Studies have also shown that flanking amino acid sequences around the polyQ sequence directly affect interactions with the lipid bilayer, and that polyQ sequences of greater than 35 glutamine repeats in htt are a characteristic of HD. The key steps that determine how flanking sequences and polyQ length affect the structure of lipid bilayers remain unknown. In this study, we use atomistic molecular dynamics simulations to study the interactions between lipid membranes of varying compositions and polyQ peptides of varying lengths and flanking sequences. We find that overextended polyQ interactions do cause deformation in model membranes, and that the flanking sequences do play a role in intensifying this deformation by altering the shape of the affected regions.

  18. Construction and characterization of a full-length cDNA library for the wheat stripe rust pathogen (Puccinia striiformis f. sp. tritici

    Directory of Open Access Journals (Sweden)

    Chen Xianming

    2007-06-01

    Full Text Available Abstract Background Puccinia striiformis is a plant pathogenic fungus causing stripe rust, one of the most important diseases on cereal crops and grasses worldwide. However, little is know about its genome and genes involved in the biology and pathogenicity of the pathogen. We initiated the functional genomic research of the fungus by constructing a full-length cDNA and determined functions of the first group of genes by sequence comparison of cDNA clones to genes reported in other fungi. Results A full-length cDNA library, consisting of 42,240 clones with an average cDNA insert of 1.9 kb, was constructed using urediniospores of race PST-78 of P. striiformis f. sp. tritici. From 196 sequenced cDNA clones, we determined functions of 73 clones (37.2%. In addition, 36 clones (18.4% had significant homology to hypothetical proteins, 37 clones (18.9% had some homology to genes in other fungi, and the remaining 50 clones (25.5% did not produce any hits. From the 73 clones with functions, we identified 51 different genes encoding protein products that are involved in amino acid metabolism, cell defense, cell cycle, cell signaling, cell structure and growth, energy cycle, lipid and nucleotide metabolism, protein modification, ribosomal protein complex, sugar metabolism, transcription factor, transport metabolism, and virulence/infection. Conclusion The full-length cDNA library is useful in identifying functional genes of P. striiformis.

  19. Full Mitochondrial Genome Sequence of the Sugar Beet Wireworm Limonius californicus (Coleoptera: Elateridae), a Common Agricultural Pest.

    Science.gov (United States)

    Gerritsen, Alida T; New, Daniel D; Robison, Barrie D; Rashed, Arash; Hohenlohe, Paul; Forney, Larry; Rashidi, Mahnaz; Wilson, Cathy M; Settles, Matthew L

    2016-01-21

    We report here the full mitochondrial genome sequence of Limonius californicus, a species of click beetle that is an agricultural pest in its larval form. The circular genome is 16.5 kb and contains 13 protein-coding genes, 2 rRNA genes, and 22 tRNA genes. Copyright © 2016 Gerritsen et al.

  20. Why barcode? High-throughput multiplex sequencing of mitochondrial genomes for molecular systematics.

    Science.gov (United States)

    Timmermans, M J T N; Dodsworth, S; Culverwell, C L; Bocak, L; Ahrens, D; Littlewood, D T J; Pons, J; Vogler, A P

    2010-11-01

    Mitochondrial genome sequences are important markers for phylogenetics but taxon sampling remains sporadic because of the great effort and cost required to acquire full-length sequences. Here, we demonstrate a simple, cost-effective way to sequence the full complement of protein coding mitochondrial genes from pooled samples using the 454/Roche platform. Multiplexing was achieved without the need for expensive indexing tags ('barcodes'). The method was trialled with a set of long-range polymerase chain reaction (PCR) fragments from 30 species of Coleoptera (beetles) sequenced in a 1/16th sector of a sequencing plate. Long contigs were produced from the pooled sequences with sequencing depths ranging from ∼10 to 100× per contig. Species identity of individual contigs was established via three 'bait' sequences matching disparate parts of the mitochondrial genome obtained by conventional PCR and Sanger sequencing. This proved that assembly of contigs from the sequencing pool was correct. Our study produced sequences for 21 nearly complete and seven partial sets of protein coding mitochondrial genes. Combined with existing sequences for 25 taxa, an improved estimate of basal relationships in Coleoptera was obtained. The procedure could be employed routinely for mitochondrial genome sequencing at the species level, to provide improved species 'barcodes' that currently use the cox1 gene only.

  1. Sub-grouping of Plasmodium falciparum 3D7 var genes based on sequence analysis of coding and non-coding regions

    DEFF Research Database (Denmark)

    Lavstsen, Thomas; Salanti, Ali; Jensen, Anja T R

    2003-01-01

    and organization of the 3D7 PfEMP1 repertoire was investigated on the basis of the complete genome sequence. METHODS: Using two tree-building methods we analysed the coding and non-coding sequences of 3D7 var and rif genes as well as var genes of other parasite strains. RESULTS: var genes can be sub...

  2. Syndrome-source-coding and its universal generalization. [error correcting codes for data compression

    Science.gov (United States)

    Ancheta, T. C., Jr.

    1976-01-01

    A method of using error-correcting codes to obtain data compression, called syndrome-source-coding, is described in which the source sequence is treated as an error pattern whose syndrome forms the compressed data. It is shown that syndrome-source-coding can achieve arbitrarily small distortion with the number of compressed digits per source digit arbitrarily close to the entropy of a binary memoryless source. A 'universal' generalization of syndrome-source-coding is formulated which provides robustly effective distortionless coding of source ensembles. Two examples are given, comparing the performance of noiseless universal syndrome-source-coding to (1) run-length coding and (2) Lynch-Davisson-Schalkwijk-Cover universal coding for an ensemble of binary memoryless sources.

  3. Diversity, distribution and dynamics of full-length Copia and Gypsy LTR retroelements in Solanum lycopersicum.

    Science.gov (United States)

    Paz, Rosalía Cristina; Kozaczek, Melisa Eliana; Rosli, Hernán Guillermo; Andino, Natalia Pilar; Sanchez-Puerta, Maria Virginia

    2017-10-01

    Transposable elements are the most abundant components of plant genomes and can dramatically induce genetic changes and impact genome evolution. In the recently sequenced genome of tomato (Solanum lycopersicum), the estimated fraction of elements corresponding to retrotransposons is nearly 62%. Given that tomato is one of the most important vegetable crop cultivated and consumed worldwide, understanding retrotransposon dynamics can provide insight into its evolution and domestication processes. In this study, we performed a genome-wide in silico search of full-length LTR retroelements in the tomato nuclear genome and annotated 736 full-length Gypsy and Copia retroelements. The dispersion level across the 12 chromosomes, the diversity and tissue-specific expression of those elements were estimated. Phylogenetic analysis based on the retrotranscriptase region revealed the presence of 12 major lineages of LTR retroelements in the tomato genome. We identified 97 families, of which 77 and 20 belong to the superfamilies Copia and Gypsy, respectively. Each retroelement family was characterized according to their element size, relative frequencies and insertion time. These analyses represent a valuable resource for comparative genomics within the Solanaceae, transposon-tagging and for the design of cultivar-specific molecular markers in tomato.

  4. Blackout sequence modeling for Atucha-I with MARCH3 code

    International Nuclear Information System (INIS)

    Baron, J.; Bastianelli, B.

    1997-01-01

    The modeling of a blackout sequence in Atucha I nuclear power plant is presented in this paper, as a preliminary phase for a level II probabilistic safety assessment. Such sequence is analyzed with the code MARCH3 from STCP (Source Term Code Package), based on a specific model developed for Atucha, that takes into accounts it peculiarities. The analysis includes all the severe accident phases, from the initial transient (loss of heat sink), loss of coolant through the safety valves, core uncovered, heatup, metal-water reaction, melting and relocation, heatup and failure of the pressure vessel, core-concrete interaction in the reactor cavity, heatup and failure of the containment building (multi-compartmented) due to quasi-static overpressurization. The results obtained permit to visualize the time sequence of these events, as well as provide the basis for source term studies. (author) [es

  5. Full-length genomic analysis of korean porcine sapelovirus strains

    DEFF Research Database (Denmark)

    Son, Kyu-Yeol; Kim, Deok-Song; Kwon, Joseph

    2014-01-01

    the typical picornavirus genome organization; 5'untranslated region (UTR)-L-VP4-VP2-VP3-VP1-2A-2B-2C-3A-3B-3C-3D-3'UTR. Three distinct cis-active RNA elements, the internal ribosome entry site (IRES) in the 5'UTR, a cis-replication element (CRE) in the 2C coding region and 3'UTR were identified...... and their structures were predicted. Interestingly, the structural features of the CRE and 3'UTR were different between PSV strains. The availability of these first complete genome sequences for PSV strains will facilitate future investigations of the molecular pathogenesis and evolutionary characteristics of PSV....

  6. Essential idempotents and simplex codes

    Directory of Open Access Journals (Sweden)

    Gladys Chalom

    2017-01-01

    Full Text Available We define essential idempotents in group algebras and use them to prove that every mininmal abelian non-cyclic code is a repetition code. Also we use them to prove that every minimal abelian code is equivalent to a minimal cyclic code of the same length. Finally, we show that a binary cyclic code is simplex if and only if is of length of the form $n=2^k-1$ and is generated by an essential idempotent.

  7. IdentiCS – Identification of coding sequence and in silico reconstruction of the metabolic network directly from unannotated low-coverage bacterial genome sequence

    Directory of Open Access Journals (Sweden)

    Zeng An-Ping

    2004-08-01

    Full Text Available Abstract Background A necessary step for a genome level analysis of the cellular metabolism is the in silico reconstruction of the metabolic network from genome sequences. The available methods are mainly based on the annotation of genome sequences including two successive steps, the prediction of coding sequences (CDS and their function assignment. The annotation process takes time. The available methods often encounter difficulties when dealing with unfinished error-containing genomic sequence. Results In this work a fast method is proposed to use unannotated genome sequence for predicting CDSs and for an in silico reconstruction of metabolic networks. Instead of using predicted genes or CDSs to query public databases, entries from public DNA or protein databases are used as queries to search a local database of the unannotated genome sequence to predict CDSs. Functions are assigned to the predicted CDSs simultaneously. The well-annotated genome of Salmonella typhimurium LT2 is used as an example to demonstrate the applicability of the method. 97.7% of the CDSs in the original annotation are correctly identified. The use of SWISS-PROT-TrEMBL databases resulted in an identification of 98.9% of CDSs that have EC-numbers in the published annotation. Furthermore, two versions of sequences of the bacterium Klebsiella pneumoniae with different genome coverage (3.9 and 7.9 fold, respectively are examined. The results suggest that a 3.9-fold coverage of the bacterial genome could be sufficiently used for the in silico reconstruction of the metabolic network. Compared to other gene finding methods such as CRITICA our method is more suitable for exploiting sequences of low genome coverage. Based on the new method, a program called IdentiCS (Identification of Coding Sequences from Unfinished Genome Sequences is delivered that combines the identification of CDSs with the reconstruction, comparison and visualization of metabolic networks (free to download

  8. Interactive QR code beautification with full background image embedding

    Science.gov (United States)

    Lin, Lijian; Wu, Song; Liu, Sijiang; Jiang, Bo

    2017-06-01

    QR (Quick Response) code is a kind of two dimensional barcode that was first developed in automotive industry. Nowadays, QR code has been widely used in commercial applications like product promotion, mobile payment, product information management, etc. Traditional QR codes in accordance with the international standard are reliable and fast to decode, but are lack of aesthetic appearance to demonstrate visual information to customers. In this work, we present a novel interactive method to generate aesthetic QR code. By given information to be encoded and an image to be decorated as full QR code background, our method accepts interactive user's strokes as hints to remove undesired parts of QR code modules based on the support of QR code error correction mechanism and background color thresholds. Compared to previous approaches, our method follows the intention of the QR code designer, thus can achieve more user pleasant result, while keeping high machine readability.

  9. Drug resistance is conferred on the model yeast Saccharomyces cerevisiae by expression of full-length melanoma-associated human ATP-binding cassette transporter ABCB5.

    Science.gov (United States)

    Keniya, Mikhail V; Holmes, Ann R; Niimi, Masakazu; Lamping, Erwin; Gillet, Jean-Pierre; Gottesman, Michael M; Cannon, Richard D

    2014-10-06

    ABCB5, an ATP-binding cassette (ABC) transporter, is highly expressed in melanoma cells, and may contribute to the extreme resistance of melanomas to chemotherapy by efflux of anti-cancer drugs. Our goal was to determine whether we could functionally express human ABCB5 in the model yeast Saccharomyces cerevisiae, in order to demonstrate an efflux function for ABCB5 in the absence of background pump activity from other human transporters. Heterologous expression would also facilitate drug discovery for this important target. DNAs encoding ABCB5 sequences were cloned into the chromosomal PDR5 locus of a S. cerevisiae strain in which seven endogenous ABC transporters have been deleted. Protein expression in the yeast cells was monitored by immunodetection using both a specific anti-ABCB5 antibody and a cross-reactive anti-ABCB1 antibody. ABCB5 function in recombinant yeast cells was measured by determining whether the cells possessed increased resistance to known pump substrates, compared to the host yeast strain, in assays of yeast growth. Three ABCB5 constructs were made in yeast. One was derived from the ABCB5-β mRNA, which is highly expressed in human tissues but is a truncation of a canonical full-size ABC transporter. Two constructs contained full-length ABCB5 sequences: either a native sequence from cDNA or a synthetic sequence codon-harmonized for S. cerevisiae. Expression of all three constructs in yeast was confirmed by immunodetection. Expression of the codon-harmonized full-length ABCB5 DNA conferred increased resistance, relative to the host yeast strain, to the putative substrates rhodamine 123, daunorubicin, tetramethylrhodamine, FK506, or clorgyline. We conclude that full-length ABCB5 can be functionally expressed in S. cerevisiae and confers drug resistance.

  10. Characterisation of Toxoplasma gondii isolates using polymerase chain reaction (PCR) and restriction fragment length polymorphism (RFLP) of the non-coding Toxoplasma gondii (TGR)-gene sequences

    DEFF Research Database (Denmark)

    Høgdall, Estrid; Vuust, Jens; Lind, Peter

    2000-01-01

    of using TGR gene variants as markers to distinguish among T. gondii isolates from different animals and different geographical sources. Based on the band patterns obtained by restriction fragment length polymorphism (RFLP) analysis of the polymerase chain reaction (PCR) amplified TGR sequences, the T...

  11. Variable-Length Coding with Stop-Feedback for the Common-Message Broadcast Channel

    DEFF Research Database (Denmark)

    Trillingsgaard, Kasper Fløe; Yang, Wei; Durisi, Giuseppe

    2016-01-01

    This paper investigates the maximum coding rate over a K-user discrete memoryless broadcast channel for the scenario where a common message is transmitted using variable-length stop-feedback codes. Specifically, upon decoding the common message, each decoder sends a stop signal to the encoder...... of these bounds reveal that---contrary to the point-to-point case---the second-order term in the asymptotic expansion of the maximum coding rate decays inversely proportional to the square root of the average blocklength. This holds for certain nontrivial common-message broadcast channels, such as the binary......, which transmits continuously until it receives all K stop signals. We present nonasymptotic achievability and converse bounds for the maximum coding rate, which strengthen and generalize the bounds previously reported in Trillingsgaard et al. (2015) for the two-user case. An asymptotic analysis...

  12. IN-MACA-MCC: Integrated Multiple Attractor Cellular Automata with Modified Clonal Classifier for Human Protein Coding and Promoter Prediction

    Directory of Open Access Journals (Sweden)

    Kiran Sree Pokkuluri

    2014-01-01

    Full Text Available Protein coding and promoter region predictions are very important challenges of bioinformatics (Attwood and Teresa, 2000. The identification of these regions plays a crucial role in understanding the genes. Many novel computational and mathematical methods are introduced as well as existing methods that are getting refined for predicting both of the regions separately; still there is a scope for improvement. We propose a classifier that is built with MACA (multiple attractor cellular automata and MCC (modified clonal classifier to predict both regions with a single classifier. The proposed classifier is trained and tested with Fickett and Tung (1992 datasets for protein coding region prediction for DNA sequences of lengths 54, 108, and 162. This classifier is trained and tested with MMCRI datasets for protein coding region prediction for DNA sequences of lengths 252 and 354. The proposed classifier is trained and tested with promoter sequences from DBTSS (Yamashita et al., 2006 dataset and nonpromoters from EID (Saxonov et al., 2000 and UTRdb (Pesole et al., 2002 datasets. The proposed model can predict both regions with an average accuracy of 90.5% for promoter and 89.6% for protein coding region predictions. The specificity and sensitivity values of promoter and protein coding region predictions are 0.89 and 0.92, respectively.

  13. Subtype-independent near full-length HIV-1 genome sequencing and assembly to be used in large molecular epidemiological studies and clinical management.

    Science.gov (United States)

    Grossmann, Sebastian; Nowak, Piotr; Neogi, Ujjwal

    2015-01-01

    HIV-1 near full-length genome (HIV-NFLG) sequencing from plasma is an attractive multidimensional tool to apply in large-scale population-based molecular epidemiological studies. It also enables genotypic resistance testing (GRT) for all drug target sites allowing effective intervention strategies for control and prevention in high-risk population groups. Thus, the main objective of this study was to develop a simplified subtype-independent, cost- and labour-efficient HIV-NFLG protocol that can be used in clinical management as well as in molecular epidemiological studies. Plasma samples (n=30) were obtained from HIV-1B (n=10), HIV-1C (n=10), CRF01_AE (n=5) and CRF01_AG (n=5) infected individuals with minimum viral load >1120 copies/ml. The amplification was performed with two large amplicons of 5.5 kb and 3.7 kb, sequenced with 17 primers to obtain HIV-NFLG. GRT was validated against ViroSeq™ HIV-1 Genotyping System. After excluding four plasma samples with low-quality RNA, a total of 26 samples were attempted. Among them, NFLG was obtained from 24 (92%) samples with the lowest viral load being 3000 copies/ml. High (>99%) concordance was observed between HIV-NFLG and ViroSeq™ when determining the drug resistance mutations (DRMs). The N384I connection mutation was additionally detected by NFLG in two samples. Our high efficiency subtype-independent HIV-NFLG is a simple and promising approach to be used in large-scale molecular epidemiological studies. It will facilitate the understanding of the HIV-1 pandemic population dynamics and outline effective intervention strategies. Furthermore, it can potentially be applicable in clinical management of drug resistance by evaluating DRMs against all available antiretrovirals in a single assay.

  14. Full-length Ebola glycoprotein accumulates in the endoplasmic reticulum

    Directory of Open Access Journals (Sweden)

    Bhattacharyya Suchita

    2011-01-01

    Full Text Available Abstract The Filoviridae family comprises of Ebola and Marburg viruses, which are known to cause lethal hemorrhagic fever. However, there is no effective anti-viral therapy or licensed vaccines currently available for these human pathogens. The envelope glycoprotein (GP of Ebola virus, which mediates entry into target cells, is cytotoxic and this effect maps to a highly glycosylated mucin-like region in the surface subunit of GP (GP1. However, the mechanism underlying this cytotoxic property of GP is unknown. To gain insight into the basis of this GP-induced cytotoxicity, HEK293T cells were transiently transfected with full-length and mucin-deleted (Δmucin Ebola GP plasmids and GP localization was examined relative to the nucleus, endoplasmic reticulum (ER, Golgi, early and late endosomes using deconvolution fluorescent microscopy. Full-length Ebola GP was observed to accumulate in the ER. In contrast, GPΔmucin was uniformly expressed throughout the cell and did not localize in the ER. The Ebola major matrix protein VP40 was also co-expressed with GP to investigate its influence on GP localization. GP and VP40 co-expression did not alter GP localization to the ER. Also, when VP40 was co-expressed with the nucleoprotein (NP, it localized to the plasma membrane while NP accumulated in distinct cytoplasmic structures lined with vimentin. These latter structures are consistent with aggresomes and may serve as assembly sites for filoviral nucleocapsids. Collectively, these data suggest that full-length GP, but not GPΔmucin, accumulates in the ER in close proximity to the nuclear membrane, which may underscore its cytotoxic property.

  15. Application and Analysis of Performance of DQPSK Advanced Modulation Format in Spectral Amplitude Coding OCDMA

    Directory of Open Access Journals (Sweden)

    Abdul Latif Memon

    2014-04-01

    Full Text Available SAC (Spectral Amplitude Coding is a technique of OCDMA (Optical Code Division Multiple Access to encode and decode data bits by utilizing spectral components of the broadband source. Usually OOK (ON-Off-Keying modulation format is used in this encoding scheme. To make SAC OCDMA network spectrally efficient, advanced modulation format of DQPSK (Differential Quaternary Phase Shift Keying is applied, simulated and analyzed. m-sequence code is encoded in the simulated setup. Performance regarding various lengths of m-sequence code is also analyzed and displayed in the pictorial form. The results of the simulation are evaluated with the help of electrical constellation diagram, eye diagram and bit error rate graph. All the graphs indicate better transmission quality in case of advanced modulation format of DQPSK used in SAC OCDMA network as compared with OOK

  16. VP1u phospholipase activity is critical for infectivity of full-length parvovirus B19 genomic clones

    OpenAIRE

    Filippone, Claudia; Zhi, Ning; Wong, Susan; Lu, Jun; Kajigaya, Sachiko; Gallinella, Giorgio; Kakkola, Laura; Söderlund-Venermo, Maria; Young, Neal S.; Brown, Kevin E.

    2008-01-01

    Three full-length genomic clones (pB19-M20, pB19-FL and pB19-HG1) of parvovirus B19 were produced in different laboratories. pB19-M20 was shown to produce infectious virus. To determine the differences in infectivity, all three plasmids were tested by transfection and infection assays. All three clones were similar in viral DNA replication, RNA transcription, and viral capsid protein production. However, only pB19-M20 and pB19-HG1 produced infectious virus. Comparison of viral sequences showe...

  17. Read length and repeat resolution: Exploring prokaryote genomes using next-generation sequencing technologies

    KAUST Repository

    Cahill, Matt J.

    2010-07-12

    Background: There are a growing number of next-generation sequencing technologies. At present, the most cost-effective options also produce the shortest reads. However, even for prokaryotes, there is uncertainty concerning the utility of these technologies for the de novo assembly of complete genomes. This reflects an expectation that short reads will be unable to resolve small, but presumably abundant, repeats. Methodology/Principal Findings: Using a simple model of repeat assembly, we develop and test a technique that, for any read length, can estimate the occurrence of unresolvable repeats in a genome, and thus predict the number of gaps that would need to be closed to produce a complete sequence. We apply this technique to 818 prokaryote genome sequences. This provides a quantitative assessment of the relative performance of various lengths. Notably, unpaired reads of only 150nt can reconstruct approximately 50% of the analysed genomes with fewer than 96 repeat-induced gaps. Nonetheless, there is considerable variation amongst prokaryotes. Some genomes can be assembled to near contiguity using very short reads while others require much longer reads. Conclusions: Given the diversity of prokaryote genomes, a sequencing strategy should be tailored to the organism under study. Our results will provide researchers with a practical resource to guide the selection of the appropriate read length. 2010 Cahill et al.

  18. Read length and repeat resolution: Exploring prokaryote genomes using next-generation sequencing technologies

    KAUST Repository

    Cahill, Matt J.; Kö ser, Claudio U.; Ross, Nicholas E.; Archer, John A.C.

    2010-01-01

    Background: There are a growing number of next-generation sequencing technologies. At present, the most cost-effective options also produce the shortest reads. However, even for prokaryotes, there is uncertainty concerning the utility of these technologies for the de novo assembly of complete genomes. This reflects an expectation that short reads will be unable to resolve small, but presumably abundant, repeats. Methodology/Principal Findings: Using a simple model of repeat assembly, we develop and test a technique that, for any read length, can estimate the occurrence of unresolvable repeats in a genome, and thus predict the number of gaps that would need to be closed to produce a complete sequence. We apply this technique to 818 prokaryote genome sequences. This provides a quantitative assessment of the relative performance of various lengths. Notably, unpaired reads of only 150nt can reconstruct approximately 50% of the analysed genomes with fewer than 96 repeat-induced gaps. Nonetheless, there is considerable variation amongst prokaryotes. Some genomes can be assembled to near contiguity using very short reads while others require much longer reads. Conclusions: Given the diversity of prokaryote genomes, a sequencing strategy should be tailored to the organism under study. Our results will provide researchers with a practical resource to guide the selection of the appropriate read length. 2010 Cahill et al.

  19. Joint Source-Channel Decoding of Variable-Length Codes with Soft Information: A Survey

    Science.gov (United States)

    Guillemot, Christine; Siohan, Pierre

    2005-12-01

    Multimedia transmission over time-varying wireless channels presents a number of challenges beyond existing capabilities conceived so far for third-generation networks. Efficient quality-of-service (QoS) provisioning for multimedia on these channels may in particular require a loosening and a rethinking of the layer separation principle. In that context, joint source-channel decoding (JSCD) strategies have gained attention as viable alternatives to separate decoding of source and channel codes. A statistical framework based on hidden Markov models (HMM) capturing dependencies between the source and channel coding components sets the foundation for optimal design of techniques of joint decoding of source and channel codes. The problem has been largely addressed in the research community, by considering both fixed-length codes (FLC) and variable-length source codes (VLC) widely used in compression standards. Joint source-channel decoding of VLC raises specific difficulties due to the fact that the segmentation of the received bitstream into source symbols is random. This paper makes a survey of recent theoretical and practical advances in the area of JSCD with soft information of VLC-encoded sources. It first describes the main paths followed for designing efficient estimators for VLC-encoded sources, the key component of the JSCD iterative structure. It then presents the main issues involved in the application of the turbo principle to JSCD of VLC-encoded sources as well as the main approaches to source-controlled channel decoding. This survey terminates by performance illustrations with real image and video decoding systems.

  20. Using RNA-Seq to assemble a rose transcriptome with more than 13,000 full-length expressed genes and to develop the WagRhSNP 68k Axiom SNP array for rose (Rosa L.

    Directory of Open Access Journals (Sweden)

    Carole F S Koning-Boucoiran

    2015-04-01

    Full Text Available In order to develop a versatile and large SNP array for rose, we set out to mine ESTs from diverse sets of rose germplasm. For this RNA-Seq libraries containing about 700 million reads were generated from tetraploid cut and garden roses using Illumina paired-end sequencing, and from diploid Rosa multiflora using 454 sequencing. Separate de novo assemblies were performed in order to identify single nucleotide polymorphisms (SNPs within and between rose varieties. SNPs among tetraploid roses were selected for constructing a genotyping array that can be employed for genetic mapping and marker-trait association discovery in breeding programs based on tetraploid germplasm, both from cut roses and from garden roses. In total 68,893 SNPs were included on the WagRhSNP Axiom array.Next, an orthology-guided assembly was performed for the construction of a non-redundant rose transcriptome database. A total of 21,740 transcripts had significant hits with orthologous genes in the strawberry (Fragaria vesca L. genome. Of these 13,390 appeared to contain the full-length coding regions. This newly established transcriptome resource adds considerably to the currently available sequence resources for the Rosaceae family in general and the genus Rosa in particular.

  1. A comparative phylogenetic analysis of full-length mariner elements

    Indian Academy of Sciences (India)

    Mariner like elements (MLEs) are widely distributed type II transposons with an open reading frame (ORF) for transposase. We studied comparative phylogenetic evolution and inverted terminal repeat (ITR) conservation of MLEs from Indian saturniid silkmoth, Antheraea mylitta with other full length MLEs submitted in the ...

  2. Sequencing and characterization of the complete mitochondrial genome of Japanese Swellshark (Cephalloscyllium umbratile)

    OpenAIRE

    Zhu, Ke-Cheng; Liang, Yin-Yin; Wu, Na; Guo, Hua-Yang; Zhang, Nan; Jiang, Shi-Gui; Zhang, Dian-Chang

    2017-01-01

    To further comprehend the genome features of Cephalloscyllium umbratile (Carcharhiniformes), an endangered species, the complete mitochondrial DNA (mtDNA) was firstly sequenced and annotated. The full-length mtDNA of C. umbratile was 16,697 bp and contained ribosomal RNA (rRNA) genes, 13 protein-coding genes (PCGs), 23 transfer RNA (tRNA) genes, and a major non-coding control region. Each PCG was initiated by an authoritative ATN codon, except for COX1 initiated by a GTG codon. Seven of 13 PC...

  3. Coding patient emotional cues and concerns in medical consultations: the Verona coding definitions of emotional sequences (VR-CoDES).

    NARCIS (Netherlands)

    Zimmermann, C.; Piccolo, L. del; Bensing, J.; Bergvik, S.; Haes, H. de; Eide, H.; Fletcher, I.; Goss, C.; Heaven, C.; Humphris, G.; Young-Mi, K.; Langewitz, W.; Meeuwesen, L.; Nuebling, M.; Rimondini, M.; Salmon, P.; Dulmen, S. van; Wissow, L.; Zandbelt, L.; Finset, A.

    2011-01-01

    Objective: To present the Verona Coding Definitions of Emotional Sequences (VR-CoDES CC), a consensus based system for coding patient expressions of emotional distress in medical consultations, defined as Cues or Concerns. Methods: The system was developed by an international group of communication

  4. Prevalence of transcription promoters within archaeal operons and coding sequences.

    Science.gov (United States)

    Koide, Tie; Reiss, David J; Bare, J Christopher; Pang, Wyming Lee; Facciotti, Marc T; Schmid, Amy K; Pan, Min; Marzolf, Bruz; Van, Phu T; Lo, Fang-Yin; Pratap, Abhishek; Deutsch, Eric W; Peterson, Amelia; Martin, Dan; Baliga, Nitin S

    2009-01-01

    Despite the knowledge of complex prokaryotic-transcription mechanisms, generalized rules, such as the simplified organization of genes into operons with well-defined promoters and terminators, have had a significant role in systems analysis of regulatory logic in both bacteria and archaea. Here, we have investigated the prevalence of alternate regulatory mechanisms through genome-wide characterization of transcript structures of approximately 64% of all genes, including putative non-coding RNAs in Halobacterium salinarum NRC-1. Our integrative analysis of transcriptome dynamics and protein-DNA interaction data sets showed widespread environment-dependent modulation of operon architectures, transcription initiation and termination inside coding sequences, and extensive overlap in 3' ends of transcripts for many convergently transcribed genes. A significant fraction of these alternate transcriptional events correlate to binding locations of 11 transcription factors and regulators (TFs) inside operons and annotated genes-events usually considered spurious or non-functional. Using experimental validation, we illustrate the prevalence of overlapping genomic signals in archaeal transcription, casting doubt on the general perception of rigid boundaries between coding sequences and regulatory elements.

  5. Molecular characterization of human T-cell lymphotropic virus type 1 full and partial genomes by Illumina massively parallel sequencing technology.

    Directory of Open Access Journals (Sweden)

    Rodrigo Pessôa

    Full Text Available BACKGROUND: Here, we report on the partial and full-length genomic (FLG variability of HTLV-1 sequences from 90 well-characterized subjects, including 48 HTLV-1 asymptomatic carriers (ACs, 35 HTLV-1-associated myelopathy/tropical spastic paraparesis (HAM/TSP and 7 adult T-cell leukemia/lymphoma (ATLL patients, using an Illumina paired-end protocol. METHODS: Blood samples were collected from 90 individuals, and DNA was extracted from the PBMCs to measure the proviral load and to amplify the HTLV-1 FLG from two overlapping fragments. The amplified PCR products were subjected to deep sequencing. The sequencing data were assembled, aligned, and mapped against the HTLV-1 genome with sufficient genetic resemblance and utilized for further phylogenetic analysis. RESULTS: A high-throughput sequencing-by-synthesis instrument was used to obtain an average of 3210- and 5200-fold coverage of the partial (n = 14 and FLG (n = 76 data from the HTLV-1 strains, respectively. The results based on the phylogenetic trees of consensus sequences from partial and FLGs revealed that 86 (95.5% individuals were infected with the transcontinental sub-subtypes of the cosmopolitan subtype (aA and that 4 individuals (4.5% were infected with the Japanese sub-subtypes (aB. A comparison of the nucleotide and amino acids of the FLG between the three clinical settings yielded no correlation between the sequenced genotype and clinical outcomes. The evolutionary relationships among the HTLV sequences were inferred from nucleotide sequence, and the results are consistent with the hypothesis that there were multiple introductions of the transcontinental subtype in Brazil. CONCLUSIONS: This study has increased the number of subtype aA full-length genomes from 8 to 81 and HTLV-1 aB from 2 to 5 sequences. The overall data confirmed that the cosmopolitan transcontinental sub-subtypes were the most prevalent in the Brazilian population. It is hoped that this valuable genomic data

  6. Hybridization-based reconstruction of small non-coding RNA transcripts from deep sequencing data.

    Science.gov (United States)

    Ragan, Chikako; Mowry, Bryan J; Bauer, Denis C

    2012-09-01

    Recent advances in RNA sequencing technology (RNA-Seq) enables comprehensive profiling of RNAs by producing millions of short sequence reads from size-fractionated RNA libraries. Although conventional tools for detecting and distinguishing non-coding RNAs (ncRNAs) from reference-genome data can be applied to sequence data, ncRNA detection can be improved by harnessing the full information content provided by this new technology. Here we present NorahDesk, the first unbiased and universally applicable method for small ncRNAs detection from RNA-Seq data. NorahDesk utilizes the coverage-distribution of small RNA sequence data as well as thermodynamic assessments of secondary structure to reliably predict and annotate ncRNA classes. Using publicly available mouse sequence data from brain, skeletal muscle, testis and ovary, we evaluated our method with an emphasis on the performance for microRNAs (miRNAs) and piwi-interacting small RNA (piRNA). We compared our method with Dario and mirDeep2 and found that NorahDesk produces longer transcripts with higher read coverage. This feature makes it the first method particularly suitable for the prediction of both known and novel piRNAs.

  7. Irradiation performance of full-length metallic IFR fuels

    International Nuclear Information System (INIS)

    Tsai, H.; Neimark, L.A.

    1992-07-01

    An assembly irradiation of 169 full-length U-Pu-Zr metallic fuel pins was successfully completed in FFTF to a goal burnup of 10 at.%. All test fuel pins maintained their cladding integrity during the irradiation. Postirradiation examination showed minimal fuel/cladding mechanical interaction and excellent stability of the fuel column. Fission-gas release was normal and consistent with the existing data base from irradiation testing of shorter metallic fuel pins in EBR-II

  8. The Complete Sequence of a Human Parainfluenzavirus 4 Genome

    Science.gov (United States)

    Yea, Carmen; Cheung, Rose; Collins, Carol; Adachi, Dena; Nishikawa, John; Tellier, Raymond

    2009-01-01

    Although the human parainfluenza virus 4 (HPIV4) has been known for a long time, its genome, alone among the human paramyxoviruses, has not been completely sequenced to date. In this study we obtained the first complete genomic sequence of HPIV4 from a clinical isolate named SKPIV4 obtained at the Hospital for Sick Children in Toronto (Ontario, Canada). The coding regions for the N, P/V, M, F and HN proteins show very high identities (95% to 97%) with previously available partial sequences for HPIV4B. The sequence for the L protein and the non-coding regions represent new information. A surprising feature of the genome is its length, more than 17 kb, making it the longest genome within the genus Rubulavirus, although the length is well within the known range of 15 kb to 19 kb for the subfamily Paramyxovirinae. The availability of a complete genomic sequence will facilitate investigations on a respiratory virus that is still not completely characterized. PMID:21994536

  9. A leaf sequencing algorithm to enlarge treatment field length in IMRT

    International Nuclear Information System (INIS)

    Xia Ping; Hwang, Andrew B.; Verhey, Lynn J.

    2002-01-01

    With MLC-based IMRT, the maximum usable field size is often smaller than the maximum field size for conventional treatments. This is due to the constraints of the overtravel distances of MLC leaves and/or jaws. Using a new leaf sequencing algorithm, the usable IMRT field length (perpendicular to the MLC motion) can be mostly made equal to the full length of the MLC field without violating the upper jaw overtravel limit. For any given intensity pattern, a criterion was proposed to assess whether an intensity pattern can be delivered without violation of the jaw position constraints. If the criterion is met, the new algorithm will consider the jaw position constraints during the segmentation for the step and shoot delivery method. The strategy employed by the algorithm is to connect the intensity elements outside the jaw overtravel limits with those inside the jaw overtravel limits. Several methods were used to establish these connections during segmentation by modifying a previously published algorithm (areal algorithm), including changing the intensity level, alternating the leaf-sequencing direction, or limiting the segment field size. The algorithm was tested with 1000 random intensity patterns with dimensions of 21x27 cm2, 800 intensity patterns with higher intensity outside the jaw overtravel limit, and three different types of clinical treatment plans that were undeliverable using a segmentation method from a commercial treatment planning system. The new algorithm achieved a success rate of 100% with these test patterns. For the 1000 random patterns, the new algorithm yields a similar average number of segments of 36.9±2.9 in comparison to 36.6±1.3 when using the areal algorithm. For the 800 patterns with higher intensities outside the jaw overtravel limits, the new algorithm results in an increase of 25% in the average number of segments compared to the areal algorithm. However, the areal algorithm fails to create deliverable segments for 90% of these

  10. Seismic inference of 57 stars using full-length Kepler data sets

    Directory of Open Access Journals (Sweden)

    Creevey Orlagh

    2017-01-01

    Full Text Available We present stellar properties of 57 stars from a seismic inference using full-length data sets from Kepler (mass, age, radius, distances. These stars comprise active stars, planet-hosts, solar-analogs, and binary systems. We validate the distances derived from the astrometric Gaia-Tycho solution. Ensemble analysis of the stellar properties reveals a trend of mixing-length parameter with the surface gravity and effective temperature. We derive a linear relationship with the seismic quantity ‹r02› to estimate the stellar age. Finally, we define the stellar regimes where the Kjeldsen et al (2008 empirical surface correction for 1D model frequencies is valid.

  11. Comparative analysis of full genomic sequences among different genotypes of dengue virus type 3

    Directory of Open Access Journals (Sweden)

    Lin Ting-Hsiang

    2008-05-01

    Full Text Available Abstract Background Although the previous study demonstrated the envelope protein of dengue viruses is under purifying selection pressure, little is known about the genetic differences of full-length viral genomes of DENV-3. In our study, complete genomic sequencing of DENV-3 strains collected from different geographical locations and isolation years were determined and the sequence diversity as well as selection pressure sites in the DENV genome other than within the E gene were also analyzed. Results Using maximum likelihood and Bayesian approaches, our phylogenetic analysis revealed that the Taiwan's indigenous DENV-3 isolated from 1994 and 1998 dengue/DHF epidemics and one 1999 sporadic case were of the three different genotypes – I, II, and III, each associated with DENV-3 circulating in Indonesia, Thailand and Sri Lanka, respectively. Sequence diversity and selection pressure of different genomic regions among DENV-3 different genotypes was further examined to understand the global DENV-3 evolution. The highest nucleotide sequence diversity among the fully sequenced DENV-3 strains was found in the nonstructural protein 2A (mean ± SD: 5.84 ± 0.54 and envelope protein gene regions (mean ± SD: 5.04 ± 0.32. Further analysis found that positive selection pressure of DENV-3 may occur in the non-structural protein 1 gene region and the positive selection site was detected at position 178 of the NS1 gene. Conclusion Our study confirmed that the envelope protein is under purifying selection pressure although it presented higher sequence diversity. The detection of positive selection pressure in the non-structural protein along genotype II indicated that DENV-3 originated from Southeast Asia needs to monitor the emergence of DENV strains with epidemic potential for better epidemic prevention and vaccine development.

  12. Full length prototype SSC dipole test results

    International Nuclear Information System (INIS)

    Strait, J.; Brown, B.C.; Carson, J.

    1987-01-01

    Results are presented from tests of the first full length prototype SSC dipole magnet. The cryogenic behavior of the magnet during a slow cooldown to 4.5K and a slow warmup to room temperature has been measured. Magnetic field quality was measured at currents up to 2000 A. Averaged over the body field all harmonics with the exception of b 2 and b 8 are at or within the tolerances specified by the SSC Central Design Group. (The values of b 2 and b 8 result from known design and construction defects which will be be corrected in later magnets.) Using an NMR probe the average body field strength is measured to be 10.283 G/A with point to point variations on the order of one part in 1000. Data are presented on quench behavior of the magnet up to 3500 A (approximately 55% of full field) including longitudinal and transverse velocities for the first 250 msec of the quench

  13. BEAVRS full core burnup calculation in hot full power condition by RMC code

    International Nuclear Information System (INIS)

    Liu, Shichang; Liang, Jingang; Wu, Qu; Guo, JuanJuan; Huang, Shanfang; Tang, Xiao; Li, Zeguang; Wang, Kan

    2017-01-01

    Highlights: • TMS and thermal scattering interpolation were developed to treat cross sections OTF. • Hybrid coupling system was developed for HFP burnup calculation of BEAVRS benchmark. • Domain decomposition was applied to handle memory problem of full core burnup. • Critical boron concentration with burnup by RMC agrees with the benchmark results. • RMC is capable of multi-physics coupling for simulations of nuclear reactors in HFP. - Abstract: Monte Carlo method can provide high fidelity neutronics analysis of different types of nuclear reactors, owing to its advantages of the flexible geometry modeling and the use of continuous-energy nuclear cross sections. However, nuclear reactors are complex systems with multi-physics interacting and coupling. MC codes can couple with depletion solver and thermal-hydraulics (T/H) codes simultaneously for the “transport-burnup-thermal-hydraulics” coupling calculations. MIT BEAVRS is a typical “transport-burnup-thermal-hydraulics” coupling benchmark. In this paper, RMC was coupled with sub-channel code COBRA, equipped with on-the-fly temperature-dependent cross section treatment and large-scale detailed burnup calculation based on domain decomposition. Then RMC was applied to the full core burnup calculations of BEAVRS benchmark in hot full power (HFP) condition. The numerical tests show that domain decomposition method can achieve the consistent results compared with original version of RMC while enlarging the computational burnup regions. The results of HFP by RMC agree well with the reference values of BEAVRS benchmark and also agree well with those of MC21. This work proves the feasibility and accuracy of RMC in multi-physics coupling and lifecycle simulations of nuclear reactors.

  14. Optimal choice of word length when comparing two Markov sequences using a χ 2-statistic.

    Science.gov (United States)

    Bai, Xin; Tang, Kujin; Ren, Jie; Waterman, Michael; Sun, Fengzhu

    2017-10-03

    Alignment-free sequence comparison using counts of word patterns (grams, k-tuples) has become an active research topic due to the large amount of sequence data from the new sequencing technologies. Genome sequences are frequently modelled by Markov chains and the likelihood ratio test or the corresponding approximate χ 2 -statistic has been suggested to compare two sequences. However, it is not known how to best choose the word length k in such studies. We develop an optimal strategy to choose k by maximizing the statistical power of detecting differences between two sequences. Let the orders of the Markov chains for the two sequences be r 1 and r 2 , respectively. We show through both simulations and theoretical studies that the optimal k= max(r 1 ,r 2 )+1 for both long sequences and next generation sequencing (NGS) read data. The orders of the Markov chains may be unknown and several methods have been developed to estimate the orders of Markov chains based on both long sequences and NGS reads. We study the power loss of the statistics when the estimated orders are used. It is shown that the power loss is minimal for some of the estimators of the orders of Markov chains. Our studies provide guidelines on choosing the optimal word length for the comparison of Markov sequences.

  15. HLA-E regulatory and coding region variability and haplotypes in a Brazilian population sample.

    Science.gov (United States)

    Ramalho, Jaqueline; Veiga-Castelli, Luciana C; Donadi, Eduardo A; Mendes-Junior, Celso T; Castelli, Erick C

    2017-11-01

    The HLA-E gene is characterized by low but wide expression on different tissues. HLA-E is considered a conserved gene, being one of the least polymorphic class I HLA genes. The HLA-E molecule interacts with Natural Killer cell receptors and T lymphocytes receptors, and might activate or inhibit immune responses depending on the peptide associated with HLA-E and with which receptors HLA-E interacts to. Variable sites within the HLA-E regulatory and coding segments may influence the gene function by modifying its expression pattern or encoded molecule, thus, influencing its interaction with receptors and the peptide. Here we propose an approach to evaluate the gene structure, haplotype pattern and the complete HLA-E variability, including regulatory (promoter and 3'UTR) and coding segments (with introns), by using massively parallel sequencing. We investigated the variability of 420 samples from a very admixed population such as Brazilians by using this approach. Considering a segment of about 7kb, 63 variable sites were detected, arranged into 75 extended haplotypes. We detected 37 different promoter sequences (but few frequent ones), 27 different coding sequences (15 representing new HLA-E alleles) and 12 haplotypes at the 3'UTR segment, two of them presenting a summed frequency of 90%. Despite the number of coding alleles, they encode mainly two different full-length molecules, known as E*01:01 and E*01:03, which corresponds to about 90% of all. In addition, differently from what has been previously observed for other non classical HLA genes, the relationship among the HLA-E promoter, coding and 3'UTR haplotypes is not straightforward because the same promoter and 3'UTR haplotypes were many times associated with different HLA-E coding haplotypes. This data reinforces the presence of only two main full-length HLA-E molecules encoded by the many HLA-E alleles detected in our population sample. In addition, this data does indicate that the distal HLA-E promoter is by

  16. The contribution of 700,000 ORF sequence tags to the definition of the human transcriptome

    Science.gov (United States)

    Camargo, Anamaria A.; Samaia, Helena P. B.; Dias-Neto, Emmanuel; Simão, Daniel F.; Migotto, Italo A.; Briones, Marcelo R. S.; Costa, Fernando F.; Aparecida Nagai, Maria; Verjovski-Almeida, Sergio; Zago, Marco A.; Andrade, Luis Eduardo C.; Carrer, Helaine; El-Dorry, Hamza F. A.; Espreafico, Enilza M.; Habr-Gama, Angelita; Giannella-Neto, Daniel; Goldman, Gustavo H.; Gruber, Arthur; Hackel, Christine; Kimura, Edna T.; Maciel, Rui M. B.; Marie, Suely K. N.; Martins, Elizabeth A. L.; Nóbrega, Marina P.; Paçó-Larson, Maria Luisa; Pardini, Maria Inês M. C.; Pereira, Gonçalo G.; Pesquero, João Bosco; Rodrigues, Vanderlei; Rogatto, Silvia R.; da Silva, Ismael D. C. G.; Sogayar, Mari C.; Sonati, Maria de Fátima; Tajara, Eloiza H.; Valentini, Sandro R.; Alberto, Fernando L.; Amaral, Maria Elisabete J.; Aneas, Ivy; Arnaldi, Liliane A. T.; de Assis, Angela M.; Bengtson, Mário Henrique; Bergamo, Nadia Aparecida; Bombonato, Vanessa; de Camargo, Maria E. R.; Canevari, Renata A.; Carraro, Dirce M.; Cerutti, Janete M.; Corrêa, Maria Lucia C.; Corrêa, Rosana F. R.; Costa, Maria Cristina R.; Curcio, Cyntia; Hokama, Paula O. M.; Ferreira, Ari J. S.; Furuzawa, Gilberto K.; Gushiken, Tsieko; Ho, Paulo L.; Kimura, Elza; Krieger, José E.; Leite, Luciana C. C.; Majumder, Paromita; Marins, Mozart; Marques, Everaldo R.; Melo, Analy S. A.; Melo, Monica; Mestriner, Carlos Alberto; Miracca, Elisabete C.; Miranda, Daniela C.; Nascimento, Ana Lucia T. O.; Nóbrega, Francisco G.; Ojopi, Élida P. B.; Pandolfi, José Rodrigo C.; Pessoa, Luciana G.; Prevedel, Aline C.; Rahal, Paula; Rainho, Claudia A.; Reis, Eduardo M. R.; Ribeiro, Marcelo L.; da Rós, Nancy; de Sá, Renata G.; Sales, Magaly M.; Sant'anna, Simone Cristina; dos Santos, Mariana L.; da Silva, Aline M.; da Silva, Neusa P.; Silva, Wilson A.; da Silveira, Rosana A.; Sousa, Josane F.; Stecconi, Daniella; Tsukumo, Fernando; Valente, Valéria; Soares, Fernando; Moreira, Eloisa S.; Nunes, Diana N.; Correa, Ricardo G.; Zalcberg, Heloisa; Carvalho, Alex F.; Reis, Luis F. L.; Brentani, Ricardo R.; Simpson, Andrew J. G.; de Souza, Sandro J.

    2001-01-01

    Open reading frame expressed sequences tags (ORESTES) differ from conventional ESTs by providing sequence data from the central protein coding portion of transcripts. We generated a total of 696,745 ORESTES sequences from 24 human tissues and used a subset of the data that correspond to a set of 15,095 full-length mRNAs as a means of assessing the efficiency of the strategy and its potential contribution to the definition of the human transcriptome. We estimate that ORESTES sampled over 80% of all highly and moderately expressed, and between 40% and 50% of rarely expressed, human genes. In our most thoroughly sequenced tissue, the breast, the 130,000 ORESTES generated are derived from transcripts from an estimated 70% of all genes expressed in that tissue, with an equally efficient representation of both highly and poorly expressed genes. In this respect, we find that the capacity of the ORESTES strategy both for gene discovery and shotgun transcript sequence generation significantly exceeds that of conventional ESTs. The distribution of ORESTES is such that many human transcripts are now represented by a scaffold of partial sequences distributed along the length of each gene product. The experimental joining of the scaffold components, by reverse transcription–PCR, represents a direct route to transcript finishing that may represent a useful alternative to full-length cDNA cloning. PMID:11593022

  17. The contribution of 700,000 ORF sequence tags to the definition of the human transcriptome.

    Science.gov (United States)

    Camargo, A A; Samaia, H P; Dias-Neto, E; Simão, D F; Migotto, I A; Briones, M R; Costa, F F; Nagai, M A; Verjovski-Almeida, S; Zago, M A; Andrade, L E; Carrer, H; El-Dorry, H F; Espreafico, E M; Habr-Gama, A; Giannella-Neto, D; Goldman, G H; Gruber, A; Hackel, C; Kimura, E T; Maciel, R M; Marie, S K; Martins, E A; Nobrega, M P; Paco-Larson, M L; Pardini, M I; Pereira, G G; Pesquero, J B; Rodrigues, V; Rogatto, S R; da Silva, I D; Sogayar, M C; Sonati, M F; Tajara, E H; Valentini, S R; Alberto, F L; Amaral, M E; Aneas, I; Arnaldi, L A; de Assis, A M; Bengtson, M H; Bergamo, N A; Bombonato, V; de Camargo, M E; Canevari, R A; Carraro, D M; Cerutti, J M; Correa, M L; Correa, R F; Costa, M C; Curcio, C; Hokama, P O; Ferreira, A J; Furuzawa, G K; Gushiken, T; Ho, P L; Kimura, E; Krieger, J E; Leite, L C; Majumder, P; Marins, M; Marques, E R; Melo, A S; Melo, M B; Mestriner, C A; Miracca, E C; Miranda, D C; Nascimento, A L; Nobrega, F G; Ojopi, E P; Pandolfi, J R; Pessoa, L G; Prevedel, A C; Rahal, P; Rainho, C A; Reis, E M; Ribeiro, M L; da Ros, N; de Sa, R G; Sales, M M; Sant'anna, S C; dos Santos, M L; da Silva, A M; da Silva, N P; Silva, W A; da Silveira, R A; Sousa, J F; Stecconi, D; Tsukumo, F; Valente, V; Soares, F; Moreira, E S; Nunes, D N; Correa, R G; Zalcberg, H; Carvalho, A F; Reis, L F; Brentani, R R; Simpson, A J; de Souza, S J; Melo, M

    2001-10-09

    Open reading frame expressed sequences tags (ORESTES) differ from conventional ESTs by providing sequence data from the central protein coding portion of transcripts. We generated a total of 696,745 ORESTES sequences from 24 human tissues and used a subset of the data that correspond to a set of 15,095 full-length mRNAs as a means of assessing the efficiency of the strategy and its potential contribution to the definition of the human transcriptome. We estimate that ORESTES sampled over 80% of all highly and moderately expressed, and between 40% and 50% of rarely expressed, human genes. In our most thoroughly sequenced tissue, the breast, the 130,000 ORESTES generated are derived from transcripts from an estimated 70% of all genes expressed in that tissue, with an equally efficient representation of both highly and poorly expressed genes. In this respect, we find that the capacity of the ORESTES strategy both for gene discovery and shotgun transcript sequence generation significantly exceeds that of conventional ESTs. The distribution of ORESTES is such that many human transcripts are now represented by a scaffold of partial sequences distributed along the length of each gene product. The experimental joining of the scaffold components, by reverse transcription-PCR, represents a direct route to transcript finishing that may represent a useful alternative to full-length cDNA cloning.

  18. [Sequencing and analysis of complete genome of rabies viruses isolated from Chinese Ferret-Badger and dog in Zhejiang province].

    Science.gov (United States)

    Lei, Yong-Liang; Wang, Xiao-Guang; Tao, Xiao-Yan; Li, Hao; Meng, Sheng-Li; Chen, Xiu-Ying; Liu, Fu-Ming; Ye, Bi-Feng; Tang, Qing

    2010-01-01

    Based on sequencing the full-length genomes of four Chinese Ferret-Badger and dog, we analyze the properties of rabies viruses genetic variation in molecular level, get the information about rabies viruses prevalence and variation in Zhejiang, and enrich the genome database of rabies viruses street strains isolated from China. Rabies viruses in suckling mice were isolated, overlapped fragments were amplified by RT-PCR and full-length genomes were assembled to analyze the nucleotide and deduced protein similarities and phylogenetic analyses from Chinese Ferret-Badger, dog, sika deer, vole, used vaccine strain were determined. The four full-length genomes were sequenced completely and had the same genetic structure with the length of 11, 923 nts or 11, 925 nts including 58 nts-Leader, 1353 nts-NP, 894 nts-PP, 609 nts-MP, 1575 nts-GP, 6386 nts-LP, and 2, 5, 5 nts- intergenic regions(IGRs), 423 nts-Pseudogene-like sequence (psi), 70 nts-Trailer. The four full-length genomes were in accordance with the properties of Rhabdoviridae Lyssa virus by BLAST and multi-sequence alignment. The nucleotide and amino acid sequences among Chinese strains had the highest similarity, especially among animals of the same species. Of the four full-length genomes, the similarity in amino acid level was dramatically higher than that in nucleotide level, so the nucleotide mutations happened in these four genomes were most synonymous mutations. Compared with the reference rabies viruses, the lengths of the five protein coding regions had no change, no recombination, only with a few point mutations. It was evident that the five proteins appeared to be stable. The variation sites and types of the four genomes were similar to the reference vaccine or street strains. And the four strains were genotype 1 according to the multi-sequence and phylogenetic analyses, which possessed the distinct district characteristics of China. Therefore, these four rabies viruses are likely to be street viruses

  19. Using RNA-Seq to assemble a rose transcriptome with more than 13,000 full-length expressed genes and to develop the WagRhSNP 68k Axiom SNP array for rose (Rosa L.).

    Science.gov (United States)

    Koning-Boucoiran, Carole F S; Esselink, G Danny; Vukosavljev, Mirjana; van 't Westende, Wendy P C; Gitonga, Virginia W; Krens, Frans A; Voorrips, Roeland E; van de Weg, W Eric; Schulz, Dietmar; Debener, Thomas; Maliepaard, Chris; Arens, Paul; Smulders, Marinus J M

    2015-01-01

    In order to develop a versatile and large SNP array for rose, we set out to mine ESTs from diverse sets of rose germplasm. For this RNA-Seq libraries containing about 700 million reads were generated from tetraploid cut and garden roses using Illumina paired-end sequencing, and from diploid Rosa multiflora using 454 sequencing. Separate de novo assemblies were performed in order to identify single nucleotide polymorphisms (SNPs) within and between rose varieties. SNPs among tetraploid roses were selected for constructing a genotyping array that can be employed for genetic mapping and marker-trait association discovery in breeding programs based on tetraploid germplasm, both from cut roses and from garden roses. In total 68,893 SNPs were included on the WagRhSNP Axiom array. Next, an orthology-guided assembly was performed for the construction of a non-redundant rose transcriptome database. A total of 21,740 transcripts had significant hits with orthologous genes in the strawberry (Fragaria vesca L.) genome. Of these 13,390 appeared to contain the full-length coding regions. This newly established transcriptome resource adds considerably to the currently available sequence resources for the Rosaceae family in general and the genus Rosa in particular.

  20. ICRPfinder: a fast pattern design algorithm for coding sequences and its application in finding potential restriction enzyme recognition sites

    Directory of Open Access Journals (Sweden)

    Stafford Phillip

    2009-09-01

    Full Text Available Abstract Background Restriction enzymes can produce easily definable segments from DNA sequences by using a variety of cut patterns. There are, however, no software tools that can aid in gene building -- that is, modifying wild-type DNA sequences to express the same wild-type amino acid sequences but with enhanced codons, specific cut sites, unique post-translational modifications, and other engineered-in components for recombinant applications. A fast DNA pattern design algorithm, ICRPfinder, is provided in this paper and applied to find or create potential recognition sites in target coding sequences. Results ICRPfinder is applied to find or create restriction enzyme recognition sites by introducing silent mutations. The algorithm is shown capable of mapping existing cut-sites but importantly it also can generate specified new unique cut-sites within a specified region that are guaranteed not to be present elsewhere in the DNA sequence. Conclusion ICRPfinder is a powerful tool for finding or creating specific DNA patterns in a given target coding sequence. ICRPfinder finds or creates patterns, which can include restriction enzyme recognition sites, without changing the translated protein sequence. ICRPfinder is a browser-based JavaScript application and it can run on any platform, in on-line or off-line mode.

  1. Full-Length Characterization of Hepatitis C Virus Subtype 3a Reveals Novel Hypervariable Regions under Positive Selection during Acute Infection▿

    OpenAIRE

    Humphreys, Isla; Fleming, Vicki; Fabris, Paolo; Parker, Joe; Schulenberg, Bodo; Brown, Anthony; Demetriou, Charis; Gaudieri, Silvana; Pfafferott, Katja; Lucas, Michaela; Collier, Jane; Huang, Kuan-Hsiang Gary; Pybus, Oliver G.; Klenerman, Paul; Barnes, Eleanor

    2009-01-01

    Hepatitis C virus subtype 3a is a highly prevalent and globally distributed strain that is often associated with infection via injection drug use. This subtype exhibits particular phenotypic characteristics. In spite of this, detailed genetic analysis of this subtype has rarely been performed. We performed full-length viral sequence analysis in 18 patients with chronic HCV subtype 3a infection and assessed genomic viral variability in comparison to other HCV subtypes. Two novel regions of int...

  2. Modeling compositional dynamics based on GC and purine contents of protein-coding sequences

    KAUST Repository

    Zhang, Zhang

    2010-11-08

    Background: Understanding the compositional dynamics of genomes and their coding sequences is of great significance in gaining clues into molecular evolution and a large number of publically-available genome sequences have allowed us to quantitatively predict deviations of empirical data from their theoretical counterparts. However, the quantification of theoretical compositional variations for a wide diversity of genomes remains a major challenge.Results: To model the compositional dynamics of protein-coding sequences, we propose two simple models that take into account both mutation and selection effects, which act differently at the three codon positions, and use both GC and purine contents as compositional parameters. The two models concern the theoretical composition of nucleotides, codons, and amino acids, with no prerequisite of homologous sequences or their alignments. We evaluated the two models by quantifying theoretical compositions of a large collection of protein-coding sequences (including 46 of Archaea, 686 of Bacteria, and 826 of Eukarya), yielding consistent theoretical compositions across all the collected sequences.Conclusions: We show that the compositions of nucleotides, codons, and amino acids are largely determined by both GC and purine contents and suggest that deviations of the observed from the expected compositions may reflect compositional signatures that arise from a complex interplay between mutation and selection via DNA replication and repair mechanisms.Reviewers: This article was reviewed by Zhaolei Zhang (nominated by Mark Gerstein), Guruprasad Ananda (nominated by Kateryna Makova), and Daniel Haft. 2010 Zhang and Yu; licensee BioMed Central Ltd.

  3. Modeling compositional dynamics based on GC and purine contents of protein-coding sequences

    KAUST Repository

    Zhang, Zhang; Yu, Jun

    2010-01-01

    Background: Understanding the compositional dynamics of genomes and their coding sequences is of great significance in gaining clues into molecular evolution and a large number of publically-available genome sequences have allowed us to quantitatively predict deviations of empirical data from their theoretical counterparts. However, the quantification of theoretical compositional variations for a wide diversity of genomes remains a major challenge.Results: To model the compositional dynamics of protein-coding sequences, we propose two simple models that take into account both mutation and selection effects, which act differently at the three codon positions, and use both GC and purine contents as compositional parameters. The two models concern the theoretical composition of nucleotides, codons, and amino acids, with no prerequisite of homologous sequences or their alignments. We evaluated the two models by quantifying theoretical compositions of a large collection of protein-coding sequences (including 46 of Archaea, 686 of Bacteria, and 826 of Eukarya), yielding consistent theoretical compositions across all the collected sequences.Conclusions: We show that the compositions of nucleotides, codons, and amino acids are largely determined by both GC and purine contents and suggest that deviations of the observed from the expected compositions may reflect compositional signatures that arise from a complex interplay between mutation and selection via DNA replication and repair mechanisms.Reviewers: This article was reviewed by Zhaolei Zhang (nominated by Mark Gerstein), Guruprasad Ananda (nominated by Kateryna Makova), and Daniel Haft. 2010 Zhang and Yu; licensee BioMed Central Ltd.

  4. Identification of human chromosome 22 transcribed sequences with ORF expressed sequence tags

    Science.gov (United States)

    de Souza, Sandro J.; Camargo, Anamaria A.; Briones, Marcelo R. S.; Costa, Fernando F.; Nagai, Maria Aparecida; Verjovski-Almeida, Sergio; Zago, Marco A.; Andrade, Luis Eduardo C.; Carrer, Helaine; El-Dorry, Hamza F. A.; Espreafico, Enilza M.; Habr-Gama, Angelita; Giannella-Neto, Daniel; Goldman, Gustavo H.; Gruber, Arthur; Hackel, Christine; Kimura, Edna T.; Maciel, Rui M. B.; Marie, Suely K. N.; Martins, Elizabeth A. L.; Nóbrega, Marina P.; Paçó-Larson, Maria Luisa; Pardini, Maria Inês M. C.; Pereira, Gonçalo G.; Pesquero, João Bosco; Rodrigues, Vanderlei; Rogatto, Silvia R.; da Silva, Ismael D. C. G.; Sogayar, Mari C.; de Fátima Sonati, Maria; Tajara, Eloiza H.; Valentini, Sandro R.; Acencio, Marcio; Alberto, Fernando L.; Amaral, Maria Elisabete J.; Aneas, Ivy; Bengtson, Mário Henrique; Carraro, Dirce M.; Carvalho, Alex F.; Carvalho, Lúcia Helena; Cerutti, Janete M.; Corrêa, Maria Lucia C.; Costa, Maria Cristina R.; Curcio, Cyntia; Gushiken, Tsieko; Ho, Paulo L.; Kimura, Elza; Leite, Luciana C. C.; Maia, Gustavo; Majumder, Paromita; Marins, Mozart; Matsukuma, Adriana; Melo, Analy S. A.; Mestriner, Carlos Alberto; Miracca, Elisabete C.; Miranda, Daniela C.; Nascimento, Ana Lucia T. O.; Nóbrega, Francisco G.; Ojopi, Élida P. B.; Pandolfi, José Rodrigo C.; Pessoa, Luciana Gilbert; Rahal, Paula; Rainho, Claudia A.; da Ro's, Nancy; de Sá, Renata G.; Sales, Magaly M.; da Silva, Neusa P.; Silva, Tereza C.; da Silva, Wilson; Simão, Daniel F.; Sousa, Josane F.; Stecconi, Daniella; Tsukumo, Fernando; Valente, Valéria; Zalcberg, Heloisa; Brentani, Ricardo R.; Reis, Luis F. L.; Dias-Neto, Emmanuel; Simpson, Andrew J. G.

    2000-01-01

    Transcribed sequences in the human genome can be identified with confidence only by alignment with sequences derived from cDNAs synthesized from naturally occurring mRNAs. We constructed a set of 250,000 cDNAs that represent partial expressed gene sequences and that are biased toward the central coding regions of the resulting transcripts. They are termed ORF expressed sequence tags (ORESTES). The 250,000 ORESTES were assembled into 81,429 contigs. Of these, 1,181 (1.45%) were found to match sequences in chromosome 22 with at least one ORESTES contig for 162 (65.6%) of the 247 known genes, for 67 (44.6%) of the 150 related genes, and for 45 of the 148 (30.4%) EST-predicted genes on this chromosome. Using a set of stringent criteria to validate our sequences, we identified a further 219 previously unannotated transcribed sequences on chromosome 22. Of these, 171 were in fact also defined by EST or full length cDNA sequences available in GenBank but not utilized in the initial annotation of the first human chromosome sequence. Thus despite representing less than 15% of all expressed human sequences in the public databases at the time of the present analysis, ORESTES sequences defined 48 transcribed sequences on chromosome 22 not defined by other sequences. All of the transcribed sequences defined by ORESTES coincided with DNA regions predicted as encoding exons by genscan. (http://genes.mit.edu/GENSCAN.html). PMID:11070084

  5. Optical orthogonal code-division multiple-access system - Part 2: Multibits/sequence-period OOCDMA

    Science.gov (United States)

    Kwon, Hyuck M.

    1994-08-01

    In a recently proposed optical orthogonal code division multiple-access (OOCDMA) system, one bit of user's data is transmitted per sequence-period, and a threshold is employed for the final bit decision. In this paper, a system that can transmit multibits per sequence-period is introduced, and avalanche photodiode (APD) noise, thermal noise, and interference, are included. This system, derived by exploiting orthogonal properties of the OOCDMA code sequence and using a maximum search (instead of a threshold) in the final decision, is log(sub 2) F times higher in throughput, where F is sequence-period. For example, four orders of magnitude are better in bit error probability at - 56 dBW received laser power, with F = 1000 chips, 10 'marks' in a sequence, and 10 users of 30 Mb/s data rate for one-bit/sequence-period and 270 Mb/s data rate for multibits/sequence-period system. Furthermore, an exact analysis is performed for the log(sub 2)F bits/sequence-period system with a hard-limiter placed before the receiver, and its performance is compared to the performance without hard-limiter, for the chip-synchronous case. The improvement from using a hard-limiter is significant in the log(sub 2)F bits/sequence-period OCCDMA system.

  6. Coding chaotic billiards. Pt. 3

    International Nuclear Information System (INIS)

    Ullmo, D.; Giannoni, M.J.

    1993-01-01

    Non-tiling compact billiard defined on the pseudosphere is studied 'a la Morse coding'. As for most bounded systems, the coding is non exact. However, two sets of approximate grammar rules can be obtained, one specifying forbidden codes, and the other allowed ones. In-between some sequences remain in the 'unknown' zone, but their relative amount can be reduced to zero if one lets the length of the approximate grammar rules goes to infinity. The relationship between these approximate grammar rules and the 'pruning front' introduced by Cvitanovic et al. is discussed. (authors). 13 refs., 10 figs., 1 tab

  7. CONSTRUCTION OF REGULAR LDPC LIKE CODES BASED ON FULL RANK CODES AND THEIR ITERATIVE DECODING USING A PARITY CHECK TREE

    Directory of Open Access Journals (Sweden)

    H. Prashantha Kumar

    2011-09-01

    Full Text Available Low density parity check (LDPC codes are capacity-approaching codes, which means that practical constructions exist that allow the noise threshold to be set very close to the theoretical Shannon limit for a memory less channel. LDPC codes are finding increasing use in applications like LTE-Networks, digital television, high density data storage systems, deep space communication systems etc. Several algebraic and combinatorial methods are available for constructing LDPC codes. In this paper we discuss a novel low complexity algebraic method for constructing regular LDPC like codes derived from full rank codes. We demonstrate that by employing these codes over AWGN channels, coding gains in excess of 2dB over un-coded systems can be realized when soft iterative decoding using a parity check tree is employed.

  8. Evaluating Open-Source Full-Text Search Engines for Matching ICD-10 Codes.

    Science.gov (United States)

    Jurcău, Daniel-Alexandru; Stoicu-Tivadar, Vasile

    2016-01-01

    This research presents the results of evaluating multiple free, open-source engines on matching ICD-10 diagnostic codes via full-text searches. The study investigates what it takes to get an accurate match when searching for a specific diagnostic code. For each code the evaluation starts by extracting the words that make up its text and continues with building full-text search queries from the combinations of these words. The queries are then run against all the ICD-10 codes until a match indicates the code in question as a match with the highest relative score. This method identifies the minimum number of words that must be provided in order for the search engines choose the desired entry. The engines analyzed include a popular Java-based full-text search engine, a lightweight engine written in JavaScript which can even execute on the user's browser, and two popular open-source relational database management systems.

  9. Spike Code Flow in Cultured Neuronal Networks

    Directory of Open Access Journals (Sweden)

    Shinichi Tamura

    2016-01-01

    Full Text Available We observed spike trains produced by one-shot electrical stimulation with 8 × 8 multielectrodes in cultured neuronal networks. Each electrode accepted spikes from several neurons. We extracted the short codes from spike trains and obtained a code spectrum with a nominal time accuracy of 1%. We then constructed code flow maps as movies of the electrode array to observe the code flow of “1101” and “1011,” which are typical pseudorandom sequence such as that we often encountered in a literature and our experiments. They seemed to flow from one electrode to the neighboring one and maintained their shape to some extent. To quantify the flow, we calculated the “maximum cross-correlations” among neighboring electrodes, to find the direction of maximum flow of the codes with lengths less than 8. Normalized maximum cross-correlations were almost constant irrespective of code. Furthermore, if the spike trains were shuffled in interval orders or in electrodes, they became significantly small. Thus, the analysis suggested that local codes of approximately constant shape propagated and conveyed information across the network. Hence, the codes can serve as visible and trackable marks of propagating spike waves as well as evaluating information flow in the neuronal network.

  10. Kangaroo – A pattern-matching program for biological sequences

    Directory of Open Access Journals (Sweden)

    Betel Doron

    2002-07-01

    Full Text Available Abstract Background Biologists are often interested in performing a simple database search to identify proteins or genes that contain a well-defined sequence pattern. Many databases do not provide straightforward or readily available query tools to perform simple searches, such as identifying transcription binding sites, protein motifs, or repetitive DNA sequences. However, in many cases simple pattern-matching searches can reveal a wealth of information. We present in this paper a regular expression pattern-matching tool that was used to identify short repetitive DNA sequences in human coding regions for the purpose of identifying potential mutation sites in mismatch repair deficient cells. Results Kangaroo is a web-based regular expression pattern-matching program that can search for patterns in DNA, protein, or coding region sequences in ten different organisms. The program is implemented to facilitate a wide range of queries with no restriction on the length or complexity of the query expression. The program is accessible on the web at http://bioinfo.mshri.on.ca/kangaroo/ and the source code is freely distributed at http://sourceforge.net/projects/slritools/. Conclusion A low-level simple pattern-matching application can prove to be a useful tool in many research settings. For example, Kangaroo was used to identify potential genetic targets in a human colorectal cancer variant that is characterized by a high frequency of mutations in coding regions containing mononucleotide repeats.

  11. [Complete genome sequencing and analyses of rabies viruses isolated from wild animals (Chinese Ferret-Badger) in Zhejiang province].

    Science.gov (United States)

    Lei, Yong-Liang; Wang, Xiao-Guang; Liu, Fu-Ming; Chen, Xiu-Ying; Ye, Bi-Feng; Mei, Jian-Hua; Lan, Jin-Quan; Tang, Qing

    2009-08-01

    Based on sequencing the full-length genomes of two Chinese Ferret-Badger, we analyzed the properties of rabies viruses genetic variation in molecular level to get information on prevalence and variation of rabies viruses in Zhejiang, and to enrich the genome database of rabies viruses street strains isolated from Chinese wildlife. Overlapped fragments were amplified by RT-PCR and full-length genomes were assembled to analyze the nucleotide and deduced protein similarities and phylogenetic analyses of the N genes from Chinese Ferret-Badger, sika deer, vole, dog. Vaccine strains were then determined. The two full-length genomes were completely sequenced to find out that they had the same genetic structure with 11 923 nts including 58 nts-Leader, 1353 nts-NP, 894 nts-PP, 609 nts-MP, 1575 nts-GP, 6386 nts-LP, and 2, 5, 5 nts- intergenic regions (IGRs), 423 nts-Pseudogene-like sequence (Psi), 70 nts-Trailer. The two full-length genomes were in accordance with the properties of Rhabdoviridae Lyssa virus by blast and multi-sequence alignment. The nucleotide and amino acid sequences among Chinese strains had the highest similarity, especially among animals of the same species. Of the two full-length genomes, the similarity in amino acid level was dramatically higher than that in nucleotide level, so that the nucleotide mutations happened in these two genomes were most probably as synonymous mutations. Compared to the referenced rabies viruses, the lengths of the five protein coding regions did not show any changes or recombination, but only with a few-point mutations. It was evident that the five proteins appeared to be stable. The variation sites and types of the two ferret badgers genomes were similar to the referenced vaccine or street strains. The two strains were genotype 1 according to the multi-sequence and phylogenetic analyses, which possessing the distinct geographyphic characteristics of China. All the evidence suggested a cue that these two ferret badgers

  12. On the total number of genes and their length distribution in complete microbial genomes

    DEFF Research Database (Denmark)

    Skovgaard, Marie; Jensen, L.J.; Brunak, Søren

    2001-01-01

    In sequenced microbial genomes, some of the annotated genes are actually not protein-coding genes, but rather open reading frames that occur by chance. Therefore, the number of annotated genes is higher than the actual number of genes for most of these microbes. Comparison of the length...... distribution of the annotated genes with the length distribution of those matching a known protein reveals that too many short genes are annotated in many genomes. Here we estimate the true number of protein-coding genes for sequenced genomes. Although it is often claimed that Escherichia coli has about 4300...... genes, we show that it probably has only similar to 3800 genes, and that a similar discrepancy exists for almost all published genomes....

  13. Computer simulation of replacement sequences in copper

    International Nuclear Information System (INIS)

    Schiffgens, J.O.; Schwartz, D.W.; Ariyasu, R.G.; Cascadden, S.E.

    1978-01-01

    Results of computer simulations of , , and replacement sequences in copper are presented, including displacement thresholds, focusing energies, energy losses per replacement, and replacement sequence lengths. These parameters are tabulated for six interatomic potentials and shown to vary in a systematic way with potential stiffness and range. Comparisons of results from calculations made with ADDES, a quasi-dynamical code, and COMENT, a dynamical code, show excellent agreement, demonstrating that the former can be calibrated and used satisfactorily in the analysis of low energy displacement cascades. Upper limits on , , and replacement sequences were found to be approximately 10, approximately 30, and approximately 14 replacements, respectively. (author)

  14. RESEARCH ARTICLE Full length sequencing and novel ...

    Indian Academy of Sciences (India)

    Navya

    2016-12-16

    Dec 16, 2016 ... TOLONE, ANNA MARIA SUTERA, MARIA TERESA SARDINA, BALDASSARE ... finding of novel SNPs that might be important in future studies and laid the .... power, precision and quality to assess the relationship between ...

  15. Full-length cDNA sequence cloning and analysis of Ghrelin in Cervus nippon%梅花鹿Ghrelin全长cDNA克隆及其序列分析

    Institute of Scientific and Technical Information of China (English)

    张曼; 金鑫; 田巧珍; 刘骄; 王云鹤; 杨银凤

    2017-01-01

    为获得梅花鹿Ghrelin eDNA全序列,以梅花鹿皱胃黏膜上皮组织提取的总RNA为模板,通过RT-PCR和RACE法克隆了梅花鹿皱胃中Ghrelin基因eDNA的全序列.结果表明梅花鹿Ghrelin eDNA序列全长为539 bp,其中5’非翻译区(5'UTR)为46 bp,3'UTR为128 bp,开放阅读框(ORF)为351 bp,该ORF编码116个氨基酸残基.将梅花鹿Ghrelin基因的eDNA与人和其他动物的Ghrelin相比,发现:梅花鹿Ghrelin与驯鹿、山羊、绵羊和牛的同源性达90.4%~99.1%;与恒河猴、人、猪、犬的同源性达76.6%~66.9%;与鸡和野鸽的同源性分别为36.4%和35.4%.研究表明Ghrelin的结构具有明显的种属特异性,因此Ghrelin在反刍动物体内可能有着重要的生理功能.%In order to obtain the full-length cDNA of Ghrelin in Cervus nippon,RT-PCR and RACE methods were used by using total RNA of abomasus tissue in C.nippon as template.The results of sequence analysis revealed a 539 bp length cDNA containing 46 bp 5'-untranslated region (5'UTR),128 bp 3'-untranslated region (3'UTR) and 351 bp open reading frame (ORF) encoding 116 amino acids.The cDNA sequence alignments of C.nippon Ghrelin gene with human and other animals showed that the cDNA sequence homology of C.nippon Ghrelin was 90.4%-99.1% to reindeer,goat,sheep and cattle,66.9%-76.6% with rhesus monkey,human,pig and dog,only 36.4% with chicken and C.livia.These results indicated that the structure of Ghrelin displayed an obvious varietal specificity,suggesting that Ghrelin might play an important physiological function role in ruminants.

  16. Compressing DNA sequence databases with coil

    Directory of Open Access Journals (Sweden)

    Hendy Michael D

    2008-05-01

    Full Text Available Abstract Background Publicly available DNA sequence databases such as GenBank are large, and are growing at an exponential rate. The sheer volume of data being dealt with presents serious storage and data communications problems. Currently, sequence data is usually kept in large "flat files," which are then compressed using standard Lempel-Ziv (gzip compression – an approach which rarely achieves good compression ratios. While much research has been done on compressing individual DNA sequences, surprisingly little has focused on the compression of entire databases of such sequences. In this study we introduce the sequence database compression software coil. Results We have designed and implemented a portable software package, coil, for compressing and decompressing DNA sequence databases based on the idea of edit-tree coding. coil is geared towards achieving high compression ratios at the expense of execution time and memory usage during compression – the compression time represents a "one-off investment" whose cost is quickly amortised if the resulting compressed file is transmitted many times. Decompression requires little memory and is extremely fast. We demonstrate a 5% improvement in compression ratio over state-of-the-art general-purpose compression tools for a large GenBank database file containing Expressed Sequence Tag (EST data. Finally, coil can efficiently encode incremental additions to a sequence database. Conclusion coil presents a compelling alternative to conventional compression of flat files for the storage and distribution of DNA sequence databases having a narrow distribution of sequence lengths, such as EST data. Increasing compression levels for databases having a wide distribution of sequence lengths is a direction for future work.

  17. SEAPATH: A microcomputer code for evaluating physical security effectiveness using adversary sequence diagrams

    International Nuclear Information System (INIS)

    Darby, J.L.

    1986-01-01

    The Adversary Sequence Diagram (ASD) concept was developed by Sandia National Laboratories (SNL) to examine physical security system effectiveness. Sandia also developed a mainframe computer code, PANL, to analyze the ASD. The authors have developed a microcomputer code, SEAPATH, which also analyzes ASD's. The Authors are supporting SNL in software development of the SAVI code; SAVI utilizes the SEAPATH algorithm to identify and quantify paths

  18. The Evolution of Bony Vertebrate Enhancers at Odds with Their Coding Sequence Landscape.

    Science.gov (United States)

    Yousaf, Aisha; Sohail Raza, Muhammad; Ali Abbasi, Amir

    2015-08-06

    Enhancers lie at the heart of transcriptional and developmental gene regulation. Therefore, changes in enhancer sequences usually disrupt the target gene expression and result in disease phenotypes. Despite the well-established role of enhancers in development and disease, evolutionary sequence studies are lacking. The current study attempts to unravel the puzzle of bony vertebrates' conserved noncoding elements (CNE) enhancer evolution. Bayesian phylogenetics of enhancer sequences spotlights promising interordinal relationships among placental mammals, proposing a closer relationship between humans and laurasiatherians while placing rodents at the basal position. Clock-based estimates of enhancer evolution provided a dynamic picture of interspecific rate changes across the bony vertebrate lineage. Moreover, coelacanth in the study augmented our appreciation of the vertebrate cis-regulatory evolution during water-land transition. Intriguingly, we observed a pronounced upsurge in enhancer evolution in land-dwelling vertebrates. These novel findings triggered us to further investigate the evolutionary trend of coding as well as CNE nonenhancer repertoires, to highlight the relative evolutionary dynamics of diverse genomic landscapes. Surprisingly, the evolutionary rates of enhancer sequences were clearly at odds with those of the coding and the CNE nonenhancer sequences during vertebrate adaptation to land, with land vertebrates exhibiting significantly reduced rates of coding sequence evolution in comparison to their fast evolving regulatory landscape. The observed variation in tetrapod cis-regulatory elements caused the fine-tuning of associated gene regulatory networks. Therefore, the increased evolutionary rate of tetrapods' enhancer sequences might be responsible for the variation in developmental regulatory circuits during the process of vertebrate adaptation to land. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for

  19. Properties of non-coding DNA and identification of putative cis-regulatory elements in Theileria parva

    Directory of Open Access Journals (Sweden)

    Guo Xiang

    2008-12-01

    Full Text Available Abstract Background Parasites in the genus Theileria cause lymphoproliferative diseases in cattle, resulting in enormous socio-economic losses. The availability of the genome sequences and annotation for T. parva and T. annulata has facilitated the study of parasite biology and their relationship with host cell transformation and tropism. However, the mechanism of transcriptional regulation in this genus, which may be key to understanding fundamental aspects of its parasitology, remains poorly understood. In this study, we analyze the evolution of non-coding sequences in the Theileria genome and identify conserved sequence elements that may be involved in gene regulation of these parasitic species. Results Intergenic regions and introns in Theileria are short, and their length distributions are considerably right-skewed. Intergenic regions flanked by genes in 5'-5' orientation tend to be longer and slightly more AT-rich than those flanked by two stop codons; intergenic regions flanked by genes in 3'-5' orientation have intermediate values of length and AT composition. Intron position is negatively correlated with intron length, and positively correlated with GC content. Using stringent criteria, we identified a set of high-quality orthologous non-coding sequences between T. parva and T. annulata, and determined the distribution of selective constraints across regions, which are shown to be higher close to translation start sites. A positive correlation between constraint and length in both intergenic regions and introns suggests a tight control over length expansion of non-coding regions. Genome-wide searches for functional elements revealed several conserved motifs in intergenic regions of Theileria genomes. Two such motifs are preferentially located within the first 60 base pairs upstream of transcription start sites in T. parva, are preferentially associated with specific protein functional categories, and have significant similarity to know

  20. Some new ternary linear codes

    Directory of Open Access Journals (Sweden)

    Rumen Daskalov

    2017-07-01

    Full Text Available Let an $[n,k,d]_q$ code be a linear code of length $n$, dimension $k$ and minimum Hamming distance $d$ over $GF(q$. One of the most important problems in coding theory is to construct codes with optimal minimum distances. In this paper 22 new ternary linear codes are presented. Two of them are optimal. All new codes improve the respective lower bounds in [11].

  1. Symbolic complexity for nucleotide sequences: a sign of the genome structure

    International Nuclear Information System (INIS)

    Salgado-García, R; Ugalde, E

    2016-01-01

    We introduce a method for estimating the complexity function (which counts the number of observable words of a given length) of a finite symbolic sequence, which we use to estimate the complexity function of coding DNA sequences for several species of the Hominidae family. In all cases, the obtained symbolic complexities show the same characteristic behavior: exponential growth for small word lengths, followed by linear growth for larger word lengths. The symbolic complexities of the species we consider exhibit a systematic trend in correspondence with the phylogenetic tree. Using our method, we estimate the complexity function of sequences obtained by some known evolution models, and in some cases we observe the characteristic exponential-linear growth of the Hominidae coding DNA complexity. Analysis of the symbolic complexity of sequences obtained from a specific evolution model points to the following conclusion: linear growth arises from the random duplication of large segments during the evolution of the genome, while the decrease in the overall complexity from one species to another is due to a difference in the speed of accumulation of point mutations. (paper)

  2. Molecular characterization of human T-cell lymphotropic virus type 1 full and partial genomes by Illumina massively parallel sequencing technology.

    Science.gov (United States)

    Pessôa, Rodrigo; Watanabe, Jaqueline Tomoko; Nukui, Youko; Pereira, Juliana; Casseb, Jorge; Kasseb, Jorge; de Oliveira, Augusto César Penalva; Segurado, Aluisio Cotrim; Sanabani, Sabri Saeed

    2014-01-01

    Here, we report on the partial and full-length genomic (FLG) variability of HTLV-1 sequences from 90 well-characterized subjects, including 48 HTLV-1 asymptomatic carriers (ACs), 35 HTLV-1-associated myelopathy/tropical spastic paraparesis (HAM/TSP) and 7 adult T-cell leukemia/lymphoma (ATLL) patients, using an Illumina paired-end protocol. Blood samples were collected from 90 individuals, and DNA was extracted from the PBMCs to measure the proviral load and to amplify the HTLV-1 FLG from two overlapping fragments. The amplified PCR products were subjected to deep sequencing. The sequencing data were assembled, aligned, and mapped against the HTLV-1 genome with sufficient genetic resemblance and utilized for further phylogenetic analysis. A high-throughput sequencing-by-synthesis instrument was used to obtain an average of 3210- and 5200-fold coverage of the partial (n = 14) and FLG (n = 76) data from the HTLV-1 strains, respectively. The results based on the phylogenetic trees of consensus sequences from partial and FLGs revealed that 86 (95.5%) individuals were infected with the transcontinental sub-subtypes of the cosmopolitan subtype (aA) and that 4 individuals (4.5%) were infected with the Japanese sub-subtypes (aB). A comparison of the nucleotide and amino acids of the FLG between the three clinical settings yielded no correlation between the sequenced genotype and clinical outcomes. The evolutionary relationships among the HTLV sequences were inferred from nucleotide sequence, and the results are consistent with the hypothesis that there were multiple introductions of the transcontinental subtype in Brazil. This study has increased the number of subtype aA full-length genomes from 8 to 81 and HTLV-1 aB from 2 to 5 sequences. The overall data confirmed that the cosmopolitan transcontinental sub-subtypes were the most prevalent in the Brazilian population. It is hoped that this valuable genomic data will add to our current understanding of the

  3. Cloning, sequencing, and expression of cDNA for human β-glucuronidase

    International Nuclear Information System (INIS)

    Oshima, A.; Kyle, J.W.; Miller, R.D.

    1987-01-01

    The authors report here the cDNA sequence for human placental β-glucuronidase (β-D-glucuronoside glucuronosohydrolase, EC 3.2.1.31) and demonstrate expression of the human enzyme in transfected COS cells. They also sequenced a partial cDNA clone from human fibroblasts that contained a 153-base-pair deletion within the coding sequence and found a second type of cDNA clone from placenta that contained the same deletion. Nuclease S1 mapping studies demonstrated two types of mRNAs in human placenta that corresponded to the two types of cDNA clones isolated. The NH 2 -terminal amino acid sequence determined for human spleen β-glucuronidase agreed with that inferred from the DNA sequence of the two placental clones, beginning at amino acid 23, suggesting a cleaved signal sequence of 22 amino acids. When transfected into COS cells, plasmids containing either placental clone expressed an immunoprecipitable protein that contained N-linked oligosaccharides as evidenced by sensitivity to endoglycosidase F. However, only transfection with the clone containing the 153-base-pair segment led to expression of human β-glucuronidase activity. These studies provide the sequence for the full-length cDNA for human β-glucuronidase, demonstrate the existence of two populations of mRNA for β-glucuronidase in human placenta, only one of which specifies a catalytically active enzyme, and illustrate the importance of expression studies in verifying that a cDNA is functionally full-length

  4. Evolutionary dynamics of microsatellite distribution in plants: insight from the comparison of sequenced brassica, Arabidopsis and other angiosperm species.

    Directory of Open Access Journals (Sweden)

    Jiaqin Shi

    Full Text Available Despite their ubiquity and functional importance, microsatellites have been largely ignored in comparative genomics, mostly due to the lack of genomic information. In the current study, microsatellite distribution was characterized and compared in the whole genomes and both the coding and non-coding DNA sequences of the sequenced Brassica, Arabidopsis and other angiosperm species to investigate their evolutionary dynamics in plants. The variation in the microsatellite frequencies of these angiosperm species was much smaller than those for their microsatellite numbers and genome sizes, suggesting that microsatellite frequency may be relatively stable in plants. The microsatellite frequencies of these angiosperm species were significantly negatively correlated with both their genome sizes and transposable elements contents. The pattern of microsatellite distribution may differ according to the different genomic regions (such as coding and non-coding sequences. The observed differences in many important microsatellite characteristics (especially the distribution with respect to motif length, type and repeat number of these angiosperm species were generally accordant with their phylogenetic distance, which suggested that the evolutionary dynamics of microsatellite distribution may be generally consistent with plant divergence/evolution. Importantly, by comparing these microsatellite characteristics (especially the distribution with respect to motif type the angiosperm species (aside from a few species all clustered into two obviously different groups that were largely represented by monocots and dicots, suggesting a complex and generally dichotomous evolutionary pattern of microsatellite distribution in angiosperms. Polyploidy may lead to a slight increase in microsatellite frequency in the coding sequences and a significant decrease in microsatellite frequency in the whole genome/non-coding sequences, but have little effect on the microsatellite

  5. Analysis of full coding sequence of the TP53 gene in invasive vulvar cancers: Implications for therapy.

    Science.gov (United States)

    Kashofer, Karl; Regauer, Sigrid

    2017-08-01

    This study evaluates the frequency and type of TP53 gene mutations and HPV status in 72 consecutively diagnosed primary invasive vulvar squamous cell carcinomas (SCC) during the past 5years. DNA of formalin-fixed and paraffin embedded tumour tissue was analysed for 32 HPV subtypes and the full coding sequence of the TP53 gene, and correlated with results of p53 immunohistochemistry. 13/72 (18%) cancers were HPV-induced squamous cell carcinomas, of which 1/13 (8%) carcinoma harboured a somatic TP53 mutation. Among the 59/72 (82%) HPV-negative cancers, 59/72 (82%) SCC were HPV-negative with wild-type gene in 14/59 (24%) SCC and somatic TP53 mutations in 45/59 (76%) SCC. 28/45 (62%) SCC carried one (n=20) or two (n=8) missense mutations. 11/45 (24%) carcinomas showed a single disruptive mutation (3× frame shift, 7× stop codon, 1× deletion), 3/45 SCC a splice site mutation. 3/45 (7%) carcinomas had 2 or 3 different mutations. 18 different "hot spot" mutations were observed in 22/45 cancers (49%; 5× R273, 3× R282; 2× each Y220, R278, R248). Immunohistochemical p53 over expression was identified in most SCC with missense mutations, but not in SCC with disruptive TP53 mutations or TP53 wild-type. 14/45 (31%) patients with TP53 mutated SCC died of disease within 12months (range 2-24months) versus 0/13 patients with HPV-induced carcinomas and 0/14 patients with HPV-negative, TP53 wild-type carcinomas. 80% of primary invasive vulvar SCC were HPV-negative carcinomas with a high frequency of disruptive mutations and "hot spot" TP53 gene mutations, which have been linked to chemo- and radioresistance. The death rate of patients with p53 mutated vulvar cancers was 31%. Immunohistochemical p53 over expression could not reliably identify SCC with TP53 gene mutation. Pharmacological therapies targeting mutant p53 will be promising strategies for personalized therapy in patients with TP53 mutated vulvar cancers. Copyright © 2017. Published by Elsevier Inc.

  6. Analysis and Construction of Full-Diversity Joint Network-LDPC Codes for Cooperative Communications

    Directory of Open Access Journals (Sweden)

    Capirone Daniele

    2010-01-01

    Full Text Available Transmit diversity is necessary in harsh environments to reduce the required transmit power for achieving a given error performance at a certain transmission rate. In networks, cooperative communication is a well-known technique to yield transmit diversity and network coding can increase the spectral efficiency. These two techniques can be combined to achieve a double diversity order for a maximum coding rate on the Multiple-Access Relay Channel (MARC, where two sources share a common relay in their transmission to the destination. However, codes have to be carefully designed to obtain the intrinsic diversity offered by the MARC. This paper presents the principles to design a family of full-diversity LDPC codes with maximum rate. Simulation of the word error rate performance of the new proposed family of LDPC codes for the MARC confirms the full diversity.

  7. Effect of temperature and cycle length on microbial competition in PHB-producing sequencing batch reactor

    NARCIS (Netherlands)

    Jiang, Y.; Marang, L.; Kleerebezem, R.; Muyzer, G.; van Loosdrecht, M.C.M.

    2011-01-01

    The impact of temperature and cycle length on microbial competition between polyhydroxybutyrate (PHB)-producing populations enriched in feast-famine sequencing batch reactors (SBRs) was investigated at temperatures of 20 °C and 30 °C, and in a cycle length range of 1-18 h. In this study, the

  8. ReRep: Computational detection of repetitive sequences in genome survey sequences (GSS

    Directory of Open Access Journals (Sweden)

    Alves-Ferreira Marcelo

    2008-09-01

    Full Text Available Abstract Background Genome survey sequences (GSS offer a preliminary global view of a genome since, unlike ESTs, they cover coding as well as non-coding DNA and include repetitive regions of the genome. A more precise estimation of the nature, quantity and variability of repetitive sequences very early in a genome sequencing project is of considerable importance, as such data strongly influence the estimation of genome coverage, library quality and progress in scaffold construction. Also, the elimination of repetitive sequences from the initial assembly process is important to avoid errors and unnecessary complexity. Repetitive sequences are also of interest in a variety of other studies, for instance as molecular markers. Results We designed and implemented a straightforward pipeline called ReRep, which combines bioinformatics tools for identifying repetitive structures in a GSS dataset. In a case study, we first applied the pipeline to a set of 970 GSSs, sequenced in our laboratory from the human pathogen Leishmania braziliensis, the causative agent of leishmaniosis, an important public health problem in Brazil. We also verified the applicability of ReRep to new sequencing technologies using a set of 454-reads of an Escheria coli. The behaviour of several parameters in the algorithm is evaluated and suggestions are made for tuning of the analysis. Conclusion The ReRep approach for identification of repetitive elements in GSS datasets proved to be straightforward and efficient. Several potential repetitive sequences were found in a L. braziliensis GSS dataset generated in our laboratory, and further validated by the analysis of a more complete genomic dataset from the EMBL and Sanger Centre databases. ReRep also identified most of the E. coli K12 repeats prior to assembly in an example dataset obtained by automated sequencing using 454 technology. The parameters controlling the algorithm behaved consistently and may be tuned to the properties

  9. Purification and Fibrillation of Full-Length Recombinant PrP

    OpenAIRE

    Makarava, Natallia; Baskakov, Ilia V.

    2012-01-01

    Misfolding and aggregation of prion protein (PrP) is related to several neurodegenerative diseases in humans such as Creutzfeldt–Jacob disease, fatal familial insomnia, and Gerstmann–Straussler–Sheinker disease. Certain applications in prion area require recombinant PrP of high purity and quality. Here, we report an experimental procedure for expression and purification of full-length mammalian PrP. This protocol has been proved to yield PrP of extremely high purity that lac...

  10. Increased circulating full-length betatrophin levels in drug-naïve metabolic syndrome.

    Science.gov (United States)

    Liu, Dan; Li, Sheyu; He, He; Yu, Chuan; Li, Xiaodan; Liang, Libo; Chen, Yi; Li, Jianwei; Li, Jianshu; Sun, Xin; Tian, Haoming; An, Zhenmei

    2017-03-14

    Betatrophin is a newly identified circulating adipokine playing a role in the regulation of glucose homeostasis and lipid metabolism. But its role in metabolic syndrome (MetS) remains unknown. Therefore, we aimed to compare the circulating betatrophin concentrations between patients with MetS and healthy controls. We recruited 47 patients with MetS and 47 age and sex matched healthy controls. Anthropometric and biochemical measurements were performed, and serum betatrophin levels were detected by ELISA. Full-length betatrophin levels in patients with MetS were significantly higher than those in controls (694.84 ± 365.51 pg/ml versus 356.64 ± 287.92 pg/ml; P <0.001). While no significant difference of total betatrophin levels was found between the two groups (1.20 ± 0.79 ng/ml versus 1.31 ± 1.08 ng/ml; P = 0.524). Full-length betatrophin level was positively correlated with fasting plasma glucose (FPG) (r = 0.357, P = 0.014) and 2-hour plasma glucose (2hPG) (r = 0.38, P <0.01). Binary logistic regression models indicated that subjects in the tertile of the highest full-length betatrophin level experienced higher odds of having MetS (OR, 8.6; 95% CI 2.8-26.8; P <0.001). Our study showed that full-length betatrophin concentrations were increased in drug-naïve MetS patients.

  11. Multiple Access Interference Reduction Using Received Response Code Sequence for DS-CDMA UWB System

    Science.gov (United States)

    Toh, Keat Beng; Tachikawa, Shin'ichi

    This paper proposes a combination of novel Received Response (RR) sequence at the transmitter and a Matched Filter-RAKE (MF-RAKE) combining scheme receiver system for the Direct Sequence-Code Division Multiple Access Ultra Wideband (DS-CDMA UWB) multipath channel model. This paper also demonstrates the effectiveness of the RR sequence in Multiple Access Interference (MAI) reduction for the DS-CDMA UWB system. It suggests that by using conventional binary code sequence such as the M sequence or the Gold sequence, there is a possibility of generating extra MAI in the UWB system. Therefore, it is quite difficult to collect the energy efficiently although the RAKE reception method is applied at the receiver. The main purpose of the proposed system is to overcome the performance degradation for UWB transmission due to the occurrence of MAI during multiple accessing in the DS-CDMA UWB system. The proposed system improves the system performance by improving the RAKE reception performance using the RR sequence which can reduce the MAI effect significantly. Simulation results verify that significant improvement can be obtained by the proposed system in the UWB multipath channel models.

  12. Particle infectivity of HIV-1 full-length genome infectious molecular clones in a subtype C heterosexual transmission pair following high fidelity amplification and unbiased cloning

    Energy Technology Data Exchange (ETDEWEB)

    Deymier, Martin J., E-mail: mdeymie@emory.edu [Emory Vaccine Center, Yerkes National Primate Research Center, 954 Gatewood Road NE, Atlanta, GA 30329 (United States); Claiborne, Daniel T., E-mail: dclaibo@emory.edu [Emory Vaccine Center, Yerkes National Primate Research Center, 954 Gatewood Road NE, Atlanta, GA 30329 (United States); Ende, Zachary, E-mail: zende@emory.edu [Emory Vaccine Center, Yerkes National Primate Research Center, 954 Gatewood Road NE, Atlanta, GA 30329 (United States); Ratner, Hannah K., E-mail: hannah.ratner@emory.edu [Emory Vaccine Center, Yerkes National Primate Research Center, 954 Gatewood Road NE, Atlanta, GA 30329 (United States); Kilembe, William, E-mail: wkilembe@rzhrg-mail.org [Zambia-Emory HIV Research Project (ZEHRP), B22/737 Mwembelelo, Emmasdale Post Net 412, P/BagE891, Lusaka (Zambia); Allen, Susan, E-mail: sallen5@emory.edu [Zambia-Emory HIV Research Project (ZEHRP), B22/737 Mwembelelo, Emmasdale Post Net 412, P/BagE891, Lusaka (Zambia); Department of Pathology and Laboratory Medicine, Emory University, Atlanta, GA (United States); Hunter, Eric, E-mail: eric.hunter2@emory.edu [Emory Vaccine Center, Yerkes National Primate Research Center, 954 Gatewood Road NE, Atlanta, GA 30329 (United States); Department of Pathology and Laboratory Medicine, Emory University, Atlanta, GA (United States)

    2014-11-15

    The high genetic diversity of HIV-1 impedes high throughput, large-scale sequencing and full-length genome cloning by common restriction enzyme based methods. Applying novel methods that employ a high-fidelity polymerase for amplification and an unbiased fusion-based cloning strategy, we have generated several HIV-1 full-length genome infectious molecular clones from an epidemiologically linked transmission pair. These clones represent the transmitted/founder virus and phylogenetically diverse non-transmitted variants from the chronically infected individual's diverse quasispecies near the time of transmission. We demonstrate that, using this approach, PCR-induced mutations in full-length clones derived from their cognate single genome amplicons are rare. Furthermore, all eight non-transmitted genomes tested produced functional virus with a range of infectivities, belying the previous assumption that a majority of circulating viruses in chronic HIV-1 infection are defective. Thus, these methods provide important tools to update protocols in molecular biology that can be universally applied to the study of human viral pathogens. - Highlights: • Our novel methodology demonstrates accurate amplification and cloning of full-length HIV-1 genomes. • A majority of plasma derived HIV variants from a chronically infected individual are infectious. • The transmitted/founder was more infectious than the majority of the variants from the chronically infected donor.

  13. Particle infectivity of HIV-1 full-length genome infectious molecular clones in a subtype C heterosexual transmission pair following high fidelity amplification and unbiased cloning

    International Nuclear Information System (INIS)

    Deymier, Martin J.; Claiborne, Daniel T.; Ende, Zachary; Ratner, Hannah K.; Kilembe, William; Allen, Susan; Hunter, Eric

    2014-01-01

    The high genetic diversity of HIV-1 impedes high throughput, large-scale sequencing and full-length genome cloning by common restriction enzyme based methods. Applying novel methods that employ a high-fidelity polymerase for amplification and an unbiased fusion-based cloning strategy, we have generated several HIV-1 full-length genome infectious molecular clones from an epidemiologically linked transmission pair. These clones represent the transmitted/founder virus and phylogenetically diverse non-transmitted variants from the chronically infected individual's diverse quasispecies near the time of transmission. We demonstrate that, using this approach, PCR-induced mutations in full-length clones derived from their cognate single genome amplicons are rare. Furthermore, all eight non-transmitted genomes tested produced functional virus with a range of infectivities, belying the previous assumption that a majority of circulating viruses in chronic HIV-1 infection are defective. Thus, these methods provide important tools to update protocols in molecular biology that can be universally applied to the study of human viral pathogens. - Highlights: • Our novel methodology demonstrates accurate amplification and cloning of full-length HIV-1 genomes. • A majority of plasma derived HIV variants from a chronically infected individual are infectious. • The transmitted/founder was more infectious than the majority of the variants from the chronically infected donor

  14. Composite Binary Sequences with a Large Ensemble and Zero Correlation Zone

    Directory of Open Access Journals (Sweden)

    S. S. Yudachev

    2015-01-01

    Full Text Available The article considers a proposed class of derived signals such as composite binary sequences for application in advanced spread spectrum radio systems of various purposes, using signals based on spectrum spreading by direct sequence method. Considered composite sequences, having a representative set of lengths and unique correlation properties, compares favorably with the widely used at present large ensembles formed on a single algorithmic basis. To evaluate the properties of the composite sequences generated on the basis of two components - the Barker code and Kerdock sequences, expressions of periodic and aperiodic correlation functions are given.An algorithm for generating practical ensembles of composite sequences is presented. On the basis of the algorithm and its software implementation in C #, the samples of the sequence ensembles of various lengths were obtained and their periodic and aperiodic correlation functions and statistical characteristics were studied in detail. As an illustration, some of the most typical correlation functions are presented. The most remarkable characteristics allowing a ssessing the feasibility of using this type of sequences in the design of specific types of radio systems are considered.On the basis of the proposed program and the performed calculations the conclusions can be drawn about the possibility of using the sequences of these classes, with the aim of reducing intra-system disturbance in the projected spread spectrum CDMA.

  15. Quantum mean-field decoding algorithm for error-correcting codes

    International Nuclear Information System (INIS)

    Inoue, Jun-ichi; Saika, Yohei; Okada, Masato

    2009-01-01

    We numerically examine a quantum version of TAP (Thouless-Anderson-Palmer)-like mean-field algorithm for the problem of error-correcting codes. For a class of the so-called Sourlas error-correcting codes, we check the usefulness to retrieve the original bit-sequence (message) with a finite length. The decoding dynamics is derived explicitly and we evaluate the average-case performance through the bit-error rate (BER).

  16. Genome survey sequencing and genetic background characterization of Gracilariopsis lemaneiformis (Rhodophyta) based on next-generation sequencing.

    Science.gov (United States)

    Zhou, Wei; Hu, Yiyi; Sui, Zhenghong; Fu, Feng; Wang, Jinguo; Chang, Lianpeng; Guo, Weihua; Li, Binbin

    2013-01-01

    Gracilariopsis lemaneiformis has a high economic value and is one of the most important aquaculture species in China. Despite it is economic importance, it has remained largely unstudied at the genomic level. In this study, we conducted a genome survey of Gp. lemaneiformis using next-generation sequencing (NGS) technologies. In total, 18.70 Gb of high-quality sequence data with an estimated genome size of 97 Mb were obtained by HiSeq 2000 sequencing for Gp. lemaneiformis. These reads were assembled into 160,390 contigs with a N50 length of 3.64 kb, which were further assembled into 125,685 scaffolds with a total length of 81.17 Mb. Genome analysis predicted 3490 genes and a GC% content of 48%. The identified genes have an average transcript length of 1,429 bp, an average coding sequence size of 1,369 bp, 1.36 exons per gene, exon length of 1,008 bp, and intron length of 191 bp. From the initial assembled scaffold, transposable elements constituted 54.64% (44.35 Mb) of the genome, and 7737 simple sequence repeats (SSRs) were identified. Among these SSRs, the trinucleotide repeat type was the most abundant (up to 73.20% of total SSRs), followed by the di- (17.41%), tetra- (5.49%), hexa- (2.90%), and penta- (1.00%) nucleotide repeat type. These characteristics suggest that Gp. lemaneiformis is a model organism for genetic study. This is the first report of genome-wide characterization within this taxon.

  17. Genome Survey Sequencing and Genetic Background Characterization of Gracilariopsis lemaneiformis (Rhodophyta) Based on Next-Generation Sequencing

    Science.gov (United States)

    Sui, Zhenghong; Fu, Feng; Wang, Jinguo; Chang, Lianpeng; Guo, Weihua; Li, Binbin

    2013-01-01

    Gracilariopsis lemaneiformis has a high economic value and is one of the most important aquaculture species in China. Despite it is economic importance, it has remained largely unstudied at the genomic level. In this study, we conducted a genome survey of Gp. lemaneiformis using next-generation sequencing (NGS) technologies. In total, 18.70 Gb of high-quality sequence data with an estimated genome size of 97 Mb were obtained by HiSeq 2000 sequencing for Gp. lemaneiformis. These reads were assembled into 160,390 contigs with a N50 length of 3.64 kb, which were further assembled into 125,685 scaffolds with a total length of 81.17 Mb. Genome analysis predicted 3490 genes and a GC% content of 48%. The identified genes have an average transcript length of 1,429 bp, an average coding sequence size of 1,369 bp, 1.36 exons per gene, exon length of 1,008 bp, and intron length of 191 bp. From the initial assembled scaffold, transposable elements constituted 54.64% (44.35 Mb) of the genome, and 7737 simple sequence repeats (SSRs) were identified. Among these SSRs, the trinucleotide repeat type was the most abundant (up to 73.20% of total SSRs), followed by the di- (17.41%), tetra- (5.49%), hexa- (2.90%), and penta- (1.00%) nucleotide repeat type. These characteristics suggest that Gp. lemaneiformis is a model organism for genetic study. This is the first report of genome-wide characterization within this taxon. PMID:23875008

  18. Copper Coordination in the Full-Length, Recombinant Prion Protein†

    Science.gov (United States)

    Burns, Colin S.; Aronoff-Spencer, Eliah; Legname, Giuseppe; Prusiner, Stanley B.; Antholine, William E.; Gerfen, Gary J.; Peisach, Jack; Millhauser, Glenn L.

    2010-01-01

    The prion protein (PrP) binds divalent copper at physiologically relevant conditions and is believed to participate in copper regulation or act as a copper-dependent enzyme. Ongoing studies aim at determining the molecular features of the copper binding sites. The emerging consensus is that most copper binds in the octarepeat domain, which is composed of four or more copies of the fundamental sequence PHGGGWGQ. Previous work from our laboratory using PrP-derived peptides, in conjunction with EPR and X-ray crystallography, demonstrated that the HGGGW segment constitutes the fundamental binding unit in the octarepeat domain [Burns et al. (2002) Biochemistry 41, 3991–4001; Aronoff-Spencer et al. (2000) Biochemistry 39, 13760–13771]. Copper coordination arises from the His imidazole and sequential deprotonated glycine amides. In this present work, recombinant, full-length Syrian hamster PrP is investigated using EPR methodologies. Four copper ions are taken up in the octarepeat domain, which supports previous findings. However, quantification studies reveal a fifth binding site in the flexible region between the octarepeats and the PrP globular C-terminal domain. A series of PrP peptide constructs show that this site involves His96 in the PrP(92–96) segment GGGTH. Further examination by X-band EPR, S-band EPR, and electron spin–echo envelope spectroscopy, demonstrates coordination by the His96 imidazole and the glycine preceding the threonine. The copper affinity for this type of binding site is highly pH dependent, and EPR studies here show that recombinant PrP loses its affinity for copper below pH 6.0. These studies seem to provide a complete profile of the copper binding sites in PrP and support the hypothesis that PrP function is related to its ability to bind copper in a pH-dependent fashion. PMID:12779334

  19. Optimal interference code based on machine learning

    Science.gov (United States)

    Qian, Ye; Chen, Qian; Hu, Xiaobo; Cao, Ercong; Qian, Weixian; Gu, Guohua

    2016-10-01

    In this paper, we analyze the characteristics of pseudo-random code, by the case of m sequence. Depending on the description of coding theory, we introduce the jamming methods. We simulate the interference effect or probability model by the means of MATLAB to consolidate. In accordance with the length of decoding time the adversary spends, we find out the optimal formula and optimal coefficients based on machine learning, then we get the new optimal interference code. First, when it comes to the phase of recognition, this study judges the effect of interference by the way of simulating the length of time over the decoding period of laser seeker. Then, we use laser active deception jamming simulate interference process in the tracking phase in the next block. In this study we choose the method of laser active deception jamming. In order to improve the performance of the interference, this paper simulates the model by MATLAB software. We find out the least number of pulse intervals which must be received, then we can make the conclusion that the precise interval number of the laser pointer for m sequence encoding. In order to find the shortest space, we make the choice of the greatest common divisor method. Then, combining with the coding regularity that has been found before, we restore pulse interval of pseudo-random code, which has been already received. Finally, we can control the time period of laser interference, get the optimal interference code, and also increase the probability of interference as well.

  20. [Transposition errors during learning to reproduce a sequence by the right- and the left-hand movements: simulation of positional and movement coding].

    Science.gov (United States)

    Liakhovetskiĭ, V A; Bobrova, E V; Skopin, G N

    2012-01-01

    Transposition errors during the reproduction of a hand movement sequence make it possible to receive important information on the internal representation of this sequence in the motor working memory. Analysis of such errors showed that learning to reproduce sequences of the left-hand movements improves the system of positional coding (coding ofpositions), while learning of the right-hand movements improves the system of vector coding (coding of movements). Learning of the right-hand movements after the left-hand performance involved the system of positional coding "imposed" by the left hand. Learning of the left-hand movements after the right-hand performance activated the system of vector coding. Transposition errors during learning to reproduce movement sequences can be explained by neural network using either vector coding or both vector and positional coding.

  1. Structure and function of the first full-length murein peptide ligase (Mpl cell wall recycling protein.

    Directory of Open Access Journals (Sweden)

    Debanu Das

    2011-03-01

    Full Text Available Bacterial cell walls contain peptidoglycan, an essential polymer made by enzymes in the Mur pathway. These proteins are specific to bacteria, which make them targets for drug discovery. MurC, MurD, MurE and MurF catalyze the synthesis of the peptidoglycan precursor UDP-N-acetylmuramoyl-L-alanyl-γ-D-glutamyl-meso-diaminopimelyl-D-alanyl-D-alanine by the sequential addition of amino acids onto UDP-N-acetylmuramic acid (UDP-MurNAc. MurC-F enzymes have been extensively studied by biochemistry and X-ray crystallography. In gram-negative bacteria, ∼30-60% of the bacterial cell wall is recycled during each generation. Part of this recycling process involves the murein peptide ligase (Mpl, which attaches the breakdown product, the tripeptide L-alanyl-γ-D-glutamyl-meso-diaminopimelate, to UDP-MurNAc. We present the crystal structure at 1.65 Å resolution of a full-length Mpl from the permafrost bacterium Psychrobacter arcticus 273-4 (PaMpl. Although the Mpl structure has similarities to Mur enzymes, it has unique sequence and structure features that are likely related to its role in cell wall recycling, a function that differentiates it from the MurC-F enzymes. We have analyzed the sequence-structure relationships that are unique to Mpl proteins and compared them to MurC-F ligases. We have also characterized the biochemical properties of this enzyme (optimal temperature, pH and magnesium binding profiles and kinetic parameters. Although the structure does not contain any bound substrates, we have identified ∼30 residues that are likely to be important for recognition of the tripeptide and UDP-MurNAc substrates, as well as features that are unique to Psychrobacter Mpl proteins. These results provide the basis for future mutational studies for more extensive function characterization of the Mpl sequence-structure relationships.

  2. Metagenomic and near full-length 16S rRNA sequence data in support of the phylogenetic analysis of the rumen bacterial community in steers

    Directory of Open Access Journals (Sweden)

    Phillip R. Myer

    2016-09-01

    Full Text Available Amplicon sequencing utilizing next-generation platforms has significantly transformed how research is conducted, specifically microbial ecology. However, primer and sequencing platform biases can confound or change the way scientists interpret these data. The Pacific Biosciences RSII instrument may also preferentially load smaller fragments, which may also be a function of PCR product exhaustion during sequencing. To further examine theses biases, data is provided from 16S rRNA rumen community analyses. Specifically, data from the relative phylum-level abundances for the ruminal bacterial community are provided to determine between-sample variability. Direct sequencing of metagenomic DNA was conducted to circumvent primer-associated biases in 16S rRNA reads and rarefaction curves were generated to demonstrate adequate coverage of each amplicon. PCR products were also subjected to reduced amplification and pooling to reduce the likelihood of PCR product exhaustion during sequencing on the Pacific Biosciences platform. The taxonomic profiles for the relative phylum-level and genus-level abundance of rumen microbiota as a function of PCR pooling for sequencing on the Pacific Biosciences RSII platform were provided. For more information, see “Evaluation of 16S rRNA amplicon sequencing using two next-generation sequencing technologies for phylogenetic analysis of the rumen bacterial community in steers” P.R. Myer, M. Kim, H.C. Freetly, T.P.L. Smith (2016 [1]. Keywords: 16S rRNA gene, MiSeq, Pacific Biosciences, Rumen microbiome

  3. Analysis of the AD sequence in Zion plant using the March 1.1 code

    International Nuclear Information System (INIS)

    Oriolo, F.; Paci, S.

    1985-01-01

    The analyses of the AD sequences for the Zion power plant, made at the Pisa University, in the framework of the participation in the Source Tern Working Group. After a short description of the plant and the sequence under analysis, the model used for the reference computation and the results obtained using the March 1.1 code are shown. Together with the reference computation a series of parametric tests have been also made, concerning some input code variables, in order to ascertain their influence on the transient trend. The results of these analyses are shown in Appendix

  4. Communicating the Benefits of a Full Sequence of High School Science Courses

    Science.gov (United States)

    Nicholas, Catherine Marie

    2014-01-01

    High school students are generally uninformed about the benefits of enrolling in a full sequence of science courses, therefore only about a third of our nation's high school graduates have completed the science sequence of Biology, Chemistry and Physics. The lack of students completing a full sequence of science courses contributes to the deficit…

  5. Sequence Coding and Search System for licensee event reports: coder's manual. Volume 4

    International Nuclear Information System (INIS)

    Gallaher, R.B.; Guymon, R.H.; Mays, G.T.; Poore, W.P.; Cagle, R.J.; Harrington, K.H.; Johnson, M.P.

    1985-04-01

    Operating experience data from nuclear power plants are essential for safety and reliability analyses, especially analyses of trends and patterns. The licensee event reports (LERs) that are submitted to the Nuclear Regulatory Commission (NRC) by the nuclear power plant utilities contain much of this data. The NRC's Office for Analysis and Evaluation of Operational Data (AEOD) has developed, under contract with NSIC, a system for codifying the events reported in the LERs. The primary objective of the Sequence Coding and Search System (SCSS) is to reduce the descriptive text of the LERs to coded sequences that are both computer-readable and computer-searchable. This four volume report documents and describes SCSS in detail. Volume 3 and 4 provide a technical processor, new to SCSS, the information and methodology necessary to capture descriptive data from the LER and to codify that data into a structured format and serve as reference material for the more experienced technical processor, and contains information that is essential for the more advanced user who needs to be familiar with the intricate coding techniques in order to retrieve specific details in a sequence. This volume contains updated material through amendment 1 to revision 1 of the working version of ORNL/NSIC-223, Vol. 4

  6. Sequence Coding and Search System for licensee event reports: coder's manual. Volume 3

    International Nuclear Information System (INIS)

    Gallaher, R.B.; Guymon, R.H.; Mays, G.T.; Poore, W.P.; Cagle, R.J.; Harrington, K.H.; Johnson, M.P.

    1985-04-01

    Operating experience data from nuclear power plants are essential for safety and reliability analyses, especially analyses of trends and patterns. The licensee event reports (LERs) that are submitted to the Nuclear Regulatory Commission (NRC) by the nuclear power plant utilities contain much of this data. The NRC's Office for Analysis and Evaluation of Operational Data (AEOD) has developed, under contract with NSIC, a system for codifying the events reported in the LERs. The primary objective of the Sequence Coding and Search System (SCSS) is to reduce the descriptive text of the LERs to coded sequences that are both computer-readable and computer-searchable. This four volume report documents and describes SCSS in detail. Volumes 3 and 4 provide a technical processor, new to SCSS, the information and methodology necessary to capture descriptive data from the LER and to codify that data into a structured format and serve as reference material for the more experienced technical processor, and contains information is essential for the more advanced user who needs to be familiar with the intricate coding techniques in order to retrieve specific details in a sequence. This volume contains updated material through amendment 1 to revision 1 of the working version of ORNL/NSIC-223, Vol. 3

  7. De novo assembly and next-generation sequencing to analyse full-length gene variants from codon-barcoded libraries.

    Science.gov (United States)

    Cho, Namjin; Hwang, Byungjin; Yoon, Jung-ki; Park, Sangun; Lee, Joongoo; Seo, Han Na; Lee, Jeewon; Huh, Sunghoon; Chung, Jinsoo; Bang, Duhee

    2015-09-21

    Interpreting epistatic interactions is crucial for understanding evolutionary dynamics of complex genetic systems and unveiling structure and function of genetic pathways. Although high resolution mapping of en masse variant libraries renders molecular biologists to address genotype-phenotype relationships, long-read sequencing technology remains indispensable to assess functional relationship between mutations that lie far apart. Here, we introduce JigsawSeq for multiplexed sequence identification of pooled gene variant libraries by combining a codon-based molecular barcoding strategy and de novo assembly of short-read data. We first validate JigsawSeq on small sub-pools and observed high precision and recall at various experimental settings. With extensive simulations, we then apply JigsawSeq to large-scale gene variant libraries to show that our method can be reliably scaled using next-generation sequencing. JigsawSeq may serve as a rapid screening tool for functional genomics and offer the opportunity to explore evolutionary trajectories of protein variants.

  8. Simulations of The Dalles Dam Proposed Full Length Spillwall

    Energy Technology Data Exchange (ETDEWEB)

    Rakowski, Cynthia L.; Perkins, William A.; Richmond, Marshall C.; Serkowski, John A.

    2008-02-25

    This report presents results of a computational fluid dynamics (CFD) modeling study to evaluatethe impacts of a full-length spillwall at The Dalles Dam. The full-length spillwall is being designed and evaluated as a structural means to improve tailrace egress and thus survival of juvenile fish passing through the spillway. During the course of this study, a full-length spillwall at Bays 6/7 and 8/9 were considered. The U.S. Army Corps of Engineers (USACE) has proposed extending the spillwall constructed in the stilling basin between spillway Bays 6 and 7 about 590 ft farther downstream. It is believed that the extension of the spillwall will improve egress conditions for downstream juvenile salmonids by moving them more rapidly into the thalweg of the river hence reducing their exposure to predators. A numerical model was created, validated, and applied the The Dalles Dam tailrace. The models were designed to assess impacts to flow, tailrace egress, navigation, and adult salmon passage of a proposed spill wall extension. The more extensive model validation undertaken in this study greatly improved our confidence in the numerical model to represent the flow conditions in The Dalles tailrace. This study used these validated CFD models to simulate the potential impacts of a spillwall extension for The Dalles Dam tailrace for two locations. We determined the following: (1)The construction of an extended wall (between Bays 6/7) will not adversely impact entering or exiting the navigation lock. Impact should be less if a wall were constructed between Bays 8/9. (2)The construction of a wall between Bays 6/7 will increase the water surface elevation between the wall and the Washington shore. Although the increased water surface elevation would be beneficial to adult upstream migrants in that it decreases velocities on the approach to the adult ladder, the increased flow depth would enhance dissolved gas production, impacting potential operations of the project because of

  9. Sequence Coding and Search System Backfit Quality Assurance Program Plan

    International Nuclear Information System (INIS)

    Lovell, C.J.; Stepina, P.L.

    1985-03-01

    The Sequence Coding and Search System is a computer-based encoding system for events described in Licensee Event Reports. This data system contains LERs from 1981 to present. Backfit of the data system to include LERs prior to 1981 is required. This report documents the Quality Assurance Program Plan that EG and G Idaho, Inc. will follow while encoding 1980 LERs

  10. In Silico Mining of Microsatellites in Coding Sequences of the Date Palm (Arecaceae Genome, Characterization, and Transferability

    Directory of Open Access Journals (Sweden)

    Frédérique Aberlenc-Bertossi

    2014-01-01

    Full Text Available Premise of the study: To complement existing sets of primarily dinucleotide microsatellite loci from noncoding sequences of date palm, we developed primers for tri- and hexanucleotide microsatellite loci identified within genes. Due to their conserved genomic locations, the primers should be useful in other palm taxa, and their utility was tested in seven other Phoenix species and in Chamaerops, Livistona, and Hyphaene. Methods and Results: Tandem repeat motifs of 3–6 bp were searched using a simple sequence repeat (SSR–pipeline package in coding portions of the date palm draft genome sequence. Fifteen loci produced highly consistent amplification, intraspecific polymorphisms, and stepwise mutation patterns. Conclusions: These microsatellite loci showed sufficient levels of variability and transferability to make them useful for population genetic, selection signature, and interspecific gene flow studies in Phoenix and other Coryphoideae genera.

  11. Sequence adaptations during growth of rescued classical swine fever viruses in cell culture and within infected pigs

    DEFF Research Database (Denmark)

    Hadsbjerg, Johanne; Friis, Martin Barfred; Fahnøe, Ulrik

    2016-01-01

    RNA could be detected. However, the animals inoculated with these mutant viruses seroconverted against CSFV. Thus, these mutant viruses were highly attenuated in vivo. All 4 rescued viruses were also passaged up to 20 times in cell culture. Using full genome sequencing, the same two adaptations within......Classical swine fever virus (CSFV) causes an economically important disease of swine. Four different viruses were rescued from full-length cloned cDNAs derived from the Paderborn strain of CSFV. Three of these viruses had been modified by mutagenesis (with 7 or 8 nt changes) within stem 2...... each of four independent virus populations were observed that restored the coding sequence to that of the parental field strain. These adaptations occurred with different kinetics. The combination of reverse genetics and in depth, full genome sequencing provides a powerful approach to analyse virus...

  12. Systematic Network Coding with the Aid of a Full-Duplex Relay

    DEFF Research Database (Denmark)

    Giacaglia, Giuliano; Shi, Xiaomeng; Kim, MinJi

    2013-01-01

    is to deliver a given number of data packets to a receiver with the aid of a relay. The source broadcasts to both the receiver and the relay using one frequency, while the relay uses another frequency for transmissions to the receiver, allowing for a full-duplex operation of the relay. We analyze the decoding...... complexity and delay performance of two types of relays: one that preserves the systematic structure of the code from the source; another that does not. A systematic relay forwards uncoded packets upon reception, but transmits coded packets to the receiver after receiving the first coded packet from...

  13. Error Recovery Properties and Soft Decoding of Quasi-Arithmetic Codes

    Directory of Open Access Journals (Sweden)

    Christine Guillemot

    2007-08-01

    Full Text Available This paper first introduces a new set of aggregated state models for soft-input decoding of quasi arithmetic (QA codes with a termination constraint. The decoding complexity with these models is linear with the sequence length. The aggregation parameter controls the tradeoff between decoding performance and complexity. It is shown that close-to-optimal decoding performance can be obtained with low values of the aggregation parameter, that is, with a complexity which is significantly reduced with respect to optimal QA bit/symbol models. The choice of the aggregation parameter depends on the synchronization recovery properties of the QA codes. This paper thus describes a method to estimate the probability mass function (PMF of the gain/loss of symbols following a single bit error (i.e., of the difference between the number of encoded and decoded symbols. The entropy of the gain/loss turns out to be the average amount of information conveyed by a length constraint on both the optimal and aggregated state models. This quantity allows us to choose the value of the aggregation parameter that will lead to close-to-optimal decoding performance. It is shown that the optimum position for the length constraint is not the last time instant of the decoding process. This observation leads to the introduction of a new technique for robust decoding of QA codes with redundancy which turns out to outperform techniques based on the concept of forbidden symbol.

  14. Performance Analysis of CRC Codes for Systematic and Nonsystematic Polar Codes with List Decoding

    Directory of Open Access Journals (Sweden)

    Takumi Murata

    2018-01-01

    Full Text Available Successive cancellation list (SCL decoding of polar codes is an effective approach that can significantly outperform the original successive cancellation (SC decoding, provided that proper cyclic redundancy-check (CRC codes are employed at the stage of candidate selection. Previous studies on CRC-assisted polar codes mostly focus on improvement of the decoding algorithms as well as their implementation, and little attention has been paid to the CRC code structure itself. For the CRC-concatenated polar codes with CRC code as their outer code, the use of longer CRC code leads to reduction of information rate, whereas the use of shorter CRC code may reduce the error detection probability, thus degrading the frame error rate (FER performance. Therefore, CRC codes of proper length should be employed in order to optimize the FER performance for a given signal-to-noise ratio (SNR per information bit. In this paper, we investigate the effect of CRC codes on the FER performance of polar codes with list decoding in terms of the CRC code length as well as its generator polynomials. Both the original nonsystematic and systematic polar codes are considered, and we also demonstrate that different behaviors of CRC codes should be observed depending on whether the inner polar code is systematic or not.

  15. Rational Design of High-Number dsDNA Fragments Based on Thermodynamics for the Construction of Full-Length Genes in a Single Reaction.

    Directory of Open Access Journals (Sweden)

    Bhagyashree S Birla

    Full Text Available Gene synthesis is frequently used in modern molecular biology research either to create novel genes or to obtain natural genes when the synthesis approach is more flexible and reliable than cloning. DNA chemical synthesis has limits on both its length and yield, thus full-length genes have to be hierarchically constructed from synthesized DNA fragments. Gibson Assembly and its derivatives are the simplest methods to assemble multiple double-stranded DNA fragments. Currently, up to 12 dsDNA fragments can be assembled at once with Gibson Assembly according to its vendor. In practice, the number of dsDNA fragments that can be assembled in a single reaction are much lower. We have developed a rational design method for gene construction that allows high-number dsDNA fragments to be assembled into full-length genes in a single reaction. Using this new design method and a modified version of the Gibson Assembly protocol, we have assembled 3 different genes from up to 45 dsDNA fragments at once. Our design method uses the thermodynamic analysis software Picky that identifies all unique junctions in a gene where consecutive DNA fragments are specifically made to connect to each other. Our novel method is generally applicable to most gene sequences, and can improve both the efficiency and cost of gene assembly.

  16. Modelling of blackout sequence at Atucha-1 using the MARCH3 code

    International Nuclear Information System (INIS)

    Baron, J.; Bastianelli, B.

    1997-01-01

    This paper presents the modelling of a complete blackout at the Atucha-1 NPP as preliminary phase for a Level II safety probabilistic analysis. The MARCH3 code of the STCP (Source Term Code Package) is used, based on a plant model made in accordance with particularities of the plant design. The analysis covers all the severe accident phases. The results allow to view the time sequence of the events, and provide the basis for source term studies. (author). 6 refs., 2 figs

  17. On the equivalence of cyclic and quasi-cyclic codes over finite fields

    Directory of Open Access Journals (Sweden)

    Kenza Guenda

    2017-07-01

    Full Text Available This paper studies the equivalence problem for cyclic codes of length $p^r$ and quasi-cyclic codes of length $p^rl$. In particular, we generalize the results of Huffman, Job, and Pless (J. Combin. Theory. A, 62, 183--215, 1993, who considered the special case $p^2$. This is achieved by explicitly giving the permutations by which two cyclic codes of prime power length are equivalent. This allows us to obtain an algorithm which solves the problem of equivalency for cyclic codes of length $p^r$ in polynomial time. Further, we characterize the set by which two quasi-cyclic codes of length $p^rl$ can be equivalent, and prove that the affine group is one of its subsets.

  18. Expression of full-length and splice forms of FoxP3 in rheumatoid arthritis

    DEFF Research Database (Denmark)

    Ryder, L R; Woetmann, A; Madsen, H O

    2010-01-01

    OBJECTIVE: The aim of our study was to compare the presence of full-length and alternative splice forms of FoxP3 mRNA in CD4 cells from rheumatoid arthritis (RA) patients and healthy controls. METHODS: A quantitative real-time polymerase chain reaction (QRT-PCR) method was used to measure...... the amount of FoxP3 mRNA full-length and splice forms. CD4-positive T cells were isolated from peripheral blood from 50 RA patients by immunomagnetic separation, and the FoxP3 mRNA expression was compared with the results from 10 healthy controls. RESULTS: We observed an increased expression of full......-length FoxP3 mRNA in RA patients when compared to healthy controls, as well as an increase in CD25 mRNA expression, but no corresponding increase in CTLA-4 mRNA expression. The presence of an alternative splice form of FoxP3 lacking exon 2 was confirmed in both RA patients and healthy controls...

  19. DNA interactions with a Methylene Blue redox indicator depend on the DNA length and are sequence specific.

    Science.gov (United States)

    Farjami, Elaheh; Clima, Lilia; Gothelf, Kurt V; Ferapontova, Elena E

    2010-06-01

    A DNA molecular beacon approach was used for the analysis of interactions between DNA and Methylene Blue (MB) as a redox indicator of a hybridization event. DNA hairpin structures of different length and guanine (G) content were immobilized onto gold electrodes in their folded states through the alkanethiol linker at the 5'-end. Binding of MB to the folded hairpin DNA was electrochemically studied and compared with binding to the duplex structure formed by hybridization of the hairpin DNA to a complementary DNA strand. Variation of the electrochemical signal from the DNA-MB complex was shown to depend primarily on the DNA length and sequence used: the G-C base pairs were the preferential sites of MB binding in the duplex. For short 20 nts long DNA sequences, the increased electrochemical response from MB bound to the duplex structure was consistent with the increased amount of bound and electrochemically readable MB molecules (i.e. MB molecules that are available for the electron transfer (ET) reaction with the electrode). With longer DNA sequences, the balance between the amounts of the electrochemically readable MB molecules bound to the hairpin DNA and to the hybrid was opposite: a part of the MB molecules bound to the long-sequence DNA duplex seem to be electrochemically mute due to long ET distance. The increasing electrochemical response from MB bound to the short-length DNA hybrid contrasts with the decreasing signal from MB bound to the long-length DNA hybrid and allows an "off"-"on" genosensor development.

  20. Universal sequence map (USM of arbitrary discrete sequences

    Directory of Open Access Journals (Sweden)

    Almeida Jonas S

    2002-02-01

    Full Text Available Abstract Background For over a decade the idea of representing biological sequences in a continuous coordinate space has maintained its appeal but not been fully realized. The basic idea is that any sequence of symbols may define trajectories in the continuous space conserving all its statistical properties. Ideally, such a representation would allow scale independent sequence analysis – without the context of fixed memory length. A simple example would consist on being able to infer the homology between two sequences solely by comparing the coordinates of any two homologous units. Results We have successfully identified such an iterative function for bijective mappingψ of discrete sequences into objects of continuous state space that enable scale-independent sequence analysis. The technique, named Universal Sequence Mapping (USM, is applicable to sequences with an arbitrary length and arbitrary number of unique units and generates a representation where map distance estimates sequence similarity. The novel USM procedure is based on earlier work by these and other authors on the properties of Chaos Game Representation (CGR. The latter enables the representation of 4 unit type sequences (like DNA as an order free Markov Chain transition table. The properties of USM are illustrated with test data and can be verified for other data by using the accompanying web-based tool:http://bioinformatics.musc.edu/~jonas/usm/. Conclusions USM is shown to enable a statistical mechanics approach to sequence analysis. The scale independent representation frees sequence analysis from the need to assume a memory length in the investigation of syntactic rules.

  1. The nucleotide sequence of human transition protein 1 cDNA

    Energy Technology Data Exchange (ETDEWEB)

    Luerssen, H; Hoyer-Fender, S; Engel, W [Universitaet Goettingen (West Germany)

    1988-08-11

    The authors have screened a human testis cDNA library with an oligonucleotide of 81 mer prepared according to a part of the published nucleotide sequence of the rat transition protein TP 1. They have isolated a cDNA clone with the length of 441 bp containing the coding region of 162 bp for human transition protein 1. There is about 84% homology in the coding region of the sequence compared to rat. The human cDNA-clone encodes a polypeptide of 54 amino acids of which 7 are different to that of rat.

  2. A versatile palindromic amphipathic repeat coding sequence horizontally distributed among diverse bacterial and eucaryotic microbes

    Directory of Open Access Journals (Sweden)

    Glass John I

    2010-07-01

    Full Text Available Abstract Background Intragenic tandem repeats occur throughout all domains of life and impart functional and structural variability to diverse translation products. Repeat proteins confer distinctive surface phenotypes to many unicellular organisms, including those with minimal genomes such as the wall-less bacterial monoderms, Mollicutes. One such repeat pattern in this clade is distributed in a manner suggesting its exchange by horizontal gene transfer (HGT. Expanding genome sequence databases reveal the pattern in a widening range of bacteria, and recently among eucaryotic microbes. We examined the genomic flux and consequences of the motif by determining its distribution, predicted structural features and association with membrane-targeted proteins. Results Using a refined hidden Markov model, we document a 25-residue protein sequence motif tandemly arrayed in variable-number repeats in ORFs lacking assigned functions. It appears sporadically in unicellular microbes from disparate bacterial and eucaryotic clades, representing diverse lifestyles and ecological niches that include host parasitic, marine and extreme environments. Tracts of the repeats predict a malleable configuration of recurring domains, with conserved hydrophobic residues forming an amphipathic secondary structure in which hydrophilic residues endow extensive sequence variation. Many ORFs with these domains also have membrane-targeting sequences that predict assorted topologies; others may comprise reservoirs of sequence variants. We demonstrate expressed variants among surface lipoproteins that distinguish closely related animal pathogens belonging to a subgroup of the Mollicutes. DNA sequences encoding the tandem domains display dyad symmetry. Moreover, in some taxa the domains occur in ORFs selectively associated with mobile elements. These features, a punctate phylogenetic distribution, and different patterns of dispersal in genomes of related taxa, suggest that the

  3. Genetic variation at hair length candidate genes in elephants and the extinct woolly mammoth

    Directory of Open Access Journals (Sweden)

    Tisdale Michele

    2009-09-01

    Full Text Available Abstract Background Like humans, the living elephants are unusual among mammals in being sparsely covered with hair. Relative to extant elephants, the extinct woolly mammoth, Mammuthus primigenius, had a dense hair cover and extremely long hair, which likely were adaptations to its subarctic habitat. The fibroblast growth factor 5 (FGF5 gene affects hair length in a diverse set of mammalian species. Mutations in FGF5 lead to recessive long hair phenotypes in mice, dogs, and cats; and the gene has been implicated in hair length variation in rabbits. Thus, FGF5 represents a leading candidate gene for the phenotypic differences in hair length notable between extant elephants and the woolly mammoth. We therefore sequenced the three exons (except for the 3' UTR and a portion of the promoter of FGF5 from the living elephantid species (Asian, African savanna and African forest elephants and, using protocols for ancient DNA, from a woolly mammoth. Results Between the extant elephants and the mammoth, two single base substitutions were observed in FGF5, neither of which alters the amino acid sequence. Modeling of the protein structure suggests that the elephantid proteins fold similarly to the human FGF5 protein. Bioinformatics analyses and DNA sequencing of another locus that has been implicated in hair cover in humans, type I hair keratin pseudogene (KRTHAP1, also yielded negative results. Interestingly, KRTHAP1 is a pseudogene in elephantids as in humans (although fully functional in non-human primates. Conclusion The data suggest that the coding sequence of the FGF5 gene is not the critical determinant of hair length differences among elephantids. The results are discussed in the context of hairlessness among mammals and in terms of the potential impact of large body size, subarctic conditions, and an aquatic ancestor on hair cover in the Proboscidea.

  4. Generation of a reliable full-length cDNA of infectiousTembusu virus using a PCR-based protocol.

    Science.gov (United States)

    Liang, Te; Liu, Xiaoxiao; Cui, Shulin; Qu, Shenghua; Wang, Dan; Liu, Ning; Wang, Fumin; Ning, Kang; Zhang, Bing; Zhang, Dabing

    2016-02-02

    Full-length cDNA of Tembusu virus (TMUV) cloned in a plasmid has been found instable in bacterial hosts. Using a PCR-based protocol, we generated a stable full-length cDNA of TMUV. Different cDNA fragments of TMUV were amplified by reverse transcription (RT)-PCR, and cloned into plasmids. Fragmented cDNAs were amplified and assembled by fusion PCR to produce a full-length cDNA using the recombinant plasmids as templates. Subsequently, a full-length RNA was transcribed from the full-length cDNA in vitro and transfected into BHK-21 cells; infectious viral particles were rescued successfully. Following several passages in BKH-21 cells, the rescued virus was compared with the parental virus by genetic marker checks, growth curve determinations and animal experiments. These assays clearly demonstrated the genetic and biological stabilities of the rescued virus. The present work will be useful for future investigations on the molecular mechanisms involved in replication and pathogenesis of TMUV. Copyright © 2015 Elsevier B.V. All rights reserved.

  5. Some new quasi-twisted ternary linear codes

    Directory of Open Access Journals (Sweden)

    Rumen Daskalov

    2015-09-01

    Full Text Available Let [n, k, d]_q code be a linear code of length n, dimension k and minimum Hamming distance d over GF(q. One of the basic and most important problems in coding theory is to construct codes with best possible minimum distances. In this paper seven quasi-twisted ternary linear codes are constructed. These codes are new and improve the best known lower bounds on the minimum distance in [6].

  6. Field programmable gate array (FPGA implementation of novel complex PN-code-generator- based data scrambler and descrambler

    Directory of Open Access Journals (Sweden)

    Shabir A. Parah

    2010-04-01

    Full Text Available A novel technique for the generation of complex and lengthy code sequences using low- length linear feedback shift registers (LFSRs for data scrambling and descrambling is proposed. The scheme has been implemented using VHSIC hardware description language (VHDL approach which allows the reconfigurability of the proposed system such that the length of the generated sequences can be changed as per the security requirements. In the present design consideration the power consumption and chip area requirements are small and the operating speed is high compared to conventional discrete I.C. design, which is a pre-requisite for any system designer. The design has been synthesised on device EP2S15F484C3 of Straitx II FPGA family, using Quarts Altera version 8.1. The simulation results have been found satisfactory and are in conformity with the theoretical observations.

  7. Nucleotide sequence of tomato ringspot virus RNA-2.

    Science.gov (United States)

    Rott, M E; Tremaine, J H; Rochon, D M

    1991-07-01

    The sequence of tomato ringspot virus (TomRSV) RNA-2 has been determined. It is 7273 nucleotides in length excluding the 3' poly(A) tail and contains a single long open reading frame (ORF) of 5646 nucleotides in the positive sense beginning at position 78 and terminating at position 5723. A second in-frame AUG at position 441 is in a more favourable context for initiation of translation and may act as a site for initiation of translation. The TomRSV RNA-2 3' noncoding region is 1550 nucleotides in length. The coat protein is located in the C-terminal region of the large polypeptide and shows significant but limited amino acid sequence similarity to the putative coat proteins of the nepoviruses tomato black ring (TBRV), Hungarian grapevine chrome mosaic (GCMV) and grapevine fanleaf (GFLV). Comparisons of the coding and non-coding regions of TomRSV RNA-2 and the RNA components of TBRV, GCMV, GFLV and the comovirus cowpea mosaic virus revealed significant similarity for over 300 amino acids between the coding region immediately to the N-terminal side of the putative coat proteins of TomRSV and GFLV; very little similarity could be detected among the non-coding regions of TomRSV and any of these viruses.

  8. Source coherence impairments in a direct detection direct sequence optical code-division multiple-access system.

    Science.gov (United States)

    Fsaifes, Ihsan; Lepers, Catherine; Lourdiane, Mounia; Gallion, Philippe; Beugin, Vincent; Guignard, Philippe

    2007-02-01

    We demonstrate that direct sequence optical code- division multiple-access (DS-OCDMA) encoders and decoders using sampled fiber Bragg gratings (S-FBGs) behave as multipath interferometers. In that case, chip pulses of the prime sequence codes generated by spreading in time-coherent data pulses can result from multiple reflections in the interferometers that can superimpose within a chip time duration. We show that the autocorrelation function has to be considered as the sum of complex amplitudes of the combined chip as the laser source coherence time is much greater than the integration time of the photodetector. To reduce the sensitivity of the DS-OCDMA system to the coherence time of the laser source, we analyze the use of sparse and nonperiodic quadratic congruence and extended quadratic congruence codes.

  9. Source coherence impairments in a direct detection direct sequence optical code-division multiple-access system

    Science.gov (United States)

    Fsaifes, Ihsan; Lepers, Catherine; Lourdiane, Mounia; Gallion, Philippe; Beugin, Vincent; Guignard, Philippe

    2007-02-01

    We demonstrate that direct sequence optical code- division multiple-access (DS-OCDMA) encoders and decoders using sampled fiber Bragg gratings (S-FBGs) behave as multipath interferometers. In that case, chip pulses of the prime sequence codes generated by spreading in time-coherent data pulses can result from multiple reflections in the interferometers that can superimpose within a chip time duration. We show that the autocorrelation function has to be considered as the sum of complex amplitudes of the combined chip as the laser source coherence time is much greater than the integration time of the photodetector. To reduce the sensitivity of the DS-OCDMA system to the coherence time of the laser source, we analyze the use of sparse and nonperiodic quadratic congruence and extended quadratic congruence codes.

  10. Depth Measurement Based on Infrared Coded Structured Light

    Directory of Open Access Journals (Sweden)

    Tong Jia

    2014-01-01

    Full Text Available Depth measurement is a challenging problem in computer vision research. In this study, we first design a new grid pattern and develop a sequence coding and decoding algorithm to process the pattern. Second, we propose a linear fitting algorithm to derive the linear relationship between the object depth and pixel shift. Third, we obtain depth information on an object based on this linear relationship. Moreover, 3D reconstruction is implemented based on Delaunay triangulation algorithm. Finally, we utilize the regularity of the error curves to correct the system errors and improve the measurement accuracy. The experimental results show that the accuracy of depth measurement is related to the step length of moving object.

  11. Complete plastid genome sequence of Primula sinensis (Primulaceae: structure comparison, sequence variation and evidence for accD transfer to nucleus

    Directory of Open Access Journals (Sweden)

    Tong-Jian Liu

    2016-06-01

    Full Text Available Species-rich genus Primula L. is a typical plant group with which to understand genetic variance between species in different levels of relationships. Chloroplast genome sequences are used to be the information resource for quantifying this difference and reconstructing evolutionary history. In this study, we reported the complete chloroplast genome sequence of Primula sinensis and compared it with other related species. This genome of chloroplast showed a typical circular quadripartite structure with 150,859 bp in sequence length consisting of 37.2% GC base. Two inverted repeated regions (25,535 bp were separated by a large single-copy region (82,064 bp and a small single-copy region (17,725 bp. The genome consists of 112 genes, including 78 protein-coding genes, 30 tRNA genes and four rRNA genes. Among them, seven coding genes, seven tRNA genes and four rRNA genes have two copies due to their locations in the IR regions. The accD and infA genes lacking intact open reading frames (ORF were identified as pseudogenes. SSR and sequence variation analyses were also performed on the plastome of Primula sinensis, comparing with another available plastome of P. poissonii. The four most variable regions, rpl36–rps8, rps16–trnQ, trnH–psbA and ndhC–trnV, were identified. Phylogenetic relationship estimates using three sub-datasets extracted from a matrix of 57 protein-coding gene sequences showed the identical result that was consistent with previous studies. A transcript found from P. sinensis transcriptome showed a high similarity to plastid accD functional region and was identified as a putative plastid transit peptide at the N-terminal region. The result strongly suggested that plastid accD has been functionally transferred to the nucleus in P. sinensis.

  12. Construction experience with Fermilab-built full length 50mm SSC dipoles

    International Nuclear Information System (INIS)

    Blessing, M.J.; Hoffman, D.E.; Packer, M.D.; Gordon, M.; Higinbotham, W.; Sims, R.

    1992-03-01

    Fourteen full length SSC dipole magnets are being built and tested at Fermilab. Their purpose is to verify the magnet design as well as transfer the construction technology to industry. Magnet design is summarized. Construction problems and their solutions are discussed. Topics include coil winding, curing and measuring, collaring, instrumentation, end clamp installation, yoking and electrical and mechanical interconnection

  13. Full-length high-temperature severe fuel damage test No. 2

    International Nuclear Information System (INIS)

    Hesson, G.M.; Lombardo, N.J.; Pilger, J.P.; Rausch, W.N.; King, L.L.; Hurley, D.E.; Parchen, L.J.; Panisko, F.E.

    1993-09-01

    Hazardous conditions associated with performing the Full-Length High- Temperature (FLHT). Severe Fuel Damage Test No. 2 experiment have been analyzed. Major hazards that could cause harm or damage are (1) radioactive fission products, (2) radiation fields, (3) reactivity changes, (4) hydrogen generation, (5) materials at high temperature, (6) steam explosion, and (7) steam pressure pulse. As a result of this analysis, it is concluded that with proper precautions the FLHT- 2 test can be safely conducted

  14. A new strategy for full-length Ebola virus glycoprotein expression in E.coli.

    Science.gov (United States)

    Zai, Junjie; Yi, Yinhua; Xia, Han; Zhang, Bo; Yuan, Zhiming

    2016-12-01

    Ebola virus (EBOV) causes severe hemorrhagic fever in humans and non-human primates with high rates of fatality. Glycoprotein (GP) is the only envelope protein of EBOV, which may play a critical role in virus attachment and entry as well as stimulating host protective immune responses. However, the lack of expression of full-length GP in Escherichia coli hinders the further study of its function in viral pathogenesis. In this study, the vp40 gene was fused to the full-length gp gene and cloned into a prokaryotic expression vector. We showed that the VP40-GP and GP-VP40 fusion proteins could be expressed in E.coli at 16 °C. In addition, it was shown that the position of vp40 in the fusion proteins affected the yields of the fusion proteins, with a higher level of production of the fusion protein when vp40 was upstream of gp compared to when it was downstream. The results provide a strategy for the expression of a large quantity of EBOV full-length GP, which is of importance for further analyzing the relationship between the structure and function of GP and developing an antibody for the treatment of EBOV infection.

  15. Modeling coding-sequence evolution within the context of residue solvent accessibility

    Directory of Open Access Journals (Sweden)

    Scherrer Michael P

    2012-09-01

    Full Text Available Abstract Background Protein structure mediates site-specific patterns of sequence divergence. In particular, residues in the core of a protein (solvent-inaccessible residues tend to be more evolutionarily conserved than residues on the surface (solvent-accessible residues. Results Here, we present a model of sequence evolution that explicitly accounts for the relative solvent accessibility of each residue in a protein. Our model is a variant of the Goldman-Yang 1994 (GY94 model in which all model parameters can be functions of the relative solvent accessibility (RSA of a residue. We apply this model to a data set comprised of nearly 600 yeast genes, and find that an evolutionary-rate ratio ω that varies linearly with RSA provides a better model fit than an RSA-independent ω or an ω that is estimated separately in individual RSA bins. We further show that the branch length t and the transition-transverion ratio κ also vary with RSA. The RSA-dependent GY94 model performs better than an RSA-dependent Muse-Gaut 1994 (MG94 model in which the synonymous and non-synonymous rates individually are linear functions of RSA. Finally, protein core size affects the slope of the linear relationship between ω and RSA, and gene expression level affects both the intercept and the slope. Conclusions Structure-aware models of sequence evolution provide a significantly better fit than traditional models that neglect structure. The linear relationship between ω and RSA implies that genes are better characterized by their ω slope and intercept than by just their mean ω.

  16. Full genome sequences and molecular characterization of tick-borne encephalitis virus strains isolated from human patients.

    Science.gov (United States)

    Formanová, Petra; Černý, Jiří; Bolfíková, Barbora Černá; Valdés, James J; Kozlova, Irina; Dzhioev, Yuri; Růžek, Daniel

    2015-02-01

    Tick-borne encephalitis virus (TBEV) causes tick-borne encephalitis (TBE), one of the most important human neuroinfections across Eurasia. Up to date, only three full genome sequences of human European TBEV isolates are available, mostly due to difficulties with isolation of the virus from human patients. Here we present full genome characterization of an additional five low-passage TBEV strains isolated from human patients with severe forms of TBE. These strains were isolated in 1953 within Central Bohemia in the former Czechoslovakia, and belong to the historically oldest human TBEV isolates in Europe. We demonstrate here that all analyzed isolates are distantly phylogenetically related, indicating that the emergence of TBE in Central Europe was not caused by one predominant strain, but rather a pool of distantly related TBEV strains. Nucleotide identity between individual sequenced TBEV strains ranged from 97.5% to 99.6% and all strains shared large deletions in the 3' non-coding region, which has been recently suggested to be an important determinant of virulence. The number of unique amino acid substitutions varied from 3 to 9 in individual isolates, but no characteristic amino acid substitution typical exclusively for all human TBEV isolates was identified when compared to the isolates from ticks. We did, however, correlate that the exploration of the TBEV envelope glycoprotein by specific antibodies were in close proximity to these unique amino acid substitutions. Taken together, we report here the largest number of patient-derived European TBEV full genome sequences to date and provide a platform for further studies on evolution of TBEV since the first emergence of human TBE in Europe. Copyright © 2014 Elsevier GmbH. All rights reserved.

  17. Properties of Sequence Conservation in Upstream Regulatory and Protein Coding Sequences among Paralogs in Arabidopsis thaliana

    Science.gov (United States)

    Richardson, Dale N.; Wiehe, Thomas

    Whole genome duplication (WGD) has catalyzed the formation of new species, genes with novel functions, altered expression patterns, complexified signaling pathways and has provided organisms a level of genetic robustness. We studied the long-term evolution and interrelationships of 5’ upstream regulatory sequences (URSs), protein coding sequences (CDSs) and expression correlations (EC) of duplicated gene pairs in Arabidopsis. Three distinct methods revealed significant evolutionary conservation between paralogous URSs and were highly correlated with microarray-based expression correlation of the respective gene pairs. Positional information on exact matches between sequences unveiled the contribution of micro-chromosomal rearrangements on expression divergence. A three-way rank analysis of URS similarity, CDS divergence and EC uncovered specific gene functional biases. Transcription factor activity was associated with gene pairs exhibiting conserved URSs and divergent CDSs, whereas a broad array of metabolic enzymes was found to be associated with gene pairs showing diverged URSs but conserved CDSs.

  18. Performance Analysis of New Binary User Codes for DS-CDMA Communication

    Science.gov (United States)

    Usha, Kamle; Jaya Sankar, Kottareddygari

    2016-03-01

    This paper analyzes new binary spreading codes through correlation properties and also presents their performance over additive white Gaussian noise (AWGN) channel. The proposed codes are constructed using gray and inverse gray codes. In this paper, a n-bit gray code appended by its n-bit inverse gray code to construct the 2n-length binary user codes are discussed. Like Walsh codes, these binary user codes are available in sizes of power of two and additionally code sets of length 6 and their even multiples are also available. The simple construction technique and generation of code sets of different sizes are the salient features of the proposed codes. Walsh codes and gold codes are considered for comparison in this paper as these are popularly used for synchronous and asynchronous multi user communications respectively. In the current work the auto and cross correlation properties of the proposed codes are compared with those of Walsh codes and gold codes. Performance of the proposed binary user codes for both synchronous and asynchronous direct sequence CDMA communication over AWGN channel is also discussed in this paper. The proposed binary user codes are found to be suitable for both synchronous and asynchronous DS-CDMA communication.

  19. Dynamic Shannon Coding

    OpenAIRE

    Gagie, Travis

    2005-01-01

    We present a new algorithm for dynamic prefix-free coding, based on Shannon coding. We give a simple analysis and prove a better upper bound on the length of the encoding produced than the corresponding bound for dynamic Huffman coding. We show how our algorithm can be modified for efficient length-restricted coding, alphabetic coding and coding with unequal letter costs.

  20. Minimizing coupling loss by selection of twist pitch lengths in multi-stage cable-in-conduit conductors

    International Nuclear Information System (INIS)

    Rolando, G; Nijhuis, A; Devred, A

    2014-01-01

    The numerical code JackPot-ACDC (van Lanen et al 2010 Cryogenics 50 139–48, van Lanen et al 2011 IEEE Trans. Appl. Supercond. 21 1926–9, van Lanen et al 2012 Supercond. Sci. Technol. 25 025012) allows fast parametric studies of the electro-magnetic performance of cable-in-conduit conductors (CICCs). In this paper the code is applied to the analysis of the relation between twist pitch length sequence and coupling loss in multi-stage ITER-type CICCs. The code shows that in the analysed conductors the coupling loss is at its minimum when the twist pitches of the successive cabling stages have a length ratio close to one. It is also predicted that by careful selection of the stage-to-stage twist pitch ratio, CICCs cabled according to long twist schemes in the initial stages can achieve lower coupling loss than conductors with shorter pitches. The result is validated by AC loss measurements performed on prototype conductors for the ITER Central Solenoid featuring different twist pitch sequences. (paper)

  1. Licensee Event Report sequence coding and search procedure workshop

    International Nuclear Information System (INIS)

    Cottrell, W.B.; Gallaher, R.B.

    1981-01-01

    Since mid-1980, the Office for Analysis and Evaluation of Operational Data (AEOD) of the Nuclear Regulatory Commission (NRC) has been developing procedures for the systematic review and analysis of Licensee Event Reports (LERs). These procedures generally address several areas of concern, including identification of significant trends and patterns, event sequence of occurrences, component failures, and system and plant effects. The AEOD and NSIC conducted a workshop on the new coding procedure at the American Museum of Science and Energy in Oak Ridge, TN, on November 24, 1980

  2. [Influence of "prehistory" of sequential movements of the right and the left hand on reproduction: coding of positions, movements and sequence structure].

    Science.gov (United States)

    Bobrova, E V; Liakhovetskiĭ, V A; Borshchevskaia, E R

    2011-01-01

    The dependence of errors during reproduction of a sequence of hand movements without visual feedback on the previous right- and left-hand performance ("prehistory") and on positions in space of sequence elements (random or ordered by the explicit rule) was analyzed. It was shown that the preceding information about the ordered positions of the sequence elements was used during right-hand movements, whereas left-hand movements were performed with involvement of the information about the random sequence. The data testify to a central mechanism of the analysis of spatial structure of sequence elements. This mechanism activates movement coding specific for the left hemisphere (vector coding) in case of an ordered sequence structure and positional coding specific for the right hemisphere in case of a random sequence structure.

  3. Two-terminal video coding.

    Science.gov (United States)

    Yang, Yang; Stanković, Vladimir; Xiong, Zixiang; Zhao, Wei

    2009-03-01

    Following recent works on the rate region of the quadratic Gaussian two-terminal source coding problem and limit-approaching code designs, this paper examines multiterminal source coding of two correlated, i.e., stereo, video sequences to save the sum rate over independent coding of both sequences. Two multiterminal video coding schemes are proposed. In the first scheme, the left sequence of the stereo pair is coded by H.264/AVC and used at the joint decoder to facilitate Wyner-Ziv coding of the right video sequence. The first I-frame of the right sequence is successively coded by H.264/AVC Intracoding and Wyner-Ziv coding. An efficient stereo matching algorithm based on loopy belief propagation is then adopted at the decoder to produce pixel-level disparity maps between the corresponding frames of the two decoded video sequences on the fly. Based on the disparity maps, side information for both motion vectors and motion-compensated residual frames of the right sequence are generated at the decoder before Wyner-Ziv encoding. In the second scheme, source splitting is employed on top of classic and Wyner-Ziv coding for compression of both I-frames to allow flexible rate allocation between the two sequences. Experiments with both schemes on stereo video sequences using H.264/AVC, LDPC codes for Slepian-Wolf coding of the motion vectors, and scalar quantization in conjunction with LDPC codes for Wyner-Ziv coding of the residual coefficients give a slightly lower sum rate than separate H.264/AVC coding of both sequences at the same video quality.

  4. From concatenated codes to graph codes

    DEFF Research Database (Denmark)

    Justesen, Jørn; Høholdt, Tom

    2004-01-01

    We consider codes based on simple bipartite expander graphs. These codes may be seen as the first step leading from product type concatenated codes to more complex graph codes. We emphasize constructions of specific codes of realistic lengths, and study the details of decoding by message passing...

  5. Arc Length Coding by Interference of Theta Frequency Oscillations May Underlie Context-Dependent Hippocampal Unit Data and Episodic Memory Function

    Science.gov (United States)

    Hasselmo, Michael E.

    2007-01-01

    Many memory models focus on encoding of sequences by excitatory recurrent synapses in region CA3 of the hippocampus. However, data and modeling suggest an alternate mechanism for encoding of sequences in which interference between theta frequency oscillations encodes the position within a sequence based on spatial arc length or time. Arc length…

  6. Enhanced Protein Production in Escherichia coli by Optimization of Cloning Scars at the Vector-Coding Sequence Junction

    DEFF Research Database (Denmark)

    Mirzadeh, Kiavash; Martinez, Virginia; Toddo, Stephen

    2015-01-01

    are poorly expressed even when they are codon-optimized and expressed from vectors with powerful genetic elements. In this study, we show that poor expression can be caused by certain nucleotide sequences (e.g., cloning scars) at the junction between the vector and the coding sequence. Since these sequences...

  7. Sequencing of BAC pools by different next generation sequencing platforms and strategies

    Directory of Open Access Journals (Sweden)

    Scholz Uwe

    2011-10-01

    Full Text Available Abstract Background Next generation sequencing of BACs is a viable option for deciphering the sequence of even large and highly repetitive genomes. In order to optimize this strategy, we examined the influence of read length on the quality of Roche/454 sequence assemblies, to what extent Illumina/Solexa mate pairs (MPs improve the assemblies by scaffolding and whether barcoding of BACs is dispensable. Results Sequencing four BACs with both FLX and Titanium technologies revealed similar sequencing accuracy, but showed that the longer Titanium reads produce considerably less misassemblies and gaps. The 454 assemblies of 96 barcoded BACs were improved by scaffolding 79% of the total contig length with MPs from a non-barcoded library. Assembly of the unmasked 454 sequences without separation by barcodes revealed chimeric contig formation to be a major problem, encompassing 47% of the total contig length. Masking the sequences reduced this fraction to 24%. Conclusion Optimal BAC pool sequencing should be based on the longest available reads, with barcoding essential for a comprehensive assessment of both repetitive and non-repetitive sequence information. When interest is restricted to non-repetitive regions and repeats are masked prior to assembly, barcoding is non-essential. In any case, the assemblies can be improved considerably by scaffolding with non-barcoded BAC pool MPs.

  8. In vitro cytotoxicity of Manville Code 100 glass fibers: Effect of fiber length on human alveolar macrophages

    Directory of Open Access Journals (Sweden)

    Jones William

    2006-03-01

    Full Text Available Abstract Background Synthetic vitreous fibers (SVFs are inorganic noncrystalline materials widely used in residential and industrial settings for insulation, filtration, and reinforcement purposes. SVFs conventionally include three major categories: fibrous glass, rock/slag/stone (mineral wool, and ceramic fibers. Previous in vitro studies from our laboratory demonstrated length-dependent cytotoxic effects of glass fibers on rat alveolar macrophages which were possibly associated with incomplete phagocytosis of fibers ≥ 17 μm in length. The purpose of this study was to examine the influence of fiber length on primary human alveolar macrophages, which are larger in diameter than rat macrophages, using length-classified Manville Code 100 glass fibers (8, 10, 16, and 20 μm. It was hypothesized that complete engulfment of fibers by human alveolar macrophages could decrease fiber cytotoxicity; i.e. shorter fibers that can be completely engulfed might not be as cytotoxic as longer fibers. Human alveolar macrophages, obtained by segmental bronchoalveolar lavage of healthy, non-smoking volunteers, were treated with three different concentrations (determined by fiber number of the sized fibers in vitro. Cytotoxicity was assessed by monitoring cytosolic lactate dehydrogenase release and loss of function as indicated by a decrease in zymosan-stimulated chemiluminescence. Results Microscopic analysis indicated that human alveolar macrophages completely engulfed glass fibers of the 20 μm length. All fiber length fractions tested exhibited equal cytotoxicity on a per fiber basis, i.e. increasing lactate dehydrogenase and decreasing chemiluminescence in the same concentration-dependent fashion. Conclusion The data suggest that due to the larger diameter of human alveolar macrophages, compared to rat alveolar macrophages, complete phagocytosis of longer fibers can occur with the human cells. Neither incomplete phagocytosis nor length-dependent toxicity was

  9. VP1u phospholipase activity is critical for infectivity of full-length parvovirus B19 genomic clones.

    Science.gov (United States)

    Filippone, Claudia; Zhi, Ning; Wong, Susan; Lu, Jun; Kajigaya, Sachiko; Gallinella, Giorgio; Kakkola, Laura; Söderlund-Venermo, Maria; Young, Neal S; Brown, Kevin E

    2008-05-10

    Three full-length genomic clones (pB19-M20, pB19-FL and pB19-HG1) of parvovirus B19 were produced in different laboratories. pB19-M20 was shown to produce infectious virus. To determine the differences in infectivity, all three plasmids were tested by transfection and infection assays. All three clones were similar in viral DNA replication, RNA transcription, and viral capsid protein production. However, only pB19-M20 and pB19-HG1 produced infectious virus. Comparison of viral sequences showed no significant differences in ITR or NS regions. In the capsid region, there was a nucleotide sequence difference conferring an amino acid substitution (E176K) in the phospholipase A2-like motif of the VP1-unique (VP1u) region. The recombinant VP1u with the E176K mutation had no catalytic activity as compared with the wild-type. When this mutation was introduced into pB19-M20, infectivity was significantly attenuated, confirming the critical role of this motif. Investigation of the original serum from which pB19-FL was cloned confirmed that the phospholipase mutation was present in the native B19 virus.

  10. VP1u phospholipase activity is critical for infectivity of full-length parvovirus B19 genomic clones✰

    Science.gov (United States)

    Filippone, Claudia; Zhi, Ning; Wong, Susan; Lu, Jun; Kajigaya, Sachiko; Gallinella, Giorgio; Kakkola, Laura; Venermo, Maria S Söderlund; Young, Neal S.; Brown, Kevin E.

    2008-01-01

    Three full-length genomic clones (pB19-M20, pB19-FL and pB19-HG1) of parvovirus B19 were produced in different laboratories. pB19-M20 was shown to produce infectious virus. To determine the differences in infectivity, all three plasmids were tested by transfection and infection assays. All three clones were similar in viral DNA replication, RNA transcription, and viral capsid protein production. However, only pB19-M20 and pB19-HG1 produced infectious virus. Comparison of viral sequences showed no significant differences in ITR or NS regions. In the capsid region, there was a nucleotide sequence difference conferring an amino acid substitution (E176K) in the phospholipase A2-like motif of the VP1-unique (VP1u) region. The recombinant VP1u with the E176K mutation had no catalytic activity as compared with the wild-type. When this mutation was introduced into pB19-M20, infectivity was significantly attenuated, confirming the critical role of this motif. Investigation of the original serum from which pB19-FL was cloned confirmed that the phospholipase mutation was present in the native B19 virus. PMID:18252260

  11. Full-length high-temperature severe fuel damage test No. 5

    International Nuclear Information System (INIS)

    Lanning, D.D.; Lombardo, N.J.; Hensley, W.K.; Fitzsimmons, D.E.; Panisko, F.E.; Hartwell, J.K.

    1993-09-01

    This report describes and presents data from a severe fuel damage test that was conducted in the National Research Universal (NRU) reactor at Chalk River Nuclear Laboratories (CRNL), Ontario, Canada. The test, designated FLHT-5, was the fourth in a series of full-length high-temperature (FLHT) tests on light-water reactor fuel. The tests were designed and performed by staff from the US Department of Energy's Pacific Northwest Laboratory (PNL), operated by Battelle Memorial Institute. The test operation and test results are described in this report. The fuel bundle in the FLHT-5 experiment included 10 unirradiated full-length pressurized-water reactor (PWR) rods, 1 irradiated PWR rod and 1 dummy gamma thermometer. The fuel rods were subjected to a very low coolant flow while operating at low fission power. This caused coolant boilaway, rod dryout and overheating to temperatures above 2600 K, severe fuel rod damage, hydrogen generation, and fission product release. The test assembly and its effluent path were extensively instrumented to record temperatures, pressures, flow rates, hydrogen evolution, and fission product release during the boilaway/heatup transient. Post-test gamma scanning of the upper plenum indicated significant iodine and cesium release and deposition. Both stack gas activity and on-line gamma spectrometer data indicated significant (∼50%) release of noble fission gases. Post-test visual examination of one side of the fuel bundle revealed no massive relocation and flow blockage; however, rundown of molten cladding was evident

  12. Non-tables look-up search algorithm for efficient H.264/AVC context-based adaptive variable length coding decoding

    Science.gov (United States)

    Han, Yishi; Luo, Zhixiao; Wang, Jianhua; Min, Zhixuan; Qin, Xinyu; Sun, Yunlong

    2014-09-01

    In general, context-based adaptive variable length coding (CAVLC) decoding in H.264/AVC standard requires frequent access to the unstructured variable length coding tables (VLCTs) and significant memory accesses are consumed. Heavy memory accesses will cause high power consumption and time delays, which are serious problems for applications in portable multimedia devices. We propose a method for high-efficiency CAVLC decoding by using a program instead of all the VLCTs. The decoded codeword from VLCTs can be obtained without any table look-up and memory access. The experimental results show that the proposed algorithm achieves 100% memory access saving and 40% decoding time saving without degrading video quality. Additionally, the proposed algorithm shows a better performance compared with conventional CAVLC decoding, such as table look-up by sequential search, table look-up by binary search, Moon's method, and Kim's method.

  13. Rational Design of High-Number dsDNA Fragments Based on Thermodynamics for the Construction of Full-Length Genes in a Single Reaction.

    Science.gov (United States)

    Birla, Bhagyashree S; Chou, Hui-Hsien

    2015-01-01

    Gene synthesis is frequently used in modern molecular biology research either to create novel genes or to obtain natural genes when the synthesis approach is more flexible and reliable than cloning. DNA chemical synthesis has limits on both its length and yield, thus full-length genes have to be hierarchically constructed from synthesized DNA fragments. Gibson Assembly and its derivatives are the simplest methods to assemble multiple double-stranded DNA fragments. Currently, up to 12 dsDNA fragments can be assembled at once with Gibson Assembly according to its vendor. In practice, the number of dsDNA fragments that can be assembled in a single reaction are much lower. We have developed a rational design method for gene construction that allows high-number dsDNA fragments to be assembled into full-length genes in a single reaction. Using this new design method and a modified version of the Gibson Assembly protocol, we have assembled 3 different genes from up to 45 dsDNA fragments at once. Our design method uses the thermodynamic analysis software Picky that identifies all unique junctions in a gene where consecutive DNA fragments are specifically made to connect to each other. Our novel method is generally applicable to most gene sequences, and can improve both the efficiency and cost of gene assembly.

  14. Alignment-free Transcriptomic and Metatranscriptomic Comparison Using Sequencing Signatures with Variable Length Markov Chains.

    Science.gov (United States)

    Liao, Weinan; Ren, Jie; Wang, Kun; Wang, Shun; Zeng, Feng; Wang, Ying; Sun, Fengzhu

    2016-11-23

    The comparison between microbial sequencing data is critical to understand the dynamics of microbial communities. The alignment-based tools analyzing metagenomic datasets require reference sequences and read alignments. The available alignment-free dissimilarity approaches model the background sequences with Fixed Order Markov Chain (FOMC) yielding promising results for the comparison of microbial communities. However, in FOMC, the number of parameters grows exponentially with the increase of the order of Markov Chain (MC). Under a fixed high order of MC, the parameters might not be accurately estimated owing to the limitation of sequencing depth. In our study, we investigate an alternative to FOMC to model background sequences with the data-driven Variable Length Markov Chain (VLMC) in metatranscriptomic data. The VLMC originally designed for long sequences was extended to apply to high-throughput sequencing reads and the strategies to estimate the corresponding parameters were developed. The flexible number of parameters in VLMC avoids estimating the vast number of parameters of high-order MC under limited sequencing depth. Different from the manual selection in FOMC, VLMC determines the MC order adaptively. Several beta diversity measures based on VLMC were applied to compare the bacterial RNA-Seq and metatranscriptomic datasets. Experiments show that VLMC outperforms FOMC to model the background sequences in transcriptomic and metatranscriptomic samples. A software pipeline is available at https://d2vlmc.codeplex.com.

  15. FOURTH SEMINAR TO THE MEMORY OF D.N. KLYSHKO: Algebraic solution of the synthesis problem for coded sequences

    Science.gov (United States)

    Leukhin, Anatolii N.

    2005-08-01

    The algebraic solution of a 'complex' problem of synthesis of phase-coded (PC) sequences with the zero level of side lobes of the cyclic autocorrelation function (ACF) is proposed. It is shown that the solution of the synthesis problem is connected with the existence of difference sets for a given code dimension. The problem of estimating the number of possible code combinations for a given code dimension is solved. It is pointed out that the problem of synthesis of PC sequences is related to the fundamental problems of discrete mathematics and, first of all, to a number of combinatorial problems, which can be solved, as the number factorisation problem, by algebraic methods by using the theory of Galois fields and groups.

  16. Improved Design of Unequal Error Protection LDPC Codes

    Directory of Open Access Journals (Sweden)

    Sandberg Sara

    2010-01-01

    Full Text Available We propose an improved method for designing unequal error protection (UEP low-density parity-check (LDPC codes. The method is based on density evolution. The degree distribution with the best UEP properties is found, under the constraint that the threshold should not exceed the threshold of a non-UEP code plus some threshold offset. For different codeword lengths and different construction algorithms, we search for good threshold offsets for the UEP code design. The choice of the threshold offset is based on the average a posteriori variable node mutual information. Simulations reveal the counter intuitive result that the short-to-medium length codes designed with a suitable threshold offset all outperform the corresponding non-UEP codes in terms of average bit-error rate. The proposed codes are also compared to other UEP-LDPC codes found in the literature.

  17. SinEx DB: a database for single exon coding sequences in mammalian genomes.

    Science.gov (United States)

    Jorquera, Roddy; Ortiz, Rodrigo; Ossandon, F; Cárdenas, Juan Pablo; Sepúlveda, Rene; González, Carolina; Holmes, David S

    2016-01-01

    Eukaryotic genes are typically interrupted by intragenic, noncoding sequences termed introns. However, some genes lack introns in their coding sequence (CDS) and are generally known as 'single exon genes' (SEGs). In this work, a SEG is defined as a nuclear, protein-coding gene that lacks introns in its CDS. Whereas, many public databases of Eukaryotic multi-exon genes are available, there are only two specialized databases for SEGs. The present work addresses the need for a more extensive and diverse database by creating SinEx DB, a publicly available, searchable database of predicted SEGs from 10 completely sequenced mammalian genomes including human. SinEx DB houses the DNA and protein sequence information of these SEGs and includes their functional predictions (KOG) and the relative distribution of these functions within species. The information is stored in a relational database built with My SQL Server 5.1.33 and the complete dataset of SEG sequences and their functional predictions are available for downloading. SinEx DB can be interrogated by: (i) a browsable phylogenetic schema, (ii) carrying out BLAST searches to the in-house SinEx DB of SEGs and (iii) via an advanced search mode in which the database can be searched by key words and any combination of searches by species and predicted functions. SinEx DB provides a rich source of information for advancing our understanding of the evolution and function of SEGs.Database URL: www.sinex.cl. © The Author(s) 2016. Published by Oxford University Press.

  18. RevTrans: multiple alignment of coding DNA from aligned amino acid sequences

    DEFF Research Database (Denmark)

    Wernersson, Rasmus; Pedersen, Anders Gorm

    2003-01-01

    The simple fact that proteins are built from 20 amino acids while DNA only contains four different bases, means that the 'signal-to-noise ratio' in protein sequence alignments is much better than in alignments of DNA. Besides this information-theoretical advantage, protein alignments also benefit...... proteins. It is therefore preferable to align coding DNA at the amino acid level and it is for this purpose we have constructed the program RevTrans. RevTrans constructs a multiple DNA alignment by: (i) translating the DNA; (ii) aligning the resulting peptide sequences; and (iii) building a multiple DNA...

  19. Classifying Coding DNA with Nucleotide Statistics

    Directory of Open Access Journals (Sweden)

    Nicolas Carels

    2009-10-01

    Full Text Available In this report, we compared the success rate of classification of coding sequences (CDS vs. introns by Codon Structure Factor (CSF and by a method that we called Universal Feature Method (UFM. UFM is based on the scoring of purine bias (Rrr and stop codon frequency. We show that the success rate of CDS/intron classification by UFM is higher than by CSF. UFM classifies ORFs as coding or non-coding through a score based on (i the stop codon distribution, (ii the product of purine probabilities in the three positions of nucleotide triplets, (iii the product of Cytosine (C, Guanine (G, and Adenine (A probabilities in the 1st, 2nd, and 3rd positions of triplets, respectively, (iv the probabilities of G in 1st and 2nd position of triplets and (v the distance of their GC3 vs. GC2 levels to the regression line of the universal correlation. More than 80% of CDSs (true positives of Homo sapiens (>250 bp, Drosophila melanogaster (>250 bp and Arabidopsis thaliana (>200 bp are successfully classified with a false positive rate lower or equal to 5%. The method releases coding sequences in their coding strand and coding frame, which allows their automatic translation into protein sequences with 95% confidence. The method is a natural consequence of the compositional bias of nucleotides in coding sequences.

  20. SHARAKU: an algorithm for aligning and clustering read mapping profiles of deep sequencing in non-coding RNA processing.

    Science.gov (United States)

    Tsuchiya, Mariko; Amano, Kojiro; Abe, Masaya; Seki, Misato; Hase, Sumitaka; Sato, Kengo; Sakakibara, Yasubumi

    2016-06-15

    Deep sequencing of the transcripts of regulatory non-coding RNA generates footprints of post-transcriptional processes. After obtaining sequence reads, the short reads are mapped to a reference genome, and specific mapping patterns can be detected called read mapping profiles, which are distinct from random non-functional degradation patterns. These patterns reflect the maturation processes that lead to the production of shorter RNA sequences. Recent next-generation sequencing studies have revealed not only the typical maturation process of miRNAs but also the various processing mechanisms of small RNAs derived from tRNAs and snoRNAs. We developed an algorithm termed SHARAKU to align two read mapping profiles of next-generation sequencing outputs for non-coding RNAs. In contrast with previous work, SHARAKU incorporates the primary and secondary sequence structures into an alignment of read mapping profiles to allow for the detection of common processing patterns. Using a benchmark simulated dataset, SHARAKU exhibited superior performance to previous methods for correctly clustering the read mapping profiles with respect to 5'-end processing and 3'-end processing from degradation patterns and in detecting similar processing patterns in deriving the shorter RNAs. Further, using experimental data of small RNA sequencing for the common marmoset brain, SHARAKU succeeded in identifying the significant clusters of read mapping profiles for similar processing patterns of small derived RNA families expressed in the brain. The source code of our program SHARAKU is available at http://www.dna.bio.keio.ac.jp/sharaku/, and the simulated dataset used in this work is available at the same link. Accession code: The sequence data from the whole RNA transcripts in the hippocampus of the left brain used in this work is available from the DNA DataBank of Japan (DDBJ) Sequence Read Archive (DRA) under the accession number DRA004502. yasu@bio.keio.ac.jp Supplementary data are available

  1. Allelic sequence variations in the hypervariable region of a T-cell receptor β chain: Correlation with restriction fragment length polymorphism in human families and populations

    International Nuclear Information System (INIS)

    Robinson, M.A.

    1989-01-01

    Direct sequence analysis of the human T-cell antigen receptor (TCR) V β1 variable gene identified a single base-pair allelic variation (C/G) located within the coding region. This change results in substitution of a histidine (CAC) for a glutamine (CAG) at position 48 of the TCR β chain, a position predicted to be in the TCR antigen binding site. The V β1 polymorphism was found by DNA sequence analysis of V β1 genes from seven unrelated individuals; V β1 genes were amplified by the polymerase chain reaction, the amplified fragments were cloned into M13 phage vectors, and sequences were determined. To determined the inheritance patterns of the V β1 substitution and to test correlation with V β1 restriction fragment length polymorphism detected with Pvu II and Taq I, allele-specific oligonucleotides were constructed and used to characterize amplified DNA samples. Seventy unrelated individuals and six families were tested for both restriction fragment length polymorphism and for the V β1 substitution. The correlation was also tested using amplified, size-selected, Pvu II- and Taq I-digested DNA samples from heterozygotes. Pvu II allele 1 (61/70) and Taq I allele 1 (66/70) were found to be correlated with the substitution giving rise to a histidine at position 48. Because there are exceptions to the correlation, the use of specific probes to characterize allelic forms of TCR variable genes will provide important tools for studies of basic TCR genetics and disease associations

  2. Communicating the Benefits of a Full Sequence of High School Science Courses

    Science.gov (United States)

    Nicholas, Catherine Marie

    High school students are generally uninformed about the benefits of enrolling in a full sequence of science courses, therefore only about a third of our nation's high school graduates have completed the science sequence of Biology, Chemistry and Physics. The lack of students completing a full sequence of science courses contributes to the deficit in the STEM degree production rate needed to fill the demand of the current job market and remain competitive as a nation. The purpose of the study was to make a difference in the number of students who have access to information about the benefits of completing a full sequence of science courses. This dissertation study employed qualitative research methodology to gain a broad perspective of staff through a questionnaire and document review and then a deeper understanding through semi-structured interview protocol. The data revealed that a universal sequence of science courses in the high school district did not exist. It also showed that not all students had access to all science courses; students were sorted and tracked according to prerequisites that did not necessarily match the skill set needed for the courses. In addition, the study showed a desire for more support and direction from the district office. It was also apparent that there was a disconnect that existed between who staff members believed should enroll in a full sequence of science courses and who actually enrolled. Finally, communication about science was shown to occur mainly through counseling and peers. A common science sequence, detracking of science courses, increased communication about the postsecondary and academic benefits of a science education, increased district direction and realistic mathematics alignment were all discussed as solutions to the problem.

  3. The complete chloroplast genome sequence of Hibiscus syriacus.

    Science.gov (United States)

    Kwon, Hae-Yun; Kim, Joon-Hyeok; Kim, Sea-Hyun; Park, Ji-Min; Lee, Hyoshin

    2016-09-01

    The complete chloroplast genome sequence of Hibiscus syriacus L. is presented in this study. The genome is composed of 161 019 bp in length, with a typical circular structure containing a pair of inverted repeats of 25 745 bp of length separated by a large single-copy region and a small single-copy region of 89 698 bp and 19 831 bp of length, respectively. The overall GC content is 36.8%. One hundred and fourteen genes were annotated, including 81 protein-coding genes, 4 ribosomal RNA genes and 29 transfer RNA genes.

  4. Pulp regeneration in a full-length human tooth root using a hierarchical nanofibrous microsphere system.

    Science.gov (United States)

    Li, Xiangwei; Ma, Chi; Xie, Xiaohua; Sun, Hongchen; Liu, Xiaohua

    2016-04-15

    While pulp regeneration using tissue engineering strategy has been explored for over a decade, successful regeneration of pulp tissues in a full-length human root with a one-end seal that truly simulates clinical endodontic treatment has not been achieved. To address this challenge, we designed and synthesized a unique hierarchical growth factor-loaded nanofibrous microsphere scaffolding system. In this system, vascular endothelial growth factor (VEGF) binds with heparin and is encapsulated in heparin-conjugated gelatin nanospheres, which are further immobilized in the nanofibers of an injectable poly(l-lactic acid) (PLLA) microsphere. This hierarchical microsphere system not only protects the VEGF from denaturation and degradation, but also provides excellent control of its sustained release. In addition, the nanofibrous PLLA microsphere integrates the extracellular matrix-mimicking architecture with a highly porous injectable form, efficiently accommodating dental pulp stem cells (DPSCs) and supporting their proliferation and pulp tissue formation. Our in vivo study showed the successful regeneration of pulp-like tissues that fulfilled the entire apical and middle thirds and reached the coronal third of the full-length root canal. In addition, a large number of blood vessels were regenerated throughout the canal. For the first time, our work demonstrates the success of pulp tissue regeneration in a full-length root canal, making it a significant step toward regenerative endodontics. The regeneration of pulp tissues in a full-length tooth root canal has been one of the greatest challenges in the field of regenerative endodontics, and one of the biggest barriers for its clinical application. In this study, we developed a unique approach to tackle this challenge, and for the first time, we successfully regenerated living pulp tissues in a full-length root canal, making it a significant step toward regenerative endodontics. This study will make positive scientific

  5. Scalable production in human cells and biochemical characterization of full-length normal and mutant huntingtin.

    Directory of Open Access Journals (Sweden)

    Bin Huang

    Full Text Available Huntingtin (Htt is a 350 kD intracellular protein, ubiquitously expressed and mainly localized in the cytoplasm. Huntington's disease (HD is caused by a CAG triplet amplification in exon 1 of the corresponding gene resulting in a polyglutamine (polyQ expansion at the N-terminus of Htt. Production of full-length Htt has been difficult in the past and so far a scalable system or process has not been established for recombinant production of Htt in human cells. The ability to produce Htt in milligram quantities would be a prerequisite for many biochemical and biophysical studies aiming in a better understanding of Htt function under physiological conditions and in case of mutation and disease. For scalable production of full-length normal (17Q and mutant (46Q and 128Q Htt we have established two different systems, the first based on doxycycline-inducible Htt expression in stable cell lines, the second on "gutless" adenovirus mediated gene transfer. Purified material has then been used for biochemical characterization of full-length Htt. Posttranslational modifications (PTMs were determined and several new phosphorylation sites were identified. Nearly all PTMs in full-length Htt localized to areas outside of predicted alpha-solenoid protein regions. In all detected N-terminal peptides methionine as the first amino acid was missing and the second, alanine, was found to be acetylated. Differences in secondary structure between normal and mutant Htt, a helix-rich protein, were not observed in our study. Purified Htt tends to form dimers and higher order oligomers, thus resembling the situation observed with N-terminal fragments, although the mechanism of oligomer formation may be different.

  6. Detecting non-coding selective pressure in coding regions

    Directory of Open Access Journals (Sweden)

    Blanchette Mathieu

    2007-02-01

    Full Text Available Abstract Background Comparative genomics approaches, where orthologous DNA regions are compared and inter-species conserved regions are identified, have proven extremely powerful for identifying non-coding regulatory regions located in intergenic or intronic regions. However, non-coding functional elements can also be located within coding region, as is common for exonic splicing enhancers, some transcription factor binding sites, and RNA secondary structure elements affecting mRNA stability, localization, or translation. Since these functional elements are located in regions that are themselves highly conserved because they are coding for a protein, they generally escaped detection by comparative genomics approaches. Results We introduce a comparative genomics approach for detecting non-coding functional elements located within coding regions. Codon evolution is modeled as a mixture of codon substitution models, where each component of the mixture describes the evolution of codons under a specific type of coding selective pressure. We show how to compute the posterior distribution of the entropy and parsimony scores under this null model of codon evolution. The method is applied to a set of growth hormone 1 orthologous mRNA sequences and a known exonic splicing elements is detected. The analysis of a set of CORTBP2 orthologous genes reveals a region of several hundred base pairs under strong non-coding selective pressure whose function remains unknown. Conclusion Non-coding functional elements, in particular those involved in post-transcriptional regulation, are likely to be much more prevalent than is currently known. With the numerous genome sequencing projects underway, comparative genomics approaches like that proposed here are likely to become increasingly powerful at detecting such elements.

  7. Giardia telomeric sequence d(TAGGG)4 forms two intramolecular G-quadruplexes in K+ solution: effect of loop length and sequence on the folding topology.

    Science.gov (United States)

    Hu, Lanying; Lim, Kah Wai; Bouaziz, Serge; Phan, Anh Tuân

    2009-11-25

    Recently, it has been shown that in K(+) solution the human telomeric sequence d[TAGGG(TTAGGG)(3)] forms a (3 + 1) intramolecular G-quadruplex, while the Bombyx mori telomeric sequence d[TAGG(TTAGG)(3)], which differs from the human counterpart only by one G deletion in each repeat, forms a chair-type intramolecular G-quadruplex, indicating an effect of G-tract length on the folding topology of G-quadruplexes. To explore the effect of loop length and sequence on the folding topology of G-quadruplexes, here we examine the structure of the four-repeat Giardia telomeric sequence d[TAGGG(TAGGG)(3)], which differs from the human counterpart only by one T deletion within the non-G linker in each repeat. We show by NMR that this sequence forms two different intramolecular G-quadruplexes in K(+) solution. The first one is a novel basket-type antiparallel-stranded G-quadruplex containing two G-tetrads, a G x (A-G) triad, and two A x T base pairs; the three loops are consecutively edgewise-diagonal-edgewise. The second one is a propeller-type parallel-stranded G-quadruplex involving three G-tetrads; the three loops are all double-chain-reversal. Recurrence of several structural elements in the observed structures suggests a "cut and paste" principle for the design and prediction of G-quadruplex topologies, for which different elements could be extracted from one G-quadruplex and inserted into another.

  8. Coding and decoding libraries of sequence-defined functional copolymers synthesized via photoligation.

    Science.gov (United States)

    Zydziak, Nicolas; Konrad, Waldemar; Feist, Florian; Afonin, Sergii; Weidner, Steffen; Barner-Kowollik, Christopher

    2016-11-30

    Designing artificial macromolecules with absolute sequence order represents a considerable challenge. Here we report an advanced light-induced avenue to monodisperse sequence-defined functional linear macromolecules up to decamers via a unique photochemical approach. The versatility of the synthetic strategy-combining sequential and modular concepts-enables the synthesis of perfect macromolecules varying in chemical constitution and topology. Specific functions are placed at arbitrary positions along the chain via the successive addition of monomer units and blocks, leading to a library of functional homopolymers, alternating copolymers and block copolymers. The in-depth characterization of each sequence-defined chain confirms the precision nature of the macromolecules. Decoding of the functional information contained in the molecular structure is achieved via tandem mass spectrometry without recourse to their synthetic history, showing that the sequence information can be read. We submit that the presented photochemical strategy is a viable and advanced concept for coding individual monomer units along a macromolecular chain.

  9. Iterative List Decoding of Concatenated Source-Channel Codes

    Directory of Open Access Journals (Sweden)

    Hedayat Ahmadreza

    2005-01-01

    Full Text Available Whenever variable-length entropy codes are used in the presence of a noisy channel, any channel errors will propagate and cause significant harm. Despite using channel codes, some residual errors always remain, whose effect will get magnified by error propagation. Mitigating this undesirable effect is of great practical interest. One approach is to use the residual redundancy of variable length codes for joint source-channel decoding. In this paper, we improve the performance of residual redundancy source-channel decoding via an iterative list decoder made possible by a nonbinary outer CRC code. We show that the list decoding of VLC's is beneficial for entropy codes that contain redundancy. Such codes are used in state-of-the-art video coders, for example. The proposed list decoder improves the overall performance significantly in AWGN and fully interleaved Rayleigh fading channels.

  10. Adaptive decoding of convolutional codes

    Directory of Open Access Journals (Sweden)

    K. Hueske

    2007-06-01

    Full Text Available Convolutional codes, which are frequently used as error correction codes in digital transmission systems, are generally decoded using the Viterbi Decoder. On the one hand the Viterbi Decoder is an optimum maximum likelihood decoder, i.e. the most probable transmitted code sequence is obtained. On the other hand the mathematical complexity of the algorithm only depends on the used code, not on the number of transmission errors. To reduce the complexity of the decoding process for good transmission conditions, an alternative syndrome based decoder is presented. The reduction of complexity is realized by two different approaches, the syndrome zero sequence deactivation and the path metric equalization. The two approaches enable an easy adaptation of the decoding complexity for different transmission conditions, which results in a trade-off between decoding complexity and error correction performance.

  11. Structure and function of the first full-length murein peptide ligase (Mpl) cell wall recycling protein.

    Science.gov (United States)

    Das, Debanu; Hervé, Mireille; Feuerhelm, Julie; Farr, Carol L; Chiu, Hsiu-Ju; Elsliger, Marc-André; Knuth, Mark W; Klock, Heath E; Miller, Mitchell D; Godzik, Adam; Lesley, Scott A; Deacon, Ashley M; Mengin-Lecreulx, Dominique; Wilson, Ian A

    2011-03-18

    Bacterial cell walls contain peptidoglycan, an essential polymer made by enzymes in the Mur pathway. These proteins are specific to bacteria, which make them targets for drug discovery. MurC, MurD, MurE and MurF catalyze the synthesis of the peptidoglycan precursor UDP-N-acetylmuramoyl-L-alanyl-γ-D-glutamyl-meso-diaminopimelyl-D-alanyl-D-alanine by the sequential addition of amino acids onto UDP-N-acetylmuramic acid (UDP-MurNAc). MurC-F enzymes have been extensively studied by biochemistry and X-ray crystallography. In gram-negative bacteria, ∼30-60% of the bacterial cell wall is recycled during each generation. Part of this recycling process involves the murein peptide ligase (Mpl), which attaches the breakdown product, the tripeptide L-alanyl-γ-D-glutamyl-meso-diaminopimelate, to UDP-MurNAc. We present the crystal structure at 1.65 Å resolution of a full-length Mpl from the permafrost bacterium Psychrobacter arcticus 273-4 (PaMpl). Although the Mpl structure has similarities to Mur enzymes, it has unique sequence and structure features that are likely related to its role in cell wall recycling, a function that differentiates it from the MurC-F enzymes. We have analyzed the sequence-structure relationships that are unique to Mpl proteins and compared them to MurC-F ligases. We have also characterized the biochemical properties of this enzyme (optimal temperature, pH and magnesium binding profiles and kinetic parameters). Although the structure does not contain any bound substrates, we have identified ∼30 residues that are likely to be important for recognition of the tripeptide and UDP-MurNAc substrates, as well as features that are unique to Psychrobacter Mpl proteins. These results provide the basis for future mutational studies for more extensive function characterization of the Mpl sequence-structure relationships.

  12. Evaluation of full-length, cleaved and nitrosylated serum surfactant protein D as biomarkers for COPD

    DEFF Research Database (Denmark)

    Duvoix, Annelyse; Miranda, Elena; Perez, Juan

    2011-01-01

    . Serum levels of SP-D are raised in individuals with COPD but there is no correlation between the serum level of SP-D and the severity of airflow obstruction. Serum SP-D is present in different forms that may have more utility as a biomarker for COPD. We report here the development of new monoclonal...... antibodies to full length and cleaved SP-D. We have assessed these and existing antibodies in 98 individuals with COPD recruited to the Evaluation of COPD Longitudinally to Identify Predictive Surrogate Endpoints (ECLIPSE) cohort. Our data show that neither monoclonal antibodies to full length nor cleaved SP...

  13. An Explicit Construction of a sequence of codes attaining the Tsfasman-Vladut-Zink Bound:The first steps

    DEFF Research Database (Denmark)

    Høholdt, Tom; Voss, Cornelia

    1997-01-01

    We present a sequence of codes attaining the Tsfasman-Vladut-Zink bound. The construction is based on the tower of Artin-Schreier extensions described by Garcia and Stichtenoth (1995). We also determine the dual codes. The first steps of the constructions are explicitly given as generator matrices...

  14. Molecular characterization, sequence analysis and tissue expression of a porcine gene – MOSPD2

    Directory of Open Access Journals (Sweden)

    Yang Jie

    2017-01-01

    Full Text Available The full-length cDNA sequence of a porcine gene, MOSPD2, was amplified using the rapid amplification of cDNA ends method based on a pig expressed sequence tag sequence which was highly homologous to the coding sequence of the human MOSPD2 gene. Sequence prediction analysis revealed that the open reading frame of this gene encodes a protein of 491 amino acids that has high homology with the motile sperm domain-containing protein 2 (MOSPD2 of five species: horse (89%, human (90%, chimpanzee (89%, rhesus monkey (89% and mouse (85%; thus, it could be defined as a porcine MOSPD2 gene. This novel porcine gene was assigned GeneID: 100153601. This gene is structured in 15 exons and 14 introns as revealed by computer-assisted analysis. The phylogenetic analysis revealed that the porcine MOSPD2 gene has a closer genetic relationship with the MOSPD2 gene of horse. Tissue expression analysis indicated that the porcine MOSPD2 gene is generally and differentially expressed in the spleen, muscle, skin, kidney, lung, liver, fat and heart. Our experiment is the first to establish the primary foundation for further research on the porcine MOSPD2 gene.

  15. CpGAVAS, an integrated web server for the annotation, visualization, analysis, and GenBank submission of completely sequenced chloroplast genome sequences

    Directory of Open Access Journals (Sweden)

    Liu Chang

    2012-12-01

    Full Text Available Abstract Background The complete sequences of chloroplast genomes provide wealthy information regarding the evolutionary history of species. With the advance of next-generation sequencing technology, the number of completely sequenced chloroplast genomes is expected to increase exponentially, powerful computational tools annotating the genome sequences are in urgent need. Results We have developed a web server CPGAVAS. The server accepts a complete chloroplast genome sequence as input. First, it predicts protein-coding and rRNA genes based on the identification and mapping of the most similar, full-length protein, cDNA and rRNA sequences by integrating results from Blastx, Blastn, protein2genome and est2genome programs. Second, tRNA genes and inverted repeats (IR are identified using tRNAscan, ARAGORN and vmatch respectively. Third, it calculates the summary statistics for the annotated genome. Fourth, it generates a circular map ready for publication. Fifth, it can create a Sequin file for GenBank submission. Last, it allows the extractions of protein and mRNA sequences for given list of genes and species. The annotation results in GFF3 format can be edited using any compatible annotation editing tools. The edited annotations can then be uploaded to CPGAVAS for update and re-analyses repeatedly. Using known chloroplast genome sequences as test set, we show that CPGAVAS performs comparably to another application DOGMA, while having several superior functionalities. Conclusions CPGAVAS allows the semi-automatic and complete annotation of a chloroplast genome sequence, and the visualization, editing and analysis of the annotation results. It will become an indispensible tool for researchers studying chloroplast genomes. The software is freely accessible from http://www.herbalgenomics.org/cpgavas.

  16. Insertion of Introns: A Strategy to Facilitate Assembly of Infectious Full Length Clones

    DEFF Research Database (Denmark)

    Johansen, Ida Elisabeth; Lund, Ole Søgaard

    2008-01-01

    Some DNA fragments are difficult to clone in Escherichia coli by standard methods. It has been speculated that unintended transcription and translation result in expression of proteins that are toxic to the bacteria. This problem is frequently observed during assembly of infectious full-length vi...

  17. Is a genome a codeword of an error-correcting code?

    Directory of Open Access Journals (Sweden)

    Luzinete C B Faria

    Full Text Available Since a genome is a discrete sequence, the elements of which belong to a set of four letters, the question as to whether or not there is an error-correcting code underlying DNA sequences is unavoidable. The most common approach to answering this question is to propose a methodology to verify the existence of such a code. However, none of the methodologies proposed so far, although quite clever, has achieved that goal. In a recent work, we showed that DNA sequences can be identified as codewords in a class of cyclic error-correcting codes known as Hamming codes. In this paper, we show that a complete intron-exon gene, and even a plasmid genome, can be identified as a Hamming code codeword as well. Although this does not constitute a definitive proof that there is an error-correcting code underlying DNA sequences, it is the first evidence in this direction.

  18. Coding sequence of human rho cDNAs clone 6 and clone 9

    Energy Technology Data Exchange (ETDEWEB)

    Chardin, P; Madaule, P; Tavitian, A

    1988-03-25

    The authors have isolated human cDNAs including the complete coding sequence for two rho proteins corresponding to the incomplete isolates previously described as clone 6 and clone 9. The deduced a.a. sequences, when compared to the a.a. sequence deduced from clone 12 cDNA, show that there are in human at least three highly homologous rho genes. They suggest that clone 12 be named rhoA, clone 6 : rhoB and clone 9 : rhoC. RhoA, B and C proteins display approx. 30% a.a. identity with ras proteins,. mainly clustered in four highly homologous internal regions corresponding to the GTP binding site; however at least one significant difference is found; the 3 rho proteins have an Alanine in position corresponding to ras Glycine 13, suggesting that rho and ras proteins might have slightly different biochemical properties.

  19. Application of Melcor code for the calculo of TMLB sequence in PWR with natural circulating into the vessel

    International Nuclear Information System (INIS)

    Marten-Fuertes, F.

    1995-01-01

    The use of computer codes to analyze the phenomena of severe accidents is very important to take decisions in Nuclear Safety. This paper presents the MELCOR code used to calculate the TMLB sequence of PWR with natural circulation into the vessels. The main goal of this code is its application for the PSA (probabilistic safety analysis)

  20. Impaired heat shock response in cells expressing full-length polyglutamine-expanded huntingtin.

    Directory of Open Access Journals (Sweden)

    Sidhartha M Chafekar

    Full Text Available The molecular mechanisms by which polyglutamine (polyQ-expanded huntingtin (Htt causes neurodegeneration in Huntington's disease (HD remain unclear. The malfunction of cellular proteostasis has been suggested as central in HD pathogenesis and also as a target of therapeutic interventions for the treatment of HD. We present results that offer a previously unexplored perspective regarding impaired proteostasis in HD. We find that, under non-stress conditions, the proteostatic capacity of cells expressing full length polyQ-expanded Htt is adequate. Yet, under stress conditions, the presence of polyQ-expanded Htt impairs the heat shock response, a key component of cellular proteostasis. This impaired heat shock response results in a reduced capacity to withstand the damage caused by cellular stress. We demonstrate that in cells expressing polyQ-expanded Htt the levels of heat shock transcription factor 1 (HSF1 are reduced, and, as a consequence, these cells have an impaired a heat shock response. Also, we found reduced HSF1 and HSP70 levels in the striata of HD knock-in mice when compared to wild-type mice. Our results suggests that full length, non-aggregated polyQ-expanded Htt blocks the effective induction of the heat shock response under stress conditions and may thus trigger the accumulation of cellular damage during the course of HD pathogenesis.

  1. On low-complexity full-diversity detection of multi-user space-time coding

    KAUST Repository

    Ismail, Amr

    2013-06-01

    The incorporation of multiple input multiple output (MIMO) schemes in recent wireless communication standards paved the way to exploit the newly introduced dimension (i.e. space) to efficiently cancel the interference without requiring additional resources. In this paper, we focus on multiple input multiple ouitput (MIMO) multiple access channel (MAC) case and we answer the question about whether it is possible to suppress the interference in a MIMO MAC channel for completely blind users while achieving full-diversity with a simplified decoder in the affirmative. In fact, this goal can be attained by employing space-time block codes (STBC)s that achive full-diversity under partial interference cancellation (PIC). We derive sufficient conditions for a wide range of STBCs to achieve full-diversity under PIC group decoding with or without successive interference cancellation (SIC). Based on the provided design criteria we derive an upper-bound on the achievable rate for a class of codes. A two-user MIMO MAC interference cancellation scheme is presented and proved to achieve full-diversity under PIC group decoding. We compare our scheme to existing beamforming schemes with full or limit feedback. © 2013 IEEE.

  2. The complete mitochondrial genome sequence of Diaphorina citri (Hemiptera: Psyllidae)

    Science.gov (United States)

    The first complete mitochondrial genome (mitogenome) sequence of Asian citrus psyllid, Diaphorina citri (Hemiptera: Psyllidae), from Guangzhou, China is presented. The circular mitogenome is 14,996 bp in length with an A+T content of 74.5%, and contains 13 protein-coding genes (PCGs), 22 tRNA genes ...

  3. Generalized concatenated quantum codes

    International Nuclear Information System (INIS)

    Grassl, Markus; Shor, Peter; Smith, Graeme; Smolin, John; Zeng Bei

    2009-01-01

    We discuss the concept of generalized concatenated quantum codes. This generalized concatenation method provides a systematical way for constructing good quantum codes, both stabilizer codes and nonadditive codes. Using this method, we construct families of single-error-correcting nonadditive quantum codes, in both binary and nonbinary cases, which not only outperform any stabilizer codes for finite block length but also asymptotically meet the quantum Hamming bound for large block length.

  4. Performance of initial full-length RHIC [Relativistic Heavy Ion Collider] dipoles

    International Nuclear Information System (INIS)

    Dahl, P.; Cottingham, J.; Garber, M.

    1987-01-01

    The first four full-length (9.7 m) R and D dipoles for the proposed Relativistic Heavy Ion Collider (RHIC) have been successfully tested. The magnets reached a quench plateau of approximately 4.5 T with very reasonable training - a field level comfortably above the design field of 3.45 T required for operation with beams of 100 GeV/amu gold nuclei. Measured field multipoles are considered to be quite acceptable for this series of R and D magnets

  5. Identifying novel genes in C. elegans using SAGE tags

    Directory of Open Access Journals (Sweden)

    Chen Nansheng

    2010-12-01

    Full Text Available Abstract Background Despite extensive efforts devoted to predicting protein-coding genes in genome sequences, many bona fide genes have not been found and many existing gene models are not accurate in all sequenced eukaryote genomes. This situation is partly explained by the fact that gene prediction programs have been developed based on our incomplete understanding of gene feature information such as splicing and promoter characteristics. Additionally, full-length cDNAs of many genes and their isoforms are hard to obtain due to their low level or rare expression. In order to obtain full-length sequences of all protein-coding genes, alternative approaches are required. Results In this project, we have developed a method of reconstructing full-length cDNA sequences based on short expressed sequence tags which is called sequence tag-based amplification of cDNA ends (STACE. Expressed tags are used as anchors for retrieving full-length transcripts in two rounds of PCR amplification. We have demonstrated the application of STACE in reconstructing full-length cDNA sequences using expressed tags mined in an array of serial analysis of gene expression (SAGE of C. elegans cDNA libraries. We have successfully applied STACE to recover sequence information for 12 genes, for two of which we found isoforms. STACE was used to successfully recover full-length cDNA sequences for seven of these genes. Conclusions The STACE method can be used to effectively reconstruct full-length cDNA sequences of genes that are under-represented in cDNA sequencing projects and have been missed by existing gene prediction methods, but their existence has been suggested by short sequence tags such as SAGE tags.

  6. Influence of Code Size Variation on the Performance of 2D Hybrid ZCC/MD in OCDMA System

    Directory of Open Access Journals (Sweden)

    Matem Rima.

    2018-01-01

    Full Text Available Several two dimensional OCDMA have been developed in order to overcome many problems in optical network, enhancing cardinality, suppress Multiple Access Interference (MAI and mitigate Phase Induced Intensity Noise (PIIN. This paper propose a new 2D hybrid ZCC/MD code combining between 1D ZCC spectral encoding where M is its code length and 1D MD spatial spreading where N is its code length. The spatial spreading (N code length offers a good cardinality so it represents the main effect to enhance the performance of the system compared to the spectral (M code length according to the numerical results.

  7. Joint Source-Channel Coding by Means of an Oversampled Filter Bank Code

    Directory of Open Access Journals (Sweden)

    Marinkovic Slavica

    2006-01-01

    Full Text Available Quantized frame expansions based on block transforms and oversampled filter banks (OFBs have been considered recently as joint source-channel codes (JSCCs for erasure and error-resilient signal transmission over noisy channels. In this paper, we consider a coding chain involving an OFB-based signal decomposition followed by scalar quantization and a variable-length code (VLC or a fixed-length code (FLC. This paper first examines the problem of channel error localization and correction in quantized OFB signal expansions. The error localization problem is treated as an -ary hypothesis testing problem. The likelihood values are derived from the joint pdf of the syndrome vectors under various hypotheses of impulse noise positions, and in a number of consecutive windows of the received samples. The error amplitudes are then estimated by solving the syndrome equations in the least-square sense. The message signal is reconstructed from the corrected received signal by a pseudoinverse receiver. We then improve the error localization procedure by introducing a per-symbol reliability information in the hypothesis testing procedure of the OFB syndrome decoder. The per-symbol reliability information is produced by the soft-input soft-output (SISO VLC/FLC decoders. This leads to the design of an iterative algorithm for joint decoding of an FLC and an OFB code. The performance of the algorithms developed is evaluated in a wavelet-based image coding system.

  8. DNA barcode goes two-dimensions: DNA QR code web server.

    Directory of Open Access Journals (Sweden)

    Chang Liu

    Full Text Available The DNA barcoding technology uses a standard region of DNA sequence for species identification and discovery. At present, "DNA barcode" actually refers to DNA sequences, which are not amenable to information storage, recognition, and retrieval. Our aim is to identify the best symbology that can represent DNA barcode sequences in practical applications. A comprehensive set of sequences for five DNA barcode markers ITS2, rbcL, matK, psbA-trnH, and CO1 was used as the test data. Fifty-three different types of one-dimensional and ten two-dimensional barcode symbologies were compared based on different criteria, such as coding capacity, compression efficiency, and error detection ability. The quick response (QR code was found to have the largest coding capacity and relatively high compression ratio. To facilitate the further usage of QR code-based DNA barcodes, a web server was developed and is accessible at http://qrfordna.dnsalias.org. The web server allows users to retrieve the QR code for a species of interests, convert a DNA sequence to and from a QR code, and perform species identification based on local and global sequence similarities. In summary, the first comprehensive evaluation of various barcode symbologies has been carried out. The QR code has been found to be the most appropriate symbology for DNA barcode sequences. A web server has also been constructed to allow biologists to utilize QR codes in practical DNA barcoding applications.

  9. Transcriptome Profiling Using Single-Molecule Direct RNA Sequencing Approach for In-depth Understanding of Genes in Secondary Metabolism Pathways of Camellia sinensis

    Directory of Open Access Journals (Sweden)

    Qingshan Xu

    2017-07-01

    Full Text Available Characteristic secondary metabolites, including flavonoids, theanine and caffeine, are important components of Camellia sinensis, and their biosynthesis has attracted widespread interest. Previous studies on the biosynthesis of these major secondary metabolites using next-generation sequencing technologies limited the accurately prediction of full-length (FL splice isoforms. Herein, we applied single-molecule sequencing to pooled tea plant tissues, to provide a more complete transcriptome of C. sinensis. Moreover, we identified 94 FL transcripts and four alternative splicing events for enzyme-coding genes involved in the biosynthesis of flavonoids, theanine and caffeine. According to the comparison between long-read isoforms and assemble transcripts, we improved the quality and accuracy of genes sequenced by short-read next-generation sequencing technology. The resulting FL transcripts, together with the improved assembled transcripts and identified alternative splicing events, enhance our understanding of genes involved in the biosynthesis of characteristic secondary metabolites in C. sinensis.

  10. Application and analysis of performance of dqpsk advanced modulation format in spectral amplitude coding ocdma

    International Nuclear Information System (INIS)

    Memon, A.

    2015-01-01

    SAC (Spectral Amplitude Coding) is a technique of OCDMA (Optical Code Division Multiple Access) to encode and decode data bits by utilizing spectral components of the broadband source. Usually OOK (ON-Off-Keying) modulation format is used in this encoding scheme. To make SAC OCDMA network spectrally efficient, advanced modulation format of DQPSK (Differential Quaternary Phase Shift Keying) is applied, simulated and analyzed, m-sequence code is encoded in the simulated setup. Performance regarding various lengths of m-sequence code is also analyzed and displayed in the pictorial form. The results of the simulation are evaluated with the help of electrical constellation diagram, eye diagram and bit error rate graph. All the graphs indicate better transmission quality in case of advanced modulation format of DQPSK used in SAC OCDMA network as compared with OOK. (author)

  11. Improvement of Secret Image Invisibility in Circulation Image with Dyadic Wavelet Based Data Hiding with Run-Length Coded Secret Images of Which Location of Codes are Determined with Random Number

    OpenAIRE

    Kohei Arai; Yuji Yamada

    2011-01-01

    An attempt is made for improvement of secret image invisibility in circulation images with dyadic wavelet based data hiding with run-length coded secret images of which location of codes are determined by random number. Through experiments, it is confirmed that secret images are almost invisible in circulation images. Also robustness of the proposed data hiding method against data compression of circulation images is discussed. Data hiding performance in terms of invisibility of secret images...

  12. Design of Long Period Pseudo-Random Sequences from the Addition of -Sequences over

    Directory of Open Access Journals (Sweden)

    Ren Jian

    2004-01-01

    Full Text Available Pseudo-random sequence with good correlation property and large linear span is widely used in code division multiple access (CDMA communication systems and cryptology for reliable and secure information transmission. In this paper, sequences with long period, large complexity, balance statistics, and low cross-correlation property are constructed from the addition of -sequences with pairwise-prime linear spans (AMPLS. Using -sequences as building blocks, the proposed method proved to be an efficient and flexible approach to construct long period pseudo-random sequences with desirable properties from short period sequences. Applying the proposed method to , a signal set is constructed.

  13. Low Complexity List Decoding for Polar Codes with Multiple CRC Codes

    Directory of Open Access Journals (Sweden)

    Jong-Hwan Kim

    2017-04-01

    Full Text Available Polar codes are the first family of error correcting codes that provably achieve the capacity of symmetric binary-input discrete memoryless channels with low complexity. Since the development of polar codes, there have been many studies to improve their finite-length performance. As a result, polar codes are now adopted as a channel code for the control channel of 5G new radio of the 3rd generation partnership project. However, the decoder implementation is one of the big practical problems and low complexity decoding has been studied. This paper addresses a low complexity successive cancellation list decoding for polar codes utilizing multiple cyclic redundancy check (CRC codes. While some research uses multiple CRC codes to reduce memory and time complexity, we consider the operational complexity of decoding, and reduce it by optimizing CRC positions in combination with a modified decoding operation. Resultingly, the proposed scheme obtains not only complexity reduction from early stopping of decoding, but also additional reduction from the reduced number of decoding paths.

  14. Molecular Cloning and Characterization of Full-Length cDNA of Calmodulin Gene from Pacific Oyster Crassostrea gigas

    Directory of Open Access Journals (Sweden)

    Xing-Xia Li

    2016-01-01

    Full Text Available The shell of the pearl oyster (Pinctada fucata mainly comprises aragonite whereas that of the Pacific oyster (Crassostrea gigas is mainly calcite, thereby suggesting the different mechanisms of shell formation between above two mollusks. Calmodulin (CaM is an important gene for regulating the uptake, transport, and secretion of calcium during the process of shell formation in pearl oyster. It is interesting to characterize the CaM in oysters, which could facilitate the understanding of the different shell formation mechanisms among mollusks. We cloned the full-length cDNA of Pacific oyster CaM (cgCaM and found that the cgCaM ORF encoded a peptide of 113 amino acids containing three EF-hand calcium-binding domains, its expression level was highest in the mantle, hinting that the cgCaM gene is probably involved in shell formation of Pacific oyster, and the common ancestor of Gastropoda and Bivalvia may possess at least three CaM genes. We also found that the numbers of some EF hand family members in highly calcified species were higher than those in lowly calcified species and the numbers of these motifs in oyster genome were the highest among the mollusk species with whole genome sequence, further hinting the correlation between CaM and biomineralization.

  15. Episodic sequence memory is supported by a theta-gamma phase code

    OpenAIRE

    Heusser, Andrew C.; Poeppel, David; Ezzyat, Youssef; Davachi, Lila

    2016-01-01

    The meaning we derive from our experiences is not a simple static extraction of the elements, but is largely based on the order in which those elements occur. Models propose that sequence encoding is supported by interactions between high and low frequency oscillations, such that elements within an experience are represented by neural cell assemblies firing at higher frequencies (i.e. gamma) and sequential order is coded by the specific timing of firing with respect to a lower frequency oscil...

  16. Automated degenerate PCR primer design for high-throughput sequencing improves efficiency of viral sequencing

    Directory of Open Access Journals (Sweden)

    Li Kelvin

    2012-11-01

    Full Text Available Abstract Background In a high-throughput environment, to PCR amplify and sequence a large set of viral isolates from populations that are potentially heterogeneous and continuously evolving, the use of degenerate PCR primers is an important strategy. Degenerate primers allow for the PCR amplification of a wider range of viral isolates with only one set of pre-mixed primers, thus increasing amplification success rates and minimizing the necessity for genome finishing activities. To successfully select a large set of degenerate PCR primers necessary to tile across an entire viral genome and maximize their success, this process is best performed computationally. Results We have developed a fully automated degenerate PCR primer design system that plays a key role in the J. Craig Venter Institute’s (JCVI high-throughput viral sequencing pipeline. A consensus viral genome, or a set of consensus segment sequences in the case of a segmented virus, is specified using IUPAC ambiguity codes in the consensus template sequence to represent the allelic diversity of the target population. PCR primer pairs are then selected computationally to produce a minimal amplicon set capable of tiling across the full length of the specified target region. As part of the tiling process, primer pairs are computationally screened to meet the criteria for successful PCR with one of two described amplification protocols. The actual sequencing success rates for designed primers for measles virus, mumps virus, human parainfluenza virus 1 and 3, human respiratory syncytial virus A and B and human metapneumovirus are described, where >90% of designed primer pairs were able to consistently successfully amplify >75% of the isolates. Conclusions Augmenting our previously developed and published JCVI Primer Design Pipeline, we achieved similarly high sequencing success rates with only minor software modifications. The recommended methodology for the construction of the consensus

  17. An accurate evaluation of the performance of asynchronous DS-CDMA systems with zero-correlation-zone coding in Rayleigh fading

    Science.gov (United States)

    Walker, Ernest; Chen, Xinjia; Cooper, Reginald L.

    2010-04-01

    An arbitrarily accurate approach is used to determine the bit-error rate (BER) performance for generalized asynchronous DS-CDMA systems, in Gaussian noise with Raleigh fading. In this paper, and the sequel, new theoretical work has been contributed which substantially enhances existing performance analysis formulations. Major contributions include: substantial computational complexity reduction, including a priori BER accuracy bounding; an analytical approach that facilitates performance evaluation for systems with arbitrary spectral spreading distributions, with non-uniform transmission delay distributions. Using prior results, augmented by these enhancements, a generalized DS-CDMA system model is constructed and used to evaluated the BER performance, in a variety of scenarios. In this paper, the generalized system modeling was used to evaluate the performance of both Walsh- Hadamard (WH) and Walsh-Hadamard-seeded zero-correlation-zone (WH-ZCZ) coding. The selection of these codes was informed by the observation that WH codes contain N spectral spreading values (0 to N - 1), one for each code sequence; while WH-ZCZ codes contain only two spectral spreading values (N/2 - 1,N/2); where N is the sequence length in chips. Since these codes span the spectral spreading range for DS-CDMA coding, by invoking an induction argument, the generalization of the system model is sufficiently supported. The results in this paper, and the sequel, support the claim that an arbitrary accurate performance analysis for DS-CDMA systems can be evaluated over the full range of binary coding, with minimal computational complexity.

  18. A STUDY ON DETERMINING THE REFERENCE SPREADING SEQUENCES FOR A DS/CDMACOMMUNICATION SYSTEM

    Directory of Open Access Journals (Sweden)

    Cebrail ÇİFTLİKLİ

    2002-02-01

    Full Text Available In a direct sequence/code division multiple access (DS/CDMA system, the role of the spreading sequences (codes is crucial since the multiple access interference (MAI is the main performance limitation. In this study, we propose an accurate criterion which enables the determination of the reference spreading codes which yield lower bit error rates (BER's in a given code set for a DS/CDMA system using despreading sequences weighted by stepping chip waveforms. The numerical results show that the spreading codes determined by the proposed criterion are the most suitable codes for using as references.

  19. Full-length characterization of A1/D intersubtype recombinant genomes from a therapy-induced HIV type 1 controller during acute infection and his noncontrolling partner

    DEFF Research Database (Denmark)

    Fomsgaard, A.; Vinner, L.; Therrien, D.

    2008-01-01

    To increase the understanding of mechanisms of HIV control we have genetically and immunologically characterized a full-length HIV-1 isolated from an acute infection in a rare case of undetectable viremia. The subject, a 43-year-old Danish white male (DK1), was diagnosed with acute HIV-1 infection...... and phylogenic trees were constructed and diversity and evolutionary distances were calculated. Intracellular IFN-gamma in CD8(+)CD3(+) T-lymphocyte reactions was investigated by intracellular flow cytometry (IC-FACS). Virus isolates from both patients were A1D intersubtype recombinants showing 98% sequence...

  20. Gene organization in rice revealed by full-length cDNA mapping and gene expression analysis through microarray.

    Directory of Open Access Journals (Sweden)

    Kouji Satoh

    Full Text Available Rice (Oryza sativa L. is a model organism for the functional genomics of monocotyledonous plants since the genome size is considerably smaller than those of other monocotyledonous plants. Although highly accurate genome sequences of indica and japonica rice are available, additional resources such as full-length complementary DNA (FL-cDNA sequences are also indispensable for comprehensive analyses of gene structure and function. We cross-referenced 28.5K individual loci in the rice genome defined by mapping of 578K FL-cDNA clones with the 56K loci predicted in the TIGR genome assembly. Based on the annotation status and the presence of corresponding cDNA clones, genes were classified into 23K annotated expressed (AE genes, 33K annotated non-expressed (ANE genes, and 5.5K non-annotated expressed (NAE genes. We developed a 60mer oligo-array for analysis of gene expression from each locus. Analysis of gene structures and expression levels revealed that the general features of gene structure and expression of NAE and ANE genes were considerably different from those of AE genes. The results also suggested that the cloning efficiency of rice FL-cDNA is associated with the transcription activity of the corresponding genetic locus, although other factors may also have an effect. Comparison of the coverage of FL-cDNA among gene families suggested that FL-cDNA from genes encoding rice- or eukaryote-specific domains, and those involved in regulatory functions were difficult to produce in bacterial cells. Collectively, these results indicate that rice genes can be divided into distinct groups based on transcription activity and gene structure, and that the coverage bias of FL-cDNA clones exists due to the incompatibility of certain eukaryotic genes in bacteria.

  1. Full-length sequencing and identification of novel polymorphisms in ...

    Indian Academy of Sciences (India)

    Rosalia Di Gerlando

    2017-08-16

    Aug 16, 2017 ... ANNA MARIA SUTERA, MARIA TERESA SARDINA. ∗ ... SNPs that might be important in future studies and laid the basis for further association analyses needed to ..... Haplotype-based analysis can provide higher power,.

  2. Minimizing N2O emissions and carbon footprint on a full-scale activated sludge sequencing batch reactor.

    Science.gov (United States)

    Rodriguez-Caballero, A; Aymerich, I; Marques, Ricardo; Poch, M; Pijuan, M

    2015-03-15

    A continuous, on-line quantification of the nitrous oxide (N2O) emissions from a full-scale sequencing batch reactor (SBR) placed in a municipal wastewater treatment plant (WWTP) was performed in this study. In general, N2O emissions from the biological wastewater treatment system were 97.1 ± 6.9 g N2O-N/Kg [Formula: see text] consumed or 6.8% of the influent [Formula: see text] load. In the WWTP of this study, N2O emissions accounted for over 60% of the total carbon footprint of the facility, on average. Different cycle configurations were implemented in the SBR aiming at reaching acceptable effluent values. Each cycle configuration consisted of sequences of aerated and non-aerated phases of different time length being controlled by the ammonium set-point fixed. Cycles with long aerated phases showed the largest N2O emissions, with the consequent increase in carbon footprint. Cycle configurations with intermittent aeration (aerated phases up to 20-30 min followed by short anoxic phases) were proven to effectively reduce N2O emissions, without compromising nitrification performance or increasing electricity consumption. This is the first study in which a successful operational strategy for N2O mitigation is identified at full-scale. Copyright © 2014 Elsevier Ltd. All rights reserved.

  3. Code-Switching to Know a TL Equivalent of an L1 Word: Request-Provision-Acknowledgement (RPA) Sequence

    Science.gov (United States)

    Lucero, Edgar

    2011-01-01

    This article focuses on the learner's use of Code-switching to learn the TL (Target Language) equivalent of an L1 word. The interactional pattern that this situation creates defines the Request-Provision-Acknowledgement (RPA) sequence. The article explains each of the turns of the sequence under the combination of the Ethnomethodological…

  4. Sequence-based heuristics for faster annotation of non-coding RNA families.

    Science.gov (United States)

    Weinberg, Zasha; Ruzzo, Walter L

    2006-01-01

    Non-coding RNAs (ncRNAs) are functional RNA molecules that do not code for proteins. Covariance Models (CMs) are a useful statistical tool to find new members of an ncRNA gene family in a large genome database, using both sequence and, importantly, RNA secondary structure information. Unfortunately, CM searches are extremely slow. Previously, we created rigorous filters, which provably sacrifice none of a CM's accuracy, while making searches significantly faster for virtually all ncRNA families. However, these rigorous filters make searches slower than heuristics could be. In this paper we introduce profile HMM-based heuristic filters. We show that their accuracy is usually superior to heuristics based on BLAST. Moreover, we compared our heuristics with those used in tRNAscan-SE, whose heuristics incorporate a significant amount of work specific to tRNAs, where our heuristics are generic to any ncRNA. Performance was roughly comparable, so we expect that our heuristics provide a high-quality solution that--unlike family-specific solutions--can scale to hundreds of ncRNA families. The source code is available under GNU Public License at the supplementary web site.

  5. Structured LDPC Codes over Integer Residue Rings

    Directory of Open Access Journals (Sweden)

    Mo Elisa

    2008-01-01

    Full Text Available Abstract This paper presents a new class of low-density parity-check (LDPC codes over represented by regular, structured Tanner graphs. These graphs are constructed using Latin squares defined over a multiplicative group of a Galois ring, rather than a finite field. Our approach yields codes for a wide range of code rates and more importantly, codes whose minimum pseudocodeword weights equal their minimum Hamming distances. Simulation studies show that these structured codes, when transmitted using matched signal sets over an additive-white-Gaussian-noise channel, can outperform their random counterparts of similar length and rate.

  6. Complete sequences of the mitochondrial DNA of the wild Gracilariopsis lemaneiformis and two mutagenic cultivated breeds (Gracilariaceae, Rhodophyta.

    Directory of Open Access Journals (Sweden)

    Lei Zhang

    Full Text Available The complete mitochondrial DNA (mtDNA of Gracilariopsis lemaneiformis was sequenced (25883 bp and mapped to a circular model. The A+T composition was 72.5%. Forty six genes and two potentially functional open reading frames were identified. They include 24 protein-coding genes, 2 rRNA genes, 20 tRNA genes and 2 ORFs (orf60, orf142. There is considerable sequence synteny across the five red algal mtDNAs falling into Florideophyceae including Gr. lemaneiformis in this study and previously sequenced species. A long stem-loop and a hairpin structure were identified in intergenic regions of mt genome of Gr. lemaneiformis, which are believed to be involved with transcription and replication. In addition, the mtDNAs of two mutagenic cultivated breeds ("981" and "07-2" were also sequenced. Compared with the mtDNA of wild Gr. lemaneiformis, the genome size and gene length and order of three strains were completely identical except nine base mutations including eight in the protein-coding genes and one in the tRNA gene. None of the base mutations caused frameshift or a premature stop codon in the mtDNA genes. Phylogenetic analyses based on mitochondrial protein-coding genes and rRNA genes demonstrated Gracilariopsis andersonii had closer phylogenetic relationship with its parasite Gracilariophila oryzoides than Gracilariopsis lemaneiformis which was from the same genus of Gracilariopsis.

  7. Revised genomic consensus for the hypermethylated CpG island region of the human L1 transposon and integration sites of full length L1 elements from recombinant clones made using methylation-tolerant host strains

    DEFF Research Database (Denmark)

    Crowther, P J; Doherty, J P; Linsenmeyer, M E

    1991-01-01

    preferentially from L1 members which have accumulated mutations that have removed sites of methylation. We present a revised consensus from the 5' presumptive control region of these elements. This revised consensus contains a consensus RNA polymerase III promoter which would permit the synthesis of transcripts......Efficient recovery of clones from the 5' end of the human L1 dispersed repetitive elements necessitates the use of deletion mcr- host strains since this region contains a CpG island which is hypermethylated in vivo. Clones recovered with conventional mcr+ hosts seem to have been derived...... from the 5' end of full length L1 elements. Such potential transcripts are likely to exhibit a high degree of secondary structure. In addition, we have determined the flanking sequences for 6 full length L1 elements. The majority of full length L1 clones show no convincing evidence for target site...

  8. CloudAligner: A fast and full-featured MapReduce based tool for sequence mapping

    Directory of Open Access Journals (Sweden)

    Shi Weisong

    2011-06-01

    Full Text Available Abstract Background Research in genetics has developed rapidly recently due to the aid of next generation sequencing (NGS. However, massively-parallel NGS produces enormous amounts of data, which leads to storage, compatibility, scalability, and performance issues. The Cloud Computing and MapReduce framework, which utilizes hundreds or thousands of shared computers to map sequencing reads quickly and efficiently to reference genome sequences, appears to be a very promising solution for these issues. Consequently, it has been adopted by many organizations recently, and the initial results are very promising. However, since these are only initial steps toward this trend, the developed software does not provide adequate primary functions like bisulfite, pair-end mapping, etc., in on-site software such as RMAP or BS Seeker. In addition, existing MapReduce-based applications were not designed to process the long reads produced by the most recent second-generation and third-generation NGS instruments and, therefore, are inefficient. Last, it is difficult for a majority of biologists untrained in programming skills to use these tools because most were developed on Linux with a command line interface. Results To urge the trend of using Cloud technologies in genomics and prepare for advances in second- and third-generation DNA sequencing, we have built a Hadoop MapReduce-based application, CloudAligner, which achieves higher performance, covers most primary features, is more accurate, and has a user-friendly interface. It was also designed to be able to deal with long sequences. The performance gain of CloudAligner over Cloud-based counterparts (35 to 80% mainly comes from the omission of the reduce phase. In comparison to local-based approaches, the performance gain of CloudAligner is from the partition and parallel processing of the huge reference genome as well as the reads. The source code of CloudAligner is available at http

  9. Amplification and pyrosequencing of near-full-length hepatitis C virus for typing and monitoring antiviral resistant strains.

    Science.gov (United States)

    Trémeaux, P; Caporossi, A; Ramière, C; Santoni, E; Tarbouriech, N; Thélu, M-A; Fusillier, K; Geneletti, L; François, O; Leroy, V; Burmeister, W P; André, P; Morand, P; Larrat, S

    2016-05-01

    Directly acting antiviral drugs have contributed considerable progress to hepatitis C virus (HCV) treatment, but they show variable activity depending on virus genotypes and subtypes. Therefore, accurate genotyping including recombinant form detection is still of major importance, as is the detection of resistance-associated mutations in case of therapeutic failure. To meet these goals, an approach to amplify the HCV near-complete genome with a single long-range PCR and sequence it with Roche GS Junior was developed. After optimization, the overall amplification success rate was 73% for usual genotypes (i.e. HCV 1a, 1b, 3a and 4a, 16/22) and 45% for recombinant forms RF_2k/1b (5/11). After pyrosequencing and subsequent de novo assembly, a near-full-length genomic consensus sequence was obtained for 19 of 21 samples. The genotype and subtype were confirmed by phylogenetic analysis for every sample, including the suspected recombinant forms. Resistance-associated mutations were detected in seven of 13 samples at baseline, in the NS3 (n = 3) or NS5A (n = 4) region. Of these samples, the treatment of one patient included daclatasvir, and that patient experienced a relapse. Virus sequences from pre- and posttreatment samples of four patients who experienced relapse after sofosbuvir-based therapy were compared: the selected variants seem too far from the NS5B catalytic site to be held responsible. Although tested on a limited set of samples and with technical improvements still necessary, this assay has proven to be successful for both genotyping and resistance-associated variant detection on several HCV types. Copyright © 2016 European Society of Clinical Microbiology and Infectious Diseases. Published by Elsevier Ltd. All rights reserved.

  10. Advantages of genome sequencing by long-read sequencer using SMRT technology in medical area.

    Science.gov (United States)

    Nakano, Kazuma; Shiroma, Akino; Shimoji, Makiko; Tamotsu, Hinako; Ashimine, Noriko; Ohki, Shun; Shinzato, Misuzu; Minami, Maiko; Nakanishi, Tetsuhiro; Teruya, Kuniko; Satou, Kazuhito; Hirano, Takashi

    2017-07-01

    PacBio RS II is the first commercialized third-generation DNA sequencer able to sequence a single molecule DNA in real-time without amplification. PacBio RS II's sequencing technology is novel and unique, enabling the direct observation of DNA synthesis by DNA polymerase. PacBio RS II confers four major advantages compared to other sequencing technologies: long read lengths, high consensus accuracy, a low degree of bias, and simultaneous capability of epigenetic characterization. These advantages surmount the obstacle of sequencing genomic regions such as high/low G+C, tandem repeat, and interspersed repeat regions. Moreover, PacBio RS II is ideal for whole genome sequencing, targeted sequencing, complex population analysis, RNA sequencing, and epigenetics characterization. With PacBio RS II, we have sequenced and analyzed the genomes of many species, from viruses to humans. Herein, we summarize and review some of our key genome sequencing projects, including full-length viral sequencing, complete bacterial genome and almost-complete plant genome assemblies, and long amplicon sequencing of a disease-associated gene region. We believe that PacBio RS II is not only an effective tool for use in the basic biological sciences but also in the medical/clinical setting.

  11. Genotypic characterization of Salmonella by multilocus sequence typing, pulsed-field gel electrophoresis and amplified fragment length polymorphism

    DEFF Research Database (Denmark)

    Torpdahl, Mia; Skov, Marianne N.; Sandvang, Dorthe

    2005-01-01

    subspecies enterica isolates. A total of 25 serotypes were investigated that had been isolated from humans or veterinary sources in Denmark between 1995 and 2001. All isolates were genotyped by multilocus sequence typing (MLST), pulsed-field gel electrophoresis (PFGE) and amplified fragment length...

  12. MicroRNA-200c modulates the expression of MUC4 and MUC16 by directly targeting their coding sequences in human pancreatic cancer.

    Directory of Open Access Journals (Sweden)

    Prakash Radhakrishnan

    Full Text Available Transmembrane mucins, MUC4 and MUC16 are associated with tumor progression and metastatic potential in human pancreatic adenocarcinoma. We discovered that miR-200c interacts with specific sequences within the coding sequence of MUC4 and MUC16 mRNAs, and evaluated the regulatory nature of this association. Pancreatic cancer cell lines S2.028 and T3M-4 transfected with miR-200c showed a 4.18 and 8.50 fold down regulation of MUC4 mRNA, and 4.68 and 4.82 fold down regulation of MUC16 mRNA compared to mock-transfected cells, respectively. A significant reduction of glycoprotein expression was also observed. These results indicate that miR-200c overexpression regulates MUC4 and MUC16 mucins in pancreatic cancer cells by directly targeting the mRNA coding sequence of each, resulting in reduced levels of MUC4 and MUC16 mRNA and protein. These data suggest that, in addition to regulating proteins that modulate EMT, miR-200c influences expression of cell surface mucins in pancreatic cancer.

  13. Sequence Coding and Search System for licensee event reports: user's guide. Volume 1, Revision 1

    International Nuclear Information System (INIS)

    Greene, N.M.; Mays, G.T.; Johnson, M.P.

    1985-04-01

    Operating experience data from nuclear power plants are essential for safety and reliability analyses, especially analyses of trends and patterns. The licensee event reports (LERs) that are submitted to the Nuclear Regulatory Commission (NRC) by the nuclear power plant utilities contain much of this data. The NRC's Office for Analysis and Evaluation of Operational Data (AEOD) has developed, under contract with NSIC, a system for codifying the events reported in the LERs. The primary objective of the Sequence Coding and Search System (SCSS) is to reduce the descriptive text of the LERs to coded sequences that are both computer-readable and computer-searchable. This system provides a structured format for detailed coding of component, system, and unit effects as well as personnel errors. The database contains all current LERs submitted by nuclear power plant utilities for events occurring since 1981 and is updated on a continual basis. This four volume report documents and describes SCSS in detail. Volume 1 is a User's Guide for searching the SCSS database. This volume contains updated material through February 1985 of the working version of ORNL/NSIC-223, Vol. 1

  14. Spike Code Flow in Cultured Neuronal Networks.

    Science.gov (United States)

    Tamura, Shinichi; Nishitani, Yoshi; Hosokawa, Chie; Miyoshi, Tomomitsu; Sawai, Hajime; Kamimura, Takuya; Yagi, Yasushi; Mizuno-Matsumoto, Yuko; Chen, Yen-Wei

    2016-01-01

    We observed spike trains produced by one-shot electrical stimulation with 8 × 8 multielectrodes in cultured neuronal networks. Each electrode accepted spikes from several neurons. We extracted the short codes from spike trains and obtained a code spectrum with a nominal time accuracy of 1%. We then constructed code flow maps as movies of the electrode array to observe the code flow of "1101" and "1011," which are typical pseudorandom sequence such as that we often encountered in a literature and our experiments. They seemed to flow from one electrode to the neighboring one and maintained their shape to some extent. To quantify the flow, we calculated the "maximum cross-correlations" among neighboring electrodes, to find the direction of maximum flow of the codes with lengths less than 8. Normalized maximum cross-correlations were almost constant irrespective of code. Furthermore, if the spike trains were shuffled in interval orders or in electrodes, they became significantly small. Thus, the analysis suggested that local codes of approximately constant shape propagated and conveyed information across the network. Hence, the codes can serve as visible and trackable marks of propagating spike waves as well as evaluating information flow in the neuronal network.

  15. On low-complexity full-diversity detection of multi-user space-time coding

    KAUST Repository

    Ismail, Amr; Alouini, Mohamed-Slim

    2013-01-01

    for a wide range of STBCs to achieve full-diversity under PIC group decoding with or without successive interference cancellation (SIC). Based on the provided design criteria we derive an upper-bound on the achievable rate for a class of codes. A two

  16. Species-Specific Expression of Full-Length and Alternatively Spliced Variant Forms of CDK5RAP2.

    Directory of Open Access Journals (Sweden)

    John S Y Park

    Full Text Available CDK5RAP2 is one of the primary microcephaly genes that are associated with reduced brain size and mental retardation. We have previously shown that human CDK5RAP2 exists as a full-length form (hCDK5RAP2 or an alternatively spliced variant form (hCDK5RAP2-V1 that is lacking exon 32. The equivalent of hCDK5RAP2-V1 has been reported in rat and mouse but the presence of full-length equivalent hCDK5RAP2 in rat and mouse has not been examined. Here, we demonstrate that rat expresses both a full length and an alternatively spliced variant form of CDK5RAP2 that are equivalent to our previously reported hCDK5RAP2 and hCDK5RAP2-V1, repectively. However, mouse expresses only one form of CDK5RAP2 that is equivalent to the human and rat alternatively spliced variant forms. Knowledge of this expression of different forms of CDK5RAP2 in human, rat and mouse is essential in selecting the appropriate model for studies of CDK5RAP2 and primary microcephaly but our findings further indicate the evolutionary divergence of mouse from the human and rat species.

  17. Highly efficient full-length hepatitis C virus genotype 1 (strain TN) infectious culture system

    DEFF Research Database (Denmark)

    Li, Yi-Ping; Ramirez, Santseharay; Jensen, Sanne B

    2012-01-01

    Chronic infection with hepatitis C virus (HCV) is an important cause of end stage liver disease worldwide. In the United States, most HCV-related disease is associated with genotype 1 infection, which remains difficult to treat. Drug and vaccine development was hampered by inability to culture...... full-length TN infection dose-dependently. Given the unique importance of genotype 1 for pathogenesis, this infectious 1a culture system represents an important advance in HCV research. The approach used and the mutations identified might permit culture development for other HCV isolates, thus......) culture systems in Huh7.5 cells. Here, we developed a highly efficient genotype 1a (strain TN) full-length culture system. We initially found that the LSG substitutions conferred viability to an intergenotypic recombinant composed of TN 5' untranslated region (5'UTR)-NS5A and JFH1 NS5B-3'UTR; recovered...

  18. Nucleotide sequence of the melA gene, coding for alpha-galactosidase in Escherichia coli K-12.

    OpenAIRE

    Liljeström, P L; Liljeström, P

    1987-01-01

    Melibiose uptake and hydrolysis in E.coli is performed by the MelB and MelA proteins, respectively. We report the cloning and sequencing of the melA gene. The nucleotide sequence data showed that melA codes for a 450 amino acid long protein with a molecular weight of 50.6 kd. The sequence data also supported the assumption that the mel locus forms an operon with melA in proximal position. A comparison of MelA with alpha-galactosidase proteins from yeast and human origin showed that these prot...

  19. Adaptable recursive binary entropy coding technique

    Science.gov (United States)

    Kiely, Aaron B.; Klimesh, Matthew A.

    2002-07-01

    We present a novel data compression technique, called recursive interleaved entropy coding, that is based on recursive interleaving of variable-to variable length binary source codes. A compression module implementing this technique has the same functionality as arithmetic coding and can be used as the engine in various data compression algorithms. The encoder compresses a bit sequence by recursively encoding groups of bits that have similar estimated statistics, ordering the output in a way that is suited to the decoder. As a result, the decoder has low complexity. The encoding process for our technique is adaptable in that each bit to be encoded has an associated probability-of-zero estimate that may depend on previously encoded bits; this adaptability allows more effective compression. Recursive interleaved entropy coding may have advantages over arithmetic coding, including most notably the admission of a simple and fast decoder. Much variation is possible in the choice of component codes and in the interleaving structure, yielding coder designs of varying complexity and compression efficiency; coder designs that achieve arbitrarily small redundancy can be produced. We discuss coder design and performance estimation methods. We present practical encoding and decoding algorithms, as well as measured performance results.

  20. Multipass Channel Estimation and Joint Multiuser Detection and Equalization for MIMO Long-Code DS/CDMA Systems

    Directory of Open Access Journals (Sweden)

    Buzzi Stefano

    2006-01-01

    Full Text Available The problem of joint channel estimation, equalization, and multiuser detection for a multiantenna DS/CDMA system operating over a frequency-selective fading channel and adopting long aperiodic spreading codes is considered in this paper. First of all, we present several channel estimation and multiuser data detection schemes suited for multiantenna long-code DS/CDMA systems. Then, a multipass strategy, wherein the data detection and the channel estimation procedures exchange information in a recursive fashion, is introduced and analyzed for the proposed scenario. Remarkably, this strategy provides, at the price of some attendant computational complexity increase, excellent performance even when very short training sequences are transmitted, and thus couples together the conflicting advantages of both trained and blind systems, that is, good performance and no wasted bandwidth, respectively. Space-time coded systems are also considered, and it is shown that the multipass strategy provides excellent results for such systems also. Likewise, it is also shown that excellent performance is achieved also when each user adopts the same spreading code for all of its transmit antennas. The validity of the proposed procedure is corroborated by both simulation results and analytical findings. In particular, it is shown that adopting the multipass strategy results in a remarkable reduction of the channel estimation mean-square error and of the optimal length of the training sequence.

  1. Draft Genome Sequence of Lactobacillus sp. Strain TCF032-E4, Isolated from Fermented Radish.

    Science.gov (United States)

    Mao, Yuejian; Chen, Meng; Horvath, Philippe

    2015-07-30

    Here, we report the draft genome sequence of Lactobacillus sp. strain TCF032-E4 (= CCTCC AB2015090 = DSM 100358), isolated from a Chinese fermented radish. The total length of the 57 contigs is about 2.9 Mb, with a G+C content of 43.5 mol% and 2,797 predicted coding sequences (CDSs). Copyright © 2015 Mao et al.

  2. Purification and Fibrillation of Full-Length Recombinant PrP.

    Science.gov (United States)

    Makarava, Natallia; Savtchenko, Regina; Baskakov, Ilia V

    2017-01-01

    Misfolding and aggregation of prion protein are related to several neurodegenerative diseases in humans such as Creutzfeldt-Jakob disease, fatal familial insomnia, and Gerstmann-Straussler-Scheinker disease. A growing number of applications in the prion field including assays for detection of PrP Sc and methods for production of PrP Sc de novo require recombinant prion protein (PrP) of high purity and quality. Here, we report an experimental procedure for expression and purification of full-length mammalian prion protein. This protocol has been proved to yield PrP of extremely high purity that lacks PrP adducts, oxidative modifications, or truncation, which is typically generated as a result of spontaneous oxidation or degradation. We also describe methods for preparation of amyloid fibrils from recombinant PrP in vitro. Recombinant PrP fibrils can be used as a noninfectious synthetic surrogate of PrP Sc for development of prion diagnostics including generation of PrP Sc -specific antibody.

  3. Structured LDPC Codes over Integer Residue Rings

    Directory of Open Access Journals (Sweden)

    Marc A. Armand

    2008-07-01

    Full Text Available This paper presents a new class of low-density parity-check (LDPC codes over ℤ2a represented by regular, structured Tanner graphs. These graphs are constructed using Latin squares defined over a multiplicative group of a Galois ring, rather than a finite field. Our approach yields codes for a wide range of code rates and more importantly, codes whose minimum pseudocodeword weights equal their minimum Hamming distances. Simulation studies show that these structured codes, when transmitted using matched signal sets over an additive-white-Gaussian-noise channel, can outperform their random counterparts of similar length and rate.

  4. Production of enzymatically active recombinant full-length barley high pI alpha-glucosidase of glycoside family 31 by high cell-density fermentation of Pichia pastoris and affinity purification

    DEFF Research Database (Denmark)

    Næsted, Henrik; Kramhøft, Birte; Lok, F.

    2006-01-01

    Recombinant barley high pI alpha-glucosidase was produced by high cell-density fermentation of Pichia pastoris expressing the cloned full-length gene. The gene was amplified from a genomic clone and exons (coding regions) were assembled by overlap PCR. The resulting cDNA was expressed under contr...... nM x s(-1), and 85 s(-1) using maltose as substrate. This work presents the first production of fully active recombinant alpha-glucosidase of glycoside hydrolase family 31 from higher plants. (c) 2005 Elsevier Inc. All rights reserved....

  5. OTU analysis using metagenomic shotgun sequencing data.

    Directory of Open Access Journals (Sweden)

    Xiaolin Hao

    Full Text Available Because of technological limitations, the primer and amplification biases in targeted sequencing of 16S rRNA genes have veiled the true microbial diversity underlying environmental samples. However, the protocol of metagenomic shotgun sequencing provides 16S rRNA gene fragment data with natural immunity against the biases raised during priming and thus the potential of uncovering the true structure of microbial community by giving more accurate predictions of operational taxonomic units (OTUs. Nonetheless, the lack of statistically rigorous comparison between 16S rRNA gene fragments and other data types makes it difficult to interpret previously reported results using 16S rRNA gene fragments. Therefore, in the present work, we established a standard analysis pipeline that would help confirm if the differences in the data are true or are just due to potential technical bias. This pipeline is built by using simulated data to find optimal mapping and OTU prediction methods. The comparison between simulated datasets revealed a relationship between 16S rRNA gene fragments and full-length 16S rRNA sequences that a 16S rRNA gene fragment having a length >150 bp provides the same accuracy as a full-length 16S rRNA sequence using our proposed pipeline, which could serve as a good starting point for experimental design and making the comparison between 16S rRNA gene fragment-based and targeted 16S rRNA sequencing-based surveys possible.

  6. Full-length genome sequence analysis of four subgroup J avian leukosis virus strains isolated from chickens with clinical hemangioma.

    Science.gov (United States)

    Lin, Lulu; Wang, Peikun; Yang, Yongli; Li, Haijuan; Huang, Teng; Wei, Ping

    2017-12-01

    Since 2014, cases of hemangioma associated with avian leukosis virus subgroup J (ALV-J) have been emerging in commercial chickens in Guangxi. In this study, four strains of the subgroup J avian leukosis virus (ALV-J), named GX14HG01, GX14HG04, GX14LT07, and GX14ZS14, were isolated from chickens with clinical hemangioma in 2014 by DF-1 cell culture and then identified with ELISA detection of ALV group specific antigen p27, the detection of subtype specific PCR and indirect immunofluorescence assay (IFA) with ALV-J specific monoclonal antibody. The complete genomes of the isolates were sequenced and it was found that the gag and pol were relatively conservative, while env was variable especially the gp85 gene. Homology analysis of the env gene sequences showed that the env gene of all the four isolates had higher similarities with the hemangioma (HE)-type reference strains than that of the myeloid leukosis (ML)-type strains, and moreover, the HE-type strains' specific deletion of 205-bp sequence covering the rTM and DR1 in 3'UTR fragment was also found in the four isolates. Further analysis on the sequences of subunits of env gene revealed an interesting finding: the gp85 of isolates GX14ZS14 and GX14HG04 had a higher similarity with HPRS-103 and much lower similarity with the HE-type reference strains resulting in GX14ZS14, GX14HG04, and HPRS-103 being clustered in the same branch, while gp37 had higher similarities with the HE-type reference strains when compared to that of HPRS-103, resulted in GX14ZS14, GX14HG04, and HE-type reference strains being clustered in the same branch. The results suggested that isolates GX14ZS14 and GX14HG04 may be the recombinant strains of the foreign strain HPRS-103 with the local epidemic HE-type strains of ALV-J.

  7. The complete mitochondrial genome sequence of Oceanic whitetip shark, Carcharhinus longimanus (Carcharhiniformes: Carcharhinidae).

    Science.gov (United States)

    Li, Weiwen; Dai, Xiaojie; Xu, Qianghua; Wu, Feng; Gao, Chunxia; Zhang, Yanbo

    2016-05-01

    The complete mitochondrial DNA sequence of Carcharhinus longimanus was determined and analyzed. The complete mtDNA genome sequence of C. longimanus was 16,706 bp in length. It contained 22 tRNA genes, 2 rRNA genes, 13 protein-coding genes and 2 non-conding regions: control region (D-loop) and origin of light-strand replication (OL). The complete mitogenome sequence information of C. longimanus can provide a useful data for further studies on molecular systematics, stock evaluation, taxonomic status and conservation genetics.

  8. Disentangling the effects of alternation rate and maximum run length on judgments of randomness

    Directory of Open Access Journals (Sweden)

    Sabine G. Scholl

    2011-08-01

    Full Text Available Binary sequences are characterized by various features. Two of these characteristics---alternation rate and run length---have repeatedly been shown to influence judgments of randomness. The two characteristics, however, have usually been investigated separately, without controlling for the other feature. Because the two features are correlated but not identical, it seems critical to analyze their unique impact, as well as their interaction, so as to understand more clearly what influences judgments of randomness. To this end, two experiments on the perception of binary sequences orthogonally manipulated alternation rate and maximum run length (i.e., length of the longest run within the sequence. Results show that alternation rate consistently exerts a unique effect on judgments of randomness, but that the effect of alternation rate is contingent on the length of the longest run within the sequence. The effect of maximum run length was found to be small and less consistent. Together, these findings extend prior randomness research by integrating literature from the realms of perception, categorization, and prediction, as well as by showing the unique and joint effects of alternation rate and maximum run length on judgments of randomness.

  9. Evolutionary Dynamics of Microsatellite Distribution in Plants: Insight from the Comparison of Sequenced Brassica, Arabidopsis and Other Angiosperm Species

    Science.gov (United States)

    Shi, Jiaqin; Huang, Shunmou; Fu, Donghui; Yu, Jinyin; Wang, Xinfa; Hua, Wei; Liu, Shengyi; Liu, Guihua; Wang, Hanzhong

    2013-01-01

    Despite their ubiquity and functional importance, microsatellites have been largely ignored in comparative genomics, mostly due to the lack of genomic information. In the current study, microsatellite distribution was characterized and compared in the whole genomes and both the coding and non-coding DNA sequences of the sequenced Brassica, Arabidopsis and other angiosperm species to investigate their evolutionary dynamics in plants. The variation in the microsatellite frequencies of these angiosperm species was much smaller than those for their microsatellite numbers and genome sizes, suggesting that microsatellite frequency may be relatively stable in plants. The microsatellite frequencies of these angiosperm species were significantly negatively correlated with both their genome sizes and transposable elements contents. The pattern of microsatellite distribution may differ according to the different genomic regions (such as coding and non-coding sequences). The observed differences in many important microsatellite characteristics (especially the distribution with respect to motif length, type and repeat number) of these angiosperm species were generally accordant with their phylogenetic distance, which suggested that the evolutionary dynamics of microsatellite distribution may be generally consistent with plant divergence/evolution. Importantly, by comparing these microsatellite characteristics (especially the distribution with respect to motif type) the angiosperm species (aside from a few species) all clustered into two obviously different groups that were largely represented by monocots and dicots, suggesting a complex and generally dichotomous evolutionary pattern of microsatellite distribution in angiosperms. Polyploidy may lead to a slight increase in microsatellite frequency in the coding sequences and a significant decrease in microsatellite frequency in the whole genome/non-coding sequences, but have little effect on the microsatellite distribution with

  10. Production of full length and splicing form of chymosin using pETexpression system in E-coli and investigation its enzyme activity and preplasmic secretion

    Directory of Open Access Journals (Sweden)

    M. Ahmadi Zeydabadi

    2008-05-01

    Full Text Available Introduction: Chymosin (Rennin EC 3.4.23.4 is an aspartyl proteinas (the major proteolyticenzyme in the fourth stomach of the unweaned calf that is formed by proteolytic activation fromzymogene prochymosin. The aim of his study was to produce the full length and splicing form ofchymosin using pETexpression system in E-coli and to assay the activity of expressed enzyme andpreplasmic secretion.Materials and Methods: The sense primer F-prochy(+ (5´-ggggccatgGCTGAGATCACCAGGAincluding NCOI restriction site. The anti sense R-prochy(- (5´-gggcggccgcGATGGCTTTGGCCAGC -3´ hybridizing to the C-terminal end of calf preprocymosincDNA and contains an additional NotI restriction site at its 5´-end . The cells were disrupted bysonication and proteins were purified by using Ni-NTA beads from Qiagen under native conditional.The preprochymosin and AS6 preprochymosin were activated at pH 4.7. The enzyme solutions werediluted 20-fold with 50 mM phosphate buffer .Results: Sequencing data analysis showed that the exon six has been spliced out and, therefore thegene product is 114 bp shorter in length, both chymosin forms were expressed together in E.coli.Under the same expression conditions, at least AS6 preprochymosin was produced 7-fold highexpression in comparison to a full-length recombinant chymosin. Following acid activation andneutralization, the purified fractions were tested in a qualitative milk clotting assay. The clottingactivity of preprochymosin and exon6-less preprochymosin were comparable.Conclusion: High expression of this alternatively expressed transcript in bacteria, and properfolding of the AS6 chymosin protein molecule in the absence of exon six are the two most importantaspects distinguished in this research.

  11. Optimal Codes for the Burst Erasure Channel

    Science.gov (United States)

    Hamkins, Jon

    2010-01-01

    Deep space communications over noisy channels lead to certain packets that are not decodable. These packets leave gaps, or bursts of erasures, in the data stream. Burst erasure correcting codes overcome this problem. These are forward erasure correcting codes that allow one to recover the missing gaps of data. Much of the recent work on this topic concentrated on Low-Density Parity-Check (LDPC) codes. These are more complicated to encode and decode than Single Parity Check (SPC) codes or Reed-Solomon (RS) codes, and so far have not been able to achieve the theoretical limit for burst erasure protection. A block interleaved maximum distance separable (MDS) code (e.g., an SPC or RS code) offers near-optimal burst erasure protection, in the sense that no other scheme of equal total transmission length and code rate could improve the guaranteed correctible burst erasure length by more than one symbol. The optimality does not depend on the length of the code, i.e., a short MDS code block interleaved to a given length would perform as well as a longer MDS code interleaved to the same overall length. As a result, this approach offers lower decoding complexity with better burst erasure protection compared to other recent designs for the burst erasure channel (e.g., LDPC codes). A limitation of the design is its lack of robustness to channels that have impairments other than burst erasures (e.g., additive white Gaussian noise), making its application best suited for correcting data erasures in layers above the physical layer. The efficiency of a burst erasure code is the length of its burst erasure correction capability divided by the theoretical upper limit on this length. The inefficiency is one minus the efficiency. The illustration compares the inefficiency of interleaved RS codes to Quasi-Cyclic (QC) LDPC codes, Euclidean Geometry (EG) LDPC codes, extended Irregular Repeat Accumulate (eIRA) codes, array codes, and random LDPC codes previously proposed for burst erasure

  12. Pseudo-polyprotein translated from the full-length ORF1 of capillovirus is important for pathogenicity, but a truncated ORF1 protein without variable and CP regions is sufficient for replication.

    Science.gov (United States)

    Hirata, Hisae; Yamaji, Yasuyuki; Komatsu, Ken; Kagiwada, Satoshi; Oshima, Kenro; Okano, Yukari; Takahashi, Shuichiro; Ugaki, Masashi; Namba, Shigetou

    2010-09-01

    The first open-reading frame (ORF) of the genus Capillovirus encodes an apparently chimeric polyprotein containing conserved regions for replicase (Rep) and coat protein (CP), while other viruses in the family Flexiviridae have separate ORFs encoding these proteins. To investigate the role of the full-length ORF1 polyprotein of capillovirus, we generated truncation mutants of ORF1 of apple stem grooving virus by inserting a termination codon into the variable region located between the putative Rep- and CP-coding regions. These mutants were capable of systemic infection, although their pathogenicity was attenuated. In vitro translation of ORF1 produced both the full-length polyprotein and the smaller Rep protein. The results of in vivo reporter assays suggested that the mechanism of this early termination is a ribosomal -1 frame-shift occurring downstream from the conserved Rep domains. The mechanism of capillovirus gene expression and the very close evolutionary relationship between the genera Capillovirus and Trichovirus are discussed. Copyright (c) 2010. Published by Elsevier B.V.

  13. Non-destructive testing of full-length bonded rock bolts based on HHT signal analysis

    Science.gov (United States)

    Shi, Z. M.; Liu, L.; Peng, M.; Liu, C. C.; Tao, F. J.; Liu, C. S.

    2018-04-01

    Full-length bonded rock bolts are commonly used in mining, tunneling and slope engineering because of their simple design and resistance to corrosion. However, the length of a rock bolt and grouting quality do not often meet the required design standards in practice because of the concealment and complexity of bolt construction. Non-destructive testing is preferred when testing a rock bolt's quality because of the convenience, low cost and wide detection range. In this paper, a signal analysis method for the non-destructive sound wave testing of full-length bonded rock bolts is presented, which is based on the Hilbert-Huang transform (HHT). First, we introduce the HHT analysis method to calculate the bolt length and identify defect locations based on sound wave reflection test signals, which includes decomposing the test signal via empirical mode decomposition (EMD), selecting the intrinsic mode functions (IMF) using the Pearson Correlation Index (PCI) and calculating the instantaneous phase and frequency via the Hilbert transform (HT). Second, six model tests are conducted using different grouting defects and bolt protruding lengths to verify the effectiveness of the HHT analysis method. Lastly, the influence of the bolt protruding length on the test signal, identification of multiple reflections from defects, bolt end and protruding end, and mode mixing from EMD are discussed. The HHT analysis method can identify the bolt length and grouting defect locations from signals that contain noise at multiple reflected interfaces. The reflection from the long protruding end creates an irregular test signal with many frequency peaks on the spectrum. The reflections from defects barely change the original signal because they are low energy, which cannot be adequately resolved using existing methods. The HHT analysis method can identify reflections from the long protruding end of the bolt and multiple reflections from grouting defects based on mutations in the instantaneous

  14. Application of Displacement Height and Surface Roughness Length to Determination Boundary Layer Development Length over Stepped Spillway

    Directory of Open Access Journals (Sweden)

    Xiangju Cheng

    2014-12-01

    Full Text Available One of the most uncertain parameters in stepped spillway design is the length (from the crest of boundary layer development. The normal velocity profiles responding to the steps as bed roughness are investigated in the developing non-aerated flow region. A detailed analysis of the logarithmic vertical velocity profiles on stepped spillways is conducted through experimental data to verify the computational code and numerical experiments to expand the data available. To determine development length, the hydraulic roughness and displacement thickness, along with the shear velocity, are needed. This includes determining displacement height d and surface roughness length z0 and the relationship of d and z0 to the step geometry. The results show that the hydraulic roughness height ks is the primary factor on which d and z0 depend. In different step height, step width, discharge and intake Froude number, the relations d/ks = 0.22–0.27, z0/ks = 0.06–0.1 and d/z0 = 2.2–4 result in a good estimate. Using the computational code and numerical experiments, air inception will occur over stepped spillway flow as long as the Bauer-defined boundary layer thickness is between 0.72 and 0.79.

  15. COMPARATIVE ANALYSIS OF THE METHODS FOR EVALUATING THE EFFECTIVE LENGTH OF COLUMNS

    Directory of Open Access Journals (Sweden)

    Paschal Chimeremeze Chiadighikaobi

    2017-08-01

    Full Text Available This article looks into the effective length of columns using different methods. The codes in use in this article are those from the AISC (American Institute of Steel Construction. And that of AS 4100 (Australian Steel code. A conclusion was drawn after investigating a frame using three different methods. Solved Exercise 6 (LeMessurier Method was investigated using same frame but different dimension. Further analysis and investigation will be done using Java codes to analyze the frames.

  16. Mining and gene ontology based annotation of SSR markers from expressed sequence tags of Humulus lupulus

    Science.gov (United States)

    Singh, Swati; Gupta, Sanchita; Mani, Ashutosh; Chaturvedi, Anoop

    2012-01-01

    Humulus lupulus is commonly known as hops, a member of the family moraceae. Currently many projects are underway leading to the accumulation of voluminous genomic and expressed sequence tag sequences in public databases. The genetically characterized domains in these databases are limited due to non-availability of reliable molecular markers. The large data of EST sequences are available in hops. The simple sequence repeat markers extracted from EST data are used as molecular markers for genetic characterization, in the present study. 25,495 EST sequences were examined and assembled to get full-length sequences. Maximum frequency distribution was shown by mononucleotide SSR motifs i.e. 60.44% in contig and 62.16% in singleton where as minimum frequency are observed for hexanucleotide SSR in contig (0.09%) and pentanucleotide SSR in singletons (0.12%). Maximum trinucleotide motifs code for Glutamic acid (GAA) while AT/TA were the most frequent repeat of dinucleotide SSRs. Flanking primer pairs were designed in-silico for the SSR containing sequences. Functional categorization of SSRs containing sequences was done through gene ontology terms like biological process, cellular component and molecular function. PMID:22368382

  17. Assembly and comparative analysis of complete mitochondrial genome sequence of an economic plant Salix suchowensis

    Directory of Open Access Journals (Sweden)

    Ning Ye

    2017-03-01

    Full Text Available Willow is a widely used dioecious woody plant of Salicaceae family in China. Due to their high biomass yields, willows are promising sources for bioenergy crops. In this study, we assembled the complete mitochondrial (mt genome sequence of S. suchowensis with the length of 644,437 bp using Roche-454 GS FLX Titanium sequencing technologies. Base composition of the S. suchowensis mt genome is A (27.43%, T (27.59%, C (22.34%, and G (22.64%, which shows a prevalent GC content with that of other angiosperms. This long circular mt genome encodes 58 unique genes (32 protein-coding genes, 23 tRNA genes and 3 rRNA genes, and 9 of the 32 protein-coding genes contain 17 introns. Through the phylogenetic analysis of 35 species based on 23 protein-coding genes, it is supported that Salix as a sister to Populus. With the detailed phylogenetic information and the identification of phylogenetic position, some ribosomal protein genes and succinate dehydrogenase genes are found usually lost during evolution. As a native shrub willow species, this worthwhile research of S. suchowensis mt genome will provide more desirable information for better understanding the genomic breeding and missing pieces of sex determination evolution in the future.

  18. Cloning and sequence analysis of cDNA coding for rat nucleolar protein C23

    International Nuclear Information System (INIS)

    Ghaffari, S.H.; Olson, M.O.J.

    1986-01-01

    Using synthetic oligonucleotides as primers and probes, the authors have isolated and sequenced cDNA clones encoding protein C23, a putative nucleolus organizer protein. Poly(A + ) RNA was isolated from rat Novikoff hepatoma cells and enriched in C23 mRNA by sucrose density gradient ultracentrifugation. Two deoxyoligonuleotides, a 48- and a 27-mer, were synthesized on the basis of amino acid sequence from the C-terminal half of protein C23 and cDNA sequence data from CHO cell protein. The 48-mer was used a primer for synthesis of cDNA which was then inserted into plasmid pUC9. Transformed bacterial colonies were screened by hybridization with 32 P labeled 27-mer. Two clones among 5000 gave a strong positive signal. Plasmid DNAs from these clones were purified and characterized by blotting and nucleotide sequence analysis. The length of C23 mRNA was estimated to be 3200 bases in a northern blot analysis. The sequence of a 267 b.p. insert shows high homology with the CHO cDNA with only 9 nucleotide differences and an identical amino acid sequence. These studies indicate that this region of the protein is highly conserved

  19. Validation of the Serpent 2-DYNSUB code sequence using the Special Power Excursion Reactor Test III (SPERT III)

    International Nuclear Information System (INIS)

    Knebel, Miriam; Mercatali, Luigi; Sanchez, Victor; Stieglitz, Robert; Macian-Juan, Rafael

    2016-01-01

    Highlights: • Full few-group cross section tables created by Monte Carlo lattice code Serpent 2. • Serpent 2 group constant methodology verified for HFP static and transient cases. • Serpent 2-DYNSUB tool chainvalidated using SPERT III REA experiments. • Serpent 2-DYNSUB tool chain suitable to model RIAs in PWRs. - Abstract: The Special Power Excursion Reactor Test III (SPERT III) is studied using the Serpent 2-DYNSUB code sequence in order to validate it for modeling reactivity insertion accidents (RIA) in PWRs. The SPERT III E-core was a thermal research reactor constructed to analyze reactor dynamics. Its configuration resembles a commercial PWR on terms of fuel type, choice of moderator, coolant flow and system pressure. The initial conditions of the rod ejection accident experiments (REA) performed cover cold startup, hot startup, hot standby and operating power scenarios. Eight of these experiments were analyzed in detail. Firstly, multi-dimensional nodal diffusion cross section tables were created for the three-dimensional reactor simulator DYNSUB employing the Monte Carlo neutron transport code Serpent 2. In a second step, DYNSUB stationary simulations were compared to Monte Carlo reference three-dimensional full scale solutions obtained with Serpent 2 (cold startup conditions) and Serpent 2/SUBCHANFLOW (operating power conditions) with a good agreement being observed. The latter tool is an internal coupling of Serpent 2 and the sub-channel thermal-hydraulics code SUBCHANFLOW. Finally, DYNSUB was utilized to study the eight selected transient experiments. Results were found to match measurements well. As the selected experiments cover much of the possible transient (delayed super-critical, prompt super-critical and super-prompt critical excursion) and initial conditions (cold and hot as well as zero, little and full power reactor states) one expects in commercial PWRs, the obtained results give confidence that the Serpent 2-DYNSUB tool chain is

  20. Complete coding sequence of the human raf oncogene and the corresponding structure of the c-raf-1 gene

    Energy Technology Data Exchange (ETDEWEB)

    Bonner, T I; Oppermann, H; Seeburg, P; Kerby, S B; Gunnell, M A; Young, A C; Rapp, U R

    1986-01-24

    The complete 648 amino acid sequence of the human raf oncogene was deduced from the 2977 nucleotide sequence of a fetal liver cDNA. The cDNA has been used to obtain clones which extend the human c-raf-1 locus by an additional 18.9 kb at the 5' end and contain all the remaining coding exons.

  1. De novo ORFs in Drosophila are important to organismal fitness and evolved rapidly from previously non-coding sequences.

    Directory of Open Access Journals (Sweden)

    Josephine A Reinhardt

    Full Text Available How non-coding DNA gives rise to new protein-coding genes (de novo genes is not well understood. Recent work has revealed the origins and functions of a few de novo genes, but common principles governing the evolution or biological roles of these genes are unknown. To better define these principles, we performed a parallel analysis of the evolution and function of six putatively protein-coding de novo genes described in Drosophila melanogaster. Reconstruction of the transcriptional history of de novo genes shows that two de novo genes emerged from novel long non-coding RNAs that arose at least 5 MY prior to evolution of an open reading frame. In contrast, four other de novo genes evolved a translated open reading frame and transcription within the same evolutionary interval suggesting that nascent open reading frames (proto-ORFs, while not required, can contribute to the emergence of a new de novo gene. However, none of the genes arose from proto-ORFs that existed long before expression evolved. Sequence and structural evolution of de novo genes was rapid compared to nearby genes and the structural complexity of de novo genes steadily increases over evolutionary time. Despite the fact that these genes are transcribed at a higher level in males than females, and are most strongly expressed in testes, RNAi experiments show that most of these genes are essential in both sexes during metamorphosis. This lethality suggests that protein coding de novo genes in Drosophila quickly become functionally important.

  2. Molecular cloning and complete nucleotide sequence of a human ventricular myosin light chain 1

    Energy Technology Data Exchange (ETDEWEB)

    Hoffmann, E; Shi, Q W; Floroff, M; Mickle, D A.G.; Wu, T W; Olley, P M; Jackowski, G

    1988-03-25

    Human ventricular plasmid library was constructed. The library was screened with the oligonucleotide probe (17-mer) corresponding to a conserve region of myosin light chain 1 near the carboxy terminal. Full length cDNA recombinant plasmid containing 1100 bp insert was isolated. RNA blot hybridization with this insert detected a message of approximately 1500 bp corresponding to the size of VLCl and mRNA. Complete nucleotide sequence of the coding region was determined in M13 subclones using dideoxy chain termination method. With the isolation of this clone (pCD HLVCl), the publication of the complete nucleotide sequence of HVLCl and the predicted secondary structure of this protein will aid in understanding of the biochemistry of myosin and its function in contraction, the evolution of myosin light genes and the genetic, developmental and physiological regulation of myosin genes.

  3. Characterization of a new full length TMPRSS3 isoform and identification of mutant alleles responsible for nonsyndromic recessive deafness in Newfoundland and Pakistan

    Directory of Open Access Journals (Sweden)

    Shotland Lawrence I

    2004-09-01

    Full Text Available Abstract Background Mutant alleles of TMPRSS3 are associated with nonsyndromic recessive deafness (DFNB8/B10. TMPRSS3 encodes a predicted secreted serine protease, although the deduced amino acid sequence has no signal peptide. In this study, we searched for mutant alleles of TMPRSS3 in families from Pakistan and Newfoundland with recessive deafness co-segregating with DFNB8/B10 linked haplotypes and also more thoroughly characterized the genomic structure of TMPRSS3. Methods We enrolled families segregating recessive hearing loss from Pakistan and Newfoundland. Microsatellite markers flanking the TMPRSS3 locus were used for linkage analysis. DNA samples from participating individuals were sequenced for TMPRSS3. The structure of TMPRSS3 was characterized bioinformatically and experimentally by sequencing novel cDNA clones of TMPRSS3. Results We identified mutations in TMPRSS3 in four Pakistani families with recessive, nonsyndromic congenital deafness. We also identified two recessive mutations, one of which is novel, of TMPRSS3 segregating in a six-generation extended family from Newfoundland. The spectrum of TMPRSS3 mutations is reviewed in the context of a genotype-phenotype correlation. Our study also revealed a longer isoform of TMPRSS3 with a hitherto unidentified exon encoding a signal peptide, which is expressed in several tissues. Conclusion Mutations of TMPRSS3 contribute to hearing loss in many communities worldwide and account for 1.8% (8 of 449 of Pakistani families segregating congenital deafness as an autosomal recessive trait. The newly identified TMPRSS3 isoform e will be helpful in the functional characterization of the full length protein.

  4. Effect of temperature and cycle length on microbial competition in PHB-producing sequencing batch reactor.

    Science.gov (United States)

    Jiang, Yang; Marang, Leonie; Kleerebezem, Robbert; Muyzer, Gerard; van Loosdrecht, Mark C M

    2011-05-01

    The impact of temperature and cycle length on microbial competition between polyhydroxybutyrate (PHB)-producing populations enriched in feast-famine sequencing batch reactors (SBRs) was investigated at temperatures of 20 °C and 30 °C, and in a cycle length range of 1-18 h. In this study, the microbial community structure of the PHB-producing enrichments was found to be strongly dependent on temperature, but not on cycle length. Zoogloea and Plasticicumulans acidivorans dominated the SBRs operated at 20 °C and 30 °C, respectively. Both enrichments accumulated PHB more than 75% of cell dry weight. Short-term temperature change experiments revealed that P. acidivorans was more temperature sensitive as compared with Zoogloea. This is particularly true for the PHB degradation, resulting in incomplete PHB degradation in P. acidivorans at 20 °C. Incomplete PHB degradation limited biomass growth and allowed Zoogloea to outcompete P. acidivorans. The PHB content at the end of the feast phase correlated well with the cycle length at a constant solid retention time (SRT). These results suggest that to establish enrichment with the capacity to store a high fraction of PHB, the number of cycles per SRT should be minimized independent of the temperature.

  5. Full-length VP2 gene analysis of canine parvovirus reveals emergence of newer variants in India.

    Science.gov (United States)

    Nookala, Mangadevi; Mukhopadhyay, Hirak Kumar; Sivaprakasam, Amsaveni; Balasubramanian, Brindhalakshmi; Antony, Prabhakar Xavier; Thanislass, Jacob; Srinivas, Mouttou Vivek; Pillai, Raghavan Madhusoodanan

    2016-12-01

    The canine parvovirus (CPV) infection is a highly contagious and serious enteric disease of dogs with high fatality rate. The present study was taken up to characterize the full-length viral polypeptide 2 (VP2) gene of CPV of Indian origin along with the commercially available vaccines. The faecal samples from parvovirus suspected dogs were collected from various states of India for screening by PCR assay and 66.29% of samples were found positive. Six CPV-2a, three CPV-2b, and one CPV-2c types were identified by sequence analysis. Several unique and existing mutations have been noticed in CPV types analyzed indicating emergence of newer variants of CPV in India. The phylogenetic analysis revealed that all the field CPV types were grouped in different subclades within two main clades, but away from the commercial vaccine strains. CPV-2b and CPV-2c types with unique mutations were found to be establishing in India apart from the prevailing CPV-2a type. Mutations and the positive selection of the mutants were found to be the major mechanism of emergence and evolution of parvovirus. Therefore, the incorporation of local strain in the vaccine formulation may be considered for effective control of CPV infections in India.

  6. The ability to form full-length intron RNA circles is a general property of nuclear group I introns

    DEFF Research Database (Denmark)

    Nielsen, Henrik; Fiskaa, Tonje; Birgisdottir, Asa Birna

    2003-01-01

    at the expense of the host. The circularization pathway has distinct structural requirements that differ from those of splicing and appears to be specifically suppressed in vivo. The ability to form full-length circles is found in all types of nuclear group I introns, including those from the Tetrahymena...... ribosomal DNA. The biological function of the full-length circles is not known, but the fact that the circles contain the entire genetic information of the intron suggests a role in intron mobility....

  7. Sequence-length variation of mtDNA HVS-I C-stretch in Chinese ethnic groups.

    Science.gov (United States)

    Chen, Feng; Dang, Yong-hui; Yan, Chun-xia; Liu, Yan-ling; Deng, Ya-jun; Fulton, David J R; Chen, Teng

    2009-10-01

    The purpose of this study was to investigate mitochondrial DNA (mtDNA) hypervariable segment-I (HVS-I) C-stretch variations and explore the significance of these variations in forensic and population genetics studies. The C-stretch sequence variation was studied in 919 unrelated individuals from 8 Chinese ethnic groups using both direct and clone sequencing approaches. Thirty eight C-stretch haplotypes were identified, and some novel and population specific haplotypes were also detected. The C-stretch genetic diversity (GD) values were relatively high, and probability (P) values were low. Additionally, C-stretch length heteroplasmy was observed in approximately 9% of individuals studied. There was a significant correlation (r=-0.961, Ppopulations. The results from the Fst and dA genetic distance matrix, neighbor-joining tree, and principal component map also suggest that C-stretch could be used as a reliable genetic marker in population genetics.

  8. UltraPse: A Universal and Extensible Software Platform for Representing Biological Sequences.

    Science.gov (United States)

    Du, Pu-Feng; Zhao, Wei; Miao, Yang-Yang; Wei, Le-Yi; Wang, Likun

    2017-11-14

    With the avalanche of biological sequences in public databases, one of the most challenging problems in computational biology is to predict their biological functions and cellular attributes. Most of the existing prediction algorithms can only handle fixed-length numerical vectors. Therefore, it is important to be able to represent biological sequences with various lengths using fixed-length numerical vectors. Although several algorithms, as well as software implementations, have been developed to address this problem, these existing programs can only provide a fixed number of representation modes. Every time a new sequence representation mode is developed, a new program will be needed. In this paper, we propose the UltraPse as a universal software platform for this problem. The function of the UltraPse is not only to generate various existing sequence representation modes, but also to simplify all future programming works in developing novel representation modes. The extensibility of UltraPse is particularly enhanced. It allows the users to define their own representation mode, their own physicochemical properties, or even their own types of biological sequences. Moreover, UltraPse is also the fastest software of its kind. The source code package, as well as the executables for both Linux and Windows platforms, can be downloaded from the GitHub repository.

  9. Benchmarking of the Oxford Nanopore MinION sequencing for quantitative and qualitative assessment of cDNA populations.

    Science.gov (United States)

    Oikonomopoulos, Spyros; Wang, Yu Chang; Djambazian, Haig; Badescu, Dunarel; Ragoussis, Jiannis

    2016-08-24

    To assess the performance of the Oxford Nanopore Technologies MinION sequencing platform, cDNAs from the External RNA Controls Consortium (ERCC) RNA Spike-In mix were sequenced. This mix mimics mammalian mRNA species and consists of 92 polyadenylated transcripts with known concentration. cDNA libraries were generated using a template switching protocol to facilitate the direct comparison between different sequencing platforms. The MinION performance was assessed for its ability to sequence the cDNAs directly with good accuracy in terms of abundance and full length. The abundance of the ERCC cDNA molecules sequenced by MinION agreed with their expected concentration. No length or GC content bias was observed. The majority of cDNAs were sequenced as full length. Additionally, a complex cDNA population derived from a human HEK-293 cell line was sequenced on an Illumina HiSeq 2500, PacBio RS II and ONT MinION platforms. We observed that there was a good agreement in the measured cDNA abundance between PacBio RS II and ONT MinION (rpearson = 0.82, isoforms with length more than 700bp) and between Illumina HiSeq 2500 and ONT MinION (rpearson = 0.75). This indicates that the ONT MinION can sequence quantitatively both long and short full length cDNA molecules.

  10. Development of three full-length infectious cDNA clones of distinct brassica yellows virus genotypes for agrobacterium-mediated inoculation.

    Science.gov (United States)

    Zhang, Xiao-Yan; Dong, Shu-Wei; Xiang, Hai-Ying; Chen, Xiang-Ru; Li, Da-Wei; Yu, Jia-Lin; Han, Cheng-Gui

    2015-02-02

    Brassica yellows virus is a newly identified species in the genus of Polerovirus within the family Luteoviridae. Brassica yellows virus (BrYV) is prevalently distributed throughout Mainland China and South Korea, is an important virus infecting cruciferous crops. Based on six BrYV genomic sequences of isolates from oilseed rape, rutabaga, radish, and cabbage, three genotypes, BrYV-A, BrYV-B, and BrYV-C, exist, which mainly differ in the 5' terminal half of the genome. BrYV is an aphid-transmitted and phloem-limited virus. The use of infectious cDNA clones is an alternative means of infecting plants that allows reverse genetic studies to be performed. In this study, full-length cDNA clones of BrYV-A, recombinant BrYV5B3A, and BrYV-C were constructed under control of the cauliflower mosaic virus 35S promoter. An agrobacterium-mediated inoculation system of Nicotiana benthamiana was developed using these cDNA clones. Three days after infiltration with full-length BrYV cDNA clones, necrotic symptoms were observed in the inoculated leaves of N. benthamiana; however, no obvious symptoms appeared in the upper leaves. Reverse transcription-PCR (RT-PCR) and western blot detection of samples from the upper leaves showed that the maximum infection efficiency of BrYVs could reach 100%. The infectivity of the BrYV-A, BrYV-5B3A, and BrYV-C cDNA clones was further confirmed by northern hybridization. The system developed here will be useful for further studies of BrYV, such as host range, pathogenicity, viral gene functions, and plant-virus-vector interactions, and especially for discerning the differences among the three genotypes. Copyright © 2014 Elsevier B.V. All rights reserved.

  11. High yield purification of full-length functional hERG K+ channels produced in Saccharomyces cerevisiae

    DEFF Research Database (Denmark)

    Molbaek, Karen; Scharff-Poulsen, Peter; Hélix-Nielsen, Claus

    2015-01-01

    knowledge this is the first reported high-yield production and purification of full length, tetrameric and functional hERG. This significant breakthrough will be paramount in obtaining hERG crystal structures, and in establishment of new high-throughput hERG drug safety screening assays....

  12. Generation and analysis of expressed sequence tags in the extreme large genomes Lilium and Tulipa

    Directory of Open Access Journals (Sweden)

    Shahin Arwa

    2012-11-01

    Full Text Available Abstract Background Bulbous flowers such as lily and tulip (Liliaceae family are monocot perennial herbs that are economically very important ornamental plants worldwide. However, there are hardly any genetic studies performed and genomic resources are lacking. To build genomic resources and develop tools to speed up the breeding in both crops, next generation sequencing was implemented. We sequenced and assembled transcriptomes of four lily and five tulip genotypes using 454 pyro-sequencing technology. Results Successfully, we developed the first set of 81,791 contigs with an average length of 514 bp for tulip, and enriched the very limited number of 3,329 available ESTs (Expressed Sequence Tags for lily with 52,172 contigs with an average length of 555 bp. The contigs together with singletons covered on average 37% of lily and 39% of tulip estimated transcriptome. Mining lily and tulip sequence data for SSRs (Simple Sequence Repeats showed that di-nucleotide repeats were twice more abundant in UTRs (UnTranslated Regions compared to coding regions, while tri-nucleotide repeats were equally spread over coding and UTR regions. Two sets of single nucleotide polymorphism (SNP markers suitable for high throughput genotyping were developed. In the first set, no SNPs flanking the target SNP (50 bp on either side were allowed. In the second set, one SNP in the flanking regions was allowed, which resulted in a 2 to 3 fold increase in SNP marker numbers compared with the first set. Orthologous groups between the two flower bulbs: lily and tulip (12,017 groups and among the three monocot species: lily, tulip, and rice (6,900 groups were determined using OrthoMCL. Orthologous groups were screened for common SNP markers and EST-SSRs to study synteny between lily and tulip, which resulted in 113 common SNP markers and 292 common EST-SSR. Lily and tulip contigs generated were annotated and described according to Gene Ontology terminology. Conclusions

  13. CpGAVAS, an integrated web server for the annotation, visualization, analysis, and GenBank submission of completely sequenced chloroplast genome sequences

    Science.gov (United States)

    2012-01-01

    Background The complete sequences of chloroplast genomes provide wealthy information regarding the evolutionary history of species. With the advance of next-generation sequencing technology, the number of completely sequenced chloroplast genomes is expected to increase exponentially, powerful computational tools annotating the genome sequences are in urgent need. Results We have developed a web server CPGAVAS. The server accepts a complete chloroplast genome sequence as input. First, it predicts protein-coding and rRNA genes based on the identification and mapping of the most similar, full-length protein, cDNA and rRNA sequences by integrating results from Blastx, Blastn, protein2genome and est2genome programs. Second, tRNA genes and inverted repeats (IR) are identified using tRNAscan, ARAGORN and vmatch respectively. Third, it calculates the summary statistics for the annotated genome. Fourth, it generates a circular map ready for publication. Fifth, it can create a Sequin file for GenBank submission. Last, it allows the extractions of protein and mRNA sequences for given list of genes and species. The annotation results in GFF3 format can be edited using any compatible annotation editing tools. The edited annotations can then be uploaded to CPGAVAS for update and re-analyses repeatedly. Using known chloroplast genome sequences as test set, we show that CPGAVAS performs comparably to another application DOGMA, while having several superior functionalities. Conclusions CPGAVAS allows the semi-automatic and complete annotation of a chloroplast genome sequence, and the visualization, editing and analysis of the annotation results. It will become an indispensible tool for researchers studying chloroplast genomes. The software is freely accessible from http://www.herbalgenomics.org/cpgavas. PMID:23256920

  14. Hepatitis C virus sequences from different patients confirm the existence and transmissibility of subtype 2q, a rare subtype circulating in the metropolitan area of Barcelona, Spain.

    Science.gov (United States)

    Martró, Elisa; Valero, Ana; Jordana-Lluch, Elena; Saludes, Verónica; Planas, Ramón; González-Candelas, Fernando; Ausina, Vicente; Bracho, Maria Alma

    2011-05-01

    The hepatitis C virus (HCV) has been classified into six genotypes and more than 70 subtypes with distinct geographical and epidemiological distributions. While 18 genotype 2 subtypes have been proposed, only 5 have had their complete sequence determined. The aim of this study was to characterize HCV isolates from three patients from the Barcelona metropolitan area of Spain for whom commercial genotyping methods provided discordant results. Full-length genome sequencing was carried out for 2 of the 3 patients; for the third patient only partial NS5B sequences could be obtained. The generated sequences were subjected to phylogenetic, recombination, and identity analyses. Sequences covering most of the HCV genome (9398 and 9566  nt in length) were obtained and showed a 90.3% identity to each other at the nucleotide level, while both sequences differed by 17.5-22.6% from the other fully sequenced genotype 2 subtypes. No evidence of recombination was found. The NS5B phylogenetic tree showed that sequences from the three patients cluster together with the only representative sequence of the provisionally designed 2q subtype, which also corresponds to a patient from Barcelona. Phylogenetic analysis of the full coding sequence showed that subtype 2q was more closely related to subtype 2k. The results obtained in this study suggest that subtype 2q now meets the requirements for confirmed designation status according to consensus criteria for HCV classification and nomenclature, and its epidemiological value is ensured as it has spread among several patients in the Barcelona metropolitan area. Copyright © 2011 Wiley-Liss, Inc.

  15. Full-length cloning and phylogenetic analyses of translationally controlled tumour protein and ferritin genes from the Indian white prawn, Fenneropenaeus indicus (H. Milne Edwards)

    Digital Repository Service at National Institute of Oceanography (India)

    Nayak, S.; Ramaiah, N.; Meena, R.M.; Sreepada, R.A.

    -length sequences of these immune-relevant genes, this study highlighted their conserved natures, which perhaps make them important defence-related proteins in the innate immune system of F. indicus....

  16. Phylogenetic analyses of the polyprotein coding sequences of serotype O foot-and-mouth disease viruses in East Africa: evidence for interserotypic recombination

    Directory of Open Access Journals (Sweden)

    Balinda Sheila N

    2010-08-01

    Full Text Available Abstract Background Foot-and-mouth disease (FMD is endemic in East Africa with the majority of the reported outbreaks attributed to serotype O virus. In this study, phylogenetic analyses of the polyprotein coding region of serotype O FMD viruses from Kenya and Uganda has been undertaken to infer evolutionary relationships and processes responsible for the generation and maintenance of diversity within this serotype. FMD virus RNA was obtained from six samples following virus isolation in cell culture and in one case by direct extraction from an oropharyngeal sample. Following RT-PCR, the single long open reading frame, encoding the polyprotein, was sequenced. Results Phylogenetic comparisons of the VP1 coding region showed that the recent East African viruses belong to one lineage within the EA-2 topotype while an older Kenyan strain, K/52/1992 is a representative of the topotype EA-1. Evolutionary relationships between the coding regions for the leader protease (L, the capsid region and almost the entire coding region are monophyletic except for the K/52/1992 which is distinct. Furthermore, phylogenetic relationships for the P2 and P3 regions suggest that the K/52/1992 is a probable recombinant between serotypes A and O. A bootscan analysis of K/52/1992 with East African FMD serotype A viruses (A21/KEN/1964 and A23/KEN/1965 and serotype O viral isolate (K/117/1999 revealed that the P2 region is probably derived from a serotype A strain while the P3 region appears to be a mosaic derived from both serotypes A and O. Conclusions Sequences of the VP1 coding region from recent serotype O FMDVs from Kenya and Uganda are all representatives of a specific East African lineage (topotype EA-2, a probable indication that hardly any FMD introductions of this serotype have occurred from outside the region in the recent past. Furthermore, evidence for interserotypic recombination, within the non-structural protein coding regions, between FMDVs of serotypes A

  17. The nucleotide sequence of satellite RNA in grapevine fanleaf virus, strain F13.

    Science.gov (United States)

    Fuchs, M; Pinck, M; Serghini, M A; Ravelonandro, M; Walter, B; Pinck, L

    1989-04-01

    The nucleotide sequence of cDNA copies of grapevine fanleaf virus (strain F13) satellite RNA has been determined. The primary structure obtained was 1114 nucleotides in length, excluding the poly(A) tail, and contained only one long open reading frame encoding a 341 residue, highly hydrophilic polypeptide of Mr37275. The coding sequence was bordered by a leader of 14 nucleotides and a 3'-terminal non-coding region of 74 nucleotides. No homology has been found with small satellite RNAs associated with other nepoviruses. Two limited homologies of eight nucleotides have been detected between the satellite RNA in grapevine fanleaf virus and those in tomato black ring virus, and a consensus sequence U.G/UGAAAAU/AU/AU/A at the 5' end of nepovirus RNAs is reported. A less extended consensus exists in this region in comovirus and picornavirus RNA.

  18. Next generation sequencing yields the complete mitochondrial genome of the largescale mullet, Liza macrolepis (Teleostei: Mugilidae).

    Science.gov (United States)

    Shen, Kang-Ning; Tsai, Shiou-Yi; Chen, Ching-Hung; Hsiao, Chung-Der; Durand, Jean-Dominique

    2016-11-01

    In this study, the complete mitogenome sequence of largescale mullet (Teleostei: Mugilidae) has been sequenced by the next-generation sequencing method. The assembled mitogenome, consisting of 16,832 bp, had the typical vertebrate mitochondrial gene arrangement, including 13 protein-coding genes, 22 transfer RNAs, two ribosomal RNAs genes, and a non-coding control region of D-loop. D-loop which has a length of 1094 bp is located between tRNA-Pro and tRNA-Phe. The overall base composition of largescale mullet is 27.8% for A, 30.1% for C, 16.2% for G, and 25.9% for T. The complete mitogenome may provide essential and important DNA molecular data for further phylogenetic and evolutionary analysis for Mugilidae.

  19. Next generation sequencing yields the complete mitochondrial genome of the Hornlip mullet Plicomugil labiosus (Teleostei: Mugilidae).

    Science.gov (United States)

    Shen, Kang-Ning; Chen, Ching-Hung; Hsiao, Chung-Der

    2016-05-01

    In this study, the complete mitogenome sequence of hornlip mullet Plicomugil labiosus (Teleostei: Mugilidae) has been sequenced by next-generation sequencing method. The assembled mitogenome, consisting of 16,829 bp, had the typical vertebrate mitochondrial gene arrangement, including 13 protein coding genes, 22 transfer RNAs, 2 ribosomal RNAs genes and a non-coding control region of D-loop. D-loop contains 1057 bp length is located between tRNA-Pro and tRNA-Phe. The overall base composition of P. labiosus is 28.0% for A, 29.3% for C, 15.5% for G and 27.2% for T. The complete mitogenome may provide essential and important DNA molecular data for further population, phylogenetic and evolutionary analysis for Mugilidae.

  20. DNA Barcoding through Quaternary LDPC Codes.

    Directory of Open Access Journals (Sweden)

    Elizabeth Tapia

    Full Text Available For many parallel applications of Next-Generation Sequencing (NGS technologies short barcodes able to accurately multiplex a large number of samples are demanded. To address these competitive requirements, the use of error-correcting codes is advised. Current barcoding systems are mostly built from short random error-correcting codes, a feature that strongly limits their multiplexing accuracy and experimental scalability. To overcome these problems on sequencing systems impaired by mismatch errors, the alternative use of binary BCH and pseudo-quaternary Hamming codes has been proposed. However, these codes either fail to provide a fine-scale with regard to size of barcodes (BCH or have intrinsic poor error correcting abilities (Hamming. Here, the design of barcodes from shortened binary BCH codes and quaternary Low Density Parity Check (LDPC codes is introduced. Simulation results show that although accurate barcoding systems of high multiplexing capacity can be obtained with any of these codes, using quaternary LDPC codes may be particularly advantageous due to the lower rates of read losses and undetected sample misidentification errors. Even at mismatch error rates of 10(-2 per base, 24-nt LDPC barcodes can be used to multiplex roughly 2000 samples with a sample misidentification error rate in the order of 10(-9 at the expense of a rate of read losses just in the order of 10(-6.

  1. Characterization of partial and near full-length genomes of HIV-1 strains sampled from recently infected individuals in São Paulo, Brazil.

    Directory of Open Access Journals (Sweden)

    Sabri Saeed Sanabani

    Full Text Available BACKGROUND: Genetic variability is a major feature of human immunodeficiency virus type 1 (HIV-1 and is considered the key factor frustrating efforts to halt the HIV epidemic. A proper understanding of HIV-1 genomic diversity is a fundamental prerequisite for proper epidemiology, genetic diagnosis, and successful drugs and vaccines design. Here, we report on the partial and near full-length genomic (NFLG variability of HIV-1 isolates from a well-characterized cohort of recently infected patients in São Paul, Brazil. METHODOLOGY: HIV-1 proviral DNA was extracted from the peripheral blood mononuclear cells of 113 participants. The NFLG and partial fragments were determined by overlapping nested PCR and direct sequencing. The data were phylogenetically analyzed. RESULTS: Of the 113 samples (90.3% male; median age 31 years; 79.6% homosexual men studied, 77 (68.1% NFLGs and 32 (29.3% partial fragments were successfully subtyped. Of the successfully subtyped sequences, 88 (80.7% were subtype B sequences, 12 (11% BF1 recombinants, 3 (2.8% subtype C sequences, 2 (1.8% BC recombinants and subclade F1 each, 1 (0.9% CRF02 AG, and 1 (0.9% CRF31 BC. Primary drug resistance mutations were observed in 14/101 (13.9% of samples, with 5.9% being resistant to protease inhibitors and nucleoside reverse transcriptase inhibitors (NRTI and 4.9% resistant to non-NRTIs. Predictions of viral tropism were determined for 86 individuals. X4 or X4 dual or mixed-tropic viruses (X4/DM were seen in 26 (30.2% of subjects. The proportion of X4 viruses in homosexuals was detected in 19/69 (27.5%. CONCLUSIONS: Our results confirm the existence of various HIV-1 subtypes circulating in São Paulo, and indicate that subtype B account for the majority of infections. Antiretroviral (ARV drug resistance is relatively common among recently infected patients. The proportion of X4 viruses in homosexuals was significantly higher than the proportion seen in other study populations.

  2. Nucleotide sequence of a cDNA coding for the barley seed protein CMa: an inhibitor of insect α-amylase

    DEFF Research Database (Denmark)

    Rasmussen, Søren Kjærsgård; Johansson, A.

    1992-01-01

    The primary structure of the insect alpha-amylase inhibitor CMa of barley seeds was deduced from a full-length cDNA clone pc43F6. Analysis of RNA from barley endosperm shows high levels 15 and 20 days after flowering. The cDNA predicts an amino acid sequence of 119 residues preceded by a signal...... peptide of 25 amino acids. Ala and Leu account for 55% of the signal peptide. CMa is 60-85% identical with alpha-amylase inhibitors of wheat, but shows less than 50% identity to trypsin inhibitors of barley and wheat. The 10 Cys residues are located in identical positions compared to the cereal inhibitor...

  3. dsRNA binding characterization of full length recombinant wild type and mutants Zaire ebolavirus VP35.

    Science.gov (United States)

    Zinzula, Luca; Esposito, Francesca; Pala, Daniela; Tramontano, Enzo

    2012-03-01

    The Ebola viruses (EBOVs) VP35 protein is a multifunctional major virulence factor involved in EBOVs replication and evasion of the host immune system. EBOV VP35 is an essential component of the viral RNA polymerase, it is a key participant of the nucleocapsid assembly and it inhibits the innate immune response by antagonizing RIG-I like receptors through its dsRNA binding function and, hence, by suppressing the host type I interferon (IFN) production. Insights into the VP35 dsRNA recognition have been recently revealed by structural and functional analysis performed on its C-terminus protein. We report the biochemical characterization of the Zaire ebolavirus (ZEBOV) full-length recombinant VP35 (rVP35)-dsRNA binding function. We established a novel in vitro magnetic dsRNA binding pull down assay, determined the rVP35 optimal dsRNA binding parameters, measured the rVP35 equilibrium dissociation constant for heterologous in vitro transcribed dsRNA of different length and short synthetic dsRNA of 8bp, and validated the assay for compound screening by assessing the inhibitory ability of auryntricarboxylic acid (IC(50) value of 50μg/mL). Furthermore, we compared the dsRNA binding properties of full length wt rVP35 with those of R305A, K309A and R312A rVP35 mutants, which were previously reported to be defective in dsRNA binding-mediated IFN inhibition, showing that the latter have measurably increased K(d) values for dsRNA binding and modified migration patterns in mobility shift assays with respect to wt rVP35. Overall, these results provide the first characterization of the full-length wt and mutants VP35-dsRNA binding functions. Copyright © 2012 Elsevier B.V. All rights reserved.

  4. Quench start localization in full-length SSC R ampersand D dipoles

    International Nuclear Information System (INIS)

    Devred, A.; Chapman, M.; Cortella, J.; Desportes, A.; Kaugerts, J.; Kirk, T.; Mirk, K.; Schermer, R.; Tompkins, J.C.; Turner, J.; Bleadon, M.; Brown, B.C.; Hanft, R.; Kuchnir, M.; Lamm, M.; Mantsch, P.; Mazur, P.O.; Orris, D.; Peoples, J.; Strait, J.; Tool, G.; Caspi, S.; Gilbert, W.; Meuser, R.; Peters, C.; Rechen, J.; Royet, J.; Scanlan, R.; Taylor, C.; Zbasnik, J.

    1989-04-01

    Full-length SSC R ampersand D dipole magnets instrumented with four voltage taps on each turn of the inner quarter coils have been tested. These voltage taps enable accurate location of the point at which the quenches start and detailed studies of quench development in the coil. Attention here is focused on localizing the quench source. After recalling the basic mechanism of a quench (why it occurs and how it propagates), the method of quench origin analysis is described: the quench propagation velocity on the turn where the quench occurs is calculated, and the quench location is then verified by reiterating the analysis on the adjacent turns. Last, the velocity value, which appears to be higher than previously measured, is discussed

  5. Application of the verona coding definitions of emotional sequences (VR-CoDES) on a pediatric data set.

    Science.gov (United States)

    Vatne, Torun M; Finset, Arnstein; Ørnes, Knut; Ruland, Cornelia M

    2010-09-01

    Adult patients present concerns as defined in the Verona Coding Definitions of Emotional Sequences (VR-CoDES), but we do not know how children express their concerns during medical consultations. This study aimed to evaluate the applicability of VR-CoDES to pediatric oncology consultations. Twenty-eight pediatric consultations were coded with the Verona Coding Definitions of Emotional Sequences (VR-CoDES), and the material was also qualitatively analyzed for descriptive purposes. Five consultations were randomly selected for reliability testing and descriptive statistics were computed. Perfect inter-rater reliability for concerns and moderate reliability for cues were obtained. Cues and/or concerns were present in over half of the consultations. Cues were more frequent than concerns, with the majority of cues being verbal hints to hidden concerns or non-verbal cues. Intensity of expressions, limitations in vocabulary, commonality of statements, and complexity of the setting complicated the use of VR-CoDES. Child-specific cues; use of the imperative, cues about past experiences, and use of onomatopoeia were observed. Children with cancer express concerns during medical consultations. VR-CoDES is a reliable tool for coding concerns in pediatric data sets. For future applications in pediatric settings an appendix should be developed to incorporate the child-specific traits. Copyright (c) 2010 Elsevier Ireland Ltd. All rights reserved.

  6. Five Complete Chloroplast Genome Sequences from Diospyros: Genome Organization and Comparative Analysis.

    Directory of Open Access Journals (Sweden)

    Jianmin Fu

    Full Text Available Diospyros is the largest genus in Ebenaceae, comprising more than 500 species with remarkable economic value, especially Diospyros kaki Thunb., which has traditionally been an important food resource in China, Korea, and Japan. Complete chloroplast (cp genomes from D. kaki, D. lotus L., D. oleifera Cheng., D. glaucifolia Metc., and Diospyros 'Jinzaoshi' were sequenced using Illumina sequencing technology. This is the first cp genome reported in Ebenaceae. The cp genome sequences of Diospyros ranged from 157,300 to 157,784 bp in length, presenting a typical quadripartite structure with two inverted repeats each separated by one large and one small single-copy region. For each cp genome, 134 genes were annotated, including 80 protein-coding, 31 tRNA, and 4 rRNA unique genes. In all, 179 repeats and 283 single sequence repeats were identified. Four hypervariable regions, namely, intergenic region of trnQ_rps16, trnV_ndhC, and psbD_trnT, and intron of ndhA, were identified in the Diospyros genomes. Phylogenetic analyses based on the whole cp genome, protein-coding, and intergenic and intron sequences indicated that D. oleifera is closely related to D. kaki and could be used as a model plant for future research on D. kaki; to our knowledge, this is proposed for the first time. Further, these analyses together with two large deletions (301 and 140 bp in the cp genome of D. 'Jinzaoshi', support its placement as a new species in Diospyros. Both maximum parsimony and likelihood analyses for 19 taxa indicated the basal position of Ericales in asterids and suggested that Ebenaceae is monophyletic in Ericales.

  7. MicroRNA-encoding long non-coding RNAs

    Directory of Open Access Journals (Sweden)

    Zhu Xiaopeng

    2008-05-01

    Full Text Available Abstract Background Recent analysis of the mouse transcriptional data has revealed the existence of ~34,000 messenger-like non-coding RNAs (ml-ncRNAs. Whereas the functional properties of these ml-ncRNAs are beginning to be unravelled, no functional information is available for the large majority of these transcripts. Results A few ml-ncRNA have been shown to have genomic loci that overlap with microRNA loci, leading us to suspect that a fraction of ml-ncRNA may encode microRNAs. We therefore developed an algorithm (PriMir for specifically detecting potential microRNA-encoding transcripts in the entire set of 34,030 mouse full-length ml-ncRNAs. In combination with mouse-rat sequence conservation, this algorithm detected 97 (80 of them were novel strong miRNA-encoding candidates, and for 52 of these we obtained experimental evidence for the existence of their corresponding mature microRNA by microarray and stem-loop RT-PCR. Sequence analysis of the microRNA-encoding RNAs revealed an internal motif, whose presence correlates strongly (R2 = 0.9, P-value = 2.2 × 10-16 with the occurrence of stem-loops with characteristics of known pre-miRNAs, indicating the presence of a larger number microRNA-encoding RNAs (from 300 up to 800 in the ml-ncRNAs population. Conclusion Our work highlights a unique group of ml-ncRNAs and offers clues to their functions.

  8. The full-length form of the Drosophila amyloid precursor protein is involved in memory formation.

    Science.gov (United States)

    Bourdet, Isabelle; Preat, Thomas; Goguel, Valérie

    2015-01-21

    The APP plays a central role in AD, a pathology that first manifests as a memory decline. Understanding the role of APP in normal cognition is fundamental in understanding the progression of AD, and mammalian studies have pointed to a role of secreted APPα in memory. In Drosophila, we recently showed that APPL, the fly APP ortholog, is required for associative memory. In the present study, we aimed to characterize which form of APPL is involved in this process. We show that expression of a secreted-APPL form in the mushroom bodies, the center for olfactory memory, is able to rescue the memory deficit caused by APPL partial loss of function. We next assessed the impact on memory of the Drosophila α-secretase kuzbanian (KUZ), the enzyme initiating the nonamyloidogenic pathway that produces secreted APPLα. Strikingly, KUZ overexpression not only failed to rescue the memory deficit caused by APPL loss of function, it exacerbated this deficit. We further show that in addition to an increase in secreted-APPL forms, KUZ overexpression caused a decrease of membrane-bound full-length species that could explain the memory deficit. Indeed, we observed that transient expression of a constitutive membrane-bound mutant APPL form is sufficient to rescue the memory deficit caused by APPL reduction, revealing for the first time a role of full-length APPL in memory formation. Our data demonstrate that, in addition to secreted APPL, the noncleaved form is involved in memory, raising the possibility that secreted and full-length APPL act together in memory processes. Copyright © 2015 the authors 0270-6474/15/351043-09$15.00/0.

  9. Molecular cloning and characterization of the full-length cDNA encoding the tree shrew (tupaia belangeri) CD28

    Science.gov (United States)

    Huang, Xiaoyan; Yan, Yan; Wang, Sha; Wang, Qinying; Shi, Jian; Shao, Zhanshe; Dai, Jiejie

    2017-11-01

    CD28 is one of the most important co-stimulatory molecules expressed by naive and primed T cells. The tree shrews (Tupaia belangeri), as an ideal animal model for analyzing mechanism of human diseases receiving extensive attentions, demands essential research tools, in particular in the study of cellular markers and monoclonal antibodies for immunological studies. However, little is known about tree shrew CD28 (tsCD28) until now. In this study, a 663 bp of the full-length CD28 cDNA, encoding a polypeptide of 220 amino acids was cloned from tree shrew spleen lymphocytes. The nucleotide sequence of the tsCD28 showed 85%, 76%, and 75% similarities with human, rat, and mouse, respectively, which showed the affinity relationship between tree shrew and human is much closer than between human and rodents. The open reading frame (ORF) sequence of tsCD28 gene was predicted to be in correspondence with the signal sequence, immunoglobulin variable-like (IgV) domain, transmembrane domain and cytoplasmic tail, respectively.We also analyzed its molecular characteristics with other mammals by using biology software such as Clustal W 2.0 and so forth. Our results showed that tsCD28 contained many features conserved in CD28 genes from other mammals, including conserved signal peptide and glycosylation sites, and several residues responsible for binding to the CD28R, and the tsCD28 amino acid sequence were found a close genetic relationship with human and monkey. The crystal structure and surface charge revealed most regions of tree shrew CD28 molecule surface charges are similar as human. However, compared with human CD28 (hCD28) regions, in some areas, the surface positive charge of tsCD28 was less than hCD28, which may affect antibody binding. The present study is the first report of cloning and characterization of CD28 in tree shrew. This study provides a theoretical basis for the further study the structure and function of tree shrew CD28 and utilize tree shrew as an effective

  10. Photoluminescence Enhancement of Poly(3-methylthiophene Nanowires upon Length Variable DNA Hybridization

    Directory of Open Access Journals (Sweden)

    Jingyuan Huang

    2018-01-01

    Full Text Available The use of low-dimensional inorganic or organic nanomaterials has advantages for DNA and protein recognition due to their sensitivity, accuracy, and physical size matching. In this research, poly(3-methylthiophene (P3MT nanowires (NWs are electrochemically prepared with dopant followed by functionalization with probe DNA (pDNA sequence through electrostatic interaction. Various lengths of pDNA sequences (10-, 20- and 30-mer are conjugated to the P3MT NWs respectively followed with hybridization with their complementary target DNA (tDNA sequences. The nanoscale photoluminescence (PL properties of the P3MT NWs are studied throughout the whole process at solid state. In addition, the correlation between the PL enhancement and the double helix DNA with various lengths is demonstrated.

  11. Pharmacological efficacy of anti-IL-1β scFv, Fab and full-length antibodies in treatment of rheumatoid arthritis.

    Science.gov (United States)

    Qi, Jianying; Ye, Xianlong; Ren, Guiping; Kan, Fangming; Zhang, Yu; Guo, Mo; Zhang, Zhiyi; Li, Deshan

    2014-02-01

    Rheumatoid arthritis (RA) is a chronic autoimmune inflammatory disease that mainly causes the synovial joint inflammation and cartilage destruction. Interleukin-1β (IL-1β) is an important proinflammatory cytokine involved in the pathogenesis of RA. In this study, we constructed and expressed anti-IL-1β-full-length antibody in CHO-K1-SV, anti-IL-1β-Fab and anti-IL-1β-scFv in Rosetta. We compared the therapeutic efficacy of three anti-IL-1β antibodies for CIA mice. Mice with CIA were subcutaneously injected with humanized anti-IL-1β-scFv, anti-IL-1β-Fab or anti-IL-1β-full-length antibody. The effects of treatment were determined by arthritis severity score, autoreactive humoral, cellular immune responses, histological lesion and cytokines production. Compared with anti-IL-1β-scFv treatments, anti-IL-1β-Fab and anti-IL-1β-full-length antibody therapy resulted in more significant effect in alleviating the severity of arthritis by preventing bone damage and cartilage destruction, reducing humoral and cellular immune responses, and down-regulating the expression of IL-1β, IL-6, IL-2, IFN-γ, TNF-α and MMP-3 in inflammatory tissue. The therapeutic effects of anti-IL-1β-Fab and anti-IL-1β-full-length antibodies on CIA mice had no significant difference. However, production of anti-IL-1β-full-length antibody in eukaryotic system is, in general, time-consuming and more expensive than that of anti-IL-1β-Fab in prokaryotic systems. In conclusion, as a small molecule antibody, anti-IL-1β-Fab is an ideal candidate for RA therapy. Copyright © 2013 Elsevier Ltd. All rights reserved.

  12. Sequence Selection and Performance in DS/CDMA Systems

    Directory of Open Access Journals (Sweden)

    Jefferson Santos Ambrosio

    2016-03-01

    Full Text Available In this work key concepts on coding division multiple access (CDMA communication systems have been discussed. The sequence selection impact on the performance and capacity of direct sequence CDMA (DS/CDMA systems under AWGN and increasing system loading, as well as under multiple antennas channels was investigated.

  13. The small FOXP1 isoform predominantly expressed in activated B cell-like diffuse large B-cell lymphoma and full-length FOXP1 exert similar oncogenic and transcriptional activity in human B cells.

    Science.gov (United States)

    van Keimpema, Martine; Grüneberg, Leonie J; Schilder-Tol, Esther J M; Oud, Monique E C M; Beuling, Esther A; Hensbergen, Paul J; de Jong, Johann; Pals, Steven T; Spaargaren, Marcel

    2017-03-01

    The forkhead transcription factor FOXP1 is generally regarded as an oncogene in activated B cell-like diffuse large B-cell lymphoma. Previous studies have suggested that a small isoform of FOXP1 rather than full-length FOXP1, may possess this oncogenic activity. Corroborating those studies, we herein show that activated B cell-like diffuse large B-cell lymphoma cell lines and primary activated B cell-like diffuse large B-cell lymphoma cells predominantly express a small FOXP1 isoform, and that the 5'-end of the Foxp1 gene is a common insertion site in murine lymphomas in leukemia virus- and transposon-mediated insertional mutagenesis screens. By combined mass spectrometry, (quantative) reverse transcription polymerase chain reaction/sequencing, and small interfering ribonucleic acid-mediated gene silencing, we determined that the small FOXP1 isoform predominantly expressed in activated B cell-like diffuse large B-cell lymphoma lacks the N-terminal 100 amino acids of full-length FOXP1. Aberrant overexpression of this FOXP1 isoform (ΔN100) in primary human B cells revealed its oncogenic capacity; it repressed apoptosis and plasma cell differentiation. However, no difference in potency was found between this small FOXP1 isoform and full-length FOXP1. Furthermore, overexpression of full-length FOXP1 or this small FOXP1 isoform in primary B cells and diffuse large B-cell lymphoma cell lines resulted in similar gene regulation. Taken together, our data indicate that this small FOXP1 isoform and full-length FOXP1 have comparable oncogenic and transcriptional activity in human B cells, suggesting that aberrant expression or overexpression of FOXP1, irrespective of the specific isoform, contributes to lymphomagenesis. These novel insights further enhance the value of FOXP1 for the diagnostics, prognostics, and treatment of diffuse large B-cell lymphoma patients. Copyright© Ferrata Storti Foundation.

  14. Some Families of Asymmetric Quantum MDS Codes Constructed from Constacyclic Codes

    Science.gov (United States)

    Huang, Yuanyuan; Chen, Jianzhang; Feng, Chunhui; Chen, Riqing

    2018-02-01

    Quantum maximal-distance-separable (MDS) codes that satisfy quantum Singleton bound with different lengths have been constructed by some researchers. In this paper, seven families of asymmetric quantum MDS codes are constructed by using constacyclic codes. We weaken the case of Hermitian-dual containing codes that can be applied to construct asymmetric quantum MDS codes with parameters [[n,k,dz/dx

  15. HIV1 V3 loop hypermutability is enhanced by the guanine usage bias in the part of env gene coding for it.

    Science.gov (United States)

    Khrustalev, Vladislav Victorovich

    2009-01-01

    Guanine is the most mutable nucleotide in HIV genes because of frequently occurring G to A transitions, which are caused by cytosine deamination in viral DNA minus strands catalyzed by APOBEC enzymes. Distribution of guanine between three codon positions should influence the probability for G to A mutation to be nonsynonymous (to occur in first or second codon position). We discovered that nucleotide sequences of env genes coding for third variable regions (V3 loops) of gp120 from HIV1 and HIV2 have different kinds of guanine usage biases. In the HIV1 reference strain and 100 additionally analyzed HIV1 strains the guanine usage bias in V3 loop coding regions (2G>1G>3G) should lead to elevated nonsynonymous G to A transitions occurrence rates. In the HIV2 reference strain and 100 other HIV2 strains guanine usage bias in V3 loop coding regions (3G>2G>1G) should protect V3 loops from hypermutability. According to the HIV1 and HIV2 V3 alignment, insertion of the sequence enriched with 2G (21 codons in length) occurred during the evolution of HIV1 predecessor, while insertion of the different sequence enriched with 3G (19 codons in length) occurred during the evolution of HIV2 predecessor. The higher is the level of 3G in the V3 coding region, the lower should be the immune escaping mutation occurrence rates. This hypothesis was tested in this study by comparing the guanine usage in V3 loop coding regions from HIV1 fast and slow progressors. All calculations have been performed by our algorithms "VVK In length", "VVK Dinucleotides" and "VVK Consensus" (www.barkovsky.hotmail.ru).

  16. Natural type 3/type 2 intertypic vaccine-related poliovirus recombinants with the first crossover sites within the VP1 capsid coding region.

    Directory of Open Access Journals (Sweden)

    Yong Zhang

    Full Text Available BACKGROUND: Ten uncommon natural type 3/type 2 intertypic poliovirus recombinants were isolated from stool specimens from nine acute flaccid paralysis case patients and one healthy vaccinee in China from 2001 to 2008. PRINCIPAL FINDINGS: Complete genomic sequences revealed their vaccine-related genomic features and showed that their first crossover sites were randomly distributed in the 3' end of the VP1 coding region. The length of donor Sabin 2 sequences ranged from 55 to 136 nucleotides, which is the longest donor sequence reported in the literature for this type of poliovirus recombination. The recombination resulted in the introduction of Sabin 2 neutralizing antigenic site 3a (NAg3a into a Sabin 3 genomic background in the VP1 coding region, which may have been altered by some of the type 3-specific antigenic properties, but had not acquired any type 2-specific characterizations. NAg3a of the Sabin 3 strain seems atypical; other wild-type poliovirus isolates that have circulated in recent years have sequences of NAg3a more like the Sabin 2 strain. CONCLUSIONS: 10 natural type 3/type 2 intertypic VP1 capsid-recombinant polioviruses, in which the first crossover sites were found to be in the VP1 coding region, were isolated and characterized. In spite of the complete replacement of NAg3a by type 2-specific amino acids, the serotypes of the recombinants were not altered, and they were totally neutralized by polyclonal type 3 antisera but not at all by type 2 antisera. It is possible that recent type 3 wild poliovirus isolates may be a recombinant having NAg3a sequences derived from another strain during between 1967 and 1980, and the type 3/type 2 recombination events in the 3' end of the VP1 coding region may result in a higher fitness.

  17. Utility of QR codes in biological collections

    Directory of Open Access Journals (Sweden)

    Mauricio Diazgranados

    2013-07-01

    Full Text Available The popularity of QR codes for encoding information such as URIs has increased exponentially in step with the technological advances and availability of smartphones, digital tablets, and other electronic devices. We propose using QR codes on specimens in biological collections to facilitate linking vouchers’ electronic information with their associated collections. QR codes can efficiently provide such links for connecting collections, photographs, maps, ecosystem notes, citations, and even GenBank sequences. QR codes have numerous advantages over barcodes, including their small size, superior security mechanisms, increased complexity and quantity of information, and low implementation cost. The scope of this paper is to initiate an academic discussion about using QR codes on specimens in biological collections.

  18. The Complete Chloroplast and Mitochondrial Genome Sequences of Boea hygrometrica: Insights into the Evolution of Plant Organellar Genomes

    Science.gov (United States)

    Wang, Xumin; Deng, Xin; Zhang, Xiaowei; Hu, Songnian; Yu, Jun

    2012-01-01

    The complete nucleotide sequences of the chloroplast (cp) and mitochondrial (mt) genomes of resurrection plant Boea hygrometrica (Bh, Gesneriaceae) have been determined with the lengths of 153,493 bp and 510,519 bp, respectively. The smaller chloroplast genome contains more genes (147) with a 72% coding sequence, and the larger mitochondrial genome have less genes (65) with a coding faction of 12%. Similar to other seed plants, the Bh cp genome has a typical quadripartite organization with a conserved gene in each region. The Bh mt genome has three recombinant sequence repeats of 222 bp, 843 bp, and 1474 bp in length, which divide the genome into a single master circle (MC) and four isomeric molecules. Compared to other angiosperms, one remarkable feature of the Bh mt genome is the frequent transfer of genetic material from the cp genome during recent Bh evolution. We also analyzed organellar genome evolution in general regarding genome features as well as compositional dynamics of sequence and gene structure/organization, providing clues for the understanding of the evolution of organellar genomes in plants. The cp-derived sequences including tRNAs found in angiosperm mt genomes support the conclusion that frequent gene transfer events may have begun early in the land plant lineage. PMID:22291979

  19. Isolation and sequencing of a cDNA coding for the human DF3 breast carcinoma-associated antigen

    International Nuclear Information System (INIS)

    Siddiqui, J.; Abe, M.; Hayes, D.; Shani, E.; Yunis, E.; Kufe, D.

    1988-01-01

    The murine monoclonal antibody (mAb) DF3 reacts with a high molecular weight glycoprotein detectable in human breast carcinomas. DF3 antigen expression correlates with human breast tumor differentiation, and the detection of a cross-reactive species in human milk has suggested that this antigen might be useful as a marker of differentiated mammary epithelium. To further characterize DF3 antigen expression, the authors have isolated a cDNA clone from a λgt11 library by screening with mAb DF3. The results demonstrate that this 309-base-pair cDNA, designated pDF9.3, codes for the DF3 epitope. Southern blot analyses of EcoRI-digested DNAs from six human tumor cell lines with 32 P-labeled pDF9.3 have revealed a restriction fragment length polymorphism. Variations in size of the alleles detected by pDF9.3 were also identified in Pst I, but not in HindIII, DNA digests. Furthermore, hybridization of 32 P-labeled pDF9.3 with total cellular RNA from each of these cell lines demonstrated either one or two transcripts that varied from 4.1 to 7.1 kilobases in size. The presence of differently sized transcripts detected by pDF9.3 was also found to correspond with the polymorphic expression of DF3 glycoproteins. Nucleotide sequence analysis of pDF9.3 has revealed a highly conserved (G + C)-rich 60-base-pair tandem repeat. These findings suggest that the variation in size of alleles coding for the polymorphic DF3 glycoprotein may represent different numbers of repeats

  20. On Coding Non-Contiguous Letter Combinations

    Directory of Open Access Journals (Sweden)

    Frédéric eDandurand

    2011-06-01

    Full Text Available Starting from the hypothesis that printed word identification initially involves the parallel mapping of visual features onto location-specific letter identities, we analyze the type of information that would be involved in optimally mapping this location-specific orthographic code onto a location-invariant lexical code. We assume that some intermediate level of coding exists between individual letters and whole words, and that this involves the representation of letter combinations. We then investigate the nature of this intermediate level of coding given the constraints of optimality. This intermediate level of coding is expected to compress data while retaining as much information as possible about word identity. Information conveyed by letters is a function of how much they constrain word identity and how visible they are. Optimization of this coding is a combination of minimizing resources (using the most compact representations and maximizing information. We show that in a large proportion of cases, non-contiguous letter sequences contain more information than contiguous sequences, while at the same time requiring less precise coding. Moreover, we found that the best predictor of human performance in orthographic priming experiments was within-word ranking of conditional probabilities, rather than average conditional probabilities. We conclude that from an optimality perspective, readers learn to select certain contiguous and non-contiguous letter combinations as information that provides the best cue to word identity.

  1. Fractal MapReduce decomposition of sequence alignment

    Directory of Open Access Journals (Sweden)

    Almeida Jonas S

    2012-05-01

    Full Text Available Abstract Background The dramatic fall in the cost of genomic sequencing, and the increasing convenience of distributed cloud computing resources, positions the MapReduce coding pattern as a cornerstone of scalable bioinformatics algorithm development. In some cases an algorithm will find a natural distribution via use of map functions to process vectorized components, followed by a reduce of aggregate intermediate results. However, for some data analysis procedures such as sequence analysis, a more fundamental reformulation may be required. Results In this report we describe a solution to sequence comparison that can be thoroughly decomposed into multiple rounds of map and reduce operations. The route taken makes use of iterated maps, a fractal analysis technique, that has been found to provide a "alignment-free" solution to sequence analysis and comparison. That is, a solution that does not require dynamic programming, relying on a numeric Chaos Game Representation (CGR data structure. This claim is demonstrated in this report by calculating the length of the longest similar segment by inspecting only the USM coordinates of two analogous units: with no resort to dynamic programming. Conclusions The procedure described is an attempt at extreme decomposition and parallelization of sequence alignment in anticipation of a volume of genomic sequence data that cannot be met by current algorithmic frameworks. The solution found is delivered with a browser-based application (webApp, highlighting the browser's emergence as an environment for high performance distributed computing. Availability Public distribution of accompanying software library with open source and version control at http://usm.github.com. Also available as a webApp through Google Chrome's WebStore http://chrome.google.com/webstore: search with "usm".

  2. Cocrystallization studies of full-length recombinant butyrylcholinesterase (BChE) with cocaine

    Energy Technology Data Exchange (ETDEWEB)

    Asojo, Oluwatoyin Ajibola; Asojo, Oluyomi Adebola; Ngamelue, Michelle N.; Homma, Kohei; Lockridge, Oksana (Nebraska-Med)

    2011-09-16

    Human butyrylcholinesterase (BChE; EC 3.1.1.8) is a 340 kDa tetrameric glycoprotein that is present in human serum at about 5 mg l{sup -1} and has well documented therapeutic effects on cocaine toxicity. BChE holds promise as a therapeutic that reduces and finally eliminates the rewarding effects of cocaine, thus weaning an addict from the drug. There have been extensive computational studies of cocaine hydrolysis by BChE. Since there are no reported structures of BChE with cocaine or any of the hydrolysis products, full-length monomeric recombinant wild-type BChE was cocrystallized with cocaine. The refined 3 {angstrom} resolution structure appears to retain the hydrolysis product benzoic acid in sufficient proximity to form a hydrogen bond to the active-site Ser198.

  3. Vector Network Coding

    OpenAIRE

    Ebrahimi, Javad; Fragouli, Christina

    2010-01-01

    We develop new algebraic algorithms for scalar and vector network coding. In vector network coding, the source multicasts information by transmitting vectors of length L, while intermediate nodes process and combine their incoming packets by multiplying them with L X L coding matrices that play a similar role as coding coefficients in scalar coding. Our algorithms for scalar network jointly optimize the employed field size while selecting the coding coefficients. Similarly, for vector co...

  4. Molecular Cloning and Characterization of Full-Length cDNA of Calmodulin Gene from Pacific Oyster Crassostrea gigas.

    Science.gov (United States)

    Li, Xing-Xia; Yu, Wen-Chao; Cai, Zhong-Qiang; He, Cheng; Wei, Na; Wang, Xiao-Tong; Yue, Xi-Qing

    2016-01-01

    The shell of the pearl oyster ( Pinctada fucata ) mainly comprises aragonite whereas that of the Pacific oyster ( Crassostrea gigas ) is mainly calcite, thereby suggesting the different mechanisms of shell formation between above two mollusks. Calmodulin (CaM) is an important gene for regulating the uptake, transport, and secretion of calcium during the process of shell formation in pearl oyster. It is interesting to characterize the CaM in oysters, which could facilitate the understanding of the different shell formation mechanisms among mollusks. We cloned the full-length cDNA of Pacific oyster CaM (cgCaM) and found that the cgCaM ORF encoded a peptide of 113 amino acids containing three EF-hand calcium-binding domains, its expression level was highest in the mantle, hinting that the cgCaM gene is probably involved in shell formation of Pacific oyster, and the common ancestor of Gastropoda and Bivalvia may possess at least three CaM genes. We also found that the numbers of some EF hand family members in highly calcified species were higher than those in lowly calcified species and the numbers of these motifs in oyster genome were the highest among the mollusk species with whole genome sequence, further hinting the correlation between CaM and biomineralization.

  5. Sequence determinants of human microsatellite variability

    Directory of Open Access Journals (Sweden)

    Jakobsson Mattias

    2009-12-01

    Full Text Available Abstract Background Microsatellite loci are frequently used in genomic studies of DNA sequence repeats and in population studies of genetic variability. To investigate the effect of sequence properties of microsatellites on their level of variability we have analyzed genotypes at 627 microsatellite loci in 1,048 worldwide individuals from the HGDP-CEPH cell line panel together with the DNA sequences of these microsatellites in the human RefSeq database. Results Calibrating PCR fragment lengths in individual genotypes by using the RefSeq sequence enabled us to infer repeat number in the HGDP-CEPH dataset and to calculate the mean number of repeats (as opposed to the mean PCR fragment length, under the assumption that differences in PCR fragment length reflect differences in the numbers of repeats in the embedded repeat sequences. We find the mean and maximum numbers of repeats across individuals to be positively correlated with heterozygosity. The size and composition of the repeat unit of a microsatellite are also important factors in predicting heterozygosity, with tetra-nucleotide repeat units high in G/C content leading to higher heterozygosity. Finally, we find that microsatellites containing more separate sets of repeated motifs generally have higher heterozygosity. Conclusions These results suggest that sequence properties of microsatellites have a significant impact in determining the features of human microsatellite variability.

  6. Complete mitochondrial DNA sequence of the Eastern keelback mullet Liza affinis.

    Science.gov (United States)

    Gong, Xiaoling; Zhu, Wenjia; Bao, Baolong

    2016-05-01

    Eastern keelback mullet (Liza affinis) inhabits inlet waters and estuaries of rivers. In this paper, we initially determined the complete mitochondrial genome of Liza affinis. The entire mtDNA sequence is 16,831 bp in length, including 2 rRNA genes, 22 tRNA genes, 13 protein-coding genes and 1 putative control region. Its order and numbers of genes are similar to most bony fishes.

  7. Consensus coding sequence (CCDS) database: a standardized set of human and mouse protein-coding regions supported by expert curation.

    Science.gov (United States)

    Pujar, Shashikant; O'Leary, Nuala A; Farrell, Catherine M; Loveland, Jane E; Mudge, Jonathan M; Wallin, Craig; Girón, Carlos G; Diekhans, Mark; Barnes, If; Bennett, Ruth; Berry, Andrew E; Cox, Eric; Davidson, Claire; Goldfarb, Tamara; Gonzalez, Jose M; Hunt, Toby; Jackson, John; Joardar, Vinita; Kay, Mike P; Kodali, Vamsi K; Martin, Fergal J; McAndrews, Monica; McGarvey, Kelly M; Murphy, Michael; Rajput, Bhanu; Rangwala, Sanjida H; Riddick, Lillian D; Seal, Ruth L; Suner, Marie-Marthe; Webb, David; Zhu, Sophia; Aken, Bronwen L; Bruford, Elspeth A; Bult, Carol J; Frankish, Adam; Murphy, Terence; Pruitt, Kim D

    2018-01-04

    The Consensus Coding Sequence (CCDS) project provides a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assembly in genome annotations produced independently by NCBI and the Ensembl group at EMBL-EBI. This dataset is the product of an international collaboration that includes NCBI, Ensembl, HUGO Gene Nomenclature Committee, Mouse Genome Informatics and University of California, Santa Cruz. Identically annotated coding regions, which are generated using an automated pipeline and pass multiple quality assurance checks, are assigned a stable and tracked identifier (CCDS ID). Additionally, coordinated manual review by expert curators from the CCDS collaboration helps in maintaining the integrity and high quality of the dataset. The CCDS data are available through an interactive web page (https://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi) and an FTP site (ftp://ftp.ncbi.nlm.nih.gov/pub/CCDS/). In this paper, we outline the ongoing work, growth and stability of the CCDS dataset and provide updates on new collaboration members and new features added to the CCDS user interface. We also present expert curation scenarios, with specific examples highlighting the importance of an accurate reference genome assembly and the crucial role played by input from the research community. Published by Oxford University Press on behalf of Nucleic Acids Research 2017.

  8. Failure Mode and Effects Analysis (FMEA) of the solid state full length rod control system

    International Nuclear Information System (INIS)

    Shopsky, W.E.

    1977-01-01

    The Full Length Rod Control System (FLRCS) controls the power to the rod drive mechanisms for rod movement in response to signals received from the Reactor Control System or from signals generated through Reactor Operator action. Rod movement is used to control reactivity of the reactor during plant operation. The Full Length Rod Control System is designed to perform its reactivity control function in conjunction with the Reactor Control and Protection System, to maintain the reactor core within design safety limits. By the use of a Failure Mode and Effects Analysis, it is shown that the FLRCS will perform its reactivity control functions considering the loss of single active components. That is, sufficient fault limiting control circuits are provided which blocks control rod movement and/or indicates presence of a fault condition at the Control Board. Reactor operator action or automatic reactor trip will thus mitigate the consequences of potential failure of the FLRCS. The analysis also qualitatively demonstrates the reliability of the FLRCS to perform its intended function

  9. A transcriptome atlas of rabbit revealed by PacBio single-molecule long-read sequencing.

    Science.gov (United States)

    Chen, Shi-Yi; Deng, Feilong; Jia, Xianbo; Li, Cao; Lai, Song-Jia

    2017-08-09

    It is widely acknowledged that transcriptional diversity largely contributes to biological regulation in eukaryotes. Since the advent of second-generation sequencing technologies, a large number of RNA sequencing studies have considerably improved our understanding of transcriptome complexity. However, it still remains a huge challenge for obtaining full-length transcripts because of difficulties in the short read-based assembly. In the present study we employ PacBio single-molecule long-read sequencing technology for whole-transcriptome profiling in rabbit (Oryctolagus cuniculus). We totally obtain 36,186 high-confidence transcripts from 14,474 genic loci, among which more than 23% of genic loci and 66% of isoforms have not been annotated yet within the current reference genome. Furthermore, about 17% of transcripts are computationally revealed to be non-coding RNAs. Up to 24,797 alternative splicing (AS) and 11,184 alternative polyadenylation (APA) events are detected within this de novo constructed transcriptome, respectively. The results provide a comprehensive set of reference transcripts and hence contribute to the improved annotation of rabbit genome.

  10. Length and repeat-sequence variation in 58 STRs and 94 SNPs in two Spanish populations.

    Science.gov (United States)

    Casals, Ferran; Anglada, Roger; Bonet, Núria; Rasal, Raquel; van der Gaag, Kristiaan J; Hoogenboom, Jerry; Solé-Morata, Neus; Comas, David; Calafell, Francesc

    2017-09-01

    We have genotyped the 58 STRs (27 autosomal, 24 Y-STRs and 7 X-STRs) and 94 autosomal SNPs in Illumina ForenSeq™ Primer Mix A in 88 Spanish Roma (Gypsy) samples and 143 Catalans. Since this platform is based in massive parallel sequencing, we have used simple R scripts to uncover the sequence variation in the repeat region. Thus, we have found, across 58 STRs, 541 length-based alleles, which, after considering repeat-sequence variation, became 804 different alleles. All loci in both populations were in Hardy-Weinberg equilibrium. F ST between both populations was 0.0178 for autosomal SNPs, 0.0146 for autosomal STRs, 0.0101 for X-STRs and 0.1866 for Y-STRs. Combined a priori statistics showed quite large; for instance, pooling all the autosomal loci, the a priori probabilities of discriminating a suspect become 1-(2.3×10 -70 ) and 1-(5.9×10 -73 ), for Roma and Catalans respectively, and the chances of excluding a false father in a trio are 1-(2.6×10 -20 ) and 1-(2.0×10 -21 ). Copyright © 2017 Elsevier B.V. All rights reserved.

  11. Performance of FSO-OFDM based on BCH code

    Directory of Open Access Journals (Sweden)

    Jiao Xiao-lu

    2016-01-01

    Full Text Available As contrasted with the traditional OOK (on-off key system, FSO-OFDM system can resist the atmospheric scattering and improve the spectrum utilization rate effectively. Due to the instability of the atmospheric channel, the system will be affected by various factors, and resulting in a high BER. BCH code has a good error correcting ability, particularly in the short-length and medium-length code, and its performance is close to the theoretical value. It not only can check the burst errors but also can correct the random errors. Therefore, the BCH code is applied to the system to reduce the system BER. At last, the semi-physical simulation has been conducted with MATLAB. The simulation results show that when the BER is 10-2, the performance of OFDM is superior 4dB compared with OOK. In different weather conditions (extension rain, advection fog, dust days, when the BER is 10-5, the performance of BCH (255,191 channel coding is superior 4~5dB compared with uncoded system. All in all, OFDM technology and BCH code can reduce the system BER.

  12. Complete sequence of Tvv1, a family of Ty 1 copia-like retrotransposons of Vitis vinifera L., reconstituted by chromosome walking.

    Science.gov (United States)

    Pelsy, F.; Merdinoglu, D.

    2002-09-01

    A chromosome-walking strategy was used to sequence and characterize retrotransposons in the grapevine genome. The reconstitution of a family of retroelements, named Tvv1, was achieved by six successive steps. These elements share a single, highly conserved open reading frame 4,153 nucleotides-long, putatively encoding the gag, pro, int, rt and rh proteins. Comparison of the Tvv1 open reading frame coding potential with those of drosophila copia and tobacco Tnt1, revealed that Tvv1 is closely related to Ty 1 copia-like retrotransposons. A highly variable untranslated leader region, upstream of the open reading frame, allowed us to differentiate Tvv1 variants, which represent a family of at least 28 copies, in varying sizes. This internal region is flanked by two long terminal repeats in direct orientation, sized between 149 and 157 bp. Among elements theoretically sized from 4,970 to 5,550 bp, we describe the full-length sequence of a reference element Tvv1-1, 5,343 nucleotides-long. The full-length sequence of Tvv1-1 compared to pea PDR1 shows a 53.3% identity. In addition, both elements contain long terminal repeats of nearly the same size in which the U5 region could be entirely absent. Therefore, we assume that Tvv1 and PDR1 could constitute a particular class of short LTRs retroelements.

  13. A Novel Strategy to Engineer Pre-Vascularized Full-Length Dental Pulp-like Tissue Constructs.

    Science.gov (United States)

    Athirasala, Avathamsa; Lins, Fernanda; Tahayeri, Anthony; Hinds, Monica; Smith, Anthony J; Sedgley, Christine; Ferracane, Jack; Bertassoni, Luiz E

    2017-06-12

    The requirement for immediate vascularization of engineered dental pulp poses a major hurdle towards successful implementation of pulp regeneration as an effective therapeutic strategy for root canal therapy, especially in adult teeth. Here, we demonstrate a novel strategy to engineer pre-vascularized, cell-laden hydrogel pulp-like tissue constructs in full-length root canals for dental pulp regeneration. We utilized gelatin methacryloyl (GelMA) hydrogels with tunable physical and mechanical properties to determine the microenvironmental conditions (microstructure, degradation, swelling and elastic modulus) that enhanced viability, spreading and proliferation of encapsulated odontoblast-like cells (OD21), and the formation of endothelial monolayers by endothelial colony forming cells (ECFCs). GelMA hydrogels with higher polymer concentration (15% w/v) and stiffness enhanced OD21 cell viability, spreading and proliferation, as well as endothelial cell spreading and monolayer formation. We then fabricated pre-vascularized, full-length, dental pulp-like tissue constructs by dispensing OD21 cell-laden GelMA hydrogel prepolymer in root canals of extracted teeth and fabricating 500 µm channels throughout the root canals. ECFCs seeded into the microchannels successfully formed monolayers and underwent angiogenic sprouting within 7 days in culture. In summary, the proposed approach is a simple and effective strategy for engineering of pre-vascularized dental pulp constructs offering potentially beneficial translational outcomes.

  14. The complete chloroplast genome sequence of strawberry (Fragaria  × ananassa Duch. and comparison with related species of Rosaceae

    Directory of Open Access Journals (Sweden)

    Hui Cheng

    2017-10-01

    Full Text Available Compared with other members of the family Rosaceae, the chloroplast genomes of Fragaria species exhibit low variation, and this situation has limited phylogenetic analyses; thus, complete chloroplast genome sequencing of Fragaria species is needed. In this study, we sequenced the complete chloroplast genome of F. × ananassa ‘Benihoppe’ using the Illumina HiSeq 2500-PE150 platform and then performed a combination of de novo assembly and reference-guided mapping of contigs to generate complete chloroplast genome sequences. The chloroplast genome exhibits a typical quadripartite structure with a pair of inverted repeats (IRs, 25,936 bp separated by large (LSC, 85,531 bp and small (SSC, 18,146 bp single-copy (SC regions. The length of the F. × ananassa ‘Benihoppe’ chloroplast genome is 155,549 bp, representing the smallest Fragaria chloroplast genome observed to date. The genome encodes 112 unique genes, comprising 78 protein-coding genes, 30 tRNA genes and four rRNA genes. Comparative analysis of the overall nucleotide sequence identity among ten complete chloroplast genomes confirmed that for both coding and non-coding regions in Rosaceae, SC regions exhibit higher sequence variation than IRs. The Ka/Ks ratio of most genes was less than 1, suggesting that most genes are under purifying selection. Moreover, the mVISTA results also showed a high degree of conservation in genome structure, gene order and gene content in Fragaria, particularly among three octoploid strawberries which were F. × ananassa ‘Benihoppe’, F. chiloensis (GP33 and F. virginiana (O477. However, when the sequences of the coding and non-coding regions of F. × ananassa ‘Benihoppe’ were compared in detail with those of F. chiloensis (GP33 and F. virginiana (O477, a number of SNPs and InDels were revealed by MEGA 7. Six non-coding regions (trnK-matK, trnS-trnG, atpF-atpH, trnC-petN, trnT-psbD and trnP-psaJ with a percentage of variable sites greater than 1

  15. Purification and characterization of recombinant full-length and protease domain of murine MMP-9 expressed in Drosophila S2 cells

    DEFF Research Database (Denmark)

    Rasch, Morten G; Lund, Ida K; Illemann, Martin

    2010-01-01

    MMP-9. Constructs encoding zymogens of full-length murine MMP-9 and a version lacking the O-glycosylated linker region and hemopexin domains were therefore generated and expressed in stably transfected Drosophila S2 insect cells. After 7 days of induction the expression levels of the full......-length and truncated versions were 5 mg/l and 2 mg/l, respectively. The products were >95% pure after gelatin Sepharose chromatography and possessed proteolytic activity when analyzed by gelatin zymography. Using the purified full-length murine MMP-9 we raised polyclonal antibodies by immunizations of rabbits......Matrix metalloproteinase-9 (MMP-9) is a 92-kDa soluble pro-enzyme implicated in pathological events including cancer invasion. It is therefore an attractive target for therapeutic intervention studies in mouse models. Development of inhibitors requires sufficient amounts of correctly folded murine...

  16. The sequence coding and search system: An approach for constructing and analyzing event sequences at commercial nuclear power plants

    International Nuclear Information System (INIS)

    Mays, G.T.

    1989-04-01

    The US Nuclear Regulatory Commission (NRC) has recognized the importance of the collection, assessment, and feedstock of operating experience data from commercial nuclear power plants and has centralized these activities in the Office for Analysis and Evaluation of Operational Data (AEOD). Such data is essential for performing safety and reliability analyses, especially analyses of trends and patterns to identify undesirable changes in plant performance at the earliest opportunity to implement corrective measures to preclude the occurrences of a more serious event. One of NRC's principal tools for collecting and evaluating operating experience data is the Sequence Coding and Search System (SCSS). The SCSS consists of a methodology for structuring event sequences and the requisite computer system to store and search the data. The source information for SCSS is the Licensee Event Report (LER), which is a legally required document. This paper describes the objective SCSS, the information it contains, and the format and approach for constructuring SCSS event sequences. Examples are presented demonstrating the use SCSS to support the analysis of LER data. The SCSS contains over 30,000 LERs describing events from 1980 through the present. Insights gained from working with a complex data system from the initial developmental stage to the point of a mature operating system are highlighted

  17. Low-Complexity Multiple Description Coding of Video Based on 3D Block Transforms

    Directory of Open Access Journals (Sweden)

    Andrey Norkin

    2007-02-01

    Full Text Available The paper presents a multiple description (MD video coder based on three-dimensional (3D transforms. Two balanced descriptions are created from a video sequence. In the encoder, video sequence is represented in a form of coarse sequence approximation (shaper included in both descriptions and residual sequence (details which is split between two descriptions. The shaper is obtained by block-wise pruned 3D-DCT. The residual sequence is coded by 3D-DCT or hybrid, LOT+DCT, 3D-transform. The coding scheme is targeted to mobile devices. It has low computational complexity and improved robustness of transmission over unreliable networks. The coder is able to work at very low redundancies. The coding scheme is simple, yet it outperforms some MD coders based on motion-compensated prediction, especially in the low-redundancy region. The margin is up to 3 dB for reconstruction from one description.

  18. Assessment and optimization of theileria parva sporozoite full-length p67 antigen expression in mammalian cells

    Science.gov (United States)

    Delivery of various forms of recombinant Theileria parva sporozoite antigen (p67) has been shown to elicit antibody responses in cattle capable of providing protection against East Coast fever, the clinical disease caused by T. parva. Previous formulations of full-length and shorter recombinant vers...

  19. Genome-wide identification of aquaporin encoding genes in Brassica oleracea and their phylogenetic sequence comparison to Brassica crops and Arabidopsis

    Science.gov (United States)

    Diehn, Till A.; Pommerrenig, Benjamin; Bernhardt, Nadine; Hartmann, Anja; Bienert, Gerd P.

    2015-01-01

    Aquaporins (AQPs) are essential channel proteins that regulate plant water homeostasis and the uptake and distribution of uncharged solutes such as metalloids, urea, ammonia, and carbon dioxide. Despite their importance as crop plants, little is known about AQP gene and protein function in cabbage (Brassica oleracea) and other Brassica species. The recent releases of the genome sequences of B. oleracea and Brassica rapa allow comparative genomic studies in these species to investigate the evolution and features of Brassica genes and proteins. In this study, we identified all AQP genes in B. oleracea by a genome-wide survey. In total, 67 genes of four plant AQP subfamilies were identified. Their full-length gene sequences and locations on chromosomes and scaffolds were manually curated. The identification of six additional full-length AQP sequences in the B. rapa genome added to the recently published AQP protein family of this species. A phylogenetic analysis of AQPs of Arabidopsis thaliana, B. oleracea, B. rapa allowed us to follow AQP evolution in closely related species and to systematically classify and (re-) name these isoforms. Thirty-three groups of AQP-orthologous genes were identified between B. oleracea and Arabidopsis and their expression was analyzed in different organs. The two selectivity filters, gene structure and coding sequences were highly conserved within each AQP subfamily while sequence variations in some introns and untranslated regions were frequent. These data suggest a similar substrate selectivity and function of Brassica AQPs compared to Arabidopsis orthologs. The comparative analyses of all AQP subfamilies in three Brassicaceae species give initial insights into AQP evolution in these taxa. Based on the genome-wide AQP identification in B. oleracea and the sequence analysis and reprocessing of Brassica AQP information, our dataset provides a sequence resource for further investigations of the physiological and molecular functions of

  20. Jointly Decoded Raptor Codes: Analysis and Design for the BIAWGN Channel

    Directory of Open Access Journals (Sweden)

    Venkiah Auguste

    2009-01-01

    Full Text Available Abstract We are interested in the analysis and optimization of Raptor codes under a joint decoding framework, that is, when the precode and the fountain code exchange soft information iteratively. We develop an analytical asymptotic convergence analysis of the joint decoder, derive an optimization method for the design of efficient output degree distributions, and show that the new optimized distributions outperform the existing ones, both at long and moderate lengths. We also show that jointly decoded Raptor codes are robust to channel variation: they perform reasonably well over a wide range of channel capacities. This robustness property was already known for the erasure channel but not for the Gaussian channel. Finally, we discuss some finite length code design issues. Contrary to what is commonly believed, we show by simulations that using a relatively low rate for the precode , we can improve greatly the error floor performance of the Raptor code.

  1. Gyrokinetic Vlasov code including full three-dimensional geometry of experiments

    International Nuclear Information System (INIS)

    Nunami, Masanori; Watanabe, Tomohiko; Sugama, Hideo

    2010-03-01

    A new gyrokinetic Vlasov simulation code, GKV-X, is developed for investigating the turbulent transport in magnetic confinement devices with non-axisymmetric configurations. Effects of the magnetic surface shapes in a three-dimensional equilibrium obtained from the VMEC code are accurately incorporated. Linear simulations of the ion temperature gradient instabilities and the zonal flows in the Large Helical Device (LHD) configuration are carried out by the GKV-X code for a benchmark test against the GKV code. The frequency, the growth rate, and the mode structure of the ion temperature gradient instability are influenced by the VMEC geometrical data such as the metric tensor components of the Boozer coordinates for high poloidal wave numbers, while the difference between the zonal flow responses obtained by the GKV and GKV-X codes is found to be small in the core LHD region. (author)

  2. A Note on Sequence Prediction over Large Alphabets

    Directory of Open Access Journals (Sweden)

    Travis Gagie

    2012-02-01

    Full Text Available Building on results from data compression, we prove nearly tight bounds on how well sequences of length n can be predicted in terms of the size σ of the alphabet and the length k of the context considered when making predictions. We compare the performance achievable by an adaptive predictor with no advance knowledge of the sequence, to the performance achievable by the optimal static predictor using a table listing the frequency of each (k + 1-tuple in the sequence. We show that, if the elements of the sequence are chosen uniformly at random, then an adaptive predictor can compete in the expected case if k ≤ logσ n – 3 – ε, for a constant ε > 0, but not if k ≥ logσ n.

  3. Discrete Ramanujan transform for distinguishing the protein coding regions from other regions.

    Science.gov (United States)

    Hua, Wei; Wang, Jiasong; Zhao, Jian

    2014-01-01

    Based on the study of Ramanujan sum and Ramanujan coefficient, this paper suggests the concepts of discrete Ramanujan transform and spectrum. Using Voss numerical representation, one maps a symbolic DNA strand as a numerical DNA sequence, and deduces the discrete Ramanujan spectrum of the numerical DNA sequence. It is well known that of discrete Fourier power spectrum of protein coding sequence has an important feature of 3-base periodicity, which is widely used for DNA sequence analysis by the technique of discrete Fourier transform. It is performed by testing the signal-to-noise ratio at frequency N/3 as a criterion for the analysis, where N is the length of the sequence. The results presented in this paper show that the property of 3-base periodicity can be only identified as a prominent spike of the discrete Ramanujan spectrum at period 3 for the protein coding regions. The signal-to-noise ratio for discrete Ramanujan spectrum is defined for numerical measurement. Therefore, the discrete Ramanujan spectrum and the signal-to-noise ratio of a DNA sequence can be used for distinguishing the protein coding regions from the noncoding regions. All the exon and intron sequences in whole chromosomes 1, 2, 3 and 4 of Caenorhabditis elegans have been tested and the histograms and tables from the computational results illustrate the reliability of our method. In addition, we have analyzed theoretically and gotten the conclusion that the algorithm for calculating discrete Ramanujan spectrum owns the lower computational complexity and higher computational accuracy. The computational experiments show that the technique by using discrete Ramanujan spectrum for classifying different DNA sequences is a fast and effective method. Copyright © 2014 Elsevier Ltd. All rights reserved.

  4. A comparative phylogenetic analysis of full-length mariner elements ...

    Indian Academy of Sciences (India)

    Unknown

    recent study showing non-occurance of inter-subfamily excisions because of .... length shown in our figure is greater because of the gaps introduced to maintain an ... to test the feasibility of transforming silkmoths with a foreign gene of ...

  5. Modified Three-Dimensional Multicarrier Optical Prime Codes

    Directory of Open Access Journals (Sweden)

    Rajesh Yadav

    2016-01-01

    Full Text Available We propose a mathematical model for novel three-dimensional multicarrier optical codes in terms of wavelength/time/space based on the prime sequence algorithm. The proposed model has been extensively simulated on MATLAB for prime numbers (P to analyze the performance of code in terms of autocorrelation and cross-correlation. The simulated outcome resembles the mathematical model and gives better results over other methods available in the literature as far as autocorrelation and cross-correlation are concerned. The proposed 3D optical codes are more efficient in terms of cardinality, improved security, and providing quality of services.

  6. Homogeneity of the 16S rDNA sequence among geographically disparate isolates of Taylorella equigenitalis

    Directory of Open Access Journals (Sweden)

    Moore JE

    2006-01-01

    Full Text Available Abstract Background At present, six accessible sequences of 16S rDNA from Taylorella equigenitalis (T. equigenitalis are available, whose sequence differences occur at a few nucleotide positions. Thus it is important to determine these sequences from additional strains in other countries, if possible, in order to clarify any anomalies regarding 16S rDNA sequence heterogeneity. Here, we clone and sequence the approximate full-length 16S rDNA from additional strains of T. equigenitalis isolated in Japan, Australia and France and compare these sequences to the existing published sequences. Results Clarification of any anomalies regarding 16S rDNA sequence heterogeneity of T. equigenitalis was carried out. When cloning, sequencing and comparison of the approximate full-length 16S rDNA from 17 strains of T. equigenitalis isolated in Japan, Australia and France, nucleotide sequence differences were demonstrated at the six loci in the 1,469 nucleotide sequence. Moreover, 12 polymorphic sites occurred among 23 sequences of the 16S rDNA, including the six reference sequences. Conclusion High sequence similarity (99.5% or more was observed throughout, except from nucleotide positions 138 to 501 where substitutions and deletions were noted.

  7. EuMicroSatdb: A database for microsatellites in the sequenced genomes of eukaryotes

    Directory of Open Access Journals (Sweden)

    Grover Atul

    2007-07-01

    Full Text Available Abstract Background Microsatellites have immense utility as molecular markers in different fields like genome characterization and mapping, phylogeny and evolutionary biology. Existing microsatellite databases are of limited utility for experimental and computational biologists with regard to their content and information output. EuMicroSatdb (Eukaryotic MicroSatellite database http://ipu.ac.in/usbt/EuMicroSatdb.htm is a web based relational database for easy and efficient positional mining of microsatellites from sequenced eukaryotic genomes. Description A user friendly web interface has been developed for microsatellite data retrieval using Active Server Pages (ASP. The backend database codes for data extraction and assembly have been written using Perl based scripts and C++. Precise need based microsatellites data retrieval is possible using different input parameters like microsatellite type (simple perfect or compound perfect, repeat unit length (mono- to hexa-nucleotide, repeat number, microsatellite length and chromosomal location in the genome. Furthermore, information about clustering of different microsatellites in the genome can also be retrieved. Finally, to facilitate primer designing for PCR amplification of any desired microsatellite locus, 200 bp upstream and downstream sequences are provided. Conclusion The database allows easy systematic retrieval of comprehensive information about simple and compound microsatellites, microsatellite clusters and their locus coordinates in 31 sequenced eukaryotic genomes. The information content of the database is useful in different areas of research like gene tagging, genome mapping, population genetics, germplasm characterization and in understanding microsatellite dynamics in eukaryotic genomes.

  8. Design and performance analysis for several new classes of codes for optical synchronous CDMA and for arbitrary-medium time-hopping synchronous CDMA communication systems

    Science.gov (United States)

    Kostic, Zoran; Titlebaum, Edward L.

    1994-08-01

    New families of spread-spectrum codes are constructed, that are applicable to optical synchronous code-division multiple-access (CDMA) communications as well as to arbitrary-medium time-hopping synchronous CDMA communications. Proposed constructions are based on the mappings from integer sequences into binary sequences. We use the concept of number theoretic quadratic congruences and a subset of Reed-Solomon codes similar to the one utilized in the Welch-Costas frequency-hop (FH) patterns. The properties of the codes are as good as or better than the properties of existing codes for synchronous CDMA communications: Both the number of code-sequences within a single code family and the number of code families with good properties are significantly increased when compared to the known code designs. Possible applications are presented. To evaluate the performance of the proposed codes, a new class of hit arrays called cyclical hit arrays is recalled, which give insight into the previously unknown properties of the few classes of number theoretic FH patterns. Cyclical hit arrays and the proposed mappings are used to determine the exact probability distribution functions of random variables that represent interference between users of a time-hopping or optical CDMA system. Expressions for the bit error probability in multi-user CDMA systems are derived as a function of the number of simultaneous CDMA system users, the length of signature sequences and the threshold of a matched filter detector. The performance results are compared with the results for some previously known codes.

  9. A Shellcode Detection Method Based on Full Native API Sequence and Support Vector Machine

    Science.gov (United States)

    Cheng, Yixuan; Fan, Wenqing; Huang, Wei; An, Jing

    2017-09-01

    Dynamic monitoring the behavior of a program is widely used to discriminate between benign program and malware. It is usually based on the dynamic characteristics of a program, such as API call sequence or API call frequency to judge. The key innovation of this paper is to consider the full Native API sequence and use the support vector machine to detect the shellcode. We also use the Markov chain to extract and digitize Native API sequence features. Our experimental results show that the method proposed in this paper has high accuracy and low detection rate.

  10. The complete chloroplast genome sequence of Abies nephrolepis (Pinaceae: Abietoideae

    Directory of Open Access Journals (Sweden)

    Dong-Keun Yi

    2016-06-01

    Full Text Available The plant chloroplast (cp genome has maintained a relatively conserved structure and gene content throughout evolution. Cp genome sequences have been used widely for resolving evolutionary and phylogenetic issues at various taxonomic levels of plants. Here, we report the complete cp genome of Abies nephrolepis. The A. nephrolepis cp genome is 121,336 base pairs (bp in length including a pair of short inverted repeat regions (IRa and IRb of 139 bp each separated by a small single copy (SSC region of 54,323 bp (SSC and a large single copy region of 66,735 bp (LSC. It contains 114 genes, 68 of which are protein coding genes, 35 tRNA and four rRNA genes, six open reading frames, and one pseudogene. Seventeen repeat units and 64 simple sequence repeats (SSR have been detected in A. nephrolepis cp genome. Large IR sequences locate in 42-kb inversion points (1186 bp. The A. nephrolepis cp genome is identical to Abies koreana’s which is closely related to taxa. Pairwise comparison between two cp genomes revealed 140 polymorphic sites in each. Complete cp genome sequence of A. nephrolepis has a significant potential to provide information on the evolutionary pattern of Abietoideae and valuable data for development of DNA markers for easy identification and classification.

  11. Spreading Sequence System for Full Connectivity Relay Network

    Science.gov (United States)

    Kwon, Hyuck M. (Inventor); Yang, Jie (Inventor); Pham, Khanh D. (Inventor)

    2018-01-01

    Fully connected uplink and downlink fully connected relay network systems using pseudo-noise spreading and despreading sequences subjected to maximizing the signal-to-interference-plus-noise ratio. The relay network systems comprise one or more transmitting units, relays, and receiving units connected via a communication network. The transmitting units, relays, and receiving units each may include a computer for performing the methods and steps described herein and transceivers for transmitting and/or receiving signals. The computer encodes and/or decodes communication signals via optimum adaptive PN sequences found by employing Cholesky decompositions and singular value decompositions (SVD). The PN sequences employ channel state information (CSI) to more effectively and more securely computing the optimal sequences.

  12. First complete genome sequence of canine bocavirus 2 in mainland China

    Directory of Open Access Journals (Sweden)

    S.-L. Zhai

    2017-07-01

    Full Text Available We obtained the first full-length genome sequence of canine bocavirus 2 (CBoV2 from the faeces of a healthy dog in Guangzhou city, Guangdong province, mainland China. The genome of GZHD15 consisted of 5059 nucleotides. Sequence analysis suggested that GZHD15 was close to a previously circulated Hong Kong isolate.

  13. The Complete Chloroplast Genome Sequences of the Medicinal Plant Forsythia suspensa (Oleaceae

    Directory of Open Access Journals (Sweden)

    Wenbin Wang

    2017-10-01

    Full Text Available Forsythia suspensa is an important medicinal plant and traditionally applied for the treatment of inflammation, pyrexia, gonorrhea, diabetes, and so on. However, there is limited sequence and genomic information available for F. suspensa. Here, we produced the complete chloroplast genomes of F. suspensa using Illumina sequencing technology. F. suspensa is the first sequenced member within the genus Forsythia (Oleaceae. The gene order and organization of the chloroplast genome of F. suspensa are similar to other Oleaceae chloroplast genomes. The F. suspensa chloroplast genome is 156,404 bp in length, exhibits a conserved quadripartite structure with a large single-copy (LSC; 87,159 bp region, and a small single-copy (SSC; 17,811 bp region interspersed between inverted repeat (IRa/b; 25,717 bp regions. A total of 114 unique genes were annotated, including 80 protein-coding genes, 30 tRNA, and four rRNA. The low GC content (37.8% and codon usage bias for A- or T-ending codons may largely affect gene codon usage. Sequence analysis identified a total of 26 forward repeats, 23 palindrome repeats with lengths >30 bp (identity > 90%, and 54 simple sequence repeats (SSRs with an average rate of 0.35 SSRs/kb. We predicted 52 RNA editing sites in the chloroplast of F. suspensa, all for C-to-U transitions. IR expansion or contraction and the divergent regions were analyzed among several species including the reported F. suspensa in this study. Phylogenetic analysis based on whole-plastome revealed that F. suspensa, as a member of the Oleaceae family, diverged relatively early from Lamiales. This study will contribute to strengthening medicinal resource conservation, molecular phylogenetic, and genetic engineering research investigations of this species.

  14. Mitochondrial genome sequence of Egyptian swift Rock Pigeon (Columba livia breed Egyptian swift).

    Science.gov (United States)

    Li, Chun-Hong; Shi, Wei; Shi, Wan-Yu

    2015-06-01

    The Egyptian swift Rock Pigeon is a breed of fancy pigeon developed over many years of selective breeding. In this work, we report the complete mitochondrial genome sequence of Egyptian swift Rock Pigeon. The total length of the mitogenome was 17,239 bp and its overall base composition was estimated to be 30.2% for A, 24.0% for T, 31.9% for C and 13.9% for G, indicating an A-T (54.2%)-rich feature in the mitogenome. It contained the typical structure of 13 protein-coding genes, 2 ribosomal RNA genes, 22 transfer RNA genes and a non-coding control region (D-loop region). The complete mitochondrial genome sequence of Egyptian swift Rock Pigeon would serve as an important data set of the germplasm resources for further study.

  15. mPUMA: a computational approach to microbiota analysis by de novo assembly of operational taxonomic units based on protein-coding barcode sequences.

    Science.gov (United States)

    Links, Matthew G; Chaban, Bonnie; Hemmingsen, Sean M; Muirhead, Kevin; Hill, Janet E

    2013-08-15

    Formation of operational taxonomic units (OTU) is a common approach to data aggregation in microbial ecology studies based on amplification and sequencing of individual gene targets. The de novo assembly of OTU sequences has been recently demonstrated as an alternative to widely used clustering methods, providing robust information from experimental data alone, without any reliance on an external reference database. Here we introduce mPUMA (microbial Profiling Using Metagenomic Assembly, http://mpuma.sourceforge.net), a software package for identification and analysis of protein-coding barcode sequence data. It was developed originally for Cpn60 universal target sequences (also known as GroEL or Hsp60). Using an unattended process that is independent of external reference sequences, mPUMA forms OTUs by DNA sequence assembly and is capable of tracking OTU abundance. mPUMA processes microbial profiles both in terms of the direct DNA sequence as well as in the translated amino acid sequence for protein coding barcodes. By forming OTUs and calculating abundance through an assembly approach, mPUMA is capable of generating inputs for several popular microbiota analysis tools. Using SFF data from sequencing of a synthetic community of Cpn60 sequences derived from the human vaginal microbiome, we demonstrate that mPUMA can faithfully reconstruct all expected OTU sequences and produce compositional profiles consistent with actual community structure. mPUMA enables analysis of microbial communities while empowering the discovery of novel organisms through OTU assembly.

  16. Security problems for a pseudorandom sequence generator based on the Chen chaotic system

    Science.gov (United States)

    Özkaynak, Fatih; Yavuz, Sırma

    2013-09-01

    Recently, a novel pseudorandom number generator scheme based on the Chen chaotic system was proposed. In this study, we analyze the security weaknesses of the proposed generator. By applying a brute force attack on a reduced key space, we show that 66% of the generated pseudorandom number sequences can be revealed. Executable C# code is given for the proposed attack. The computational complexity of this attack is O(n), where n is the sequence length. Both mathematical proofs and experimental results are presented to support the proposed attack.

  17. Sequencing illustrates the transcriptional response of Legionella pneumophila during infection and identifies seventy novel small non-coding RNAs.

    LENUS (Irish Health Repository)

    Weissenmayer, Barbara A

    2011-01-01

    Second generation sequencing has prompted a number of groups to re-interrogate the transcriptomes of several bacterial and archaeal species. One of the central findings has been the identification of complex networks of small non-coding RNAs that play central roles in transcriptional regulation in all growth conditions and for the pathogen\\'s interaction with and survival within host cells. Legionella pneumophila is a gram-negative facultative intracellular human pathogen with a distinct biphasic lifestyle. One of its primary environmental hosts in the free-living amoeba Acanthamoeba castellanii and its infection by L. pneumophila mimics that seen in human macrophages. Here we present analysis of strand specific sequencing of the transcriptional response of L. pneumophila during exponential and post-exponential broth growth and during the replicative and transmissive phase of infection inside A. castellanii. We extend previous microarray based studies as well as uncovering evidence of a complex regulatory architecture underpinned by numerous non-coding RNAs. Over seventy new non-coding RNAs could be identified; many of them appear to be strain specific and in configurations not previously reported. We discover a family of non-coding RNAs preferentially expressed during infection conditions and identify a second copy of 6S RNA in L. pneumophila. We show that the newly discovered putative 6S RNA as well as a number of other non-coding RNAs show evidence for antisense transcription. The nature and extent of the non-coding RNAs and their expression patterns suggests that these may well play central roles in the regulation of Legionella spp. specific traits and offer clues as to how L. pneumophila adapts to its intracellular niche. The expression profiles outlined in the study have been deposited into Genbank\\'s Gene Expression Omnibus (GEO) database under the series accession GSE27232.

  18. Optimized Method for Generating and Acquiring GPS Gold Codes

    Directory of Open Access Journals (Sweden)

    Khaled Rouabah

    2015-01-01

    Full Text Available We propose a simpler and faster Gold codes generator, which can be efficiently initialized to any desired code, with a minimum delay. Its principle consists of generating only one sequence (code number 1 from which we can produce all the other different signal codes. This is realized by simply shifting this sequence by different delays that are judiciously determined by using the bicorrelation function characteristics. This is in contrast to the classical Linear Feedback Shift Register (LFSR based Gold codes generator that requires, in addition to the shift process, a significant number of logic XOR gates and a phase selector to change the code. The presence of all these logic XOR gates in classical LFSR based Gold codes generator provokes the consumption of an additional time in the generation and acquisition processes. In addition to its simplicity and its rapidity, the proposed architecture, due to the total absence of XOR gates, has fewer resources than the conventional Gold generator and can thus be produced at lower cost. The Digital Signal Processing (DSP implementations have shown that the proposed architecture presents a solution for acquiring Global Positioning System (GPS satellites signals optimally and in a parallel way.

  19. VALIDATION OF FULL CORE GEOMETRY MODEL OF THE NODAL3 CODE IN THE PWR TRANSIENT BENCHMARK PROBLEMS

    Directory of Open Access Journals (Sweden)

    Tagor Malem Sembiring

    2015-10-01

    Full Text Available ABSTRACT VALIDATION OF FULL CORE GEOMETRY MODEL OF THE NODAL3 CODE IN THE PWR TRANSIENT BENCHMARK PROBLEMS. The coupled neutronic and thermal-hydraulic (T/H code, NODAL3 code, has been validated in some PWR static benchmark and the NEACRP PWR transient benchmark cases. However, the NODAL3 code have not yet validated in the transient benchmark cases of a control rod assembly (CR ejection at peripheral core using a full core geometry model, the C1 and C2 cases.  By this research work, the accuracy of the NODAL3 code for one CR ejection or the unsymmetrical group of CRs ejection case can be validated. The calculations by the NODAL3 code have been carried out by the adiabatic method (AM and the improved quasistatic method (IQS. All calculated transient parameters by the NODAL3 code were compared with the reference results by the PANTHER code. The maximum relative difference of 16% occurs in the calculated time of power maximum parameter by using the IQS method, while the relative difference of the AM method is 4% for C2 case.  All calculation results by the NODAL3 code shows there is no systematic difference, it means the neutronic and T/H modules are adopted in the code are considered correct. Therefore, all calculation results by using the NODAL3 code are very good agreement with the reference results. Keywords: nodal method, coupled neutronic and thermal-hydraulic code, PWR, transient case, control rod ejection.   ABSTRAK VALIDASI MODEL GEOMETRI TERAS PENUH PAKET PROGRAM NODAL3 DALAM PROBLEM BENCHMARK GAYUT WAKTU PWR. Paket program kopel neutronik dan termohidraulika (T/H, NODAL3, telah divalidasi dengan beberapa kasus benchmark statis PWR dan kasus benchmark gayut waktu PWR NEACRP.  Akan tetapi, paket program NODAL3 belum divalidasi dalam kasus benchmark gayut waktu akibat penarikan sebuah perangkat batang kendali (CR di tepi teras menggunakan model geometri teras penuh, yaitu kasus C1 dan C2. Dengan penelitian ini, akurasi paket program

  20. Full-Length High-Temperature Severe Fuel Damage Test No. 5: Final safety analysis

    International Nuclear Information System (INIS)

    Lanning, D.D.; Lombardo, N.J.; Panisko, F.E.

    1993-09-01

    This report presents the final safety analysis for the preparation, conduct, and post-test discharge operation for the Full-Length High Temperature Experiment-5 (FLHT-5) to be conducted in the L-24 position of the National Research Universal (NRU) Reactor at Chalk River Nuclear Laboratories (CRNL), Ontario, Canada. The test is sponsored by an international group organized by the US Nuclear Regulatory Commission. The test is designed and conducted by staff from Pacific Northwest Laboratory with CRNL staff support. The test will study the consequences of loss-of-coolant and the progression of severe fuel damage

  1. Purification and characterization of recombinant full-length and protease domain of murine MMP-9 expressed in Drosophila S2 cells

    DEFF Research Database (Denmark)

    Rasch, Morten G; Lund, Ida K.; Illemann, Martin

    2010-01-01

    -length and truncated versions were 5 mg/l and 2 mg/l, respectively. The products were >95% pure after gelatin Sepharose chromatography and possessed proteolytic activity when analyzed by gelatin zymography. Using the purified full-length murine MMP-9 we raised polyclonal antibodies by immunizations of rabbits...

  2. The determinants of IPO firm prospectus length in Africa

    Directory of Open Access Journals (Sweden)

    Bruce Hearn

    2013-04-01

    Full Text Available This paper studies the differential impact on IPO firm listing prospectus length from increasing proportions of foreign directors from civil as opposed to common law societies and social elites. Using a unique hand-collected and comprehensive sample of 165 IPO firms from across 18 African countries the evidence suggests that increasing proportions of directors from civil code law countries is associated with shorter prospectuses while the opposite is true for their common law counterparts. Furthermore increasing proportions of directors drawn from elevated social positions in indigenous society is related to increasing prospectus length in North Africa while being insignificant in SSA.

  3. The evolutionary rates of HCV estimated with subtype 1a and 1b sequences over the ORF length and in different genomic regions.

    Directory of Open Access Journals (Sweden)

    Manqiong Yuan

    Full Text Available Considerable progress has been made in the HCV evolutionary analysis, since the software BEAST was released. However, prior information, especially the prior evolutionary rate, which plays a critical role in BEAST analysis, is always difficult to ascertain due to various uncertainties. Providing a proper prior HCV evolutionary rate is thus of great importance.176 full-length sequences of HCV subtype 1a and 144 of 1b were assembled by taking into consideration the balance of the sampling dates and the even dispersion in phylogenetic trees. According to the HCV genomic organization and biological functions, each dataset was partitioned into nine genomic regions and two routinely amplified regions. A uniform prior rate was applied to the BEAST analysis for each region and also the entire ORF. All the obtained posterior rates for 1a are of a magnitude of 10(-3 substitutions/site/year and in a bell-shaped distribution. Significantly lower rates were estimated for 1b and some of the rate distribution curves resulted in a one-sided truncation, particularly under the exponential model. This indicates that some of the rates for subtype 1b are less accurate, so they were adjusted by including more sequences to improve the temporal structure.Among the various HCV subtypes and genomic regions, the evolutionary patterns are dissimilar. Therefore, an applied estimation of the HCV epidemic history requires the proper selection of the rate priors, which should match the actual dataset so that they can fit for the subtype, the genomic region and even the length. By referencing the findings here, future evolutionary analysis of the HCV subtype 1a and 1b datasets may become more accurate and hence prove useful for tracing their patterns.

  4. Rate-adaptive BCH codes for distributed source coding

    DEFF Research Database (Denmark)

    Salmistraro, Matteo; Larsen, Knud J.; Forchhammer, Søren

    2013-01-01

    This paper considers Bose-Chaudhuri-Hocquenghem (BCH) codes for distributed source coding. A feedback channel is employed to adapt the rate of the code during the decoding process. The focus is on codes with short block lengths for independently coding a binary source X and decoding it given its...... strategies for improving the reliability of the decoded result are analyzed, and methods for estimating the performance are proposed. In the analysis, noiseless feedback and noiseless communication are assumed. Simulation results show that rate-adaptive BCH codes achieve better performance than low...... correlated side information Y. The proposed codes have been analyzed in a high-correlation scenario, where the marginal probability of each symbol, Xi in X, given Y is highly skewed (unbalanced). Rate-adaptive BCH codes are presented and applied to distributed source coding. Adaptive and fixed checking...

  5. Validation of full core geometry model of the NODAL3 code in the PWR transient Benchmark problems

    International Nuclear Information System (INIS)

    T-M Sembiring; S-Pinem; P-H Liem

    2015-01-01

    The coupled neutronic and thermal-hydraulic (T/H) code, NODAL3 code, has been validated in some PWR static benchmark and the NEACRP PWR transient benchmark cases. However, the NODAL3 code have not yet validated in the transient benchmark cases of a control rod assembly (CR) ejection at peripheral core using a full core geometry model, the C1 and C2 cases. By this research work, the accuracy of the NODAL3 code for one CR ejection or the unsymmetrical group of CRs ejection case can be validated. The calculations by the NODAL3 code have been carried out by the adiabatic method (AM) and the improved quasistatic method (IQS). All calculated transient parameters by the NODAL3 code were compared with the reference results by the PANTHER code. The maximum relative difference of 16 % occurs in the calculated time of power maximum parameter by using the IQS method, while the relative difference of the AM method is 4 % for C2 case. All calculation results by the NODAL3 code shows there is no systematic difference, it means the neutronic and T/H modules are adopted in the code are considered correct. Therefore, all calculation results by using the NODAL3 code are very good agreement with the reference results. (author)

  6. Lyso-myristoyl phosphatidylcholine micelles sustain the activity of Dengue non-structural (NS) protein 3 protease domain fused with the full-length NS2B.

    Science.gov (United States)

    Huang, Qiwei; Li, Qingxin; Joy, Joma; Chen, Angela Shuyi; Ruiz-Carrillo, David; Hill, Jeffrey; Lescar, Julien; Kang, Congbao

    2013-12-01

    Dengue virus (DENV), a member of the flavivirus genus, affects 50-100 million people in tropical and sub-tropical regions. The DENV protease domain is located at the N-terminus of the NS3 protease and requires for its enzymatic activity a hydrophilic segment of the NS2B that acts as a cofactor. The protease is an important antiviral drug target because it plays a crucial role in virus replication by cleaving the genome-coded polypeptide into mature functional proteins. Currently, there are no drugs to inhibit DENV protease activity. Most structural and functional studies have been conducted using protein constructs containing the NS3 protease domain connected to a soluble segment of the NS2B membrane protein via a nine-residue linker. For in vitro structural and functional studies, it would be useful to produce a natural form of the DENV protease containing the NS3 protease domain and the full-length NS2B protein. Herein, we describe the expression and purification of a natural form of DENV protease (NS2BFL-NS3pro) containing the full-length NS2B protein and the protease domain of NS3 (NS3pro). The protease was expressed and purified in detergent micelles necessary for its folding. Our results show that this purified protein was active in detergent micelles such as lyso-myristoyl phosphatidylcholine (LMPC). These findings should facilitate further structural and functional studies of the protease and will facilitate drug discovery targeting DENV. Copyright © 2013 Elsevier Inc. All rights reserved.

  7. Genome sequence of the olive tree, Olea europaea.

    Science.gov (United States)

    Cruz, Fernando; Julca, Irene; Gómez-Garrido, Jèssica; Loska, Damian; Marcet-Houben, Marina; Cano, Emilio; Galán, Beatriz; Frias, Leonor; Ribeca, Paolo; Derdak, Sophia; Gut, Marta; Sánchez-Fernández, Manuel; García, Jose Luis; Gut, Ivo G; Vargas, Pablo; Alioto, Tyler S; Gabaldón, Toni

    2016-06-27

    The Mediterranean olive tree (Olea europaea subsp. europaea) was one of the first trees to be domesticated and is currently of major agricultural importance in the Mediterranean region as the source of olive oil. The molecular bases underlying the phenotypic differences among domesticated cultivars, or between domesticated olive trees and their wild relatives, remain poorly understood. Both wild and cultivated olive trees have 46 chromosomes (2n). A total of 543 Gb of raw DNA sequence from whole genome shotgun sequencing, and a fosmid library containing 155,000 clones from a 1,000+ year-old olive tree (cv. Farga) were generated by Illumina sequencing using different combinations of mate-pair and pair-end libraries. Assembly gave a final genome with a scaffold N50 of 443 kb, and a total length of 1.31 Gb, which represents 95 % of the estimated genome length (1.38 Gb). In addition, the associated fungus Aureobasidium pullulans was partially sequenced. Genome annotation, assisted by RNA sequencing from leaf, root, and fruit tissues at various stages, resulted in 56,349 unique protein coding genes, suggesting recent genomic expansion. Genome completeness, as estimated using the CEGMA pipeline, reached 98.79 %. The assembled draft genome of O. europaea will provide a valuable resource for the study of the evolution and domestication processes of this important tree, and allow determination of the genetic bases of key phenotypic traits. Moreover, it will enhance breeding programs and the formation of new varieties.

  8. Visual artificial grammar learning by rhesus macaques (Macaca mulatta): exploring the role of grammar complexity and sequence length.

    Science.gov (United States)

    Heimbauer, Lisa A; Conway, Christopher M; Christiansen, Morten H; Beran, Michael J; Owren, Michael J

    2018-03-01

    Humans and nonhuman primates can learn about the organization of stimuli in the environment using implicit sequential pattern learning capabilities. However, most previous artificial grammar learning studies with nonhuman primates have involved relatively simple grammars and short input sequences. The goal in the current experiments was to assess the learning capabilities of monkeys on an artificial grammar-learning task that was more complex than most others previously used with nonhumans. Three experiments were conducted using a joystick-based, symmetrical-response serial reaction time task in which two monkeys were exposed to grammar-generated sequences at sequence lengths of four in Experiment 1, six in Experiment 2, and eight in Experiment 3. Over time, the monkeys came to respond faster to the sequences generated from the artificial grammar compared to random versions. In a subsequent generalization phase, subjects generalized their knowledge to novel sequences, responding significantly faster to novel instances of sequences produced using the familiar grammar compared to those constructed using an unfamiliar grammar. These results reveal that rhesus monkeys can learn and generalize the statistical structure inherent in an artificial grammar that is as complex as some used with humans, for sequences up to eight items long. These findings are discussed in relation to whether or not rhesus macaques and other primate species possess implicit sequence learning abilities that are similar to those that humans draw upon to learn natural language grammar.

  9. An Infinite Sequence of Full AFL-Structures, Each of Which Possesses an Infinite Hierarchy

    NARCIS (Netherlands)

    Asveld, P.R.J.

    1999-01-01

    We investigate different sets of operations on languages which results in corresponding algebraic structures, viz.\\ in different types of full AFL's (full Abstract Family of Languages). By iterating control on ETOL-systems we show that there exists an infinite sequence ${\\cal C}_m$ ($m\\geq1$) of

  10. An Infinite Sequence of Full AFL-structures, Each of Which Possesses an Infinite Hierarchy

    NARCIS (Netherlands)

    Asveld, P.R.J.; Martin-Vide, C.; Mitrana, V.

    2001-01-01

    We investigate different sets of operations on languages which results in corresponding algebraic structures, viz.\\ in different types of full AFL's (full Abstract Family of Languages). By iterating control on ETOL-systems we show that there exists an infinite sequence ${\\cal C}_m$ ($m\\geq1$) of

  11. The sequence coding and search system: an approach for constructing and analyzing event sequences at commercial nuclear power plants

    International Nuclear Information System (INIS)

    Mays, G.T.

    1990-01-01

    The U.S. Nuclear Regulatory Commission (NRC) has recognized the importance of the collection, assessment, and feedback of operating experience data from commercial nuclear power plants and has centralized these activities in the Office for Analysis and Evaluation of Operational Data (AEOD). Such data is essential for performing safety and reliability analyses, especially analyses of trends and patterns to identify undesirable changes in plant performance at the earliest opportunity to implement corrective measures to preclude the occurrence of a more serious event. One of NRC's principal tools for collecting and evaluating operating experience data is the Sequence Coding and Search System (SCSS). The SCSS consists of a methodology for structuring event sequences and the requisite computer system to store and search the data. The source information for SCSS is the Licensee Event Report (LER), which is a legally required document. This paper describes the objectives of SCSS, the information it contains, and the format and approach for constructing SCSS event sequences. Examples are presented demonstrating the use of SCSS to support the analysis of LER data. The SCSS contains over 30,000 LERs describing events from 1980 through the present. Insights gained from working with a complex data system from the initial developmental stage to the point of a mature operating system are highlighted. Considerable experience has been gained in the areas of evolving and changing data requirements, staffing requirements, and quality control and quality assurance procedures for addressing consistency, software/hardware considerations for developing and maintaining a complex system, documentation requirements, and end-user needs. Two other approaches for constructing and evaluating event sequences are examined including the Accident Precursor Program (ASP) where sequences having the potential for core damage are identified and analyzed, and the Significant Event Compilation Tree

  12. Circular codes revisited: a statistical approach.

    Science.gov (United States)

    Gonzalez, D L; Giannerini, S; Rosa, R

    2011-04-21

    In 1996 Arquès and Michel [1996. A complementary circular code in the protein coding genes. J. Theor. Biol. 182, 45-58] discovered the existence of a common circular code in eukaryote and prokaryote genomes. Since then, circular code theory has provoked great interest and underwent a rapid development. In this paper we discuss some theoretical issues related to the synchronization properties of coding sequences and circular codes with particular emphasis on the problem of retrieval and maintenance of the reading frame. Motivated by the theoretical discussion, we adopt a rigorous statistical approach in order to try to answer different questions. First, we investigate the covering capability of the whole class of 216 self-complementary, C(3) maximal codes with respect to a large set of coding sequences. The results indicate that, on average, the code proposed by Arquès and Michel has the best covering capability but, still, there exists a great variability among sequences. Second, we focus on such code and explore the role played by the proportion of the bases by means of a hierarchy of permutation tests. The results show the existence of a sort of optimization mechanism such that coding sequences are tailored as to maximize or minimize the coverage of circular codes on specific reading frames. Such optimization clearly relates the function of circular codes with reading frame synchronization. Copyright © 2011 Elsevier Ltd. All rights reserved.

  13. Construction of occluded recombinant baculoviruses containing the full-length cry1Ab and cry1Ac genes from Bacillus thuringiensis

    Directory of Open Access Journals (Sweden)

    B.M. Ribeiro

    1998-06-01

    Full Text Available The administration of baculoviruses to insects for bioassay purposes is carried out, in most cases, by contamination of food surfaces with a known amount of occlusion bodies (OBs. Since per os infection is the natural route of infection, occluded recombinant viruses containing crystal protein genes (cry1Ab and cry1Ac from Bacillus thuringiensis were constructed for comparison with the baculovirus prototype Autographa californica nucleopolyhedrovirus (AcNPV. The transfer vector pAcUW2B was used for construction of occluded recombinant viruses. The transfer vector containing the crystal protein genes was cotransfected with linearized DNA from a non-occluded recombinant virus. The isolation of recombinant viruses was greatly facilitated by the reduction of background "wild type" virus and the increased proportion of recombinant viruses. Since the recombinant viruses containing full-length and truncated forms of the crystal protein genes did not seem to improve the pathogenicity of the recombinant viruses when compared with the wild type AcNPV, and in order to compare expression levels of the full-length crystal proteins produced by non-occluded and occluded recombinant viruses the full-length cry1Ab and cry1Ac genes were chosen for construction of occluded recombinant viruses. The recombinant viruses containing full-length and truncated forms of the crystal protein genes did not seem to improve its pathogenicity but the size of the larvae infected with the recombinant viruses was significantly smaller than that of larvae infected with the wild type virus.

  14. High performance mixed optical CDMA system using ZCC code and multiband OFDM

    Directory of Open Access Journals (Sweden)

    Nawawi N. M.

    2017-01-01

    Full Text Available In this paper, we have proposed a high performance network design, which is based on mixed optical Code Division Multiple Access (CDMA system using Zero Cross Correlation (ZCC code and multiband Orthogonal Frequency Division Multiplexing (OFDM called catenated OFDM. In addition, we also investigate the related changing parameters such as; effective power, number of user, number of band, code length and code weight. Then we theoretically analyzed the system performance comprehensively while considering up to five OFDM bands. The feasibility of the proposed system architecture is verified via the numerical analysis. The research results demonstrated that our developed modulation solution can significantly enhanced the total number of user; improving up to 80% for five catenated bands compared to traditional optical CDMA system, with the code length equals to 80, transmitted at 622 Mbps. It is also demonstrated that the BER performance strongly depends on number of weight, especially with less number of users. As the number of weight increases, the BER performance is better.

  15. Vector Network Coding Algorithms

    OpenAIRE

    Ebrahimi, Javad; Fragouli, Christina

    2010-01-01

    We develop new algebraic algorithms for scalar and vector network coding. In vector network coding, the source multicasts information by transmitting vectors of length L, while intermediate nodes process and combine their incoming packets by multiplying them with L x L coding matrices that play a similar role as coding c in scalar coding. Our algorithms for scalar network jointly optimize the employed field size while selecting the coding coefficients. Similarly, for vector coding, our algori...

  16. Complete plastid genome sequence of Daucus carota: Implications for biotechnology and phylogeny of angiosperms

    Directory of Open Access Journals (Sweden)

    Ruhlman Tracey

    2006-08-01

    Full Text Available Abstract Background Carrot (Daucus carota is a major food crop in the US and worldwide. Its capacity for storage and its lifecycle as a biennial make it an attractive species for the introduction of foreign genes, especially for oral delivery of vaccines and other therapeutic proteins. Until recently efforts to express recombinant proteins in carrot have had limited success in terms of protein accumulation in the edible tap roots. Plastid genetic engineering offers the potential to overcome this limitation, as demonstrated by the accumulation of BADH in chromoplasts of carrot taproots to confer exceedingly high levels of salt resistance. The complete plastid genome of carrot provides essential information required for genetic engineering. Additionally, the sequence data add to the rapidly growing database of plastid genomes for assessing phylogenetic relationships among angiosperms. Results The complete carrot plastid genome is 155,911 bp in length, with 115 unique genes and 21 duplicated genes within the IR. There are four ribosomal RNAs, 30 distinct tRNA genes and 18 intron-containing genes. Repeat analysis reveals 12 direct and 2 inverted repeats ≥ 30 bp with a sequence identity ≥ 90%. Phylogenetic analysis of nucleotide sequences for 61 protein-coding genes using both maximum parsimony (MP and maximum likelihood (ML were performed for 29 angiosperms. Phylogenies from both methods provide strong support for the monophyly of several major angiosperm clades, including monocots, eudicots, rosids, asterids, eurosids II, euasterids I, and euasterids II. Conclusion The carrot plastid genome contains a number of dispersed direct and inverted repeats scattered throughout coding and non-coding regions. This is the first sequenced plastid genome of the family Apiaceae and only the second published genome sequence of the species-rich euasterid II clade. Both MP and ML trees provide very strong support (100% bootstrap for the sister relationship of

  17. Modeling coding-sequence evolution within the context of residue solvent accessibility.

    Science.gov (United States)

    Scherrer, Michael P; Meyer, Austin G; Wilke, Claus O

    2012-09-12

    Protein structure mediates site-specific patterns of sequence divergence. In particular, residues in the core of a protein (solvent-inaccessible residues) tend to be more evolutionarily conserved than residues on the surface (solvent-accessible residues). Here, we present a model of sequence evolution that explicitly accounts for the relative solvent accessibility of each residue in a protein. Our model is a variant of the Goldman-Yang 1994 (GY94) model in which all model parameters can be functions of the relative solvent accessibility (RSA) of a residue. We apply this model to a data set comprised of nearly 600 yeast genes, and find that an evolutionary-rate ratio ω that varies linearly with RSA provides a better model fit than an RSA-independent ω or an ω that is estimated separately in individual RSA bins. We further show that the branch length t and the transition-transverion ratio κ also vary with RSA. The RSA-dependent GY94 model performs better than an RSA-dependent Muse-Gaut 1994 (MG94) model in which the synonymous and non-synonymous rates individually are linear functions of RSA. Finally, protein core size affects the slope of the linear relationship between ω and RSA, and gene expression level affects both the intercept and the slope. Structure-aware models of sequence evolution provide a significantly better fit than traditional models that neglect structure. The linear relationship between ω and RSA implies that genes are better characterized by their ω slope and intercept than by just their mean ω.

  18. Sequence embedding for fast construction of guide trees for multiple sequence alignment

    LENUS (Irish Health Repository)

    Blackshields, Gordon

    2010-05-14

    Abstract Background The most widely used multiple sequence alignment methods require sequences to be clustered as an initial step. Most sequence clustering methods require a full distance matrix to be computed between all pairs of sequences. This requires memory and time proportional to N 2 for N sequences. When N grows larger than 10,000 or so, this becomes increasingly prohibitive and can form a significant barrier to carrying out very large multiple alignments. Results In this paper, we have tested variations on a class of embedding methods that have been designed for clustering large numbers of complex objects where the individual distance calculations are expensive. These methods involve embedding the sequences in a space where the similarities within a set of sequences can be closely approximated without having to compute all pair-wise distances. Conclusions We show how this approach greatly reduces computation time and memory requirements for clustering large numbers of sequences and demonstrate the quality of the clusterings by benchmarking them as guide trees for multiple alignment. Source code is available for download from http:\\/\\/www.clustal.org\\/mbed.tgz.

  19. Serotype identification and VP1 coding sequence analysis of foot-and-mouth disease virus from outbreaks in Eastern and Northern Uganda in 2008/9

    DEFF Research Database (Denmark)

    Kasambula, L.; Belsham, Graham; Siegismund, H. R.

    2012-01-01

    regions, and the presence of FMDV RNA in these samples was determined using a standard diagnostic RT-PCR assay. From the total of 27 positive samples, the VP1 coding region was amplified and sequenced. Each of these sequences showed >99% identity to each other, and just five distinct sequences were...

  20. Exome sequencing and genetic testing for MODY.

    Directory of Open Access Journals (Sweden)

    Stefan Johansson

    Full Text Available Genetic testing for monogenic diabetes is important for patient care. Given the extensive genetic and clinical heterogeneity of diabetes, exome sequencing might provide additional diagnostic potential when standard Sanger sequencing-based diagnostics is inconclusive.The aim of the study was to examine the performance of exome sequencing for a molecular diagnosis of MODY in patients who have undergone conventional diagnostic sequencing of candidate genes with negative results.We performed exome enrichment followed by high-throughput sequencing in nine patients with suspected MODY. They were Sanger sequencing-negative for mutations in the HNF1A, HNF4A, GCK, HNF1B and INS genes. We excluded common, non-coding and synonymous gene variants, and performed in-depth analysis on filtered sequence variants in a pre-defined set of 111 genes implicated in glucose metabolism.On average, we obtained 45 X median coverage of the entire targeted exome and found 199 rare coding variants per individual. We identified 0-4 rare non-synonymous and nonsense variants per individual in our a priori list of 111 candidate genes. Three of the variants were considered pathogenic (in ABCC8, HNF4A and PPARG, respectively, thus exome sequencing led to a genetic diagnosis in at least three of the nine patients. Approximately 91% of known heterozygous SNPs in the target exomes were detected, but we also found low coverage in some key diabetes genes using our current exome sequencing approach. Novel variants in the genes ARAP1, GLIS3, MADD, NOTCH2 and WFS1 need further investigation to reveal their possible role in diabetes.Our results demonstrate that exome sequencing can improve molecular diagnostics of MODY when used as a complement to Sanger sequencing. However, improvements will be needed, especially concerning coverage, before the full potential of exome sequencing can be realized.

  1. Measurement of Telomere Length in Colorectal Cancers for Improved Molecular Diagnosis

    Directory of Open Access Journals (Sweden)

    Eric Le Balc’h

    2017-08-01

    Full Text Available All tumors have in common to reactivate a telomere maintenance mechanism to allow for unlimited proliferation. On the other hand, genetic instability found in some tumors can result from the loss of telomeres. Here, we measured telomere length in colorectal cancers (CRCs using TRF (Telomere Restriction Fragment analysis. Telomeric DNA content was also quantified as the ratio of total telomeric (TTAGGG sequences over that of the invariable Alu sequences. In most of the 125 CRCs analyzed, there was a significant diminution in telomere length compared with that in control healthy tissue. Only 34 tumors exhibited no telomere erosion and, in some cases, a slight telomere lengthening. Telomere length did not correlate with age, gender, tumor stage, tumor localization or stage of tumor differentiation. In addition, while telomere length did not correlate with the presence of a mutation in BRAF (V-raf murine sarcoma viral oncogene homolog B, PIK3CA (phosphatidylinositol 3-kinase catalytic subunit, or MSI status, it was significantly associated with the occurrence of a mutation in KRAS. Interestingly, we found that the shorter the telomeres in healthy tissue of a patient, the larger an increase in telomere length in the tumor. Our study points to the existence of two types of CRCs based on telomere length and reveals that telomere length in healthy tissue might influence telomere maintenance mechanisms in the tumor.

  2. Applications of statistical physics and information theory to the analysis of DNA sequences

    Science.gov (United States)

    Grosse, Ivo

    2000-10-01

    DNA carries the genetic information of most living organisms, and the of genome projects is to uncover that genetic information. One basic task in the analysis of DNA sequences is the recognition of protein coding genes. Powerful computer programs for gene recognition have been developed, but most of them are based on statistical patterns that vary from species to species. In this thesis I address the question if there exist universal statistical patterns that are different in coding and noncoding DNA of all living species, regardless of their phylogenetic origin. In search for such species-independent patterns I study the mutual information function of genomic DNA sequences, and find that it shows persistent period-three oscillations. To understand the biological origin of the observed period-three oscillations, I compare the mutual information function of genomic DNA sequences to the mutual information function of stochastic model sequences. I find that the pseudo-exon model is able to reproduce the mutual information function of genomic DNA sequences. Moreover, I find that a generalization of the pseudo-exon model can connect the existence and the functional form of long-range correlations to the presence and the length distributions of coding and noncoding regions. Based on these theoretical studies I am able to find an information-theoretical quantity, the average mutual information (AMI), whose probability distributions are significantly different in coding and noncoding DNA, while they are almost identical in all studied species. These findings show that there exist universal statistical patterns that are different in coding and noncoding DNA of all studied species, and they suggest that the AMI may be used to identify genes in different living species, irrespective of their taxonomic origin.

  3. New quantum codes constructed from quaternary BCH codes

    Science.gov (United States)

    Xu, Gen; Li, Ruihu; Guo, Luobin; Ma, Yuena

    2016-10-01

    In this paper, we firstly study construction of new quantum error-correcting codes (QECCs) from three classes of quaternary imprimitive BCH codes. As a result, the improved maximal designed distance of these narrow-sense imprimitive Hermitian dual-containing quaternary BCH codes are determined to be much larger than the result given according to Aly et al. (IEEE Trans Inf Theory 53:1183-1188, 2007) for each different code length. Thus, families of new QECCs are newly obtained, and the constructed QECCs have larger distance than those in the previous literature. Secondly, we apply a combinatorial construction to the imprimitive BCH codes with their corresponding primitive counterpart and construct many new linear quantum codes with good parameters, some of which have parameters exceeding the finite Gilbert-Varshamov bound for linear quantum codes.

  4. Factors affecting the duration of nestling period and fledging order in Tengmalm's owl (Aegolius funereus: effect of wing length and hatching sequence.

    Directory of Open Access Journals (Sweden)

    Marek Kouba

    Full Text Available In altricial birds, the nestling period is an important part of the breeding phase because the juveniles may spend quite a long time in the nest, with associated high energy costs for the parents. The length of the nestling period can be variable and its duration may be influenced by both biotic and abiotic factors; however, studies of this have mostly been undertaken on passerine birds. We studied individual duration of nestling period of 98 Tengmalm's owl chicks (Aegolius funereus at 27 nests during five breeding seasons using a camera and chip system and radio-telemetry. We found the nestlings stayed in the nest box for 27 - 38 days from hatching (mean ± SD, 32.4 ± 2.2 days. The individual duration of nestling period was negatively related to wing length, but no formally significant effect was found for body weight, sex, prey availability and/or weather conditions. The fledging sequence of individual nestlings was primarily related to hatching order; no relationship with wing length and/or other factors was found in this case. We suggest the length of wing is the most important measure of body condition and individual quality in Tengmalm's owl young determining the duration of the nestling period. Other differences from passerines (e.g., the lack of effect of weather or prey availability on nestling period are considered likely to be due to different life-history traits, in particular different food habits and nesting sites and greater risk of nest predation among passerines.

  5. Phylogenetic analyses of the polyprotein coding sequences of serotype O foot-and-mouth disease viruses in East Africa: evidence for interserotypic recombination

    DEFF Research Database (Denmark)

    Balinda, Sheila; Siegismund, Hans; Muwanika, Vincent

    2010-01-01

    from both serotypes A and O. Conclusions Sequences of the VP1 coding region from recent serotype O FMDVs from Kenya and Uganda are all representatives of a specific East African lineage (topotype EA-2), a probable indication that hardly any FMD introductions of this serotype have occurred from outside...... the region in the recent past. Furthermore, evidence for interserotypic recombination, within the non-structural protein coding regions, between FMDVs of serotypes A and O has been obtained. In addition to characterization using the VP1 coding region, analyses involving the non-structural protein coding...

  6. Targeted sequencing of large genomic regions with CATCH-Seq.

    Directory of Open Access Journals (Sweden)

    Kenneth Day

    Full Text Available Current target enrichment systems for large-scale next-generation sequencing typically require synthetic oligonucleotides used as capture reagents to isolate sequences of interest. The majority of target enrichment reagents are focused on gene coding regions or promoters en masse. Here we introduce development of a customizable targeted capture system using biotinylated RNA probe baits transcribed from sheared bacterial artificial chromosome clone templates that enables capture of large, contiguous blocks of the genome for sequencing applications. This clone adapted template capture hybridization sequencing (CATCH-Seq procedure can be used to capture both coding and non-coding regions of a gene, and resolve the boundaries of copy number variations within a genomic target site. Furthermore, libraries constructed with methylated adapters prior to solution hybridization also enable targeted bisulfite sequencing. We applied CATCH-Seq to diverse targets ranging in size from 125 kb to 3.5 Mb. Our approach provides a simple and cost effective alternative to other capture platforms because of template-based, enzymatic probe synthesis and the lack of oligonucleotide design costs. Given its similarity in procedure, CATCH-Seq can also be performed in parallel with commercial systems.

  7. Monomorphism in humans and sequence differences among higher primates for a sequence tagged site (STS) in homeo box cluster 2 as assayed by denaturing gradient electrophoresis

    Energy Technology Data Exchange (ETDEWEB)

    Ruano, G.; Ruddle, F.H.; Kidd, K.K. (Yale Univ., New Haven, CT (United States)); Gray, M.R. (Tufts Univ., Boston, MA (United States)); Miki, Tetsuro (Osaka Univ. (Japan)); Ferguson-Smith, A.C. (Inst. of Animal Physiology and Genetics Research, Cambridge (United Kingdom))

    1990-03-11

    The human homeo box cluster 2 (HOX2) contains genes coding for DNA binding proteins involved in developmental control and is highly conserved between mouse and man. The authors have applied in concert the Polymerase Chain Reaction (PCR) and Denaturing Gradient Electrophoresis (DGE) to amplify defined primate HOX2 segments and to detect sequence differences among them. They have sequenced a PstI fragment 4 kb upstream from HOX 2.2 and synthesized primers delimiting both halves of 630 bp segment within it PCR on various unrelated humans and SC-PCR on chimpanzee, gorilla, orangutan and gibbon yielded products of the same length for each primer pair.

  8. Retrieval and Representation of Nucleotide Sequence of ...

    African Journals Online (AJOL)

    Nigerian Journal of Basic and Applied Science (March, 2013), 21(1): 27-32 ... Full Length R esearch A rticle ... The present study highlights data retrieval and representation. .... the end of information and the start of the sequence on the next ...

  9. The Fibonacci-Padovan sequences in finite groups

    Directory of Open Access Journals (Sweden)

    Sait Tas

    2014-11-01

    Full Text Available The Fibonacci-Padovan sequence modulo m was studied. Also, the Fibonacci-Padovan orbits of -generator finite groups such that was examined. The Fibonacci-Padovan lengths of the groups , and for , where Z is integer, were then obtained.

  10. Secretory production of tetrameric native full-length streptavidin with thermostability using Streptomyces lividans as a host.

    Science.gov (United States)

    Noda, Shuhei; Matsumoto, Takuya; Tanaka, Tsutomu; Kondo, Akihiko

    2015-01-13

    Streptavidin is a tetrameric protein derived from Streptomyces avidinii, and has tight and specific biotin binding affinity. Applications of the streptavidin-biotin system have been widely studied. Streptavidin is generally produced using protein expression in Escherichia coli. In the present study, the secretory production of streptavidin was carried out using Streptomyces lividans as a host. In this study, we used the gene encoding native full-length streptavidin, whereas the core region is generally used for streptavidin production in E. coli. Tetrameric streptavidin composed of native full-length streptavidin monomers was successfully secreted in the culture supernatant of S. lividans transformants, and had specific biotin binding affinity as strong as streptavidin produced by E. coli. The amount of Sav using S. lividans was about 9 times higher than using E. coli. Surprisingly, streptavidin produced by S. lividans exhibited affinity to biotin after boiling, despite the fact that tetrameric streptavidin is known to lose its biotin binding ability after brief boiling. We successfully produced a large amount of tetrameric streptavidin as a secretory-form protein with unique thermotolerance.

  11. Universal global imprints of genome growth and evolution--equivalent length and cumulative mutation density.

    Directory of Open Access Journals (Sweden)

    Hong-Da Chen

    Full Text Available BACKGROUND: Segmental duplication is widely held to be an important mode of genome growth and evolution. Yet how this would affect the global structure of genomes has been little discussed. METHODS/PRINCIPAL FINDINGS: Here, we show that equivalent length, or L(e, a quantity determined by the variance of fluctuating part of the distribution of the k-mer frequencies in a genome, characterizes the latter's global structure. We computed the L(es of 865 complete chromosomes and found that they have nearly universal but (k-dependent values. The differences among the L(e of a chromosome and those of its coding and non-coding parts were found to be slight. CONCLUSIONS: We verified that these non-trivial results are natural consequences of a genome growth model characterized by random segmental duplication and random point mutation, but not of any model whose dominant growth mechanism is not segmental duplication. Our study also indicates that genomes have a nearly universal cumulative "point" mutation density of about 0.73 mutations per site that is compatible with the relatively low mutation rates of (1-5 x 10(-3/site/Mya previously determined by sequence comparison for the human and E. coli genomes.

  12. New MDS or near MDS self-dual codes over finite fields

    OpenAIRE

    Tong, Hongxi; Wang, Xiaoqing

    2016-01-01

    The study of MDS self-dual codes has attracted lots of attention in recent years. There are many papers on determining existence of $q-$ary MDS self-dual codes for various lengths. There are not existence of $q-$ary MDS self-dual codes of some lengths, even these lengths $< q$. We generalize MDS Euclidean self-dual codes to near MDS Euclidean self-dual codes and near MDS isodual codes. And we obtain many new near MDS isodual codes from extended negacyclic duadic codes and we obtain many new M...

  13. Common-Message Broadcast Channels with Feedback in the Nonasymptotic Regime: Full Feedback

    DEFF Research Database (Denmark)

    Trillingsgaard, Kasper Fløe; Yang, Wei; Durisi, Giuseppe

    2018-01-01

    We investigate the maximum coding rate achievable on a two-user broadcast channel for the case where a common message is transmitted with feedback using either fixed-blocklength codes or variable-length codes. For the fixed-blocklength-code setup, we establish nonasymptotic converse and achievabi......We investigate the maximum coding rate achievable on a two-user broadcast channel for the case where a common message is transmitted with feedback using either fixed-blocklength codes or variable-length codes. For the fixed-blocklength-code setup, we establish nonasymptotic converse...... and achievability bounds. An asymptotic analysis of these bounds reveals that feedback improves the second-order term compared to the no-feedback case. In particular, for a certain class of anti-symmetric broadcast channels, we show that the dispersion is halved. For the variable-length-code setup, we demonstrate...

  14. Design of Long Period Pseudo-Random Sequences from the Addition of m -Sequences over 𝔽 p

    Directory of Open Access Journals (Sweden)

    Ren Jian

    2004-01-01

    Full Text Available Pseudo-random sequence with good correlation property and large linear span is widely used in code division multiple access (CDMA communication systems and cryptology for reliable and secure information transmission. In this paper, sequences with long period, large complexity, balance statistics, and low cross-correlation property are constructed from the addition of m -sequences with pairwise-prime linear spans (AMPLS. Using m -sequences as building blocks, the proposed method proved to be an efficient and flexible approach to construct long period pseudo-random sequences with desirable properties from short period sequences. Applying the proposed method to 𝔽 2 , a signal set ( ( 2 n − 1 ( 2 m − 1 , ( 2 n + 1 ( 2 m + 1 , ( 2 ( n + 1 / 2 + 1 ( 2 ( m + 1 / 2 + 1 is constructed.

  15. Lossless quantum data compression and variable-length coding

    International Nuclear Information System (INIS)

    Bostroem, Kim; Felbinger, Timo

    2002-01-01

    In order to compress quantum messages without loss of information it is necessary to allow the length of the encoded messages to vary. We develop a general framework for variable-length quantum messages in close analogy to the classical case and show that lossless compression is only possible if the message to be compressed is known to the sender. The lossless compression of an ensemble of messages is bounded from below by its von-Neumann entropy. We show that it is possible to reduce the number of qbits passing through a quantum channel even below the von Neumann entropy by adding a classical side channel. We give an explicit communication protocol that realizes lossless and instantaneous quantum data compression and apply it to a simple example. This protocol can be used for both online quantum communication and storage of quantum data

  16. Human microcephaly protein RTTN interacts with STIL and is required to build full-length centrioles.

    Science.gov (United States)

    Chen, Hsin-Yi; Wu, Chien-Ting; Tang, Chieh-Ju C; Lin, Yi-Nan; Wang, Won-Jing; Tang, Tang K

    2017-08-15

    Mutations in many centriolar protein-encoding genes cause primary microcephaly. Using super-resolution and electron microscopy, we find that the human microcephaly protein, RTTN, is recruited to the proximal end of the procentriole at early S phase, and is located at the inner luminal walls of centrioles. Further studies demonstrate that RTTN directly interacts with STIL and acts downstream of STIL-mediated centriole assembly. CRISPR/Cas9-mediated RTTN gene knockout in p53-deficient cells induce amplification of primitive procentriole bodies that lack the distal-half centriolar proteins, POC5 and POC1B. Additional analyses show that RTTN serves as an upstream effector of CEP295, which mediates the loading of POC1B and POC5 to the distal-half centrioles. Interestingly, the naturally occurring microcephaly-associated mutant, RTTN (A578P), shows a low affinity for STIL binding and blocks centriole assembly. These findings reveal that RTTN contributes to building full-length centrioles and illuminate the molecular mechanism through which the RTTN (A578P) mutation causes primary microcephaly.Mutations in many centriolar protein-encoding genes cause primary microcephaly. Here the authors show that human microcephaly protein RTTN directly interacts with STIL and acts downstream of STIL-mediated centriole assembly, contributing to building full-length centrioles.

  17. HeurAA: accurate and fast detection of genetic variations with a novel heuristic amplicon aligner program for next generation sequencing.

    Directory of Open Access Journals (Sweden)

    Lőrinc S Pongor

    Full Text Available Next generation sequencing (NGS of PCR amplicons is a standard approach to detect genetic variations in personalized medicine such as cancer diagnostics. Computer programs used in the NGS community often miss insertions and deletions (indels that constitute a large part of known human mutations. We have developed HeurAA, an open source, heuristic amplicon aligner program. We tested the program on simulated datasets as well as experimental data from multiplex sequencing of 40 amplicons in 12 oncogenes collected on a 454 Genome Sequencer from lung cancer cell lines. We found that HeurAA can accurately detect all indels, and is more than an order of magnitude faster than previous programs. HeurAA can compare reads and reference sequences up to several thousand base pairs in length, and it can evaluate data from complex mixtures containing reads of different gene-segments from different samples. HeurAA is written in C and Perl for Linux operating systems, the code and the documentation are available for research applications at http://sourceforge.net/projects/heuraa/

  18. Full Core Criticality Modeling of Gas-Cooled Fast Reactor Using the SCALE6.0 and MCNP5 Code Packages

    International Nuclear Information System (INIS)

    Matijevic, M.; Jecmenica, R.; Pevec, D.; Trontl, K.

    2012-01-01

    The Gas-Cooled Fast Reactor (GFR) is one of the reactor concepts selected by the Generation IV International Forum (GIF) for the next generation of innovative nuclear energy systems. It was selected among a group of more than 100 prototypes and his commercial availability is expected by 2030. GFR has common goals of the rest GIF advanced reactor types: economy, safety, proliferation resistance, availability and sustainability. Several GFR fuel design concepts such as plates, rod pins and pebbles are currently being investigated in order to meet the high temperature constraints characteristic for a GFR working enviroment. In the previous study we have compared the fuel depletion results for heterogeneous GFR fuel assembly (FA), obtained with TRITON6 sequence of SCALE6.0 code system, with the MCNPX-CINDER90 and TRIPOLI-4-D codes. Present work is a continuation of neutronic criticality analysis of heterogeneous FA and full core configurations of a GFR concept using 3-D Monte Carlo codes KENO-VI/SCALE6.0 and MCNP5. The FA is based on a hexagonal mesh of fuel rods (uranium and plutonium carbide fuel, silicon carbide clad, helium gas coolant) with axial reflector thickness being varied for the purpose of optimization. Three reflector materials were analysed: zirconium carbide (ZrC), silicon carbide (SiC) and natural uranium. ZrC has been selected as a reflector material, having the best contribution to the neutron economy and to the reactivity of the core. The core safety parameters were also analysed: a negative temperature coefficient of reactivity was verified for the heavy metal fuel and coolant density loss. Criticality calculations of different FA active heights were performed and the reflector thickness was also adjusted. Finally, GFR full core criticality calculations using different active fuel rod heights and fixed ZrC reflector height were done to find the optimal height of the core. The Shannon entropy of the GFR core fission distribution was proved to be

  19. Microtomography-based comparison of reciprocating single-file F2 ProTaper technique versus rotary full sequence.

    Science.gov (United States)

    Paqué, Frank; Zehnder, Matthias; De-Deus, Gustavo

    2011-10-01

    A preparation technique with only 1 single instrument was proposed on the basis of the reciprocating movement of the F2 ProTaper instrument. The present study was designed to quantitatively assess canal preparation outcomes achieved by this technique. Twenty-five extracted human mandibular first molars with 2 separate mesial root canals were selected. Canals were randomly assigned to 1 of the 2 experimental groups: group 1, rotary conventional preparation by using ProTaper, and group 2, reciprocate instrumentation with 1 single ProTaper F2 instrument. Specimens were scanned initially and after root canal preparation with an isotropic resolution of 20 μm by using a micro-computed tomography system. The following parameters were assessed: changes in dentin volume, percentage of shaped canal walls, and degree of canal transportation. In addition, the time required to reach working length with the F2 instrument was recorded. Preoperatively, there were no differences regarding root canal curvature and volume between experimental groups. Overall, instrumentation led to enlarged canal shapes with no evidence of preparation errors. There were no statistical differences between the 2 preparation techniques in the anatomical parameters assessed (P > .01), except for a significantly higher canal transportation caused by the reciprocating file in the coronal canal third. On the other hand, preparation was faster by using the single-file technique (P ProTaper technique and conventional ProTaper full-sequence rotary approach were similar. However, the single-file F2 ProTaper technique was markedly faster in reaching working length. Copyright © 2011 American Association of Endodontists. Published by Elsevier Inc. All rights reserved.

  20. Chord length distribution for a compound capsule

    International Nuclear Information System (INIS)

    Pitřík, Pavel

    2017-01-01

    Chord length distribution is a factor important in the calculation of ionisation chamber responses. This article describes Monte Carlo calculations of the chord length distribution for a non-convex compound capsule. A Monte Carlo code was set up for generation of random chords and calculation of their lengths based on the input number of generations and cavity dimensions. The code was written in JavaScript and can be executed in the majority of HTML viewers. The plot of occurrence of cords of different lengths has 3 peaks. It was found that the compound capsule cavity cannot be simply replaced with a spherical cavity of a triangular design. Furthermore, the compound capsule cavity is directionally dependent, which must be taken into account in calculations involving non-isotropic fields of primary particles in the beam, unless equilibrium of the secondary charged particles is attained. (orig.)