WorldWideScience

Sample records for classifying coding dna

  1. Classifying Coding DNA with Nucleotide Statistics

    Directory of Open Access Journals (Sweden)

    Nicolas Carels

    2009-10-01

    Full Text Available In this report, we compared the success rate of classification of coding sequences (CDS vs. introns by Codon Structure Factor (CSF and by a method that we called Universal Feature Method (UFM. UFM is based on the scoring of purine bias (Rrr and stop codon frequency. We show that the success rate of CDS/intron classification by UFM is higher than by CSF. UFM classifies ORFs as coding or non-coding through a score based on (i the stop codon distribution, (ii the product of purine probabilities in the three positions of nucleotide triplets, (iii the product of Cytosine (C, Guanine (G, and Adenine (A probabilities in the 1st, 2nd, and 3rd positions of triplets, respectively, (iv the probabilities of G in 1st and 2nd position of triplets and (v the distance of their GC3 vs. GC2 levels to the regression line of the universal correlation. More than 80% of CDSs (true positives of Homo sapiens (>250 bp, Drosophila melanogaster (>250 bp and Arabidopsis thaliana (>200 bp are successfully classified with a false positive rate lower or equal to 5%. The method releases coding sequences in their coding strand and coding frame, which allows their automatic translation into protein sequences with 95% confidence. The method is a natural consequence of the compositional bias of nucleotides in coding sequences.

  2. DNA: Polymer and molecular code

    Science.gov (United States)

    Shivashankar, G. V.

    1999-10-01

    The thesis work focusses upon two aspects of DNA, the polymer and the molecular code. Our approach was to bring single molecule micromanipulation methods to the study of DNA. It included a home built optical microscope combined with an atomic force microscope and an optical tweezer. This combined approach led to a novel method to graft a single DNA molecule onto a force cantilever using the optical tweezer and local heating. With this method, a force versus extension assay of double stranded DNA was realized. The resolution was about 10 picoN. To improve on this force measurement resolution, a simple light backscattering technique was developed and used to probe the DNA polymer flexibility and its fluctuations. It combined the optical tweezer to trap a DNA tethered bead and the laser backscattering to detect the beads Brownian fluctuations. With this technique the resolution was about 0.1 picoN with a millisecond access time, and the whole entropic part of the DNA force-extension was measured. With this experimental strategy, we measured the polymerization of the protein RecA on an isolated double stranded DNA. We observed the progressive decoration of RecA on the l DNA molecule, which results in the extension of l , due to unwinding of the double helix. The dynamics of polymerization, the resulting change in the DNA entropic elasticity and the role of ATP hydrolysis were the main parts of the study. A simple model for RecA assembly on DNA was proposed. This work presents a first step in the study of genetic recombination. Recently we have started a study of equilibrium binding which utilizes fluorescence polarization methods to probe the polymerization of RecA on single stranded DNA. In addition to the study of material properties of DNA and DNA-RecA, we have developed experiments for which the code of the DNA is central. We studied one aspect of DNA as a molecular code, using different techniques. In particular the programmatic use of template specificity makes

  3. Clinical strains of acinetobacter classified by DNA-DNA hybridization

    International Nuclear Information System (INIS)

    Tjernberg, I.; Ursing, J.

    1989-01-01

    A collection of Acinetobacter strains consisting of 168 consecutive clinical strains and 30 type and reference strains was studied by DNA-DNA hybridization and a few phenotypic tests. The field strains could be allotted to 13 DNA groups. By means of reference strains ten of these could be identified with groups described by Bouvet and Grimont (1986), while three groups were new; they were given the numbers 13-15. The type strain of A. radioresistens- recently described by Nishimura et al. (1988) - was shown to be a member of DNA group 12, which comprised 31 clinical isolates. Of the 19 strains of A. junii, eight showed hemolytic acitivity on sheep and human blood agar and an additional four strains on human blood agar only. Strains of this species have previously been regarded as non-hemolytic. Reciprocal DNA pairing data for the reference strains of the DNA gropus were treated by UPGMA clustering. The reference strains for A. calcoaceticus, A. baumannii and DNA groups 3 and 13 formed a cluster with about 70% relatedness within the cluster. Other DNA groups joined at levels below 60%. (author)

  4. Clinical strains of acinetobacter classified by DNA-DNA hybridization

    Energy Technology Data Exchange (ETDEWEB)

    Tjernberg, I; Ursing, J [Department of Medical Microbiology, University of Lund, Malmoe General Hospital, Malmoe (Sweden)

    1989-01-01

    A collection of Acinetobacter strains consisting of 168 consecutive clinical strains and 30 type and reference strains was studied by DNA-DNA hybridization and a few phenotypic tests. The field strains could be allotted to 13 DNA groups. By means of reference strains ten of these could be identified with groups described by Bouvet and Grimont (1986), while three groups were new; they were given the numbers 13-15. The type strain of A. radioresistens- recently described by Nishimura et al. (1988) - was shown to be a member of DNA group 12, which comprised 31 clinical isolates. Of the 19 strains of A. junii, eight showed hemolytic acitivity on sheep and human blood agar and an additional four strains on human blood agar only. Strains of this species have previously been regarded as non-hemolytic. Reciprocal DNA pairing data for the reference strains of the DNA gropus were treated by UPGMA clustering. The reference strains for A. calcoaceticus, A. baumannii and DNA groups 3 and 13 formed a cluster with about 70% relatedness within the cluster. Other DNA groups joined at levels below 60%. (author).

  5. IN-MACA-MCC: Integrated Multiple Attractor Cellular Automata with Modified Clonal Classifier for Human Protein Coding and Promoter Prediction

    Directory of Open Access Journals (Sweden)

    Kiran Sree Pokkuluri

    2014-01-01

    Full Text Available Protein coding and promoter region predictions are very important challenges of bioinformatics (Attwood and Teresa, 2000. The identification of these regions plays a crucial role in understanding the genes. Many novel computational and mathematical methods are introduced as well as existing methods that are getting refined for predicting both of the regions separately; still there is a scope for improvement. We propose a classifier that is built with MACA (multiple attractor cellular automata and MCC (modified clonal classifier to predict both regions with a single classifier. The proposed classifier is trained and tested with Fickett and Tung (1992 datasets for protein coding region prediction for DNA sequences of lengths 54, 108, and 162. This classifier is trained and tested with MMCRI datasets for protein coding region prediction for DNA sequences of lengths 252 and 354. The proposed classifier is trained and tested with promoter sequences from DBTSS (Yamashita et al., 2006 dataset and nonpromoters from EID (Saxonov et al., 2000 and UTRdb (Pesole et al., 2002 datasets. The proposed model can predict both regions with an average accuracy of 90.5% for promoter and 89.6% for protein coding region predictions. The specificity and sensitivity values of promoter and protein coding region predictions are 0.89 and 0.92, respectively.

  6. Identifying aggressive prostate cancer foci using a DNA methylation classifier.

    Science.gov (United States)

    Mundbjerg, Kamilla; Chopra, Sameer; Alemozaffar, Mehrdad; Duymich, Christopher; Lakshminarasimhan, Ranjani; Nichols, Peter W; Aron, Manju; Siegmund, Kimberly D; Ukimura, Osamu; Aron, Monish; Stern, Mariana; Gill, Parkash; Carpten, John D; Ørntoft, Torben F; Sørensen, Karina D; Weisenberger, Daniel J; Jones, Peter A; Duddalwar, Vinay; Gill, Inderbir; Liang, Gangning

    2017-01-12

    Slow-growing prostate cancer (PC) can be aggressive in a subset of cases. Therefore, prognostic tools to guide clinical decision-making and avoid overtreatment of indolent PC and undertreatment of aggressive disease are urgently needed. PC has a propensity to be multifocal with several different cancerous foci per gland. Here, we have taken advantage of the multifocal propensity of PC and categorized aggressiveness of individual PC foci based on DNA methylation patterns in primary PC foci and matched lymph node metastases. In a set of 14 patients, we demonstrate that over half of the cases have multiple epigenetically distinct subclones and determine the primary subclone from which the metastatic lesion(s) originated. Furthermore, we develop an aggressiveness classifier consisting of 25 DNA methylation probes to determine aggressive and non-aggressive subclones. Upon validation of the classifier in an independent cohort, the predicted aggressive tumors are significantly associated with the presence of lymph node metastases and invasive tumor stages. Overall, this study provides molecular-based support for determining PC aggressiveness with the potential to impact clinical decision-making, such as targeted biopsy approaches for early diagnosis and active surveillance, in addition to focal therapy.

  7. Superimposed Code Theorectic Analysis of DNA Codes and DNA Computing

    Science.gov (United States)

    2010-03-01

    that the hybridization that occurs between a DNA strand and its Watson - Crick complement can be used to perform mathematical computation. This research...ssDNA single stranded DNA WC Watson – Crick A Adenine C Cytosine G Guanine T Thymine ... Watson - Crick (WC) duplex, e.g., TCGCA TCGCA . Note that non-WC duplexes can form and such a formation is called a cross-hybridization. Cross

  8. On the statistical assessment of classifiers using DNA microarray data

    Directory of Open Access Journals (Sweden)

    Carella M

    2006-08-01

    Full Text Available Abstract Background In this paper we present a method for the statistical assessment of cancer predictors which make use of gene expression profiles. The methodology is applied to a new data set of microarray gene expression data collected in Casa Sollievo della Sofferenza Hospital, Foggia – Italy. The data set is made up of normal (22 and tumor (25 specimens extracted from 25 patients affected by colon cancer. We propose to give answers to some questions which are relevant for the automatic diagnosis of cancer such as: Is the size of the available data set sufficient to build accurate classifiers? What is the statistical significance of the associated error rates? In what ways can accuracy be considered dependant on the adopted classification scheme? How many genes are correlated with the pathology and how many are sufficient for an accurate colon cancer classification? The method we propose answers these questions whilst avoiding the potential pitfalls hidden in the analysis and interpretation of microarray data. Results We estimate the generalization error, evaluated through the Leave-K-Out Cross Validation error, for three different classification schemes by varying the number of training examples and the number of the genes used. The statistical significance of the error rate is measured by using a permutation test. We provide a statistical analysis in terms of the frequencies of the genes involved in the classification. Using the whole set of genes, we found that the Weighted Voting Algorithm (WVA classifier learns the distinction between normal and tumor specimens with 25 training examples, providing e = 21% (p = 0.045 as an error rate. This remains constant even when the number of examples increases. Moreover, Regularized Least Squares (RLS and Support Vector Machines (SVM classifiers can learn with only 15 training examples, with an error rate of e = 19% (p = 0.035 and e = 18% (p = 0.037 respectively. Moreover, the error rate

  9. Flexibility of the genetic code with respect to DNA structure

    DEFF Research Database (Denmark)

    Baisnée, P. F.; Baldi, Pierre; Brunak, Søren

    2001-01-01

    Motivation. The primary function of DNA is to carry genetic information through the genetic code. DNA, however, contains a variety of other signals related, for instance, to reading frame, codon bias, pairwise codon bias, splice sites and transcription regulation, nucleosome positioning and DNA...... structure. Here we study the relationship between the genetic code and DNA structure and address two questions. First, to which degree does the degeneracy of the genetic code and the acceptable amino acid substitution patterns allow for the superimposition of DNA structural signals to protein coding...... sequences? Second, is the origin or evolution of the genetic code likely to have been constrained by DNA structure? Results. We develop an index for code flexibility with respect to DNA structure. Using five different di- or tri-nucleotide models of sequence-dependent DNA structure, we show...

  10. Indications for spine surgery: validation of an administrative coding algorithm to classify degenerative diagnoses

    Science.gov (United States)

    Lurie, Jon D.; Tosteson, Anna N.A.; Deyo, Richard A.; Tosteson, Tor; Weinstein, James; Mirza, Sohail K.

    2014-01-01

    Study Design Retrospective analysis of Medicare claims linked to a multi-center clinical trial. Objective The Spine Patient Outcomes Research Trial (SPORT) provided a unique opportunity to examine the validity of a claims-based algorithm for grouping patients by surgical indication. SPORT enrolled patients for lumbar disc herniation, spinal stenosis, and degenerative spondylolisthesis. We compared the surgical indication derived from Medicare claims to that provided by SPORT surgeons, the “gold standard”. Summary of Background Data Administrative data are frequently used to report procedure rates, surgical safety outcomes, and costs in the management of spinal surgery. However, the accuracy of using diagnosis codes to classify patients by surgical indication has not been examined. Methods Medicare claims were link to beneficiaries enrolled in SPORT. The sensitivity and specificity of three claims-based approaches to group patients based on surgical indications were examined: 1) using the first listed diagnosis; 2) using all diagnoses independently; and 3) using a diagnosis hierarchy based on the support for fusion surgery. Results Medicare claims were obtained from 376 SPORT participants, including 21 with disc herniation, 183 with spinal stenosis, and 172 with degenerative spondylolisthesis. The hierarchical coding algorithm was the most accurate approach for classifying patients by surgical indication, with sensitivities of 76.2%, 88.1%, and 84.3% for disc herniation, spinal stenosis, and degenerative spondylolisthesis cohorts, respectively. The specificity was 98.3% for disc herniation, 83.2% for spinal stenosis, and 90.7% for degenerative spondylolisthesis. Misclassifications were primarily due to codes attributing more complex pathology to the case. Conclusion Standardized approaches for using claims data to accurately group patients by surgical indications has widespread interest. We found that a hierarchical coding approach correctly classified over 90

  11. nRC: non-coding RNA Classifier based on structural features.

    Science.gov (United States)

    Fiannaca, Antonino; La Rosa, Massimo; La Paglia, Laura; Rizzo, Riccardo; Urso, Alfonso

    2017-01-01

    Non-coding RNA (ncRNA) are small non-coding sequences involved in gene expression regulation of many biological processes and diseases. The recent discovery of a large set of different ncRNAs with biologically relevant roles has opened the way to develop methods able to discriminate between the different ncRNA classes. Moreover, the lack of knowledge about the complete mechanisms in regulative processes, together with the development of high-throughput technologies, has required the help of bioinformatics tools in addressing biologists and clinicians with a deeper comprehension of the functional roles of ncRNAs. In this work, we introduce a new ncRNA classification tool, nRC (non-coding RNA Classifier). Our approach is based on features extraction from the ncRNA secondary structure together with a supervised classification algorithm implementing a deep learning architecture based on convolutional neural networks. We tested our approach for the classification of 13 different ncRNA classes. We obtained classification scores, using the most common statistical measures. In particular, we reach an accuracy and sensitivity score of about 74%. The proposed method outperforms other similar classification methods based on secondary structure features and machine learning algorithms, including the RNAcon tool that, to date, is the reference classifier. nRC tool is freely available as a docker image at https://hub.docker.com/r/tblab/nrc/. The source code of nRC tool is also available at https://github.com/IcarPA-TBlab/nrc.

  12. On DNA codes from a family of chain rings

    Directory of Open Access Journals (Sweden)

    Elif Segah Oztas

    2017-01-01

    Full Text Available In this work, we focus on reversible cyclic codes which correspond to reversible DNA codes or reversible-complement DNA codes over a family of finite chain rings, in an effort to extend what was done by Yildiz and Siap in [20]. The ring family that we have considered are of size $2^{2^k}$, $k=1,2, \\cdots$ and we match each ring element with a DNA $2^{k-1}$-mer. We use the so-called $u^2$-adic digit system to solve the reversibility problem and we characterize cyclic codes that correspond to reversible-complement DNA-codes. We then conclude our study with some examples.

  13. DNA barcode goes two-dimensions: DNA QR code web server.

    Science.gov (United States)

    Liu, Chang; Shi, Linchun; Xu, Xiaolan; Li, Huan; Xing, Hang; Liang, Dong; Jiang, Kun; Pang, Xiaohui; Song, Jingyuan; Chen, Shilin

    2012-01-01

    The DNA barcoding technology uses a standard region of DNA sequence for species identification and discovery. At present, "DNA barcode" actually refers to DNA sequences, which are not amenable to information storage, recognition, and retrieval. Our aim is to identify the best symbology that can represent DNA barcode sequences in practical applications. A comprehensive set of sequences for five DNA barcode markers ITS2, rbcL, matK, psbA-trnH, and CO1 was used as the test data. Fifty-three different types of one-dimensional and ten two-dimensional barcode symbologies were compared based on different criteria, such as coding capacity, compression efficiency, and error detection ability. The quick response (QR) code was found to have the largest coding capacity and relatively high compression ratio. To facilitate the further usage of QR code-based DNA barcodes, a web server was developed and is accessible at http://qrfordna.dnsalias.org. The web server allows users to retrieve the QR code for a species of interests, convert a DNA sequence to and from a QR code, and perform species identification based on local and global sequence similarities. In summary, the first comprehensive evaluation of various barcode symbologies has been carried out. The QR code has been found to be the most appropriate symbology for DNA barcode sequences. A web server has also been constructed to allow biologists to utilize QR codes in practical DNA barcoding applications.

  14. DNA barcode goes two-dimensions: DNA QR code web server.

    Directory of Open Access Journals (Sweden)

    Chang Liu

    Full Text Available The DNA barcoding technology uses a standard region of DNA sequence for species identification and discovery. At present, "DNA barcode" actually refers to DNA sequences, which are not amenable to information storage, recognition, and retrieval. Our aim is to identify the best symbology that can represent DNA barcode sequences in practical applications. A comprehensive set of sequences for five DNA barcode markers ITS2, rbcL, matK, psbA-trnH, and CO1 was used as the test data. Fifty-three different types of one-dimensional and ten two-dimensional barcode symbologies were compared based on different criteria, such as coding capacity, compression efficiency, and error detection ability. The quick response (QR code was found to have the largest coding capacity and relatively high compression ratio. To facilitate the further usage of QR code-based DNA barcodes, a web server was developed and is accessible at http://qrfordna.dnsalias.org. The web server allows users to retrieve the QR code for a species of interests, convert a DNA sequence to and from a QR code, and perform species identification based on local and global sequence similarities. In summary, the first comprehensive evaluation of various barcode symbologies has been carried out. The QR code has been found to be the most appropriate symbology for DNA barcode sequences. A web server has also been constructed to allow biologists to utilize QR codes in practical DNA barcoding applications.

  15. DNA Barcoding through Quaternary LDPC Codes.

    Science.gov (United States)

    Tapia, Elizabeth; Spetale, Flavio; Krsticevic, Flavia; Angelone, Laura; Bulacio, Pilar

    2015-01-01

    For many parallel applications of Next-Generation Sequencing (NGS) technologies short barcodes able to accurately multiplex a large number of samples are demanded. To address these competitive requirements, the use of error-correcting codes is advised. Current barcoding systems are mostly built from short random error-correcting codes, a feature that strongly limits their multiplexing accuracy and experimental scalability. To overcome these problems on sequencing systems impaired by mismatch errors, the alternative use of binary BCH and pseudo-quaternary Hamming codes has been proposed. However, these codes either fail to provide a fine-scale with regard to size of barcodes (BCH) or have intrinsic poor error correcting abilities (Hamming). Here, the design of barcodes from shortened binary BCH codes and quaternary Low Density Parity Check (LDPC) codes is introduced. Simulation results show that although accurate barcoding systems of high multiplexing capacity can be obtained with any of these codes, using quaternary LDPC codes may be particularly advantageous due to the lower rates of read losses and undetected sample misidentification errors. Even at mismatch error rates of 10(-2) per base, 24-nt LDPC barcodes can be used to multiplex roughly 2000 samples with a sample misidentification error rate in the order of 10(-9) at the expense of a rate of read losses just in the order of 10(-6).

  16. DNA Barcoding through Quaternary LDPC Codes.

    Directory of Open Access Journals (Sweden)

    Elizabeth Tapia

    Full Text Available For many parallel applications of Next-Generation Sequencing (NGS technologies short barcodes able to accurately multiplex a large number of samples are demanded. To address these competitive requirements, the use of error-correcting codes is advised. Current barcoding systems are mostly built from short random error-correcting codes, a feature that strongly limits their multiplexing accuracy and experimental scalability. To overcome these problems on sequencing systems impaired by mismatch errors, the alternative use of binary BCH and pseudo-quaternary Hamming codes has been proposed. However, these codes either fail to provide a fine-scale with regard to size of barcodes (BCH or have intrinsic poor error correcting abilities (Hamming. Here, the design of barcodes from shortened binary BCH codes and quaternary Low Density Parity Check (LDPC codes is introduced. Simulation results show that although accurate barcoding systems of high multiplexing capacity can be obtained with any of these codes, using quaternary LDPC codes may be particularly advantageous due to the lower rates of read losses and undetected sample misidentification errors. Even at mismatch error rates of 10(-2 per base, 24-nt LDPC barcodes can be used to multiplex roughly 2000 samples with a sample misidentification error rate in the order of 10(-9 at the expense of a rate of read losses just in the order of 10(-6.

  17. DNA Bar-Coding for Phytoplasma Identification

    DEFF Research Database (Denmark)

    Makarova, Olga; Contaldo, Nicoletta; Paltrinieri, Samanta

    2013-01-01

    Phytoplasma identi fi cation has proved dif fi cult due to their inability to be maintained in vitro. DNA barcoding is an identi fi cation method based on comparison of a short DNA sequence with known sequences from a database. A DNA barcoding tool has been developed for phytoplasma identi fi cat...... genes, can be used to identify the following phytoplasma groups: 16SrI, 16SrII, 16SrIII, 16SrIV, 16SrV, 16SrVI, 16SrVII, 16SrIX, 16SrX, 16SrXI, 16SrXII, 16SrXV, 16SrXX, 16SrXXI....... cation. While other sequencebased methods may be well adapted to identification of particular strains of phytoplasmas, often they cannot be used for the simultaneous identification of phytoplasmas from different groups. The phytoplasma DNA barcoding protocol in this chapter, based on the tuf and 16SrRNA......Phytoplasma identi fi cation has proved dif fi cult due to their inability to be maintained in vitro. DNA barcoding is an identi fi cation method based on comparison of a short DNA sequence with known sequences from a database. A DNA barcoding tool has been developed for phytoplasma identi fi...

  18. On fuzzy semantic similarity measure for DNA coding.

    Science.gov (United States)

    Ahmad, Muneer; Jung, Low Tang; Bhuiyan, Md Al-Amin

    2016-02-01

    A coding measure scheme numerically translates the DNA sequence to a time domain signal for protein coding regions identification. A number of coding measure schemes based on numerology, geometry, fixed mapping, statistical characteristics and chemical attributes of nucleotides have been proposed in recent decades. Such coding measure schemes lack the biologically meaningful aspects of nucleotide data and hence do not significantly discriminate coding regions from non-coding regions. This paper presents a novel fuzzy semantic similarity measure (FSSM) coding scheme centering on FSSM codons׳ clustering and genetic code context of nucleotides. Certain natural characteristics of nucleotides i.e. appearance as a unique combination of triplets, preserving special structure and occurrence, and ability to own and share density distributions in codons have been exploited in FSSM. The nucleotides׳ fuzzy behaviors, semantic similarities and defuzzification based on the center of gravity of nucleotides revealed a strong correlation between nucleotides in codons. The proposed FSSM coding scheme attains a significant enhancement in coding regions identification i.e. 36-133% as compared to other existing coding measure schemes tested over more than 250 benchmarked and randomly taken DNA datasets of different organisms. Copyright © 2015 Elsevier Ltd. All rights reserved.

  19. Multi-Probe Based Artificial DNA Encoding and Matching Classifier for Hyperspectral Remote Sensing Imagery

    Directory of Open Access Journals (Sweden)

    Ke Wu

    2016-08-01

    Full Text Available In recent years, a novel matching classification strategy inspired by the artificial deoxyribonucleic acid (DNA technology has been proposed for hyperspectral remote sensing imagery. Such a method can describe brightness and shape information of a spectrum by encoding the spectral curve into a DNA strand, providing a more comprehensive way for spectral similarity comparison. However, it suffers from two problems: data volume is amplified when all of the bands participate in the encoding procedure and full-band comparison degrades the importance of bands carrying key information. In this paper, a new multi-probe based artificial DNA encoding and matching (MADEM method is proposed. In this method, spectral signatures are first transformed into DNA code words with a spectral feature encoding operation. After that, multiple probes for interesting classes are extracted to represent the specific fragments of DNA strands. During the course of spectral matching, the different probes are compared to obtain the similarity of different types of land covers. By computing the absolute vector distance (AVD between different probes of an unclassified spectrum and the typical DNA code words from the database, the class property of each pixel is set as the minimum distance class. The main benefit of this strategy is that the risk of redundant bands can be deeply reduced and critical spectral discrepancies can be enlarged. Two hyperspectral image datasets were tested. Comparing with the other classification methods, the overall accuracy can be improved from 1.22% to 10.09% and 1.19% to 15.87%, respectively. Furthermore, the kappa coefficient can be improved from 2.05% to 15.29% and 1.35% to 19.59%, respectively. This demonstrated that the proposed algorithm outperformed other traditional classification methods.

  20. Privacy rules for DNA databanks. Protecting coded 'future diaries'.

    Science.gov (United States)

    Annas, G J

    1993-11-17

    In privacy terms, genetic information is like medical information. But the information contained in the DNA molecule itself is more sensitive because it contains an individual's probabilistic "future diary," is written in a code that has only partially been broken, and contains information about an individual's parents, siblings, and children. Current rules for protecting the privacy of medical information cannot protect either genetic information or identifiable DNA samples stored in DNA databanks. A review of the legal and public policy rationales for protecting genetic privacy suggests that specific enforceable privacy rules for DNA databanks are needed. Four preliminary rules are proposed to govern the creation of DNA databanks, the collection of DNA samples for storage, limits on the use of information derived from the samples, and continuing obligations to those whose DNA samples are in the databanks.

  1. Superimposed Code Theoretic Analysis of Deoxyribonucleic Acid (DNA) Codes and DNA Computing

    Science.gov (United States)

    2010-01-01

    DNA strand and its Watson - Crick complement can be used to perform mathematical computation. This research addresses how the...Acid dsDNA double stranded DNA MOSAIC Mobile Stream Processing Cluster PCR Polymerase Chain Reaction RAM Random Access Memory ssDNA single stranded DNA WC Watson – Crick A Adenine C Cytosine G Guanine T Thymine ...are 5′→3′ and strands with strikethrough are 3′→5′. A dsDNA duplex formed between a strand and its reverse complement is called a

  2. Using supervised machine learning to code policy issues: Can classifiers generalize across contexts?

    NARCIS (Netherlands)

    Burscher, B.; Vliegenthart, R.; de Vreese, C.H.

    2015-01-01

    Content analysis of political communication usually covers large amounts of material and makes the study of dynamics in issue salience a costly enterprise. In this article, we present a supervised machine learning approach for the automatic coding of policy issues, which we apply to news articles

  3. Quantitative Profiling of Peptides from RNAs classified as non-coding

    Science.gov (United States)

    Prabakaran, Sudhakaran; Hemberg, Martin; Chauhan, Ruchi; Winter, Dominic; Tweedie-Cullen, Ry Y.; Dittrich, Christian; Hong, Elizabeth; Gunawardena, Jeremy; Steen, Hanno; Kreiman, Gabriel; Steen, Judith A.

    2014-01-01

    Only a small fraction of the mammalian genome codes for messenger RNAs destined to be translated into proteins, and it is generally assumed that a large portion of transcribed sequences - including introns and several classes of non-coding RNAs (ncRNAs) do not give rise to peptide products. A systematic examination of translation and physiological regulation of ncRNAs has not been conducted. Here, we use computational methods to identify the products of non-canonical translation in mouse neurons by analyzing unannotated transcripts in combination with proteomic data. This study supports the existence of non-canonical translation products from both intragenic and extragenic genomic regions, including peptides derived from anti-sense transcripts and introns. Moreover, the studied novel translation products exhibit temporal regulation similar to that of proteins known to be involved in neuronal activity processes. These observations highlight a potentially large and complex set of biologically regulated translational events from transcripts formerly thought to lack coding potential. PMID:25403355

  4. Artificial Intelligence Learning Semantics via External Resources for Classifying Diagnosis Codes in Discharge Notes.

    Science.gov (United States)

    Lin, Chin; Hsu, Chia-Jung; Lou, Yu-Sheng; Yeh, Shih-Jen; Lee, Chia-Cheng; Su, Sui-Lung; Chen, Hsiang-Cheng

    2017-11-06

    Automated disease code classification using free-text medical information is important for public health surveillance. However, traditional natural language processing (NLP) pipelines are limited, so we propose a method combining word embedding with a convolutional neural network (CNN). Our objective was to compare the performance of traditional pipelines (NLP plus supervised machine learning models) with that of word embedding combined with a CNN in conducting a classification task identifying International Classification of Diseases, Tenth Revision, Clinical Modification (ICD-10-CM) diagnosis codes in discharge notes. We used 2 classification methods: (1) extracting from discharge notes some features (terms, n-gram phrases, and SNOMED CT categories) that we used to train a set of supervised machine learning models (support vector machine, random forests, and gradient boosting machine), and (2) building a feature matrix, by a pretrained word embedding model, that we used to train a CNN. We used these methods to identify the chapter-level ICD-10-CM diagnosis codes in a set of discharge notes. We conducted the evaluation using 103,390 discharge notes covering patients hospitalized from June 1, 2015 to January 31, 2017 in the Tri-Service General Hospital in Taipei, Taiwan. We used the receiver operating characteristic curve as an evaluation measure, and calculated the area under the curve (AUC) and F-measure as the global measure of effectiveness. In 5-fold cross-validation tests, our method had a higher testing accuracy (mean AUC 0.9696; mean F-measure 0.9086) than traditional NLP-based approaches (mean AUC range 0.8183-0.9571; mean F-measure range 0.5050-0.8739). A real-world simulation that split the training sample and the testing sample by date verified this result (mean AUC 0.9645; mean F-measure 0.9003 using the proposed method). Further analysis showed that the convolutional layers of the CNN effectively identified a large number of keywords and automatically

  5. Do hip prosthesis related infection codes in administrative discharge registers correctly classify periprosthetic hip joint infection?

    DEFF Research Database (Denmark)

    Lange, Jeppe; Pedersen, Alma B; Troelsen, Anders

    2015-01-01

    PURPOSE: Administrative discharge registers could be a valuable and easily accessible single-sources for research data on periprosthetic hip joint infection. The aim of this study was to estimate the positive predictive value of the International Classification of Disease 10th revision (ICD-10...... in future single-source register based studies, but preferably should be used in combination with alternate data sources to ensure higher validity....... decreased to 82% (95% CI: 72-89). CONCLUSIONS: Misclassification must be expected and taken into consideration when using administrative discharge registers for epidemiological research on periprosthetic hip joint infection. We believe that the periprosthetic hip joint infection diagnosis code can be of use...

  6. DNA watermarks in non-coding regulatory sequences

    Directory of Open Access Journals (Sweden)

    Pyka Martin

    2009-07-01

    Full Text Available Abstract Background DNA watermarks can be applied to identify the unauthorized use of genetically modified organisms. It has been shown that coding regions can be used to encrypt information into living organisms by using the DNA-Crypt algorithm. Yet, if the sequence of interest presents a non-coding DNA sequence, either the function of a resulting functional RNA molecule or a regulatory sequence, such as a promoter, could be affected. For our studies we used the small cytoplasmic RNA 1 in yeast and the lac promoter region of Escherichia coli. Findings The lac promoter was deactivated by the integrated watermark. In addition, the RNA molecules displayed altered configurations after introducing a watermark, but surprisingly were functionally intact, which has been verified by analyzing the growth characteristics of both wild type and watermarked scR1 transformed yeast cells. In a third approach we introduced a second overlapping watermark into the lac promoter, which did not affect the promoter activity. Conclusion Even though the watermarked RNA and one of the watermarked promoters did not show any significant differences compared to the wild type RNA and wild type promoter region, respectively, it cannot be generalized that other RNA molecules or regulatory sequences behave accordingly. Therefore, we do not recommend integrating watermark sequences into regulatory regions.

  7. Minimal methylation classifier (MIMIC): A novel method for derivation and rapid diagnostic detection of disease-associated DNA methylation signatures.

    Science.gov (United States)

    Schwalbe, E C; Hicks, D; Rafiee, G; Bashton, M; Gohlke, H; Enshaei, A; Potluri, S; Matthiesen, J; Mather, M; Taleongpong, P; Chaston, R; Silmon, A; Curtis, A; Lindsey, J C; Crosier, S; Smith, A J; Goschzik, T; Doz, F; Rutkowski, S; Lannering, B; Pietsch, T; Bailey, S; Williamson, D; Clifford, S C

    2017-10-18

    Rapid and reliable detection of disease-associated DNA methylation patterns has major potential to advance molecular diagnostics and underpin research investigations. We describe the development and validation of minimal methylation classifier (MIMIC), combining CpG signature design from genome-wide datasets, multiplex-PCR and detection by single-base extension and MALDI-TOF mass spectrometry, in a novel method to assess multi-locus DNA methylation profiles within routine clinically-applicable assays. We illustrate the application of MIMIC to successfully identify the methylation-dependent diagnostic molecular subgroups of medulloblastoma (the most common malignant childhood brain tumour), using scant/low-quality samples remaining from the most recently completed pan-European medulloblastoma clinical trial, refractory to analysis by conventional genome-wide DNA methylation analysis. Using this approach, we identify critical DNA methylation patterns from previously inaccessible cohorts, and reveal novel survival differences between the medulloblastoma disease subgroups with significant potential for clinical exploitation.

  8. Prognostic Classifier Based on Genome-Wide DNA Methylation Profiling in Well-Differentiated Thyroid Tumors

    DEFF Research Database (Denmark)

    Bisarro Dos Reis, Mariana; Barros-Filho, Mateus Camargo; Marchi, Fábio Albuquerque

    2017-01-01

    Context: Even though the majority of well-differentiated thyroid carcinoma (WDTC) is indolent, a number of cases display an aggressive behavior. Cumulative evidence suggests that the deregulation of DNA methylation has the potential to point out molecular markers associated with worse prognosis. ...

  9. RevTrans: multiple alignment of coding DNA from aligned amino acid sequences

    DEFF Research Database (Denmark)

    Wernersson, Rasmus; Pedersen, Anders Gorm

    2003-01-01

    The simple fact that proteins are built from 20 amino acids while DNA only contains four different bases, means that the 'signal-to-noise ratio' in protein sequence alignments is much better than in alignments of DNA. Besides this information-theoretical advantage, protein alignments also benefit...... proteins. It is therefore preferable to align coding DNA at the amino acid level and it is for this purpose we have constructed the program RevTrans. RevTrans constructs a multiple DNA alignment by: (i) translating the DNA; (ii) aligning the resulting peptide sequences; and (iii) building a multiple DNA...

  10. At the intersection of non-coding transcription, DNA repair, chromatin structure, and cellular senescence

    Directory of Open Access Journals (Sweden)

    Ryosuke eOhsawa

    2013-07-01

    Full Text Available It is well accepted that non-coding RNAs play a critical role in regulating gene expression. Recent paradigm-setting studies are now revealing that non-coding RNAs, other than microRNAs, also play intriguing roles in the maintenance of chromatin structure, in the DNA damage response, and in adult human stem cell aging. In this review, we will discuss the complex inter-dependent relationships among non-coding RNA transcription, maintenance of genomic stability, chromatin structure and adult stem cell senescence. DNA damage-induced non-coding RNAs transcribed in the vicinity of the DNA break regulate recruitment of the DNA damage machinery and DNA repair efficiency. We will discuss the correlation between non-coding RNAs and DNA damage repair efficiency and the potential role of changing chromatin structures around double-strand break sites. On the other hand, induction of non-coding RNA transcription from the repetitive Alu elements occurs during human stem cell aging and hinders efficient DNA repair causing entry into senescence. We will discuss how this fine balance between transcription and genomic instability may be regulated by the dramatic changes to chromatin structure that accompany cellular senescence.

  11. Expression of protein-coding genes embedded in ribosomal DNA

    DEFF Research Database (Denmark)

    Johansen, Steinar D; Haugen, Peik; Nielsen, Henrik

    2007-01-01

    Ribosomal DNA (rDNA) is a specialised chromosomal location that is dedicated to high-level transcription of ribosomal RNA genes. Interestingly, rDNAs are frequently interrupted by parasitic elements, some of which carry protein genes. These are non-LTR retrotransposons and group II introns that e...... in the nucleolus....

  12. What Information is Stored in DNA: Does it Contain Digital Error Correcting Codes?

    Science.gov (United States)

    Liebovitch, Larry

    1998-03-01

    The longest term correlations in living systems are the information stored in DNA which reflects the evolutionary history of an organism. The 4 bases (A,T,G,C) encode sequences of amino acids as well as locations of binding sites for proteins that regulate DNA. The fidelity of this important information is maintained by ANALOG error check mechanisms. When a single strand of DNA is replicated the complementary base is inserted in the new strand. Sometimes the wrong base is inserted that sticks out disrupting the phosphate backbone. The new base is not yet methylated, so repair enzymes, that slide along the DNA, can tear out the wrong base and replace it with the right one. The bases in DNA form a sequence of 4 different symbols and so the information is encoded in a DIGITAL form. All the digital codes in our society (ISBN book numbers, UPC product codes, bank account numbers, airline ticket numbers) use error checking code, where some digits are functions of other digits to maintain the fidelity of transmitted informaiton. Does DNA also utitlize a DIGITAL error chekcing code to maintain the fidelity of its information and increase the accuracy of replication? That is, are some bases in DNA functions of other bases upstream or downstream? This raises the interesting mathematical problem: How does one determine whether some symbols in a sequence of symbols are a function of other symbols. It also bears on the issue of determining algorithmic complexity: What is the function that generates the shortest algorithm for reproducing the symbol sequence. The error checking codes most used in our technology are linear block codes. We developed an efficient method to test for the presence of such codes in DNA. We coded the 4 bases as (0,1,2,3) and used Gaussian elimination, modified for modulus 4, to test if some bases are linear combinations of other bases. We used this method to analyze the base sequence in the genes from the lac operon and cytochrome C. We did not find

  13. Decoding Non-Coding DNA: Trash or Treasure?

    Indian Academy of Sciences (India)

    that proteins, being the effector molecules of a cell, would determine ... value paradox states that the amount of cellular DNA is not ... They are also involved in nu- ... fold/matrix attachment regions (S/MARs). .... tumor suppressor gene PTEN.

  14. Junk DNA and the long non-coding RNA twist in cancer genetics

    NARCIS (Netherlands)

    H. Ling (Hui); K. Vincent; M. Pichler; R. Fodde (Riccardo); I. Berindan-Neagoe (Ioana); F.J. Slack (Frank); G.A. Calin (George)

    2015-01-01

    textabstractThe central dogma of molecular biology states that the flow of genetic information moves from DNA to RNA to protein. However, in the last decade this dogma has been challenged by new findings on non-coding RNAs (ncRNAs) such as microRNAs (miRNAs). More recently, long non-coding RNAs

  15. Cloning and expression of cDNA coding for bouganin.

    Science.gov (United States)

    den Hartog, Marcel T; Lubelli, Chiara; Boon, Louis; Heerkens, Sijmie; Ortiz Buijsse, Antonio P; de Boer, Mark; Stirpe, Fiorenzo

    2002-03-01

    Bouganin is a ribosome-inactivating protein that recently was isolated from Bougainvillea spectabilis Willd. In this work, the cloning and expression of the cDNA encoding for bouganin is described. From the cDNA, the amino-acid sequence was deduced, which correlated with the primary sequence data obtained by amino-acid sequencing on the native protein. Bouganin is synthesized as a pro-peptide consisting of 305 amino acids, the first 26 of which act as a leader signal while the 29 C-terminal amino acids are cleaved during processing of the molecule. The mature protein consists of 250 amino acids. Using the cDNA sequence encoding the mature protein of 250 amino acids, a recombinant protein was expressed, purified and characterized. The recombinant molecule had similar activity in a cell-free protein synthesis assay and had comparable toxicity on living cells as compared to the isolated native bouganin.

  16. Functional interrogation of non-coding DNA through CRISPR genome editing.

    Science.gov (United States)

    Canver, Matthew C; Bauer, Daniel E; Orkin, Stuart H

    2017-05-15

    Methodologies to interrogate non-coding regions have lagged behind coding regions despite comprising the vast majority of the genome. However, the rapid evolution of clustered regularly interspaced short palindromic repeats (CRISPR)-based genome editing has provided a multitude of novel techniques for laboratory investigation including significant contributions to the toolbox for studying non-coding DNA. CRISPR-mediated loss-of-function strategies rely on direct disruption of the underlying sequence or repression of transcription without modifying the targeted DNA sequence. CRISPR-mediated gain-of-function approaches similarly benefit from methods to alter the targeted sequence through integration of customized sequence into the genome as well as methods to activate transcription. Here we review CRISPR-based loss- and gain-of-function techniques for the interrogation of non-coding DNA. Copyright © 2017 Elsevier Inc. All rights reserved.

  17. Complete cDNA sequence coding for human docking protein

    Energy Technology Data Exchange (ETDEWEB)

    Hortsch, M; Labeit, S; Meyer, D I

    1988-01-11

    Docking protein (DP, or SRP receptor) is a rough endoplasmic reticulum (ER)-associated protein essential for the targeting and translocation of nascent polypeptides across this membrane. It specifically interacts with a cytoplasmic ribonucleoprotein complex, the signal recognition particle (SRP). The nucleotide sequence of cDNA encoding the entire human DP and its deduced amino acid sequence are given.

  18. RNA-DNA sequence differences spell genetic code ambiguities

    DEFF Research Database (Denmark)

    Bentin, Thomas; Nielsen, Michael L

    2013-01-01

    A recent paper in Science by Li et al. 2011(1) reports widespread sequence differences in the human transcriptome between RNAs and their encoding genes termed RNA-DNA differences (RDDs). The findings could add a new layer of complexity to gene expression but the study has been criticized. ...

  19. Non-coding RNAs and epigenome: de novo DNA methylation, allelic exclusion and X-inactivation

    Directory of Open Access Journals (Sweden)

    V. A. Halytskiy

    2013-12-01

    Full Text Available Non-coding RNAs are widespread class of cell RNAs. They participate in many important processes in cells – signaling, posttranscriptional silencing, protein biosynthesis, splicing, maintenance of genome stability, telomere lengthening, X-inactivation. Nevertheless, activity of these RNAs is not restricted to posttranscriptional sphere, but cover also processes that change or maintain the epigenetic information. Non-coding RNAs can directly bind to the DNA targets and cause their repression through recruitment of DNA methyltransferases as well as chromatin modifying enzymes. Such events constitute molecular mechanism of the RNA-dependent DNA methylation. It is possible, that the RNA-DNA interaction is universal mechanism triggering DNA methylation de novo. Allelic exclusion can be also based on described mechanism. This phenomenon takes place, when non-coding RNA, which precursor is transcribed from one allele, triggers DNA methylation in all other alleles present in the cell. Note, that miRNA-mediated transcriptional silencing resembles allelic exclusion, because both miRNA gene and genes, which can be targeted by this miRNA, contain elements with the same sequences. It can be assumed that RNA-dependent DNA methylation and allelic exclusion originated with the purpose of counteracting the activity of mobile genetic elements. Probably, thinning and deregulation of the cellular non-coding RNA pattern allows reactivation of silent mobile genetic elements resulting in genome instability that leads to ageing and carcinogenesis. In the course of X-inactivation, DNA methylation and subsequent hete­rochromatinization of X chromosome can be triggered by direct hybridization of 5′-end of large non-coding RNA Xist with DNA targets in remote regions of the X chromosome.

  20. Genetic Code Analysis Toolkit: A novel tool to explore the coding properties of the genetic code and DNA sequences

    Science.gov (United States)

    Kraljić, K.; Strüngmann, L.; Fimmel, E.; Gumbel, M.

    2018-01-01

    The genetic code is degenerated and it is assumed that redundancy provides error detection and correction mechanisms in the translation process. However, the biological meaning of the code's structure is still under current research. This paper presents a Genetic Code Analysis Toolkit (GCAT) which provides workflows and algorithms for the analysis of the structure of nucleotide sequences. In particular, sets or sequences of codons can be transformed and tested for circularity, comma-freeness, dichotomic partitions and others. GCAT comes with a fertile editor custom-built to work with the genetic code and a batch mode for multi-sequence processing. With the ability to read FASTA files or load sequences from GenBank, the tool can be used for the mathematical and statistical analysis of existing sequence data. GCAT is Java-based and provides a plug-in concept for extensibility. Availability: Open source Homepage:http://www.gcat.bio/

  1. Selfish DNA in protein-coding genes of Rickettsia.

    Science.gov (United States)

    Ogata, H; Audic, S; Barbe, V; Artiguenave, F; Fournier, P E; Raoult, D; Claverie, J M

    2000-10-13

    Rickettsia conorii, the aetiological agent of Mediterranean spotted fever, is an intracellular bacterium transmitted by ticks. Preliminary analyses of the nearly complete genome sequence of R. conorii have revealed 44 occurrences of a previously undescribed palindromic repeat (150 base pairs long) throughout the genome. Unexpectedly, this repeat was found inserted in-frame within 19 different R. conorii open reading frames likely to encode functional proteins. We found the same repeat in proteins of other Rickettsia species. The finding of a mobile element inserted in many unrelated genes suggests the potential role of selfish DNA in the creation of new protein sequences.

  2. Non coding RNA: sequence-specific guide for chromatin modification and DNA damage signaling

    Directory of Open Access Journals (Sweden)

    Sofia eFrancia

    2015-11-01

    Full Text Available Chromatin conformation shapes the environment in which our genome is transcribed into RNA. Transcription is a source of DNA damage, thus it often occurs concomitantly to DNA damage signaling. Growing amounts of evidence suggest that different types of RNAs can, independently from their protein-coding properties, directly affect chromatin conformation, transcription and splicing, as well as promote the activation of the DNA damage response (DDR and DNA repair. Therefore, transcription paradoxically functions to both threaten and safeguard genome integrity. On the other hand, DNA damage signaling is known to modulate chromatin to suppress transcription of the surrounding genetic unit. It is thus intriguing to understand how transcription can modulate DDR signaling while, in turn, DDR signaling represses transcription of chromatin around the DNA lesion. An unexpected player in this field is the RNA interference (RNAi machinery, which play roles in transcription, splicing and chromatin modulation in several organisms. Non-coding RNAs (ncRNAs and several protein factors involved in the RNAi pathway are well known master regulators of chromatin while only recent reports suggest that ncRNAs are involved in DDR signaling and homology-mediated DNA repair. Here, we discuss the experimental evidence supporting the idea that ncRNAs act at the genomic loci from which they are transcribed to modulate chromatin, DDR signaling and DNA repair.

  3. Differential DNA methylation profiles of coding and non-coding genes define hippocampal sclerosis in human temporal lobe epilepsy

    Science.gov (United States)

    Miller-Delaney, Suzanne F.C.; Bryan, Kenneth; Das, Sudipto; McKiernan, Ross C.; Bray, Isabella M.; Reynolds, James P.; Gwinn, Ryder; Stallings, Raymond L.

    2015-01-01

    Temporal lobe epilepsy is associated with large-scale, wide-ranging changes in gene expression in the hippocampus. Epigenetic changes to DNA are attractive mechanisms to explain the sustained hyperexcitability of chronic epilepsy. Here, through methylation analysis of all annotated C-phosphate-G islands and promoter regions in the human genome, we report a pilot study of the methylation profiles of temporal lobe epilepsy with or without hippocampal sclerosis. Furthermore, by comparative analysis of expression and promoter methylation, we identify methylation sensitive non-coding RNA in human temporal lobe epilepsy. A total of 146 protein-coding genes exhibited altered DNA methylation in temporal lobe epilepsy hippocampus (n = 9) when compared to control (n = 5), with 81.5% of the promoters of these genes displaying hypermethylation. Unique methylation profiles were evident in temporal lobe epilepsy with or without hippocampal sclerosis, in addition to a common methylation profile regardless of pathology grade. Gene ontology terms associated with development, neuron remodelling and neuron maturation were over-represented in the methylation profile of Watson Grade 1 samples (mild hippocampal sclerosis). In addition to genes associated with neuronal, neurotransmitter/synaptic transmission and cell death functions, differential hypermethylation of genes associated with transcriptional regulation was evident in temporal lobe epilepsy, but overall few genes previously associated with epilepsy were among the differentially methylated. Finally, a panel of 13, methylation-sensitive microRNA were identified in temporal lobe epilepsy including MIR27A, miR-193a-5p (MIR193A) and miR-876-3p (MIR876), and the differential methylation of long non-coding RNA documented for the first time. The present study therefore reports select, genome-wide DNA methylation changes in human temporal lobe epilepsy that may contribute to the molecular architecture of the epileptic brain. PMID

  4. PHYLOGENETIC RELATIONSHIPS AMONG VIETNAMESE COCOA ACCESSIONS USING A NON-CODING REGION OF THE CHLOROPLAST DNA

    OpenAIRE

    Lam Thi, Viet Ha; D.T., Khang; Everaert, Helena; T.N, Dung; P.H.D, Phuoc; H.T., Toan; Dewettinck, Koen; Messens, Kathy

    2017-01-01

    Cocoa (Theobroma cacao L.) cultivation has increased in tropical areas around the world, including Vietnam, due to the high demand of cocoa beans for chocolate production. The genetic diversity of cocoa genotypes is recognized to be complex, however, their phylogenetic relationships need to be clarified. The present study aimed to classify the cocoa genotypes that are imported and cultivated in Vietnam based on a chloroplast DNA region. Sixty-three Vietnamese Cocoa accessions were collected f...

  5. A data mining approach for classifying DNA repair genes into ageing-related or non-ageing-related

    Directory of Open Access Journals (Sweden)

    Vasieva Olga

    2011-01-01

    Full Text Available Abstract Background The ageing of the worldwide population means there is a growing need for research on the biology of ageing. DNA damage is likely a key contributor to the ageing process and elucidating the role of different DNA repair systems in ageing is of great interest. In this paper we propose a data mining approach, based on classification methods (decision trees and Naive Bayes, for analysing data about human DNA repair genes. The goal is to build classification models that allow us to discriminate between ageing-related and non-ageing-related DNA repair genes, in order to better understand their different properties. Results The main patterns discovered by the classification methods are as follows: (a the number of protein-protein interactions was a predictor of DNA repair proteins being ageing-related; (b the use of predictor attributes based on protein-protein interactions considerably increased predictive accuracy of attributes based on Gene Ontology (GO annotations; (c GO terms related to "response to stimulus" seem reasonably good predictors of ageing-relatedness for DNA repair genes; (d interaction with the XRCC5 (Ku80 protein is a strong predictor of ageing-relatedness for DNA repair genes; and (e DNA repair genes with a high expression in T lymphocytes are more likely to be ageing-related. Conclusions The above patterns are broadly integrated in an analysis discussing relations between Ku, the non-homologous end joining DNA repair pathway, ageing and lymphocyte development. These patterns and their analysis support non-homologous end joining double strand break repair as central to the ageing-relatedness of DNA repair genes. Our work also showcases the use of protein interaction partners to improve accuracy in data mining methods and our approach could be applied to other ageing-related pathways.

  6. Systematic analysis of coding and noncoding DNA sequences using methods of statistical linguistics

    Science.gov (United States)

    Mantegna, R. N.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Peng, C. K.; Simons, M.; Stanley, H. E.

    1995-01-01

    We compare the statistical properties of coding and noncoding regions in eukaryotic and viral DNA sequences by adapting two tests developed for the analysis of natural languages and symbolic sequences. The data set comprises all 30 sequences of length above 50 000 base pairs in GenBank Release No. 81.0, as well as the recently published sequences of C. elegans chromosome III (2.2 Mbp) and yeast chromosome XI (661 Kbp). We find that for the three chromosomes we studied the statistical properties of noncoding regions appear to be closer to those observed in natural languages than those of coding regions. In particular, (i) a n-tuple Zipf analysis of noncoding regions reveals a regime close to power-law behavior while the coding regions show logarithmic behavior over a wide interval, while (ii) an n-gram entropy measurement shows that the noncoding regions have a lower n-gram entropy (and hence a larger "n-gram redundancy") than the coding regions. In contrast to the three chromosomes, we find that for vertebrates such as primates and rodents and for viral DNA, the difference between the statistical properties of coding and noncoding regions is not pronounced and therefore the results of the analyses of the investigated sequences are less conclusive. After noting the intrinsic limitations of the n-gram redundancy analysis, we also briefly discuss the failure of the zeroth- and first-order Markovian models or simple nucleotide repeats to account fully for these "linguistic" features of DNA. Finally, we emphasize that our results by no means prove the existence of a "language" in noncoding DNA.

  7. DNA methylation of miRNA coding sequences putatively associated with childhood obesity.

    Science.gov (United States)

    Mansego, M L; Garcia-Lacarte, M; Milagro, F I; Marti, A; Martinez, J A

    2017-02-01

    Epigenetic mechanisms may be involved in obesity onset and its consequences. The aim of the present study was to evaluate whether DNA methylation status in microRNA (miRNA) coding regions is associated with childhood obesity. DNA isolated from white blood cells of 24 children (identification sample: 12 obese and 12 non-obese) from the Grupo Navarro de Obesidad Infantil study was hybridized in a 450 K methylation microarray. Several CpGs whose DNA methylation levels were statistically different between obese and non-obese were validated by MassArray® in 95 children (validation sample) from the same study. Microarray analysis identified 16 differentially methylated CpGs between both groups (6 hypermethylated and 10 hypomethylated). DNA methylation levels in miR-1203, miR-412 and miR-216A coding regions significantly correlated with body mass index standard deviation score (BMI-SDS) and explained up to 40% of the variation of BMI-SDS. The network analysis identified 19 well-defined obesity-relevant biological pathways from the KEGG database. MassArray® validation identified three regions located in or near miR-1203, miR-412 and miR-216A coding regions differentially methylated between obese and non-obese children. The current work identified three CpG sites located in coding regions of three miRNAs (miR-1203, miR-412 and miR-216A) that were differentially methylated between obese and non-obese children, suggesting a role of miRNA epigenetic regulation in childhood obesity. © 2016 World Obesity Federation.

  8. A DNA-based pattern classifier with in vitro learning and associative recall for genomic characterization and biosensing without explicit sequence knowledge.

    Science.gov (United States)

    Lee, Ju Seok; Chen, Junghuei; Deaton, Russell; Kim, Jin-Woo

    2014-01-01

    Genetic material extracted from in situ microbial communities has high promise as an indicator of biological system status. However, the challenge is to access genomic information from all organisms at the population or community scale to monitor the biosystem's state. Hence, there is a need for a better diagnostic tool that provides a holistic view of a biosystem's genomic status. Here, we introduce an in vitro methodology for genomic pattern classification of biological samples that taps large amounts of genetic information from all genes present and uses that information to detect changes in genomic patterns and classify them. We developed a biosensing protocol, termed Biological Memory, that has in vitro computational capabilities to "learn" and "store" genomic sequence information directly from genomic samples without knowledge of their explicit sequences, and that discovers differences in vitro between previously unknown inputs and learned memory molecules. The Memory protocol was designed and optimized based upon (1) common in vitro recombinant DNA operations using 20-base random probes, including polymerization, nuclease digestion, and magnetic bead separation, to capture a snapshot of the genomic state of a biological sample as a DNA memory and (2) the thermal stability of DNA duplexes between new input and the memory to detect similarities and differences. For efficient read out, a microarray was used as an output method. When the microarray-based Memory protocol was implemented to test its capability and sensitivity using genomic DNA from two model bacterial strains, i.e., Escherichia coli K12 and Bacillus subtilis, results indicate that the Memory protocol can "learn" input DNA, "recall" similar DNA, differentiate between dissimilar DNA, and detect relatively small concentration differences in samples. This study demonstrated not only the in vitro information processing capabilities of DNA, but also its promise as a genomic pattern classifier that could

  9. HyDEn: A Hybrid Steganocryptographic Approach for Data Encryption Using Randomized Error-Correcting DNA Codes

    Directory of Open Access Journals (Sweden)

    Dan Tulpan

    2013-01-01

    Full Text Available This paper presents a novel hybrid DNA encryption (HyDEn approach that uses randomized assignments of unique error-correcting DNA Hamming code words for single characters in the extended ASCII set. HyDEn relies on custom-built quaternary codes and a private key used in the randomized assignment of code words and the cyclic permutations applied on the encoded message. Along with its ability to detect and correct errors, HyDEn equals or outperforms existing cryptographic methods and represents a promising in silico DNA steganographic approach.

  10. DNA strand breaks induced by electrons simulated with nanodosimetry Monte Carlo simulation code: NASIC

    International Nuclear Information System (INIS)

    Li, Junli; Qiu, Rui; Yan, Congchong; Xie, Wenzhang; Zeng, Zhi; Li, Chunyan; Wu, Zhen; Tung, Chuanjong

    2015-01-01

    The method of Monte Carlo simulation is a powerful tool to investigate the details of radiation biological damage at the molecular level. In this paper, a Monte Carlo code called NASIC (Nanodosimetry Monte Carlo Simulation Code) was developed. It includes physical module, pre-chemical module, chemical module, geometric module and DNA damage module. The physical module can simulate physical tracks of low-energy electrons in the liquid water event-by-event. More than one set of inelastic cross sections were calculated by applying the dielectric function method of Emfietzoglou's optical-data treatments, with different optical data sets and dispersion models. In the pre-chemical module, the ionised and excited water molecules undergo dissociation processes. In the chemical module, the produced radiolytic chemical species diffuse and react. In the geometric module, an atomic model of 46 chromatin fibres in a spherical nucleus of human lymphocyte was established. In the DNA damage module, the direct damages induced by the energy depositions of the electrons and the indirect damages induced by the radiolytic chemical species were calculated. The parameters should be adjusted to make the simulation results be agreed with the experimental results. In this paper, the influence study of the inelastic cross sections and vibrational excitation reaction on the parameters and the DNA strand break yields were studied. Further work of NASIC is underway (authors)

  11. Identification of species based on DNA barcode using k-mer feature vector and Random forest classifier.

    Science.gov (United States)

    Meher, Prabina Kumar; Sahu, Tanmaya Kumar; Rao, A R

    2016-11-05

    DNA barcoding is a molecular diagnostic method that allows automated and accurate identification of species based on a short and standardized fragment of DNA. To this end, an attempt has been made in this study to develop a computational approach for identifying the species by comparing its barcode with the barcode sequence of known species present in the reference library. Each barcode sequence was first mapped onto a numeric feature vector based on k-mer frequencies and then Random forest methodology was employed on the transformed dataset for species identification. The proposed approach outperformed similarity-based, tree-based, diagnostic-based approaches and found comparable with existing supervised learning based approaches in terms of species identification success rate, while compared using real and simulated datasets. Based on the proposed approach, an online web interface SPIDBAR has also been developed and made freely available at http://cabgrid.res.in:8080/spidbar/ for species identification by the taxonomists. Copyright © 2016 Elsevier B.V. All rights reserved.

  12. Phylogenetic relationships among vietnamese cocoa accessions using a non-coding region of the chloroplast dna

    International Nuclear Information System (INIS)

    Ha, L.T.V.; Dung, T.N.; Phuoc, P.H.D.

    2017-01-01

    Cocoa cultivation has increased in tropical areas around the world, including Vietnam, due to the high demand of cocoa beans for chocolate production. The genetic diversity of cocoa genotypes is recognized to be complex, however, their phylogenetic relationships need to be clarified. The present study aimed to classify the cocoa genotypes, that are imported and cultivated in Vietnam, based on a chloroplast DNA region. Sixty-three Vietnamese Cocoa accessions were collected from different regions in Southern Vietnam. Their phylogenetic relationships were identified using the universal primers c-B49317 and d-A49855 from the chloroplast DNA region. The sequences were situated in the trnL intron genes which are identify the closest terrestrial plant species of the chloroplast genome. DNA sequences were determined and subjected to an analysis of the phylogenetic relationship using the maximum evolution method. The genetic analysis showed clustering of 63 cocoa accessions in three groups: the domestically cultivated Trinitario group, the Indigenous cultivars, and the cultivations from Peru. The analyzed sequencing data also illustrated that the TD accessions and CT accessions were related genetically closed. Based on those results the genetic relation between PA and NA accessions was established as the hybrid origins of the TD and CT accessions. Some foreign accessions, including UIT, SCA and IMC accessions were confirmed of their genetic relationship. The present study is the first report of phylogenetic relationships of Vietnamese cocoa collections. The cocoa program in Vietnam has been in development for thirty years. (author)

  13. Classifying Microorganisms

    DEFF Research Database (Denmark)

    Sommerlund, Julie

    2006-01-01

    This paper describes the coexistence of two systems for classifying organisms and species: a dominant genetic system and an older naturalist system. The former classifies species and traces their evolution on the basis of genetic characteristics, while the latter employs physiological characteris......This paper describes the coexistence of two systems for classifying organisms and species: a dominant genetic system and an older naturalist system. The former classifies species and traces their evolution on the basis of genetic characteristics, while the latter employs physiological...... characteristics. The coexistence of the classification systems does not lead to a conflict between them. Rather, the systems seem to co-exist in different configurations, through which they are complementary, contradictory and inclusive in different situations-sometimes simultaneously. The systems come...

  14. A two-locus global DNA barcode for land plants: the coding rbcL gene complements the non-coding trnH-psbA spacer region.

    Science.gov (United States)

    Kress, W John; Erickson, David L

    2007-06-06

    A useful DNA barcode requires sufficient sequence variation to distinguish between species and ease of application across a broad range of taxa. Discovery of a DNA barcode for land plants has been limited by intrinsically lower rates of sequence evolution in plant genomes than that observed in animals. This low rate has complicated the trade-off in finding a locus that is universal and readily sequenced and has sufficiently high sequence divergence at the species-level. Here, a global plant DNA barcode system is evaluated by comparing universal application and degree of sequence divergence for nine putative barcode loci, including coding and non-coding regions, singly and in pairs across a phylogenetically diverse set of 48 genera (two species per genus). No single locus could discriminate among species in a pair in more than 79% of genera, whereas discrimination increased to nearly 88% when the non-coding trnH-psbA spacer was paired with one of three coding loci, including rbcL. In silico trials were conducted in which DNA sequences from GenBank were used to further evaluate the discriminatory power of a subset of these loci. These trials supported the earlier observation that trnH-psbA coupled with rbcL can correctly identify and discriminate among related species. A combination of the non-coding trnH-psbA spacer region and a portion of the coding rbcL gene is recommended as a two-locus global land plant barcode that provides the necessary universality and species discrimination.

  15. DENV gene of bacteriophage T4 codes for both pyrimidine dimer-DNA glycosylase and apyrimidinic endonuclease activities

    International Nuclear Information System (INIS)

    McMillan, S.; Edenberg, H.J.; Radany, E.H.; Friedberg, R.C.; Friedberg, E.C.

    1981-01-01

    Recent studies have shown that purified preparations of phage T4 UV DNA-incising activity (T4 UV endonuclease or endonuclease V of phase T4) contain a pyrimidine dimer-DNA glycosylase activity that catalyzes hydrolysis of the 5' glycosyl bond of dimerized pyrimidines in UV-irradiated DNA. Such enzyme preparations have also been shown to catalyze the hydrolysis of phosphodiester bonds in UV-irradiated DNA at a neutral pH, presumably reflecting the action of an apurinic/apyrimidinic endonuclease at the apyrimidinic sites created by the pyrimidine dimer-DNA glycosylase. In this study we found that preparations of T4 UV DNA-incising activity contained apurinic/apyrimidinic endonuclease activity that nicked depurinated form I simian virus 40 DNA. Apurinic/apyrimidinic endonuclease activity was also found in extracts of Escherichia coli infected with T4 denV + phage. Extracts of cells infected with T4 denV mutants contained significantly lower levels of apurinic/apyrimidinic endonuclease activity; these levels were no greater than the levels present in extracts of uninfected cells. Furthermore, the addition of DNA containing UV-irradiated DNA and T4 enzyme resulted in competition for pyrimidine dimer-DNA glycosylase activity against the UV-irradiated DNA. On the basis of these results, we concluded that apurinic/apyrimidinic endonuclease activity is encoded by the denV gene of phage T4, the same gene that codes for pyrimidine dimer-DNA glycosylase activity

  16. A new method for species identification via protein-coding and non-coding DNA barcodes by combining machine learning with bioinformatic methods.

    Directory of Open Access Journals (Sweden)

    Ai-bing Zhang

    Full Text Available Species identification via DNA barcodes is contributing greatly to current bioinventory efforts. The initial, and widely accepted, proposal was to use the protein-coding cytochrome c oxidase subunit I (COI region as the standard barcode for animals, but recently non-coding internal transcribed spacer (ITS genes have been proposed as candidate barcodes for both animals and plants. However, achieving a robust alignment for non-coding regions can be problematic. Here we propose two new methods (DV-RBF and FJ-RBF to address this issue for species assignment by both coding and non-coding sequences that take advantage of the power of machine learning and bioinformatics. We demonstrate the value of the new methods with four empirical datasets, two representing typical protein-coding COI barcode datasets (neotropical bats and marine fish and two representing non-coding ITS barcodes (rust fungi and brown algae. Using two random sub-sampling approaches, we demonstrate that the new methods significantly outperformed existing Neighbor-joining (NJ and Maximum likelihood (ML methods for both coding and non-coding barcodes when there was complete species coverage in the reference dataset. The new methods also out-performed NJ and ML methods for non-coding sequences in circumstances of potentially incomplete species coverage, although then the NJ and ML methods performed slightly better than the new methods for protein-coding barcodes. A 100% success rate of species identification was achieved with the two new methods for 4,122 bat queries and 5,134 fish queries using COI barcodes, with 95% confidence intervals (CI of 99.75-100%. The new methods also obtained a 96.29% success rate (95%CI: 91.62-98.40% for 484 rust fungi queries and a 98.50% success rate (95%CI: 96.60-99.37% for 1094 brown algae queries, both using ITS barcodes.

  17. A new method for species identification via protein-coding and non-coding DNA barcodes by combining machine learning with bioinformatic methods.

    Science.gov (United States)

    Zhang, Ai-bing; Feng, Jie; Ward, Robert D; Wan, Ping; Gao, Qiang; Wu, Jun; Zhao, Wei-zhong

    2012-01-01

    Species identification via DNA barcodes is contributing greatly to current bioinventory efforts. The initial, and widely accepted, proposal was to use the protein-coding cytochrome c oxidase subunit I (COI) region as the standard barcode for animals, but recently non-coding internal transcribed spacer (ITS) genes have been proposed as candidate barcodes for both animals and plants. However, achieving a robust alignment for non-coding regions can be problematic. Here we propose two new methods (DV-RBF and FJ-RBF) to address this issue for species assignment by both coding and non-coding sequences that take advantage of the power of machine learning and bioinformatics. We demonstrate the value of the new methods with four empirical datasets, two representing typical protein-coding COI barcode datasets (neotropical bats and marine fish) and two representing non-coding ITS barcodes (rust fungi and brown algae). Using two random sub-sampling approaches, we demonstrate that the new methods significantly outperformed existing Neighbor-joining (NJ) and Maximum likelihood (ML) methods for both coding and non-coding barcodes when there was complete species coverage in the reference dataset. The new methods also out-performed NJ and ML methods for non-coding sequences in circumstances of potentially incomplete species coverage, although then the NJ and ML methods performed slightly better than the new methods for protein-coding barcodes. A 100% success rate of species identification was achieved with the two new methods for 4,122 bat queries and 5,134 fish queries using COI barcodes, with 95% confidence intervals (CI) of 99.75-100%. The new methods also obtained a 96.29% success rate (95%CI: 91.62-98.40%) for 484 rust fungi queries and a 98.50% success rate (95%CI: 96.60-99.37%) for 1094 brown algae queries, both using ITS barcodes.

  18. cDNA sequence of human transforming gene hst and identification of the coding sequence required for transforming activity

    International Nuclear Information System (INIS)

    Taira, M.; Yoshida, T.; Miyagawa, K.; Sakamoto, H.; Terada, M.; Sugimura, T.

    1987-01-01

    The hst gene was originally identified as a transforming gene in DNAs from human stomach cancers and from a noncancerous portion of stomach mucosa by DNA-mediated transfection assay using NIH3T3 cells. cDNA clones of hst were isolated from the cDNA library constructed from poly(A) + RNA of a secondary transformant induced by the DNA from a stomach cancer. The sequence analysis of the hst cDNA revealed the presence of two open reading frames. When this cDNA was inserted into an expression vector containing the simian virus 40 promoter, it efficiently induced the transformation of NIH3T3 cells upon transfection. It was found that one of the reading frames, which coded for 206 amino acids, was responsible for the transforming activity

  19. Comparison of Geant4-DNA simulation of S-values with other Monte Carlo codes

    International Nuclear Information System (INIS)

    André, T.; Morini, F.; Karamitros, M.; Delorme, R.; Le Loirec, C.; Campos, L.; Champion, C.; Groetz, J.-E.; Fromm, M.; Bordage, M.-C.; Perrot, Y.; Barberet, Ph.

    2014-01-01

    Monte Carlo simulations of S-values have been carried out with the Geant4-DNA extension of the Geant4 toolkit. The S-values have been simulated for monoenergetic electrons with energies ranging from 0.1 keV up to 20 keV, in liquid water spheres (for four radii, chosen between 10 nm and 1 μm), and for electrons emitted by five isotopes of iodine (131, 132, 133, 134 and 135), in liquid water spheres of varying radius (from 15 μm up to 250 μm). The results have been compared to those obtained from other Monte Carlo codes and from other published data. The use of the Kolmogorov–Smirnov test has allowed confirming the statistical compatibility of all simulation results

  20. Two human cDNA molecules coding for the Duchenne muscular dystrophy (DMD) locus are highly homologous

    Energy Technology Data Exchange (ETDEWEB)

    Rosenthal, A.; Speer, A.; Billwitz, H. (Zentralinstitut fuer Molekularbiologie, Berlin-Buch (Germany Democratic Republic)); Cross, G.S.; Forrest, S.M.; Davies, K.E. (Univ. of Oxford (England))

    1989-07-11

    Recently the complete sequence of the human fetal cDNA coding for the Duchenne muscular dystrophy (DMD) locus was reported and a 3,685 amino acid long, rod-shaped cytoskeletal protein (dystrophin) was predicted as the protein product. Independently, the authors have isolated and sequenced different DMD cDNA molecules from human adult and fetal muscle. The complete 12.5 kb long sequence of all their cDNA clones has now been determined and they report here the nucleotide (nt) and amino acid (aa) differences between the sequences of both groups. The cDNA sequence comprises the whole coding region but lacks the first 110 nt from the 5{prime}-untranslated region and the last 1,417 nt of the 3{prime}-untranslated region. They have found 11 nt differences (approximately 99.9% homology) from which 7 occurred at the aa level.

  1. The dnaN gene codes for the beta subunit of DNA polymerase III holoenzyme of escherichia coli.

    Science.gov (United States)

    Burgers, P M; Kornberg, A; Sakakibara, Y

    1981-09-01

    An Escherichia coli mutant, dnaN59, stops DNA synthesis promptly upon a shift to a high temperature; the wild-type dnaN gene carried in a transducing phage encodes a polypeptide of about 41,000 daltons [Sakakibara, Y. & Mizukami, T. (1980) Mol. Gen. Genet. 178, 541-553; Yuasa, S. & Sakakibara, Y. (1980) Mol. Gen. Genet. 180, 267-273]. We now find that the product of dnaN gene is the beta subunit of DNA polymerase III holoenzyme, the principal DNA synthetic multipolypeptide complex in E. coli. The conclusion is based on the following observations: (i) Extracts from dnaN59 cells were defective in phage phi X174 and G4 DNA synthesis after the mutant cells had been exposed to the increased temperature. (ii) The enzymatic defect was overcome by addition of purified beta subunit but not by other subunits of DNA polymerase III holoenzyme or by other replication proteins required for phi X174 DNA synthesis. (iii) Partially purified beta subunit from the dnaN mutant, unlike that from the wild type, was inactive in reconstituting the holoenzyme when mixed with the other purified subunits. (iv) Increased dosage of the dnaN gene provided by a plasmid carrying the gene raised cellular levels of the beta subunit 5- to 6-fold.

  2. Cloning and expression of a cDNA coding for a human monocyte-derived plasminogen activator inhibitor.

    OpenAIRE

    Antalis, T M; Clark, M A; Barnes, T; Lehrbach, P R; Devine, P L; Schevzov, G; Goss, N H; Stephens, R W; Tolstoshev, P

    1988-01-01

    Human monocyte-derived plasminogen activator inhibitor (mPAI-2) was purified to homogeneity from the U937 cell line and partially sequenced. Oligonucleotide probes derived from this sequence were used to screen a cDNA library prepared from U937 cells. One positive clone was sequenced and contained most of the coding sequence as well as a long incomplete 3' untranslated region (1112 base pairs). This cDNA sequence was shown to encode mPAI-2 by hybrid-select translation. A cDNA clone encoding t...

  3. Carbon classified?

    DEFF Research Database (Denmark)

    Lippert, Ingmar

    2012-01-01

    . Using an actor- network theory (ANT) framework, the aim is to investigate the actors who bring together the elements needed to classify their carbon emission sources and unpack the heterogeneous relations drawn on. Based on an ethnographic study of corporate agents of ecological modernisation over...... a period of 13 months, this paper provides an exploration of three cases of enacting classification. Drawing on ANT, we problematise the silencing of a range of possible modalities of consumption facts and point to the ontological ethics involved in such performances. In a context of global warming...

  4. An Abundant Class of Non-coding DNA Can Prevent Stochastic Gene Silencing in the C. elegans Germline

    DEFF Research Database (Denmark)

    Frøkjær-Jensen, Christian; Jain, Nimit; Hansen, Loren

    2016-01-01

    /or structure. Here, we demonstrate that a pervasive non-coding DNA feature in Caenorhabditis elegans, characterized by 10-base pair periodic An/Tn-clusters (PATCs), can license transgenes for germline expression within repressive chromatin domains. Transgenes containing natural or synthetic PATCs are resistant...

  5. Functional intersection of ATM and DNA-dependent protein kinase catalytic subunit in coding end joining during V(D)J recombination

    DEFF Research Database (Denmark)

    Lee, Baeck-Seung; Gapud, Eric J; Zhang, Shichuan

    2013-01-01

    V(D)J recombination is initiated by the RAG endonuclease, which introduces DNA double-strand breaks (DSBs) at the border between two recombining gene segments, generating two hairpin-sealed coding ends and two blunt signal ends. ATM and DNA-dependent protein kinase catalytic subunit (DNA-PKcs) ar......V(D)J recombination is initiated by the RAG endonuclease, which introduces DNA double-strand breaks (DSBs) at the border between two recombining gene segments, generating two hairpin-sealed coding ends and two blunt signal ends. ATM and DNA-dependent protein kinase catalytic subunit (DNA......-PKcs) are serine-threonine kinases that orchestrate the cellular responses to DNA DSBs. During V(D)J recombination, ATM and DNA-PKcs have unique functions in the repair of coding DNA ends. ATM deficiency leads to instability of postcleavage complexes and the loss of coding ends from these complexes. DNA...... when ATM is present and its kinase activity is intact. The ability of ATM to compensate for DNA-PKcs kinase activity depends on the integrity of three threonines in DNA-PKcs that are phosphorylation targets of ATM, suggesting that ATM can modulate DNA-PKcs activity through direct phosphorylation of DNA...

  6. Isolation and characterization of full-length cDNA clones coding for cholinesterase from fetal human tissues

    International Nuclear Information System (INIS)

    Prody, C.A.; Zevin-Sonkin, D.; Gnatt, A.; Goldberg, O.; Soreq, H.

    1987-01-01

    To study the primary structure and regulation of human cholinesterases, oligodeoxynucleotide probes were prepared according to a consensus peptide sequence present in the active site of both human serum pseudocholinesterase and Torpedo electric organ true acetylcholinesterase. Using these probes, the authors isolated several cDNA clones from λgt10 libraries of fetal brain and liver origins. These include 2.4-kilobase cDNA clones that code for a polypeptide containing a putative signal peptide and the N-terminal, active site, and C-terminal peptides of human BtChoEase, suggesting that they code either for BtChoEase itself or for a very similar but distinct fetal form of cholinesterase. In RNA blots of poly(A) + RNA from the cholinesterase-producing fetal brain and liver, these cDNAs hybridized with a single 2.5-kilobase band. Blot hybridization to human genomic DNA revealed that these fetal BtChoEase cDNA clones hybridize with DNA fragments of the total length of 17.5 kilobases, and signal intensities indicated that these sequences are not present in many copies. Both the cDNA-encoded protein and its nucleotide sequence display striking homology to parallel sequences published for Torpedo AcChoEase. These finding demonstrate extensive homologies between the fetal BtChoEase encoded by these clones and other cholinesterases of various forms and species

  7. An integrated PCR colony hybridization approach to screen cDNA libraries for full-length coding sequences.

    Science.gov (United States)

    Pollier, Jacob; González-Guzmán, Miguel; Ardiles-Diaz, Wilson; Geelen, Danny; Goossens, Alain

    2011-01-01

    cDNA-Amplified Fragment Length Polymorphism (cDNA-AFLP) is a commonly used technique for genome-wide expression analysis that does not require prior sequence knowledge. Typically, quantitative expression data and sequence information are obtained for a large number of differentially expressed gene tags. However, most of the gene tags do not correspond to full-length (FL) coding sequences, which is a prerequisite for subsequent functional analysis. A medium-throughput screening strategy, based on integration of polymerase chain reaction (PCR) and colony hybridization, was developed that allows in parallel screening of a cDNA library for FL clones corresponding to incomplete cDNAs. The method was applied to screen for the FL open reading frames of a selection of 163 cDNA-AFLP tags from three different medicinal plants, leading to the identification of 109 (67%) FL clones. Furthermore, the protocol allows for the use of multiple probes in a single hybridization event, thus significantly increasing the throughput when screening for rare transcripts. The presented strategy offers an efficient method for the conversion of incomplete expressed sequence tags (ESTs), such as cDNA-AFLP tags, to FL-coding sequences.

  8. Cloning and expression of a cDNA coding for a human monocyte-derived plasminogen activator inhibitor

    International Nuclear Information System (INIS)

    Antalis, T.M.; Clark, M.A.; Barnes, T.; Lehrbach, P.R.; Devine, P.L.; Schevzov, G.; Goss, N.H.; Stephens, R.W.; Tolstoshev, P.

    1988-01-01

    Human monocyte-derived plasminogen activator inhibitor (mPAI-2) was purified to homogeneity from the U937 cell line and partially sequenced. Oligonucleotide probes derived from this sequence were used to screen a cDNA library prepared from U937 cells. One positive clone was sequenced and contained most of the coding sequence as well as a long incomplete 3' untranslated region (1112 base pairs). This cDNA sequence was shown to encode mPAI-2 by hybrid-select translation. A cDNA clone encoding the remainder of the mPAI-2 mRNA was obtained by primer extension of U937 poly(A) + RNA using a probe complementary to the mPAI-2 coding region. The coding sequence for mPAI-2 was placed under the control of the λ P/sub L/ promoter, and the protein expressed in Escherichia coli formed a complex with urokinase that could be detected immunologically. By nucleotide sequence analysis, mPAI-2 cDNA encodes a protein containing 415 amino acids with a predicted unglycosylated M/sub r/ of 46,543. The predicted amino acid sequence of mPAI-2 is very similar to placental PAI-2 and shows extensive homology with members of the serine protease inhibitor (serpin) superfamily. mPAI-2 was found to be more homologous to ovalbumin (37%) than the endothelial plasminogen activator inhibitor, PAI-1 (26%). The 3' untranslated region of the mPAI-2 cDNA contains a putative regulatory sequence that has been associated with the inflammatory mediators

  9. Cloning and expression of a cDNA coding for a human monocyte-derived plasminogen activator inhibitor.

    Science.gov (United States)

    Antalis, T M; Clark, M A; Barnes, T; Lehrbach, P R; Devine, P L; Schevzov, G; Goss, N H; Stephens, R W; Tolstoshev, P

    1988-02-01

    Human monocyte-derived plasminogen activator inhibitor (mPAI-2) was purified to homogeneity from the U937 cell line and partially sequenced. Oligonucleotide probes derived from this sequence were used to screen a cDNA library prepared from U937 cells. One positive clone was sequenced and contained most of the coding sequence as well as a long incomplete 3' untranslated region (1112 base pairs). This cDNA sequence was shown to encode mPAI-2 by hybrid-select translation. A cDNA clone encoding the remainder of the mPAI-2 mRNA was obtained by primer extension of U937 poly(A)+ RNA using a probe complementary to the mPAI-2 coding region. The coding sequence for mPAI-2 was placed under the control of the lambda PL promoter, and the protein expressed in Escherichia coli formed a complex with urokinase that could be detected immunologically. By nucleotide sequence analysis, mPAI-2 cDNA encodes a protein containing 415 amino acids with a predicted unglycosylated Mr of 46,543. The predicted amino acid sequence of mPAI-2 is very similar to placental PAI-2 (3 amino acid differences) and shows extensive homology with members of the serine protease inhibitor (serpin) superfamily. mPAI-2 was found to be more homologous to ovalbumin (37%) than the endothelial plasminogen activator inhibitor, PAI-1 (26%). Like ovalbumin, mPAI-2 appears to have no typical amino-terminal signal sequence. The 3' untranslated region of the mPAI-2 cDNA contains a putative regulatory sequence that has been associated with the inflammatory mediators.

  10. Signalign: An Ontology of DNA as Signal for Comparative Gene Structure Prediction Using Information-Coding-and-Processing Techniques.

    Science.gov (United States)

    Yu, Ning; Guo, Xuan; Gu, Feng; Pan, Yi

    2016-03-01

    Conventional character-analysis-based techniques in genome analysis manifest three main shortcomings-inefficiency, inflexibility, and incompatibility. In our previous research, a general framework, called DNA As X was proposed for character-analysis-free techniques to overcome these shortcomings, where X is the intermediates, such as digit, code, signal, vector, tree, graph network, and so on. In this paper, we further implement an ontology of DNA As Signal, by designing a tool named Signalign for comparative gene structure analysis, in which DNA sequences are converted into signal series, processed by modified method of dynamic time warping and measured by signal-to-noise ratio (SNR). The ontology of DNA As Signal integrates the principles and concepts of other disciplines including information coding theory and signal processing into sequence analysis and processing. Comparing with conventional character-analysis-based methods, Signalign can not only have the equivalent or superior performance, but also enrich the tools and the knowledge library of computational biology by extending the domain from character/string to diverse areas. The evaluation results validate the success of the character-analysis-free technique for improved performances in comparative gene structure prediction.

  11. Benzofurazane as a New Redox Label for Electrochemical Detection of DNA: Towards Multipotential Redox Coding of DNA Bases

    Czech Academy of Sciences Publication Activity Database

    Balintová, Jana; Plucnara, Medard; Vidláková, Pavlína; Pohl, Radek; Havran, Luděk; Fojta, Miroslav; Hocek, Michal

    2013-01-01

    Roč. 19, č. 38 (2013), s. 12720-12731 ISSN 0947-6539 R&D Projects: GA ČR GBP206/12/G151; GA AV ČR(CZ) IAA400040901 Institutional support: RVO:61388963 ; RVO:68081707 Keywords : DNA polymerase * electrochemistry * nucleoside triphosphates * sequencing * voltammetry Subject RIV: CC - Organic Chemistry Impact factor: 5.696, year: 2013

  12. Long non-coding RNAs as novel expression signatures modulate DNA damage and repair in cadmium toxicology

    Science.gov (United States)

    Zhou, Zhiheng; Liu, Haibai; Wang, Caixia; Lu, Qian; Huang, Qinhai; Zheng, Chanjiao; Lei, Yixiong

    2015-10-01

    Increasing evidence suggests that long non-coding RNAs (lncRNAs) are involved in a variety of physiological and pathophysiological processes. Our study was to investigate whether lncRNAs as novel expression signatures are able to modulate DNA damage and repair in cadmium(Cd) toxicity. There were aberrant expression profiles of lncRNAs in 35th Cd-induced cells as compared to untreated 16HBE cells. siRNA-mediated knockdown of ENST00000414355 inhibited the growth of DNA-damaged cells and decreased the expressions of DNA-damage related genes (ATM, ATR and ATRIP), while increased the expressions of DNA-repair related genes (DDB1, DDB2, OGG1, ERCC1, MSH2, RAD50, XRCC1 and BARD1). Cadmium increased ENST00000414355 expression in the lung of Cd-exposed rats in a dose-dependent manner. A significant positive correlation was observed between blood ENST00000414355 expression and urinary/blood Cd concentrations, and there were significant correlations of lncRNA-ENST00000414355 expression with the expressions of target genes in the lung of Cd-exposed rats and the blood of Cd exposed workers. These results indicate that some lncRNAs are aberrantly expressed in Cd-treated 16HBE cells. lncRNA-ENST00000414355 may serve as a signature for DNA damage and repair related to the epigenetic mechanisms underlying the cadmium toxicity and become a novel biomarker of cadmium toxicity.

  13. Stories in Genetic Code. The contribution of ancient DNA studies to anthropology and their ethical implications

    Directory of Open Access Journals (Sweden)

    Cristian M. Crespo

    2010-12-01

    Full Text Available For several decades, biological anthropology has employed different molecular markers in population research. Since 1990 different techniques in molecular biology have been developed allowing preserved DNA extraction and its typification in different samples from museums and archaeological sites. Ancient DNA studies related to archaeological issues are now included in the field of Archaeogenetics. In this work we present some of ancient DNA applications in archaeology. We also discuss advantages and limitations for this kind of research and its relationship with ethic and legal norms.

  14. Multi-scale coding of genomic information: From DNA sequence to genome structure and function

    International Nuclear Information System (INIS)

    Arneodo, Alain; Vaillant, Cedric; Audit, Benjamin; Argoul, Francoise; D'Aubenton-Carafa, Yves; Thermes, Claude

    2011-01-01

    Understanding how chromatin is spatially and dynamically organized in the nucleus of eukaryotic cells and how this affects genome functions is one of the main challenges of cell biology. Since the different orders of packaging in the hierarchical organization of DNA condition the accessibility of DNA sequence elements to trans-acting factors that control the transcription and replication processes, there is actually a wealth of structural and dynamical information to learn in the primary DNA sequence. In this review, we show that when using concepts, methodologies, numerical and experimental techniques coming from statistical mechanics and nonlinear physics combined with wavelet-based multi-scale signal processing, we are able to decipher the multi-scale sequence encoding of chromatin condensation-decondensation mechanisms that play a fundamental role in regulating many molecular processes involved in nuclear functions.

  15. Isolation of cDNA clones coding for human tissue factor: primary structure of the protein and cDNA

    International Nuclear Information System (INIS)

    Spicer, E.K.; Horton, R.; Bloem, L.

    1987-01-01

    Tissue factor is a membrane-bound procoagulant protein that activates the extrinsic pathway of blood coagulation in the presence of factor VII and calcium. λ Phage containing the tissue factor gene were isolated from a human placental cDNA library. The amino acid sequence deduced from the nucleotide sequence of the cDNAs indicates that tissue factor is synthesized as a higher molecular weight precursor with a leader sequence of 32 amino acids, while the mature protein is a single polypeptide chain composed of 263 residues. The derived primary structure of tissue factor has been confirmed by comparison to protein and peptide sequence data. The sequence of the mature protein suggests that there are three distinct domains: extracellular, residues 1-219; hydrophobic, residues 220-242; and cytoplasmic, residues 243-263. Three potential N-linked carbohydrate attachment sites occur in the extracellular domain. The amino acid sequence of tissue factor shows no significant homology with the vitamin K-dependent serine proteases, coagulation cofactors, or any other protein in the National Biomedical Research Foundation sequence data bank (Washington, DC)

  16. Coding of DNA samples and data in the pharmaceutical industry: current practices and future directions--perspective of the I-PWG.

    Science.gov (United States)

    Franc, M A; Cohen, N; Warner, A W; Shaw, P M; Groenen, P; Snapir, A

    2011-04-01

    DNA samples collected in clinical trials and stored for future research are valuable to pharmaceutical drug development. Given the perceived higher risk associated with genetic research, industry has implemented complex coding methods for DNA. Following years of experience with these methods and with addressing questions from institutional review boards (IRBs), ethics committees (ECs) and health authorities, the industry has started reexamining the extent of the added value offered by these methods. With the goal of harmonization, the Industry Pharmacogenomics Working Group (I-PWG) conducted a survey to gain an understanding of company practices for DNA coding and to solicit opinions on their effectiveness at protecting privacy. The results of the survey and the limitations of the coding methods are described. The I-PWG recommends dialogue with key stakeholders regarding coding practices such that equal standards are applied to DNA and non-DNA samples. The I-PWG believes that industry standards for privacy protection should provide adequate safeguards for DNA and non-DNA samples/data and suggests a need for more universal standards for samples stored for future research.

  17. Cloning and sequence analysis of cDNA coding for rat nucleolar protein C23

    International Nuclear Information System (INIS)

    Ghaffari, S.H.; Olson, M.O.J.

    1986-01-01

    Using synthetic oligonucleotides as primers and probes, the authors have isolated and sequenced cDNA clones encoding protein C23, a putative nucleolus organizer protein. Poly(A + ) RNA was isolated from rat Novikoff hepatoma cells and enriched in C23 mRNA by sucrose density gradient ultracentrifugation. Two deoxyoligonuleotides, a 48- and a 27-mer, were synthesized on the basis of amino acid sequence from the C-terminal half of protein C23 and cDNA sequence data from CHO cell protein. The 48-mer was used a primer for synthesis of cDNA which was then inserted into plasmid pUC9. Transformed bacterial colonies were screened by hybridization with 32 P labeled 27-mer. Two clones among 5000 gave a strong positive signal. Plasmid DNAs from these clones were purified and characterized by blotting and nucleotide sequence analysis. The length of C23 mRNA was estimated to be 3200 bases in a northern blot analysis. The sequence of a 267 b.p. insert shows high homology with the CHO cDNA with only 9 nucleotide differences and an identical amino acid sequence. These studies indicate that this region of the protein is highly conserved

  18. Isolation and sequencing of a cDNA coding for the human DF3 breast carcinoma-associated antigen

    International Nuclear Information System (INIS)

    Siddiqui, J.; Abe, M.; Hayes, D.; Shani, E.; Yunis, E.; Kufe, D.

    1988-01-01

    The murine monoclonal antibody (mAb) DF3 reacts with a high molecular weight glycoprotein detectable in human breast carcinomas. DF3 antigen expression correlates with human breast tumor differentiation, and the detection of a cross-reactive species in human milk has suggested that this antigen might be useful as a marker of differentiated mammary epithelium. To further characterize DF3 antigen expression, the authors have isolated a cDNA clone from a λgt11 library by screening with mAb DF3. The results demonstrate that this 309-base-pair cDNA, designated pDF9.3, codes for the DF3 epitope. Southern blot analyses of EcoRI-digested DNAs from six human tumor cell lines with 32 P-labeled pDF9.3 have revealed a restriction fragment length polymorphism. Variations in size of the alleles detected by pDF9.3 were also identified in Pst I, but not in HindIII, DNA digests. Furthermore, hybridization of 32 P-labeled pDF9.3 with total cellular RNA from each of these cell lines demonstrated either one or two transcripts that varied from 4.1 to 7.1 kilobases in size. The presence of differently sized transcripts detected by pDF9.3 was also found to correspond with the polymorphic expression of DF3 glycoproteins. Nucleotide sequence analysis of pDF9.3 has revealed a highly conserved (G + C)-rich 60-base-pair tandem repeat. These findings suggest that the variation in size of alleles coding for the polymorphic DF3 glycoprotein may represent different numbers of repeats

  19. HGSA DNA day essay contest winner 60 years on: still coding for cutting-edge science.

    Science.gov (United States)

    Yates, Patrick

    2013-08-01

    MESSAGE FROM THE EDUCATION COMMITTEE: In 2013, the Education Committee of the Human Genetics Society of Australasia (HGSA) established the DNA Day Essay Contest in Australia and New Zealand. The contest was first established by the American Society of Human Genetics in 2005 and the HGSA DNA Day Essay Contest is adapted from this contest via a collaborative partnership. The aim of the contest is to engage high school students with important concepts in genetics through literature research and reflection. As 2013 marks the 60th anniversary of the discovery of the double helix of DNA by James Watson and Francis Crick and the 10th anniversary of the first sequencing of the human genome, the essay topic was to choose either of these breakthroughs and explain its broader impact on biotechnology, human health and disease, or our understanding of basic genetics, such as genetic variation or gene expression. The contest attracted 87 entrants in 2013, with the winning essay authored by Patrick Yates, a Year 12 student from Melbourne High School. Further details about the contest including the names and schools of the other finalists can be found at http://www.hgsa-essay.net.au/. The Education Committee would like to thank all the 2013 applicants and encourage students to enter in 2014.

  20. Fine-tuning the ubiquitin code at DNA double-strand breaks: deubiquitinating enzymes at work

    Directory of Open Access Journals (Sweden)

    Elisabetta eCitterio

    2015-09-01

    Full Text Available Ubiquitination is a reversible protein modification broadly implicated in cellular functions. Signaling processes mediated by ubiquitin are crucial for the cellular response to DNA double-strand breaks (DSBs, one of the most dangerous types of DNA lesions. In particular, the DSB response critically relies on active ubiquitination by the RNF8 and RNF168 ubiquitin ligases at the chromatin, which is essential for proper DSB signaling and repair. How this pathway is fine-tuned and what the functional consequences are of its deregulation for genome integrity and tissue homeostasis are subject of intense investigation. One important regulatory mechanism is by reversal of substrate ubiquitination through the activity of specific deubiquitinating enzymes (DUBs, as supported by the implication of a growing number of DUBs in DNA damage response (DDR processes. Here, we discuss the current knowledge of how ubiquitin-mediated signaling at DSBs is controlled by deubiquitinating enzymes, with main focus on DUBs targeting histone H2A and on their recent implication in stem cell biology and cancer.

  1. Evaluation of the efficacy of twelve mitochondrial protein-coding genes as barcodes for mollusk DNA barcoding.

    Science.gov (United States)

    Yu, Hong; Kong, Lingfeng; Li, Qi

    2016-01-01

    In this study, we evaluated the efficacy of 12 mitochondrial protein-coding genes from 238 mitochondrial genomes of 140 molluscan species as potential DNA barcodes for mollusks. Three barcoding methods (distance, monophyly and character-based methods) were used in species identification. The species recovery rates based on genetic distances for the 12 genes ranged from 70.83 to 83.33%. There were no significant differences in intra- or interspecific variability among the 12 genes. The monophyly and character-based methods provided higher resolution than the distance-based method in species delimitation. Especially in closely related taxa, the character-based method showed some advantages. The results suggested that besides the standard COI barcode, other 11 mitochondrial protein-coding genes could also be potentially used as a molecular diagnostic for molluscan species discrimination. Our results also showed that the combination of mitochondrial genes did not enhance the efficacy for species identification and a single mitochondrial gene would be fully competent.

  2. vConTACT: an iVirus tool to classify double-stranded DNA viruses that infect Archaea and Bacteria

    Directory of Open Access Journals (Sweden)

    Benjamin Bolduc

    2017-05-01

    Full Text Available Taxonomic classification of archaeal and bacterial viruses is challenging, yet also fundamental for developing a predictive understanding of microbial ecosystems. Recent identification of hundreds of thousands of new viral genomes and genome fragments, whose hosts remain unknown, requires a paradigm shift away from traditional classification approaches and towards the use of genomes for taxonomy. Here we revisited the use of genomes and their protein content as a means for developing a viral taxonomy for bacterial and archaeal viruses. A network-based analytic was evaluated and benchmarked against authority-accepted taxonomic assignments and found to be largely concordant. Exceptions were manually examined and found to represent areas of viral genome ‘sequence space’ that are under-sampled or prone to excessive genetic exchange. While both cases are poorly resolved by genome-based taxonomic approaches, the former will improve as viral sequence space is better sampled and the latter are uncommon. Finally, given the largely robust taxonomic capabilities of this approach, we sought to enable researchers to easily and systematically classify new viruses. Thus, we established a tool, vConTACT, as an app at iVirus, where it operates as a fast, highly scalable, user-friendly app within the free and powerful CyVerse cyberinfrastructure.

  3. Ube2V2 Is a Rosetta Stone Bridging Redox and Ubiquitin Codes, Coordinating DNA Damage Responses.

    Science.gov (United States)

    Zhao, Yi; Long, Marcus J C; Wang, Yiran; Zhang, Sheng; Aye, Yimon

    2018-02-28

    Posttranslational modifications (PTMs) are the lingua franca of cellular communication. Most PTMs are enzyme-orchestrated. However, the reemergence of electrophilic drugs has ushered mining of unconventional/non-enzyme-catalyzed electrophile-signaling pathways. Despite the latest impetus toward harnessing kinetically and functionally privileged cysteines for electrophilic drug design, identifying these sensors remains challenging. Herein, we designed "G-REX"-a technique that allows controlled release of reactive electrophiles in vivo. Mitigating toxicity/off-target effects associated with uncontrolled bolus exposure, G-REX tagged first-responding innate cysteines that bind electrophiles under true k cat / K m conditions. G-REX identified two allosteric ubiquitin-conjugating proteins-Ube2V1/Ube2V2-sharing a novel privileged-sensor-cysteine. This non-enzyme-catalyzed-PTM triggered responses specific to each protein. Thus, G-REX is an unbiased method to identify novel functional cysteines. Contrasting conventional active-site/off-active-site cysteine-modifications that regulate target activity, modification of Ube2V2 allosterically hyperactivated its enzymatically active binding-partner Ube2N, promoting K63-linked client ubiquitination and stimulating H2AX-dependent DNA damage response. This work establishes Ube2V2 as a Rosetta-stone bridging redox and ubiquitin codes to guard genome integrity.

  4. New insights into the Lake Chad Basin population structure revealed by high-throughput genotyping of mitochondrial DNA coding SNPs.

    Directory of Open Access Journals (Sweden)

    María Cerezo

    Full Text Available BACKGROUND: Located in the Sudan belt, the Chad Basin forms a remarkable ecosystem, where several unique agricultural and pastoral techniques have been developed. Both from an archaeological and a genetic point of view, this region has been interpreted to be the center of a bidirectional corridor connecting West and East Africa, as well as a meeting point for populations coming from North Africa through the Saharan desert. METHODOLOGY/PRINCIPAL FINDINGS: Samples from twelve ethnic groups from the Chad Basin (n = 542 have been high-throughput genotyped for 230 coding region mitochondrial DNA (mtDNA Single Nucleotide Polymorphisms (mtSNPs using Matrix-Assisted Laser Desorption/Ionization Time-Of-Flight (MALDI-TOF mass spectrometry. This set of mtSNPs allowed for much better phylogenetic resolution than previous studies of this geographic region, enabling new insights into its population history. Notable haplogroup (hg heterogeneity has been observed in the Chad Basin mirroring the different demographic histories of these ethnic groups. As estimated using a Bayesian framework, nomadic populations showed negative growth which was not always correlated to their estimated effective population sizes. Nomads also showed lower diversity values than sedentary groups. CONCLUSIONS/SIGNIFICANCE: Compared to sedentary population, nomads showed signals of stronger genetic drift occurring in their ancestral populations. These populations, however, retained more haplotype diversity in their hypervariable segments I (HVS-I, but not their mtSNPs, suggesting a more ancestral ethnogenesis. Whereas the nomadic population showed a higher Mediterranean influence signaled mainly by sub-lineages of M1, R0, U6, and U5, the other populations showed a more consistent sub-Saharan pattern. Although lifestyle may have an influence on diversity patterns and hg composition, analysis of molecular variance has not identified these differences. The present study indicates that

  5. Lnc2Meth: a manually curated database of regulatory relationships between long non-coding RNAs and DNA methylation associated with human disease.

    Science.gov (United States)

    Zhi, Hui; Li, Xin; Wang, Peng; Gao, Yue; Gao, Baoqing; Zhou, Dianshuang; Zhang, Yan; Guo, Maoni; Yue, Ming; Shen, Weitao; Ning, Shangwei; Jin, Lianhong; Li, Xia

    2018-01-04

    Lnc2Meth (http://www.bio-bigdata.com/Lnc2Meth/), an interactive resource to identify regulatory relationships between human long non-coding RNAs (lncRNAs) and DNA methylation, is not only a manually curated collection and annotation of experimentally supported lncRNAs-DNA methylation associations but also a platform that effectively integrates tools for calculating and identifying the differentially methylated lncRNAs and protein-coding genes (PCGs) in diverse human diseases. The resource provides: (i) advanced search possibilities, e.g. retrieval of the database by searching the lncRNA symbol of interest, DNA methylation patterns, regulatory mechanisms and disease types; (ii) abundant computationally calculated DNA methylation array profiles for the lncRNAs and PCGs; (iii) the prognostic values for each hit transcript calculated from the patients clinical data; (iv) a genome browser to display the DNA methylation landscape of the lncRNA transcripts for a specific type of disease; (v) tools to re-annotate probes to lncRNA loci and identify the differential methylation patterns for lncRNAs and PCGs with user-supplied external datasets; (vi) an R package (LncDM) to complete the differentially methylated lncRNAs identification and visualization with local computers. Lnc2Meth provides a timely and valuable resource that can be applied to significantly expand our understanding of the regulatory relationships between lncRNAs and DNA methylation in various human diseases. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  6. Changes in the Coding and Non-coding Transcriptome and DNA Methylome that Define the Schwann Cell Repair Phenotype after Nerve Injury.

    Science.gov (United States)

    Arthur-Farraj, Peter J; Morgan, Claire C; Adamowicz, Martyna; Gomez-Sanchez, Jose A; Fazal, Shaline V; Beucher, Anthony; Razzaghi, Bonnie; Mirsky, Rhona; Jessen, Kristjan R; Aitman, Timothy J

    2017-09-12

    Repair Schwann cells play a critical role in orchestrating nerve repair after injury, but the cellular and molecular processes that generate them are poorly understood. Here, we perform a combined whole-genome, coding and non-coding RNA and CpG methylation study following nerve injury. We show that genes involved in the epithelial-mesenchymal transition are enriched in repair cells, and we identify several long non-coding RNAs in Schwann cells. We demonstrate that the AP-1 transcription factor C-JUN regulates the expression of certain micro RNAs in repair Schwann cells, in particular miR-21 and miR-34. Surprisingly, unlike during development, changes in CpG methylation are limited in injury, restricted to specific locations, such as enhancer regions of Schwann cell-specific genes (e.g., Nedd4l), and close to local enrichment of AP-1 motifs. These genetic and epigenomic changes broaden our mechanistic understanding of the formation of repair Schwann cell during peripheral nervous system tissue repair. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.

  7. Mutagenesis by alkylating agents: coding properties for DNA polymerase of poly (dC) template containing 3-methylcytosine

    Energy Technology Data Exchange (ETDEWEB)

    Boiteux, S.; Laval, J. (Institut Gustave-Roussy, 94 - Villejuif (France))

    After treatment of poly(dC) by the simple alkylating agent (/sup 3/H)dimethylsulfate, 90 percent of the radioactivity cochromatographed with 3-methylcytosine and 10 percent with 5-methylcytosine which is the normally occurring methylated base. In order to study the influence of 3-methylcytosine on DNA replication, untreated and MDS-treated poly(dC) were used as templates for E. coli DNA polymerase I. The alkylation of poly(dC) inhibits DNA chain elongation, and does not induce any mispairing under high fidelity conditions. The alteration of DNA polymerase I fidelity by manganese ions allows some replication of 3-methylcytosine which mispairs with either dAMP or dTMP. Our results suggest that 3-methylcytosine could be responsible, at least partially, for killing and the mutagenesis observed after cell treatment by alkylating agents.

  8. Cloning and expression of a cDNA coding for the human platelet-derived growth factor receptor: Evidence for more than one receptor class

    International Nuclear Information System (INIS)

    Gronwald, R.G.K.; Grant, F.J.; Haldeman, B.A.; Hart, C.E.; O'Hara, P.J.; Hagen, F.S.; Ross, R.; Bowen-Pope, D.F.; Murray, M.J.

    1988-01-01

    The complete nucleotide sequence of a cDNA encoding the human platelet-derived growth factor (PDGF) receptor is presented. The cDNA contains an open reading frame that codes for a protein of 1106 amino acids. Comparison to the mouse PDGF receptor reveals an overall amino acid sequence identity of 86%. This sequence identity rises to 98% in the cytoplasmic split tyrosine kinase domain. RNA blot hybridization analysis of poly(A) + RNA from human dermal fibroblasts detects a major and a minor transcript using the cDNA as a probe. Baby hamster kidney cells, transfected with an expression vector containing the receptor cDNA, express an ∼ 190-kDa cell surface protein that is recognized by an anti-human PDGF receptor antibody. The recombinant PDGF receptor is functional in the transfected baby hamster kidney cells as demonstrated by ligand-induced phosphorylation of the receptor. Binding properties of the recombinant PDGF receptor were also assessed with pure preparations of BB and AB isoforms of PDGF. Unlike human dermal fibroblasts, which bind both isoforms with high affinity, the transfected baby hamster kidney cells bind only the BB isoform of PDGF with high affinity. This observation is consistent with the existence of more than one PDGF receptor class

  9. Classifying Returns as Extreme

    DEFF Research Database (Denmark)

    Christiansen, Charlotte

    2014-01-01

    I consider extreme returns for the stock and bond markets of 14 EU countries using two classification schemes: One, the univariate classification scheme from the previous literature that classifies extreme returns for each market separately, and two, a novel multivariate classification scheme tha...

  10. Classifying web pages with visual features

    NARCIS (Netherlands)

    de Boer, V.; van Someren, M.; Lupascu, T.; Filipe, J.; Cordeiro, J.

    2010-01-01

    To automatically classify and process web pages, current systems use the textual content of those pages, including both the displayed content and the underlying (HTML) code. However, a very important feature of a web page is its visual appearance. In this paper, we show that using generic visual

  11. LCC: Light Curves Classifier

    Science.gov (United States)

    Vo, Martin

    2017-08-01

    Light Curves Classifier uses data mining and machine learning to obtain and classify desired objects. This task can be accomplished by attributes of light curves or any time series, including shapes, histograms, or variograms, or by other available information about the inspected objects, such as color indices, temperatures, and abundances. After specifying features which describe the objects to be searched, the software trains on a given training sample, and can then be used for unsupervised clustering for visualizing the natural separation of the sample. The package can be also used for automatic tuning parameters of used methods (for example, number of hidden neurons or binning ratio). Trained classifiers can be used for filtering outputs from astronomical databases or data stored locally. The Light Curve Classifier can also be used for simple downloading of light curves and all available information of queried stars. It natively can connect to OgleII, OgleIII, ASAS, CoRoT, Kepler, Catalina and MACHO, and new connectors or descriptors can be implemented. In addition to direct usage of the package and command line UI, the program can be used through a web interface. Users can create jobs for ”training” methods on given objects, querying databases and filtering outputs by trained filters. Preimplemented descriptors, classifier and connectors can be picked by simple clicks and their parameters can be tuned by giving ranges of these values. All combinations are then calculated and the best one is used for creating the filter. Natural separation of the data can be visualized by unsupervised clustering.

  12. Defining and Classifying Interest Groups

    DEFF Research Database (Denmark)

    Baroni, Laura; Carroll, Brendan; Chalmers, Adam

    2014-01-01

    The interest group concept is defined in many different ways in the existing literature and a range of different classification schemes are employed. This complicates comparisons between different studies and their findings. One of the important tasks faced by interest group scholars engaged...... in large-N studies is therefore to define the concept of an interest group and to determine which classification scheme to use for different group types. After reviewing the existing literature, this article sets out to compare different approaches to defining and classifying interest groups with a sample...... in the organizational attributes of specific interest group types. As expected, our comparison of coding schemes reveals a closer link between group attributes and group type in narrower classification schemes based on group organizational characteristics than those based on a behavioral definition of lobbying....

  13. Species tree phylogeny and character evolution in the genus Centipeda (Asteraceae): evidence from DNA sequences from coding and non-coding loci from the plastid and nuclear genomes.

    Science.gov (United States)

    Nylinder, Stephan; Cronholm, Bodil; de Lange, Peter J; Walsh, Neville; Anderberg, Arne A

    2013-08-01

    A species tree phylogeny of the Australian/New Zealand genus Centipeda (Asteraceae) is estimated based on nucleotide sequence data. We analysed sequences of nuclear ribosomal DNA (ETS, ITS) and three plasmid loci (ndhF, psbA-trnH, and trnL-F) using the multi-species coalescent module in BEAST. A total of 129 individuals from all 10 recognised species of Centipeda were sampled throughout the species distribution ranges, including two subspecies. We conclude that the inferred species tree topology largely conform previous assumptions on species relationships. Centipeda racemosa (Snuffweed) is the sister to remaining species, which is also the only consistently perennial representative in the genus. Centipeda pleiocephala (Tall Sneezeweed) and C. nidiformis (Cotton Sneezeweed) constitute a species pair, as does C. borealis and C. minima (Spreading Sneezeweed), all sharing the symplesiomorphic characters of spherical capitulum and convex receptacle with C. racemosa. Another species group comprising C. thespidioides (Desert Sneezeweed), C. cunninghamii (Old man weed, or Common sneeze-weed), C. crateriformis is well-supported but then include the morphologically aberrant C. aotearoana, all sharing the character of having capitula that mature more slowly relative the subtending shoot. Centipeda elatinoides takes on a weakly supported intermediate position between the two mentioned groups, and is difficult to relate to any of the former groups based on morphological characters. Copyright © 2013 Elsevier Inc. All rights reserved.

  14. Intelligent Garbage Classifier

    Directory of Open Access Journals (Sweden)

    Ignacio Rodríguez Novelle

    2008-12-01

    Full Text Available IGC (Intelligent Garbage Classifier is a system for visual classification and separation of solid waste products. Currently, an important part of the separation effort is based on manual work, from household separation to industrial waste management. Taking advantage of the technologies currently available, a system has been built that can analyze images from a camera and control a robot arm and conveyor belt to automatically separate different kinds of waste.

  15. Classifying Linear Canonical Relations

    OpenAIRE

    Lorand, Jonathan

    2015-01-01

    In this Master's thesis, we consider the problem of classifying, up to conjugation by linear symplectomorphisms, linear canonical relations (lagrangian correspondences) from a finite-dimensional symplectic vector space to itself. We give an elementary introduction to the theory of linear canonical relations and present partial results toward the classification problem. This exposition should be accessible to undergraduate students with a basic familiarity with linear algebra.

  16. Characterization of non-coding DNA satellites associated with sweepoviruses (genus Begomovirus, Geminiviridae - definition of a distinct class of begomovirus-associated satellites

    Directory of Open Access Journals (Sweden)

    Gloria eLozano

    2016-02-01

    Full Text Available Begomoviruses (family Geminiviridae are whitefly-transmitted, plant-infecting single-stranded DNA viruses that cause crop losses throughout the warmer parts of the World. Sweepoviruses are a phylogenetically distinct group of begomoviruses that infect plants of the family Convolvulaceae, including sweet potato (Ipomoea batatas. Two classes of subviral molecules are often associated with begomoviruses, particularly in the Old World; the betasatellites and the alphasatellites. An analysis of sweet potato and Ipomoea indica samples from Spain and Merremia dissecta samples from Venezuela identified small non-coding subviral molecules in association with several distinct sweepoviruses. The sequences of 18 clones were obtained and found to be structurally similar to tomato leaf curl virus–satellite (ToLCV-sat, the first DNA satellite identified in association with a begomovirus, with a region with significant sequence identity to the conserved region of betasatellites, an A-rich sequence, a predicted stem-loop structure containing the nonanucleotide TAATATTAC, and a second predicted stem-loop. These sweepovirus-associated satellites join an increasing number of ToLCV-sat-like non-coding satellites identified recently. Although sharing some features with betasatellites, evidence is provided to suggest that the ToLCV-sat-like satellites are distinct from betasatellites and should be considered a separate class of satellites, for which the collective name deltasatellites is proposed.

  17. Arabidopsis RNASE THREE LIKE2 Modulates the Expression of Protein-Coding Genes via 24-Nucleotide Small Interfering RNA-Directed DNA Methylation.

    Science.gov (United States)

    Elvira-Matelot, Emilie; Hachet, Mélanie; Shamandi, Nahid; Comella, Pascale; Sáez-Vásquez, Julio; Zytnicki, Matthias; Vaucheret, Hervé

    2016-02-01

    RNaseIII enzymes catalyze the cleavage of double-stranded RNA (dsRNA) and have diverse functions in RNA maturation. Arabidopsis thaliana RNASE THREE LIKE2 (RTL2), which carries one RNaseIII and two dsRNA binding (DRB) domains, is a unique Arabidopsis RNaseIII enzyme resembling the budding yeast small interfering RNA (siRNA)-producing Dcr1 enzyme. Here, we show that RTL2 modulates the production of a subset of small RNAs and that this activity depends on both its RNaseIII and DRB domains. However, the mode of action of RTL2 differs from that of Dcr1. Whereas Dcr1 directly cleaves dsRNAs into 23-nucleotide siRNAs, RTL2 likely cleaves dsRNAs into longer molecules, which are subsequently processed into small RNAs by the DICER-LIKE enzymes. Depending on the dsRNA considered, RTL2-mediated maturation either improves (RTL2-dependent loci) or reduces (RTL2-sensitive loci) the production of small RNAs. Because the vast majority of RTL2-regulated loci correspond to transposons and intergenic regions producing 24-nucleotide siRNAs that guide DNA methylation, RTL2 depletion modifies DNA methylation in these regions. Nevertheless, 13% of RTL2-regulated loci correspond to protein-coding genes. We show that changes in 24-nucleotide siRNA levels also affect DNA methylation levels at such loci and inversely correlate with mRNA steady state levels, thus implicating RTL2 in the regulation of protein-coding gene expression. © 2016 American Society of Plant Biologists. All rights reserved.

  18. Homological stabilizer codes

    Energy Technology Data Exchange (ETDEWEB)

    Anderson, Jonas T., E-mail: jonastyleranderson@gmail.com

    2013-03-15

    In this paper we define homological stabilizer codes on qubits which encompass codes such as Kitaev's toric code and the topological color codes. These codes are defined solely by the graphs they reside on. This feature allows us to use properties of topological graph theory to determine the graphs which are suitable as homological stabilizer codes. We then show that all toric codes are equivalent to homological stabilizer codes on 4-valent graphs. We show that the topological color codes and toric codes correspond to two distinct classes of graphs. We define the notion of label set equivalencies and show that under a small set of constraints the only homological stabilizer codes without local logical operators are equivalent to Kitaev's toric code or to the topological color codes. - Highlights: Black-Right-Pointing-Pointer We show that Kitaev's toric codes are equivalent to homological stabilizer codes on 4-valent graphs. Black-Right-Pointing-Pointer We show that toric codes and color codes correspond to homological stabilizer codes on distinct graphs. Black-Right-Pointing-Pointer We find and classify all 2D homological stabilizer codes. Black-Right-Pointing-Pointer We find optimal codes among the homological stabilizer codes.

  19. Molecular phylogeny of Edraianthus (Grassy Bells; Campanulaceae) based on non-coding plastid DNA sequences

    DEFF Research Database (Denmark)

    Stefanovic, Sasa; Lakusic, Dmitar; Kuzmina, Maria

    2008-01-01

    The Balkan Peninsula is known as an ice-age refugium and an area with high rates of speciation and diversification. Only a few genera have their centers of distribution in the Balkans and the endemic genus Edraianthus is one of its most prominent groups. As such, Edraianthus is an excellent model...... divided into three sections: E. sect. Edraianthus, E. sect. Uniflori, and E. sect. Spathulati. We present here the first phylogenetic study of Edraianthus based on multiple plastid DNA sequences (trnL-F region and rbcL-atpB spacer) derived from a wide taxonomic sampling and geographic range. While...

  20. Stack filter classifiers

    Energy Technology Data Exchange (ETDEWEB)

    Porter, Reid B [Los Alamos National Laboratory; Hush, Don [Los Alamos National Laboratory

    2009-01-01

    Just as linear models generalize the sample mean and weighted average, weighted order statistic models generalize the sample median and weighted median. This analogy can be continued informally to generalized additive modeels in the case of the mean, and Stack Filters in the case of the median. Both of these model classes have been extensively studied for signal and image processing but it is surprising to find that for pattern classification, their treatment has been significantly one sided. Generalized additive models are now a major tool in pattern classification and many different learning algorithms have been developed to fit model parameters to finite data. However Stack Filters remain largely confined to signal and image processing and learning algorithms for classification are yet to be seen. This paper is a step towards Stack Filter Classifiers and it shows that the approach is interesting from both a theoretical and a practical perspective.

  1. Properties of non-coding DNA and identification of putative cis-regulatory elements in Theileria parva

    Directory of Open Access Journals (Sweden)

    Guo Xiang

    2008-12-01

    Full Text Available Abstract Background Parasites in the genus Theileria cause lymphoproliferative diseases in cattle, resulting in enormous socio-economic losses. The availability of the genome sequences and annotation for T. parva and T. annulata has facilitated the study of parasite biology and their relationship with host cell transformation and tropism. However, the mechanism of transcriptional regulation in this genus, which may be key to understanding fundamental aspects of its parasitology, remains poorly understood. In this study, we analyze the evolution of non-coding sequences in the Theileria genome and identify conserved sequence elements that may be involved in gene regulation of these parasitic species. Results Intergenic regions and introns in Theileria are short, and their length distributions are considerably right-skewed. Intergenic regions flanked by genes in 5'-5' orientation tend to be longer and slightly more AT-rich than those flanked by two stop codons; intergenic regions flanked by genes in 3'-5' orientation have intermediate values of length and AT composition. Intron position is negatively correlated with intron length, and positively correlated with GC content. Using stringent criteria, we identified a set of high-quality orthologous non-coding sequences between T. parva and T. annulata, and determined the distribution of selective constraints across regions, which are shown to be higher close to translation start sites. A positive correlation between constraint and length in both intergenic regions and introns suggests a tight control over length expansion of non-coding regions. Genome-wide searches for functional elements revealed several conserved motifs in intergenic regions of Theileria genomes. Two such motifs are preferentially located within the first 60 base pairs upstream of transcription start sites in T. parva, are preferentially associated with specific protein functional categories, and have significant similarity to know

  2. Stalled RNAP-II molecules bound to non-coding rDNA spacers are required for normal nucleolus architecture.

    Science.gov (United States)

    Freire-Picos, M A; Landeira-Ameijeiras, V; Mayán, María D

    2013-07-01

    The correct distribution of nuclear domains is critical for the maintenance of normal cellular processes such as transcription and replication, which are regulated depending on their location and surroundings. The most well-characterized nuclear domain, the nucleolus, is essential for cell survival and metabolism. Alterations in nucleolar structure affect nuclear dynamics; however, how the nucleolus and the rest of the nuclear domains are interconnected is largely unknown. In this report, we demonstrate that RNAP-II is vital for the maintenance of the typical crescent-shaped structure of the nucleolar rDNA repeats and rRNA transcription. When stalled RNAP-II molecules are not bound to the chromatin, the nucleolus loses its typical crescent-shaped structure. However, the RNAP-II interaction with Seh1p, or cryptic transcription by RNAP-II, is not critical for morphological changes. Copyright © 2013 John Wiley & Sons, Ltd.

  3. Alterations in sperm DNA methylation, non-coding RNA expression, and histone retention mediate vinclozolin-induced epigenetic transgenerational inheritance of disease.

    Science.gov (United States)

    Ben Maamar, Millissia; Sadler-Riggleman, Ingrid; Beck, Daniel; McBirney, Margaux; Nilsson, Eric; Klukovich, Rachel; Xie, Yeming; Tang, Chong; Yan, Wei; Skinner, Michael K

    2018-04-01

    Epigenetic transgenerational inheritance of disease and phenotypic variation can be induced by several toxicants, such as vinclozolin. This phenomenon can involve DNA methylation, non-coding RNA (ncRNA) and histone retention, and/or modification in the germline (e.g. sperm). These different epigenetic marks are called epimutations and can transmit in part the transgenerational phenotypes. This study was designed to investigate the vinclozolin-induced concurrent alterations of a number of different epigenetic factors, including DNA methylation, ncRNA, and histone retention in rat sperm. Gestating females (F0 generation) were exposed transiently to vinclozolin during fetal gonadal development. The directly exposed F1 generation fetus, the directly exposed germline within the fetus that will generate the F2 generation, and the transgenerational F3 generation sperm were studied. DNA methylation and ncRNA were altered in each generation rat sperm with the direct exposure F1 and F2 generations being distinct from the F3 generation epimutations. Interestingly, an increased number of differential histone retention sites were found in the F3 generation vinclozolin sperm, but not in the F1 or F2 generations. All three different epimutation types were affected in the vinclozolin lineage transgenerational sperm (F3 generation). The direct exposure generations (F1 and F2) epigenetic alterations were distinct from the transgenerational sperm epimutations. The genomic features and gene pathways associated with the epimutations were investigated to help elucidate the integration of these different epigenetic processes. Our results show that the three different types of epimutations are involved and integrated in the mediation of the epigenetic transgenerational inheritance phenomenon.

  4. Recognize and classify pneumoconiosis

    International Nuclear Information System (INIS)

    Hering, K.G.; Hofmann-Preiss, K.

    2014-01-01

    In the year 2012, out of the 10 most frequently recognized occupational diseases 6 were forms of pneumoconiosis. With respect to healthcare and economic aspects, silicosis and asbestos-associated diseases are of foremost importance. The latter are to be found everywhere and are not restricted to large industrial areas. Radiology has a central role in the diagnosis and evaluation of occupational lung disorders. In cases of known exposure mainly to asbestos and quartz, the diagnosis of pneumoconiosis, with few exceptions will be established primarily by the radiological findings. As these disorders are asymptomatic for a long time they are quite often detected as incidental findings in examinations for other reasons. Therefore, radiologists have to be familiar with the pattern of findings of the most frequent forms of pneumoconiosis and the differential diagnoses. For reasons of equal treatment of the insured a quality-based, standardized performance, documentation and evaluation of radiological examinations is required in preventive procedures and evaluations. Above all, a standardized low-dose protocol has to be used in computed tomography (CT) examinations, although individualized concerning the dose, in order to keep radiation exposure as low as possible for the patient. The International Labour Office (ILO) classification for the coding of chest X-rays and the international classification of occupational and environmental respiratory diseases (ICOERD) classification used since 2004 for CT examinations meet the requirements of the insured and the occupational insurance associations as a means of reproducible and comparable data for decision-making. (orig.) [de

  5. Defending Malicious Script Attacks Using Machine Learning Classifiers

    Directory of Open Access Journals (Sweden)

    Nayeem Khan

    2017-01-01

    Full Text Available The web application has become a primary target for cyber criminals by injecting malware especially JavaScript to perform malicious activities for impersonation. Thus, it becomes an imperative to detect such malicious code in real time before any malicious activity is performed. This study proposes an efficient method of detecting previously unknown malicious java scripts using an interceptor at the client side by classifying the key features of the malicious code. Feature subset was obtained by using wrapper method for dimensionality reduction. Supervised machine learning classifiers were used on the dataset for achieving high accuracy. Experimental results show that our method can efficiently classify malicious code from benign code with promising results.

  6. A Supervised Multiclass Classifier for an Autocoding System

    Directory of Open Access Journals (Sweden)

    Yukako Toko

    2017-11-01

    Full Text Available Classification is often required in various contexts, including in the field of official statistics. In the previous study, we have developed a multiclass classifier that can classify short text descriptions with high accuracy. The algorithm borrows the concept of the naïve Bayes classifier and is so simple that its structure is easily understandable. The proposed classifier has the following two advantages. First, the processing times for both learning and classifying are extremely practical. Second, the proposed classifier yields high-accuracy results for a large portion of a dataset. We have previously developed an autocoding system for the Family Income and Expenditure Survey in Japan that has a better performing classifier. While the original system was developed in Perl in order to improve the efficiency of the coding process of short Japanese texts, the proposed system is implemented in the R programming language in order to explore versatility and is modified to make the system easily applicable to English text descriptions, in consideration of the increasing number of R users in the field of official statistics. We are planning to publish the proposed classifier as an R-package. The proposed classifier would be generally applicable to other classification tasks including coding activities in the field of official statistics, and it would contribute greatly to improving their efficiency.

  7. Fingerprint prediction using classifier ensembles

    CSIR Research Space (South Africa)

    Molale, P

    2011-11-01

    Full Text Available ); logistic discrimination (LgD), k-nearest neighbour (k-NN), artificial neural network (ANN), association rules (AR) decision tree (DT), naive Bayes classifier (NBC) and the support vector machine (SVM). The performance of several multiple classifier systems...

  8. Classified

    CERN Multimedia

    Computer Security Team

    2011-01-01

    In the last issue of the Bulletin, we have discussed recent implications for privacy on the Internet. But privacy of personal data is just one facet of data protection. Confidentiality is another one. However, confidentiality and data protection are often perceived as not relevant in the academic environment of CERN.   But think twice! At CERN, your personal data, e-mails, medical records, financial and contractual documents, MARS forms, group meeting minutes (and of course your password!) are all considered to be sensitive, restricted or even confidential. And this is not all. Physics results, in particular when being preliminary and pending scrutiny, are sensitive, too. Just recently, an ATLAS collaborator copy/pasted the abstract of an ATLAS note onto an external public blog, despite the fact that this document was clearly marked as an "Internal Note". Such an act was not only embarrassing to the ATLAS collaboration, and had negative impact on CERN’s reputation --- i...

  9. Isolation and expression of a novel chick G-protein cDNA coding for a G alpha i3 protein with a G alpha 0 N-terminus.

    OpenAIRE

    Kilbourne, E J; Galper, J B

    1994-01-01

    We have cloned cDNAs coding for G-protein alpha subunits from a chick brain cDNA library. Based on sequence similarity to G-protein alpha subunits from other eukaryotes, one clone was designated G alpha i3. A second clone, G alpha i3-o, was identical to the G alpha i3 clone over 932 bases on the 3' end. The 5' end of G alpha i3-o, however, contained an alternative sequence in which the first 45 amino acids coded for are 100% identical to the conserved N-terminus of G alpha o from species such...

  10. Classifying Sluice Occurrences in Dialogue

    DEFF Research Database (Denmark)

    Baird, Austin; Hamza, Anissa; Hardt, Daniel

    2018-01-01

    perform manual annotation with acceptable inter-coder agreement. We build classifier models with Decision Trees and Naive Bayes, with accuracy of 67%. We deploy a classifier to automatically classify sluice occurrences in OpenSubtitles, resulting in a corpus with 1.7 million occurrences. This will support....... Despite this, the corpus can be of great use in research on sluicing and development of systems, and we are making the corpus freely available on request. Furthermore, we are in the process of improving the accuracy of sluice identification and annotation for the purpose of created a subsequent version...

  11. Quantum ensembles of quantum classifiers.

    Science.gov (United States)

    Schuld, Maria; Petruccione, Francesco

    2018-02-09

    Quantum machine learning witnesses an increasing amount of quantum algorithms for data-driven decision making, a problem with potential applications ranging from automated image recognition to medical diagnosis. Many of those algorithms are implementations of quantum classifiers, or models for the classification of data inputs with a quantum computer. Following the success of collective decision making with ensembles in classical machine learning, this paper introduces the concept of quantum ensembles of quantum classifiers. Creating the ensemble corresponds to a state preparation routine, after which the quantum classifiers are evaluated in parallel and their combined decision is accessed by a single-qubit measurement. This framework naturally allows for exponentially large ensembles in which - similar to Bayesian learning - the individual classifiers do not have to be trained. As an example, we analyse an exponentially large quantum ensemble in which each classifier is weighed according to its performance in classifying the training data, leading to new results for quantum as well as classical machine learning.

  12. IAEA safeguards and classified materials

    International Nuclear Information System (INIS)

    Pilat, J.F.; Eccleston, G.W.; Fearey, B.L.; Nicholas, N.J.; Tape, J.W.; Kratzer, M.

    1997-01-01

    The international community in the post-Cold War period has suggested that the International Atomic Energy Agency (IAEA) utilize its expertise in support of the arms control and disarmament process in unprecedented ways. The pledges of the US and Russian presidents to place excess defense materials, some of which are classified, under some type of international inspections raises the prospect of using IAEA safeguards approaches for monitoring classified materials. A traditional safeguards approach, based on nuclear material accountancy, would seem unavoidably to reveal classified information. However, further analysis of the IAEA's safeguards approaches is warranted in order to understand fully the scope and nature of any problems. The issues are complex and difficult, and it is expected that common technical understandings will be essential for their resolution. Accordingly, this paper examines and compares traditional safeguards item accounting of fuel at a nuclear power station (especially spent fuel) with the challenges presented by inspections of classified materials. This analysis is intended to delineate more clearly the problems as well as reveal possible approaches, techniques, and technologies that could allow the adaptation of safeguards to the unprecedented task of inspecting classified materials. It is also hoped that a discussion of these issues can advance ongoing political-technical debates on international inspections of excess classified materials

  13. Hybrid classifiers methods of data, knowledge, and classifier combination

    CERN Document Server

    Wozniak, Michal

    2014-01-01

    This book delivers a definite and compact knowledge on how hybridization can help improving the quality of computer classification systems. In order to make readers clearly realize the knowledge of hybridization, this book primarily focuses on introducing the different levels of hybridization and illuminating what problems we will face with as dealing with such projects. In the first instance the data and knowledge incorporated in hybridization were the action points, and then a still growing up area of classifier systems known as combined classifiers was considered. This book comprises the aforementioned state-of-the-art topics and the latest research results of the author and his team from Department of Systems and Computer Networks, Wroclaw University of Technology, including as classifier based on feature space splitting, one-class classification, imbalance data, and data stream classification.

  14. Code Cactus; Code Cactus

    Energy Technology Data Exchange (ETDEWEB)

    Fajeau, M; Nguyen, L T; Saunier, J [Commissariat a l' Energie Atomique, Centre d' Etudes Nucleaires de Saclay, 91 - Gif-sur-Yvette (France)

    1966-09-01

    This code handles the following problems: -1) Analysis of thermal experiments on a water loop at high or low pressure; steady state or transient behavior; -2) Analysis of thermal and hydrodynamic behavior of water-cooled and moderated reactors, at either high or low pressure, with boiling permitted; fuel elements are assumed to be flat plates: - Flowrate in parallel channels coupled or not by conduction across plates, with conditions of pressure drops or flowrate, variable or not with respect to time is given; the power can be coupled to reactor kinetics calculation or supplied by the code user. The code, containing a schematic representation of safety rod behavior, is a one dimensional, multi-channel code, and has as its complement (FLID), a one-channel, two-dimensional code. (authors) [French] Ce code permet de traiter les problemes ci-dessous: 1. Depouillement d'essais thermiques sur boucle a eau, haute ou basse pression, en regime permanent ou transitoire; 2. Etudes thermiques et hydrauliques de reacteurs a eau, a plaques, a haute ou basse pression, ebullition permise: - repartition entre canaux paralleles, couples on non par conduction a travers plaques, pour des conditions de debit ou de pertes de charge imposees, variables ou non dans le temps; - la puissance peut etre couplee a la neutronique et une representation schematique des actions de securite est prevue. Ce code (Cactus) a une dimension d'espace et plusieurs canaux, a pour complement Flid qui traite l'etude d'un seul canal a deux dimensions. (auteurs)

  15. 3D Bayesian contextual classifiers

    DEFF Research Database (Denmark)

    Larsen, Rasmus

    2000-01-01

    We extend a series of multivariate Bayesian 2-D contextual classifiers to 3-D by specifying a simultaneous Gaussian distribution for the feature vectors as well as a prior distribution of the class variables of a pixel and its 6 nearest 3-D neighbours.......We extend a series of multivariate Bayesian 2-D contextual classifiers to 3-D by specifying a simultaneous Gaussian distribution for the feature vectors as well as a prior distribution of the class variables of a pixel and its 6 nearest 3-D neighbours....

  16. Knowledge Uncertainty and Composed Classifier

    Czech Academy of Sciences Publication Activity Database

    Klimešová, Dana; Ocelíková, E.

    2007-01-01

    Roč. 1, č. 2 (2007), s. 101-105 ISSN 1998-0140 Institutional research plan: CEZ:AV0Z10750506 Keywords : Boosting architecture * contextual modelling * composed classifier * knowledge management, * knowledge * uncertainty Subject RIV: IN - Informatics, Computer Science

  17. Correlation Dimension-Based Classifier

    Czech Academy of Sciences Publication Activity Database

    Jiřina, Marcel; Jiřina jr., M.

    2014-01-01

    Roč. 44, č. 12 (2014), s. 2253-2263 ISSN 2168-2267 R&D Projects: GA MŠk(CZ) LG12020 Institutional support: RVO:67985807 Keywords : classifier * multidimensional data * correlation dimension * scaling exponent * polynomial expansion Subject RIV: BB - Applied Statistics, Operational Research Impact factor: 3.469, year: 2014

  18. DNA probes

    International Nuclear Information System (INIS)

    Castelino, J.

    1992-01-01

    The creation of DNA probes for detection of specific nucleotide segments differs from ligand detection in that it is a chemical rather than an immunological reaction. Complementary DNA or RNA is used in place of the antibody and is labelled with 32 P. So far, DNA probes have been successfully employed in the diagnosis of inherited disorders, infectious diseases, and for identification of human oncogenes. The latest approach to the diagnosis of communicable and parasitic infections is based on the use of deoxyribonucleic acid (DNA) probes. The genetic information of all cells is encoded by DNA and DNA probe approach to identification of pathogens is unique because the focus of the method is the nucleic acid content of the organism rather than the products that the nucleic acid encodes. Since every properly classified species has some unique nucleotide sequences that distinguish it from every other species, each organism's genetic composition is in essence a finger print that can be used for its identification. In addition to this specificity, DNA probes offer other advantages in that pathogens may be identified directly in clinical specimens

  19. DNA probes

    Energy Technology Data Exchange (ETDEWEB)

    Castelino, J

    1993-12-31

    The creation of DNA probes for detection of specific nucleotide segments differs from ligand detection in that it is a chemical rather than an immunological reaction. Complementary DNA or RNA is used in place of the antibody and is labelled with {sup 32}P. So far, DNA probes have been successfully employed in the diagnosis of inherited disorders, infectious diseases, and for identification of human oncogenes. The latest approach to the diagnosis of communicable and parasitic infections is based on the use of deoxyribonucleic acid (DNA) probes. The genetic information of all cells is encoded by DNA and DNA probe approach to identification of pathogens is unique because the focus of the method is the nucleic acid content of the organism rather than the products that the nucleic acid encodes. Since every properly classified species has some unique nucleotide sequences that distinguish it from every other species, each organism`s genetic composition is in essence a finger print that can be used for its identification. In addition to this specificity, DNA probes offer other advantages in that pathogens may be identified directly in clinical specimens 10 figs, 2 tabs

  20. Classified facilities for environmental protection

    International Nuclear Information System (INIS)

    Anon.

    1993-02-01

    The legislation of the classified facilities governs most of the dangerous or polluting industries or fixed activities. It rests on the law of 9 July 1976 concerning facilities classified for environmental protection and its application decree of 21 September 1977. This legislation, the general texts of which appear in this volume 1, aims to prevent all the risks and the harmful effects coming from an installation (air, water or soil pollutions, wastes, even aesthetic breaches). The polluting or dangerous activities are defined in a list called nomenclature which subjects the facilities to a declaration or an authorization procedure. The authorization is delivered by the prefect at the end of an open and contradictory procedure after a public survey. In addition, the facilities can be subjected to technical regulations fixed by the Environment Minister (volume 2) or by the prefect for facilities subjected to declaration (volume 3). (A.B.)

  1. Energy-Efficient Neuromorphic Classifiers.

    Science.gov (United States)

    Martí, Daniel; Rigotti, Mattia; Seok, Mingoo; Fusi, Stefano

    2016-10-01

    Neuromorphic engineering combines the architectural and computational principles of systems neuroscience with semiconductor electronics, with the aim of building efficient and compact devices that mimic the synaptic and neural machinery of the brain. The energy consumptions promised by neuromorphic engineering are extremely low, comparable to those of the nervous system. Until now, however, the neuromorphic approach has been restricted to relatively simple circuits and specialized functions, thereby obfuscating a direct comparison of their energy consumption to that used by conventional von Neumann digital machines solving real-world tasks. Here we show that a recent technology developed by IBM can be leveraged to realize neuromorphic circuits that operate as classifiers of complex real-world stimuli. Specifically, we provide a set of general prescriptions to enable the practical implementation of neural architectures that compete with state-of-the-art classifiers. We also show that the energy consumption of these architectures, realized on the IBM chip, is typically two or more orders of magnitude lower than that of conventional digital machines implementing classifiers with comparable performance. Moreover, the spike-based dynamics display a trade-off between integration time and accuracy, which naturally translates into algorithms that can be flexibly deployed for either fast and approximate classifications, or more accurate classifications at the mere expense of longer running times and higher energy costs. This work finally proves that the neuromorphic approach can be efficiently used in real-world applications and has significant advantages over conventional digital devices when energy consumption is considered.

  2. 76 FR 34761 - Classified National Security Information

    Science.gov (United States)

    2011-06-14

    ... MARINE MAMMAL COMMISSION Classified National Security Information [Directive 11-01] AGENCY: Marine... Commission's (MMC) policy on classified information, as directed by Information Security Oversight Office... of Executive Order 13526, ``Classified National Security Information,'' and 32 CFR part 2001...

  3. Efficient DNA barcode regions for classifying Piper species (Piperaceae

    Directory of Open Access Journals (Sweden)

    Arunrat Chaveerach

    2016-09-01

    Full Text Available Piper species are used for spices, in traditional and processed forms of medicines, in cosmetic compounds, in cultural activities and insecticides. Here barcode analysis was performed for identification of plant parts, young plants and modified forms of plants. Thirty-six Piper species were collected and the three barcode regions, matK, rbcL and psbA-trnH spacer, were amplified, sequenced and aligned to determine their genetic distances. For intraspecific genetic distances, the most effective values for the species identification ranged from no difference to very low distance values. However, P. betle had the highest values at 0.386 for the matK region. This finding may be due to P. betle being an economic and cultivated species, and thus is supported with growth factors, which may have affected its genetic distance. The interspecific genetic distances that were most effective for identification of different species were from the matK region and ranged from a low of 0.002 in 27 paired species to a high of 0.486. Eight species pairs, P. kraense and P. dominantinervium, P. magnibaccum and P. kraense, P. phuwuaense and P. dominantinervium, P. phuwuaense and P. kraense, P. pilobracteatum and P. dominantinervium, P. pilobracteatum and P. kraense, P. pilobracteatum and P. phuwuaense and P. sylvestre and P. polysyphonum, that presented a genetic distance of 0.000 and were identified by independently using each of the other two regions. Concisely, these three barcode regions are powerful for further efficient identification of the 36 Piper species.

  4. Efficient DNA barcode regions for classifying Piper species (Piperaceae).

    Science.gov (United States)

    Chaveerach, Arunrat; Tanee, Tawatchai; Sanubol, Arisa; Monkheang, Pansa; Sudmoon, Runglawan

    2016-01-01

    Piper species are used for spices, in traditional and processed forms of medicines, in cosmetic compounds, in cultural activities and insecticides. Here barcode analysis was performed for identification of plant parts, young plants and modified forms of plants. Thirty-six Piper species were collected and the three barcode regions, matK , rbcL and psbA - trnH spacer, were amplified, sequenced and aligned to determine their genetic distances. For intraspecific genetic distances, the most effective values for the species identification ranged from no difference to very low distance values. However, Piper betle had the highest values at 0.386 for the matK region. This finding may be due to Piper betle being an economic and cultivated species, and thus is supported with growth factors, which may have affected its genetic distance. The interspecific genetic distances that were most effective for identification of different species were from the matK region and ranged from a low of 0.002 in 27 paired species to a high of 0.486. Eight species pairs, Piper kraense and Piper dominantinervium , Piper magnibaccum and Piper kraense , Piper phuwuaense and Piper dominantinervium , Piper phuwuaense and Piper kraense , Piper pilobracteatum and Piper dominantinervium , Piper pilobracteatum and Piper kraense , Piper pilobracteatum and Piper phuwuaense and Piper sylvestre and Piper polysyphonum , that presented a genetic distance of 0.000 and were identified by independently using each of the other two regions. Concisely, these three barcode regions are powerful for further efficient identification of the 36 Piper species.

  5. Minisatellites as DNA markers to classify bermudagrasses (Cynodon ...

    Indian Academy of Sciences (India)

    RESEARCH NOTE ... an inexpensive, PCR-based method to amplify minisatellite ... isatellite core primer sequences derived from other species, including ... important quantitative traits (Karaca et al. ... These problems in RAPD-PCR are mainly inherited from the .... isatellite sequence organization of 5.2 times repeated-core.

  6. Efficient DNA barcode regions for classifying Piper species (Piperaceae)

    Science.gov (United States)

    Chaveerach, Arunrat; Tanee, Tawatchai; Sanubol, Arisa; Monkheang, Pansa; Sudmoon, Runglawan

    2016-01-01

    Abstract Piper species are used for spices, in traditional and processed forms of medicines, in cosmetic compounds, in cultural activities and insecticides. Here barcode analysis was performed for identification of plant parts, young plants and modified forms of plants. Thirty-six Piper species were collected and the three barcode regions, matK, rbcL and psbA-trnH spacer, were amplified, sequenced and aligned to determine their genetic distances. For intraspecific genetic distances, the most effective values for the species identification ranged from no difference to very low distance values. However, Piper betle had the highest values at 0.386 for the matK region. This finding may be due to Piper betle being an economic and cultivated species, and thus is supported with growth factors, which may have affected its genetic distance. The interspecific genetic distances that were most effective for identification of different species were from the matK region and ranged from a low of 0.002 in 27 paired species to a high of 0.486. Eight species pairs, Piper kraense and Piper dominantinervium, Piper magnibaccum and Piper kraense, Piper phuwuaense and Piper dominantinervium, Piper phuwuaense and Piper kraense, Piper pilobracteatum and Piper dominantinervium, Piper pilobracteatum and Piper kraense, Piper pilobracteatum and Piper phuwuaense and Piper sylvestre and Piper polysyphonum, that presented a genetic distance of 0.000 and were identified by independently using each of the other two regions. Concisely, these three barcode regions are powerful for further efficient identification of the 36 Piper species. PMID:27829794

  7. Nucleotide sequence of a cDNA coding for the amino-terminal region of human prepro. alpha. 1(III) collagen

    Energy Technology Data Exchange (ETDEWEB)

    Toman, P D; Ricca, G A [Rorer Biotechnology, Inc., Springfield, VA (USA); de Crombrugghe, B [National Institutes of Health, Bethesda, MD (USA)

    1988-07-25

    Type III Collagen is synthesized in a variety of tissues as a precursor macromolecule containing a leader sequence, a N-propeptide, a N-telopeptide, the triple helical region, a C-telopeptide, and C-propeptide. To further characterize the human type III collagen precursor, a human placental cDNA library was constructed in gt11 using an oligonucleotide derived from a partial cDNA sequence corresponding to the carboxy-terminal part of the 1(III) collagen. A cDNA was identified which contains the leader sequence, the N-propeptide and N-telopeptide regions. The DNA sequence of these regions are presented here. The triple helical, C-telopeptide and C-propeptide amino acid sequence for human type III collagen has been determined previously. A comparison of the human amino acid sequence with mouse, chicken, and calf sequence shows 81%, 81%, and 92% similarity, respectively. At the DNA level, the sequence similarity between human and mouse or chicken type III collagen sequences in this area is 82% and 77%, respectively.

  8. Function and Application Areas in Medicine of Non-Coding RNA

    Directory of Open Access Journals (Sweden)

    Figen Guzelgul

    2009-06-01

    Full Text Available RNA is the genetic material converting the genetic code that it gets from DNA into protein. While less than 2 % of RNA is converted into protein , more than 98 % of it can not be converted into protein and named as non-coding RNAs. 70 % of noncoding RNAs consists of introns , however, the rest part of them consists of exons. Non-coding RNAs are examined in two classes according to their size and functions. Whereas they are classified as long non-coding and small non-coding RNAs according to their size , they are grouped as housekeeping non-coding RNAs and regulating non-coding RNAs according to their function. For long years ,these non-coding RNAs have been considered as non-functional. However, today, it has been proved that these non-coding RNAs play role in regulating genes and in structural, functional and catalitic roles of RNAs converted into protein. Due to its taking a role in gene silencing mechanism, particularly in medical world , non-coding RNAs have led to significant developments. RNAi technolgy , which is used in designing drugs to be used in treatment of various diseases , is a ray of hope for medical world. [Archives Medical Review Journal 2009; 18(3.000: 141-155

  9. Preparation of Proper Immunogen by Cloning and Stable Expression of cDNA coding for Human Hematopoietic Stem Cell Marker CD34 in NIH-3T3 Mouse Fibroblast Cell Line

    Science.gov (United States)

    Shafaghat, Farzaneh; Abbasi-Kenarsari, Hajar; Majidi, Jafar; Movassaghpour, Ali Akbar; Shanehbandi, Dariush; Kazemi, Tohid

    2015-01-01

    Purpose: Transmembrane CD34 glycoprotein is the most important marker for identification, isolation and enumeration of hematopoietic stem cells (HSCs). We aimed in this study to clone the cDNA coding for human CD34 from KG1a cell line and stably express in mouse fibroblast cell line NIH-3T3. Such artificial cell line could be useful as proper immunogen for production of mouse monoclonal antibodies. Methods: CD34 cDNA was cloned from KG1a cell line after total RNA extraction and cDNA synthesis. Pfu DNA polymerase-amplified specific band was ligated to pGEMT-easy TA-cloning vector and sub-cloned in pCMV6-Neo expression vector. After transfection of NIH-3T3 cells using 3 μg of recombinant construct and 6 μl of JetPEI transfection reagent, stable expression was obtained by selection of cells by G418 antibiotic and confirmed by surface flow cytometry. Results: 1158 bp specific band was aligned completely to reference sequence in NCBI database corresponding to long isoform of human CD34. Transient and stable expression of human CD34 on transfected NIH-3T3 mouse fibroblast cells was achieved (25% and 95%, respectively) as shown by flow cytometry. Conclusion: Cloning and stable expression of human CD34 cDNA was successfully performed and validated by standard flow cytometric analysis. Due to murine origin of NIH-3T3 cell line, CD34-expressing NIH-3T3 cells could be useful as immunogen in production of diagnostic monoclonal antibodies against human CD34. This approach could bypass the need for purification of recombinant proteins produced in eukaryotic expression systems. PMID:25789221

  10. Waste classifying and separation device

    International Nuclear Information System (INIS)

    Kakiuchi, Hiroki.

    1997-01-01

    A flexible plastic bags containing solid wastes of indefinite shape is broken and the wastes are classified. The bag cutting-portion of the device has an ultrasonic-type or a heater-type cutting means, and the cutting means moves in parallel with the transferring direction of the plastic bags. A classification portion separates and discriminates the plastic bag from the contents and conducts classification while rotating a classification table. Accordingly, the plastic bag containing solids of indefinite shape can be broken and classification can be conducted efficiently and reliably. The device of the present invention has a simple structure which requires small installation space and enables easy maintenance. (T.M.)

  11. A system for classifying wood-using industries and recording statistics for automatic data processing.

    Science.gov (United States)

    E.W. Fobes; R.W. Rowe

    1968-01-01

    A system for classifying wood-using industries and recording pertinent statistics for automatic data processing is described. Forms and coding instructions for recording data of primary processing plants are included.

  12. The genes coding for the hsp70(dnaK) molecular chaperone machine occur in the moderate thermophilic archaeon Methanosarcina thermophila TM-1

    DEFF Research Database (Denmark)

    Hofman-Bang, H Jacob Peider; Lange, Marianne; Ahring, Birgitte Kiær

    1999-01-01

    The hsp70 (dnaK) locus of the moderate thermophilic archaeon Methanosarcina thermophila TM-1 was cloned, sequenced, and tested in vitro to measure gene induction by heat and ammonia, i.e., stressors pertinent to the biotechnological ecosystem of this methanogen that plays a key role in anaerobic...... thermoautotrophicum Delta H, from another genus, in which trkA is not part of the locus. The proteins encoded in the TM-1 genes are very similar to the S-6 homologs, but considerably less similar to the Delta H proteins. The TM-1 Hsp70(DnaK) protein has the 23-amino acid deletion-by comparison with homologs from Gram...

  13. Nucleotide sequence of a cDNA coding for the barley seed protein CMa: an inhibitor of insect α-amylase

    DEFF Research Database (Denmark)

    Rasmussen, Søren Kjærsgård; Johansson, A.

    1992-01-01

    The primary structure of the insect alpha-amylase inhibitor CMa of barley seeds was deduced from a full-length cDNA clone pc43F6. Analysis of RNA from barley endosperm shows high levels 15 and 20 days after flowering. The cDNA predicts an amino acid sequence of 119 residues preceded by a signal...... peptide of 25 amino acids. Ala and Leu account for 55% of the signal peptide. CMa is 60-85% identical with alpha-amylase inhibitors of wheat, but shows less than 50% identity to trypsin inhibitors of barley and wheat. The 10 Cys residues are located in identical positions compared to the cereal inhibitor...

  14. Recombinant Invasive Lactococcus lactis Carrying a DNA Vaccine Coding the Ag85A Antigen Increases INF-γ, IL-6, and TNF-α Cytokines after Intranasal Immunization

    Directory of Open Access Journals (Sweden)

    Pamela Mancha-Agresti

    2017-07-01

    Full Text Available Tuberculosis (TB remains a major threat throughout the world and in 2015 it caused the death of 1.4 million people. The Bacillus Calmette-Guérin is the only existing vaccine against this ancient disease; however, it does not provide complete protection in adults. New vaccines against TB are eminently a global priority. The use of bacteria as vehicles for delivery of vaccine plasmids is a promising vaccination strategy. In this study, we evaluated the use of, an engineered invasive Lactococcus lactis (expressing Fibronectin-Binding Protein A from Staphylococcus aureus for the delivery of DNA plasmid to host cells, especially to the mucosal site as a new DNA vaccine against tuberculosis. One of the major antigens documented that offers protective responses against Mycobacterium tuberculosis is the Ag85A. L. lactis FnBPA+ (pValac:Ag85A which was obtained and used for intranasal immunization of C57BL/6 mice and the immune response profile was evaluated. In this study we observed that this strain was able to produce significant increases in the amount of pro-inflammatory cytokines (IFN-γ, TNF-α, and IL-6 in the stimulated spleen cell supernatants, showing a systemic T helper 1 (Th1 cell response. Antibody production (IgG and sIgA anti-Ag85A was also significantly increased in bronchoalveolar lavage, as well as in the serum of mice. In summary, these findings open new perspectives in the area of mucosal DNA vaccine, against specific pathogens using a Lactic Acid Bacteria such as L. lactis.

  15. Composite Classifiers for Automatic Target Recognition

    National Research Council Canada - National Science Library

    Wang, Lin-Cheng

    1998-01-01

    ...) using forward-looking infrared (FLIR) imagery. Two existing classifiers, one based on learning vector quantization and the other on modular neural networks, are used as the building blocks for our composite classifiers...

  16. Coding Partitions

    Directory of Open Access Journals (Sweden)

    Fabio Burderi

    2007-05-01

    Full Text Available Motivated by the study of decipherability conditions for codes weaker than Unique Decipherability (UD, we introduce the notion of coding partition. Such a notion generalizes that of UD code and, for codes that are not UD, allows to recover the ``unique decipherability" at the level of the classes of the partition. By tacking into account the natural order between the partitions, we define the characteristic partition of a code X as the finest coding partition of X. This leads to introduce the canonical decomposition of a code in at most one unambiguouscomponent and other (if any totally ambiguouscomponents. In the case the code is finite, we give an algorithm for computing its canonical partition. This, in particular, allows to decide whether a given partition of a finite code X is a coding partition. This last problem is then approached in the case the code is a rational set. We prove its decidability under the hypothesis that the partition contains a finite number of classes and each class is a rational set. Moreover we conjecture that the canonical partition satisfies such a hypothesis. Finally we consider also some relationships between coding partitions and varieties of codes.

  17. Nucleotide sequence of the Escherichia coli pyrE gene and of the DNA in front of the protein-coding region

    DEFF Research Database (Denmark)

    Poulsen, Peter; Jensen, Kaj Frank; Valentin-Hansen, Poul

    1983-01-01

    leader segment in front of the protein-coding region. This leader contains a structure with features characteristic for a (translated?) rho-independent transcriptional terminator, which is preceded by a cluster of uridylate residues. This indicates that the frequency of pyrE transcription is regulated......Orotate phosphoribosyltransferase (EC 2.4.2.10) was purified to electrophoretic homogeneity from a strain of Escherichia coli containing the pyrE gene cloned on a multicopy plasmid. The relative molecular masses (Mr) of the native enzyme and its subunit were estimated by means of gel filtration...

  18. Aggregation Operator Based Fuzzy Pattern Classifier Design

    DEFF Research Database (Denmark)

    Mönks, Uwe; Larsen, Henrik Legind; Lohweg, Volker

    2009-01-01

    This paper presents a novel modular fuzzy pattern classifier design framework for intelligent automation systems, developed on the base of the established Modified Fuzzy Pattern Classifier (MFPC) and allows designing novel classifier models which are hardware-efficiently implementable....... The performances of novel classifiers using substitutes of MFPC's geometric mean aggregator are benchmarked in the scope of an image processing application against the MFPC to reveal classification improvement potentials for obtaining higher classification rates....

  19. Lactococcus lactis carrying a DNA vaccine coding for the ESAT-6 antigen increases IL-17 cytokine secretion and boosts the BCG vaccine immune response.

    Science.gov (United States)

    Pereira, V B; da Cunha, V P; Preisser, T M; Souza, B M; Turk, M Z; De Castro, C P; Azevedo, M S P; Miyoshi, A

    2017-06-01

    A regimen utilizing Bacille Calmette-Guerin (BCG) and another vaccine system as a booster may represent a promising strategy for the development of an efficient tuberculosis vaccine for adults. In a previous work, we confirmed the ability of Lactococcus lactis fibronectin-binding protein A (FnBPA+) (pValac:ESAT-6), a live mucosal DNA vaccine, to produce a specific immune response in mice after oral immunization. In this study, we examined the immunogenicity of this strain as a booster for the BCG vaccine in mice. After immunization, cytokine and immunoglobulin profiles were measured. The BCG prime L. lactis FnBPA+ (pValac:ESAT-6) boost group was the most responsive group, with a significant increase in splenic pro-inflammatory cytokines IL-17, IFN-γ, IL-6 and TNF-α compared with the negative control. Based on the results obtained here, we demonstrated that L. lactis FnBPA+ (pValac:ESAT-6) was able to increase the BCG vaccine general immune response. This work is of great scientific and social importance because it represents the first step towards the development of a booster to the BCG vaccine using L. lactis as a DNA delivery system. © 2017 The Society for Applied Microbiology.

  20. 15 CFR 4.8 - Classified Information.

    Science.gov (United States)

    2010-01-01

    ... 15 Commerce and Foreign Trade 1 2010-01-01 2010-01-01 false Classified Information. 4.8 Section 4... INFORMATION Freedom of Information Act § 4.8 Classified Information. In processing a request for information..., the information shall be reviewed to determine whether it should remain classified. Ordinarily the...

  1. Semi-supervised sparse coding

    KAUST Repository

    Wang, Jim Jing-Yan; Gao, Xin

    2014-01-01

    Sparse coding approximates the data sample as a sparse linear combination of some basic codewords and uses the sparse codes as new presentations. In this paper, we investigate learning discriminative sparse codes by sparse coding in a semi-supervised manner, where only a few training samples are labeled. By using the manifold structure spanned by the data set of both labeled and unlabeled samples and the constraints provided by the labels of the labeled samples, we learn the variable class labels for all the samples. Furthermore, to improve the discriminative ability of the learned sparse codes, we assume that the class labels could be predicted from the sparse codes directly using a linear classifier. By solving the codebook, sparse codes, class labels and classifier parameters simultaneously in a unified objective function, we develop a semi-supervised sparse coding algorithm. Experiments on two real-world pattern recognition problems demonstrate the advantage of the proposed methods over supervised sparse coding methods on partially labeled data sets.

  2. Semi-supervised sparse coding

    KAUST Repository

    Wang, Jim Jing-Yan

    2014-07-06

    Sparse coding approximates the data sample as a sparse linear combination of some basic codewords and uses the sparse codes as new presentations. In this paper, we investigate learning discriminative sparse codes by sparse coding in a semi-supervised manner, where only a few training samples are labeled. By using the manifold structure spanned by the data set of both labeled and unlabeled samples and the constraints provided by the labels of the labeled samples, we learn the variable class labels for all the samples. Furthermore, to improve the discriminative ability of the learned sparse codes, we assume that the class labels could be predicted from the sparse codes directly using a linear classifier. By solving the codebook, sparse codes, class labels and classifier parameters simultaneously in a unified objective function, we develop a semi-supervised sparse coding algorithm. Experiments on two real-world pattern recognition problems demonstrate the advantage of the proposed methods over supervised sparse coding methods on partially labeled data sets.

  3. Phylogenetic footprinting of non-coding RNA: hammerhead ribozyme sequences in a satellite DNA family of Dolichopoda cave crickets (Orthoptera, Rhaphidophoridae

    Directory of Open Access Journals (Sweden)

    Venanzetti Federica

    2010-01-01

    Full Text Available Abstract Background The great variety in sequence, length, complexity, and abundance of satellite DNA has made it difficult to ascribe any function to this genome component. Recent studies have shown that satellite DNA can be transcribed and be involved in regulation of chromatin structure and gene expression. Some satellite DNAs, such as the pDo500 sequence family in Dolichopoda cave crickets, have a catalytic hammerhead (HH ribozyme structure and activity embedded within each repeat. Results We assessed the phylogenetic footprints of the HH ribozyme within the pDo500 sequences from 38 different populations representing 12 species of Dolichopoda. The HH region was significantly more conserved than the non-hammerhead (NHH region of the pDo500 repeat. In addition, stems were more conserved than loops. In stems, several compensatory mutations were detected that maintain base pairing. The core region of the HH ribozyme was affected by very few nucleotide substitutions and the cleavage position was altered only once among 198 sequences. RNA folding of the HH sequences revealed that a potentially active HH ribozyme can be found in most of the Dolichopoda populations and species. Conclusions The phylogenetic footprints suggest that the HH region of the pDo500 sequence family is selected for function in Dolichopoda cave crickets. However, the functional role of HH ribozymes in eukaryotic organisms is unclear. The possible functions have been related to trans cleavage of an RNA target by a ribonucleoprotein and regulation of gene expression. Whether the HH ribozyme in Dolichopoda is involved in similar functions remains to be investigated. Future studies need to demonstrate how the observed nucleotide changes and evolutionary constraint have affected the catalytic efficiency of the hammerhead.

  4. Isolation and characterization of an atypical LEA protein coding cDNA and its promoter from drought-tolerant plant Prosopis juliflora.

    Science.gov (United States)

    George, Suja; Usha, B; Parida, Ajay

    2009-05-01

    Plant growth and productivity are adversely affected by various abiotic and biotic stress factors. Despite the wealth of information on abiotic stress and stress tolerance in plants, many aspects still remain unclear. Prosopis juliflora is a hardy plant reported to be tolerant to drought, salinity, extremes of soil pH, and heavy metal stress. In this paper, we report the isolation and characterization of the complementary DNA clone for an atypical late embryogenesis abundant (LEA) protein (Pj LEA3) and its putative promoter sequence from P. juliflora. Unlike typical LEA proteins, rich in glycine, Pj LEA3 has alanine as the most abundant amino acid followed by serine and shows an average negative hydropathy. Pj LEA3 is significantly different from other LEA proteins in the NCBI database and shows high similarity to indole-3 acetic-acid-induced protein ARG2 from Vigna radiata. Northern analysis for Pj LEA3 in P. juliflora leaves under 90 mM H2O2 stress revealed up-regulation of transcript at 24 and 48 h. A 1.5-kb fragment upstream the 5' UTR of this gene (putative promoter) was isolated and analyzed in silico. The possible reasons for changes in gene expression during stress in relation to the host plant's stress tolerance mechanisms are discussed.

  5. Ancient DNA

    DEFF Research Database (Denmark)

    Willerslev, Eske; Cooper, Alan

    2004-01-01

    ancient DNA, palaeontology, palaeoecology, archaeology, population genetics, DNA damage and repair......ancient DNA, palaeontology, palaeoecology, archaeology, population genetics, DNA damage and repair...

  6. Error minimizing algorithms for nearest eighbor classifiers

    Energy Technology Data Exchange (ETDEWEB)

    Porter, Reid B [Los Alamos National Laboratory; Hush, Don [Los Alamos National Laboratory; Zimmer, G. Beate [TEXAS A& M

    2011-01-03

    Stack Filters define a large class of discrete nonlinear filter first introd uced in image and signal processing for noise removal. In recent years we have suggested their application to classification problems, and investigated their relationship to other types of discrete classifiers such as Decision Trees. In this paper we focus on a continuous domain version of Stack Filter Classifiers which we call Ordered Hypothesis Machines (OHM), and investigate their relationship to Nearest Neighbor classifiers. We show that OHM classifiers provide a novel framework in which to train Nearest Neighbor type classifiers by minimizing empirical error based loss functions. We use the framework to investigate a new cost sensitive loss function that allows us to train a Nearest Neighbor type classifier for low false alarm rate applications. We report results on both synthetic data and real-world image data.

  7. Speaking Code

    DEFF Research Database (Denmark)

    Cox, Geoff

    Speaking Code begins by invoking the “Hello World” convention used by programmers when learning a new language, helping to establish the interplay of text and code that runs through the book. Interweaving the voice of critical writing from the humanities with the tradition of computing and software...

  8. Hierarchical mixtures of naive Bayes classifiers

    NARCIS (Netherlands)

    Wiering, M.A.

    2002-01-01

    Naive Bayes classifiers tend to perform very well on a large number of problem domains, although their representation power is quite limited compared to more sophisticated machine learning algorithms. In this pa- per we study combining multiple naive Bayes classifiers by using the hierar- chical

  9. Comparing classifiers for pronunciation error detection

    NARCIS (Netherlands)

    Strik, H.; Truong, K.; Wet, F. de; Cucchiarini, C.

    2007-01-01

    Providing feedback on pronunciation errors in computer assisted language learning systems requires that pronunciation errors be detected automatically. In the present study we compare four types of classifiers that can be used for this purpose: two acoustic-phonetic classifiers (one of which employs

  10. Feature extraction for dynamic integration of classifiers

    NARCIS (Netherlands)

    Pechenizkiy, M.; Tsymbal, A.; Puuronen, S.; Patterson, D.W.

    2007-01-01

    Recent research has shown the integration of multiple classifiers to be one of the most important directions in machine learning and data mining. In this paper, we present an algorithm for the dynamic integration of classifiers in the space of extracted features (FEDIC). It is based on the technique

  11. Deconvolution When Classifying Noisy Data Involving Transformations

    KAUST Repository

    Carroll, Raymond

    2012-09-01

    In the present study, we consider the problem of classifying spatial data distorted by a linear transformation or convolution and contaminated by additive random noise. In this setting, we show that classifier performance can be improved if we carefully invert the data before the classifier is applied. However, the inverse transformation is not constructed so as to recover the original signal, and in fact, we show that taking the latter approach is generally inadvisable. We introduce a fully data-driven procedure based on cross-validation, and use several classifiers to illustrate numerical properties of our approach. Theoretical arguments are given in support of our claims. Our procedure is applied to data generated by light detection and ranging (Lidar) technology, where we improve on earlier approaches to classifying aerosols. This article has supplementary materials online.

  12. Deconvolution When Classifying Noisy Data Involving Transformations.

    Science.gov (United States)

    Carroll, Raymond; Delaigle, Aurore; Hall, Peter

    2012-09-01

    In the present study, we consider the problem of classifying spatial data distorted by a linear transformation or convolution and contaminated by additive random noise. In this setting, we show that classifier performance can be improved if we carefully invert the data before the classifier is applied. However, the inverse transformation is not constructed so as to recover the original signal, and in fact, we show that taking the latter approach is generally inadvisable. We introduce a fully data-driven procedure based on cross-validation, and use several classifiers to illustrate numerical properties of our approach. Theoretical arguments are given in support of our claims. Our procedure is applied to data generated by light detection and ranging (Lidar) technology, where we improve on earlier approaches to classifying aerosols. This article has supplementary materials online.

  13. Deconvolution When Classifying Noisy Data Involving Transformations

    KAUST Repository

    Carroll, Raymond; Delaigle, Aurore; Hall, Peter

    2012-01-01

    In the present study, we consider the problem of classifying spatial data distorted by a linear transformation or convolution and contaminated by additive random noise. In this setting, we show that classifier performance can be improved if we carefully invert the data before the classifier is applied. However, the inverse transformation is not constructed so as to recover the original signal, and in fact, we show that taking the latter approach is generally inadvisable. We introduce a fully data-driven procedure based on cross-validation, and use several classifiers to illustrate numerical properties of our approach. Theoretical arguments are given in support of our claims. Our procedure is applied to data generated by light detection and ranging (Lidar) technology, where we improve on earlier approaches to classifying aerosols. This article has supplementary materials online.

  14. Replicating animal mitochondrial DNA

    Directory of Open Access Journals (Sweden)

    Emily A. McKinney

    2013-01-01

    Full Text Available The field of mitochondrial DNA (mtDNA replication has been experiencing incredible progress in recent years, and yet little is certain about the mechanism(s used by animal cells to replicate this plasmid-like genome. The long-standing strand-displacement model of mammalian mtDNA replication (for which single-stranded DNA intermediates are a hallmark has been intensively challenged by a new set of data, which suggests that replication proceeds via coupled leading-and lagging-strand synthesis (resembling bacterial genome replication and/or via long stretches of RNA intermediates laid on the mtDNA lagging-strand (the so called RITOLS. The set of proteins required for mtDNA replication is small and includes the catalytic and accessory subunits of DNA polymerase y, the mtDNA helicase Twinkle, the mitochondrial single-stranded DNA-binding protein, and the mitochondrial RNA polymerase (which most likely functions as the mtDNA primase. Mutations in the genes coding for the first three proteins are associated with human diseases and premature aging, justifying the research interest in the genetic, biochemical and structural properties of the mtDNA replication machinery. Here we summarize these properties and discuss the current models of mtDNA replication in animal cells.

  15. DNA fingerprinting of Chinese melon provides evidentiary support of seed quality appraisal.

    Science.gov (United States)

    Gao, Peng; Ma, Hongyan; Luan, Feishi; Song, Haibin

    2012-01-01

    Melon, Cucumis melo L. is an important vegetable crop worldwide. At present, there are phenomena of homonyms and synonyms present in the melon seed markets of China, which could cause variety authenticity issues influencing the process of melon breeding, production, marketing and other aspects. Molecular markers, especially microsatellites or simple sequence repeats (SSRs) are playing increasingly important roles for cultivar identification. The aim of this study was to construct a DNA fingerprinting database of major melon cultivars, which could provide a possibility for the establishment of a technical standard system for purity and authenticity identification of melon seeds. In this study, to develop the core set SSR markers, 470 polymorphic SSRs were selected as the candidate markers from 1219 SSRs using 20 representative melon varieties (lines). Eighteen SSR markers, evenly distributed across the genome and with the highest contents of polymorphism information (PIC) were identified as the core marker set for melon DNA fingerprinting analysis. Fingerprint codes for 471 melon varieties (lines) were established. There were 51 materials which were classified into17 groups based on sharing the same fingerprint code, while field traits survey results showed that these plants in the same group were synonyms because of the same or similar field characters. Furthermore, DNA fingerprinting quick response (QR) codes of 471 melon varieties (lines) were constructed. Due to its fast readability and large storage capacity, QR coding melon DNA fingerprinting is in favor of read convenience and commercial applications.

  16. DNA Fingerprinting of Chinese Melon Provides Evidentiary Support of Seed Quality Appraisal

    Science.gov (United States)

    Gao, Peng; Ma, Hongyan; Luan, Feishi; Song, Haibin

    2012-01-01

    Melon, Cucumis melo L. is an important vegetable crop worldwide. At present, there are phenomena of homonyms and synonyms present in the melon seed markets of China, which could cause variety authenticity issues influencing the process of melon breeding, production, marketing and other aspects. Molecular markers, especially microsatellites or simple sequence repeats (SSRs) are playing increasingly important roles for cultivar identification. The aim of this study was to construct a DNA fingerprinting database of major melon cultivars, which could provide a possibility for the establishment of a technical standard system for purity and authenticity identification of melon seeds. In this study, to develop the core set SSR markers, 470 polymorphic SSRs were selected as the candidate markers from 1219 SSRs using 20 representative melon varieties (lines). Eighteen SSR markers, evenly distributed across the genome and with the highest contents of polymorphism information (PIC) were identified as the core marker set for melon DNA fingerprinting analysis. Fingerprint codes for 471 melon varieties (lines) were established. There were 51 materials which were classified into17 groups based on sharing the same fingerprint code, while field traits survey results showed that these plants in the same group were synonyms because of the same or similar field characters. Furthermore, DNA fingerprinting quick response (QR) codes of 471 melon varieties (lines) were constructed. Due to its fast readability and large storage capacity, QR coding melon DNA fingerprinting is in favor of read convenience and commercial applications. PMID:23285039

  17. DNA fingerprinting of Chinese melon provides evidentiary support of seed quality appraisal.

    Directory of Open Access Journals (Sweden)

    Peng Gao

    Full Text Available Melon, Cucumis melo L. is an important vegetable crop worldwide. At present, there are phenomena of homonyms and synonyms present in the melon seed markets of China, which could cause variety authenticity issues influencing the process of melon breeding, production, marketing and other aspects. Molecular markers, especially microsatellites or simple sequence repeats (SSRs are playing increasingly important roles for cultivar identification. The aim of this study was to construct a DNA fingerprinting database of major melon cultivars, which could provide a possibility for the establishment of a technical standard system for purity and authenticity identification of melon seeds. In this study, to develop the core set SSR markers, 470 polymorphic SSRs were selected as the candidate markers from 1219 SSRs using 20 representative melon varieties (lines. Eighteen SSR markers, evenly distributed across the genome and with the highest contents of polymorphism information (PIC were identified as the core marker set for melon DNA fingerprinting analysis. Fingerprint codes for 471 melon varieties (lines were established. There were 51 materials which were classified into17 groups based on sharing the same fingerprint code, while field traits survey results showed that these plants in the same group were synonyms because of the same or similar field characters. Furthermore, DNA fingerprinting quick response (QR codes of 471 melon varieties (lines were constructed. Due to its fast readability and large storage capacity, QR coding melon DNA fingerprinting is in favor of read convenience and commercial applications.

  18. Logarithmic learning for generalized classifier neural network.

    Science.gov (United States)

    Ozyildirim, Buse Melis; Avci, Mutlu

    2014-12-01

    Generalized classifier neural network is introduced as an efficient classifier among the others. Unless the initial smoothing parameter value is close to the optimal one, generalized classifier neural network suffers from convergence problem and requires quite a long time to converge. In this work, to overcome this problem, a logarithmic learning approach is proposed. The proposed method uses logarithmic cost function instead of squared error. Minimization of this cost function reduces the number of iterations used for reaching the minima. The proposed method is tested on 15 different data sets and performance of logarithmic learning generalized classifier neural network is compared with that of standard one. Thanks to operation range of radial basis function included by generalized classifier neural network, proposed logarithmic approach and its derivative has continuous values. This makes it possible to adopt the advantage of logarithmic fast convergence by the proposed learning method. Due to fast convergence ability of logarithmic cost function, training time is maximally decreased to 99.2%. In addition to decrease in training time, classification performance may also be improved till 60%. According to the test results, while the proposed method provides a solution for time requirement problem of generalized classifier neural network, it may also improve the classification accuracy. The proposed method can be considered as an efficient way for reducing the time requirement problem of generalized classifier neural network. Copyright © 2014 Elsevier Ltd. All rights reserved.

  19. A CLASSIFIER SYSTEM USING SMOOTH GRAPH COLORING

    Directory of Open Access Journals (Sweden)

    JORGE FLORES CRUZ

    2017-01-01

    Full Text Available Unsupervised classifiers allow clustering methods with less or no human intervention. Therefore it is desirable to group the set of items with less data processing. This paper proposes an unsupervised classifier system using the model of soft graph coloring. This method was tested with some classic instances in the literature and the results obtained were compared with classifications made with human intervention, yielding as good or better results than supervised classifiers, sometimes providing alternative classifications that considers additional information that humans did not considered.

  20. High dimensional classifiers in the imbalanced case

    DEFF Research Database (Denmark)

    Bak, Britta Anker; Jensen, Jens Ledet

    We consider the binary classification problem in the imbalanced case where the number of samples from the two groups differ. The classification problem is considered in the high dimensional case where the number of variables is much larger than the number of samples, and where the imbalance leads...... to a bias in the classification. A theoretical analysis of the independence classifier reveals the origin of the bias and based on this we suggest two new classifiers that can handle any imbalance ratio. The analytical results are supplemented by a simulation study, where the suggested classifiers in some...

  1. The structure of dual Grassmann codes

    DEFF Research Database (Denmark)

    Beelen, Peter; Pinero, Fernando

    2016-01-01

    In this article we study the duals of Grassmann codes, certain codes coming from the Grassmannian variety. Exploiting their structure, we are able to count and classify all their minimum weight codewords. In this classification the lines lying on the Grassmannian variety play a central role. Rela...

  2. Coding Labour

    Directory of Open Access Journals (Sweden)

    Anthony McCosker

    2014-03-01

    Full Text Available As well as introducing the Coding Labour section, the authors explore the diffusion of code across the material contexts of everyday life, through the objects and tools of mediation, the systems and practices of cultural production and organisational management, and in the material conditions of labour. Taking code beyond computation and software, their specific focus is on the increasingly familiar connections between code and labour with a focus on the codification and modulation of affect through technologies and practices of management within the contemporary work organisation. In the grey literature of spreadsheets, minutes, workload models, email and the like they identify a violence of forms through which workplace affect, in its constant flux of crisis and ‘prodromal’ modes, is regulated and governed.

  3. Discrete Ramanujan transform for distinguishing the protein coding regions from other regions.

    Science.gov (United States)

    Hua, Wei; Wang, Jiasong; Zhao, Jian

    2014-01-01

    Based on the study of Ramanujan sum and Ramanujan coefficient, this paper suggests the concepts of discrete Ramanujan transform and spectrum. Using Voss numerical representation, one maps a symbolic DNA strand as a numerical DNA sequence, and deduces the discrete Ramanujan spectrum of the numerical DNA sequence. It is well known that of discrete Fourier power spectrum of protein coding sequence has an important feature of 3-base periodicity, which is widely used for DNA sequence analysis by the technique of discrete Fourier transform. It is performed by testing the signal-to-noise ratio at frequency N/3 as a criterion for the analysis, where N is the length of the sequence. The results presented in this paper show that the property of 3-base periodicity can be only identified as a prominent spike of the discrete Ramanujan spectrum at period 3 for the protein coding regions. The signal-to-noise ratio for discrete Ramanujan spectrum is defined for numerical measurement. Therefore, the discrete Ramanujan spectrum and the signal-to-noise ratio of a DNA sequence can be used for distinguishing the protein coding regions from the noncoding regions. All the exon and intron sequences in whole chromosomes 1, 2, 3 and 4 of Caenorhabditis elegans have been tested and the histograms and tables from the computational results illustrate the reliability of our method. In addition, we have analyzed theoretically and gotten the conclusion that the algorithm for calculating discrete Ramanujan spectrum owns the lower computational complexity and higher computational accuracy. The computational experiments show that the technique by using discrete Ramanujan spectrum for classifying different DNA sequences is a fast and effective method. Copyright © 2014 Elsevier Ltd. All rights reserved.

  4. Arabic Handwriting Recognition Using Neural Network Classifier

    African Journals Online (AJOL)

    pc

    2018-03-05

    Mar 5, 2018 ... an OCR using Neural Network classifier preceded by a set of preprocessing .... Artificial Neural Networks (ANNs), which we adopt in this research, consist of ... advantage and disadvantages of each technique. In [9],. Khemiri ...

  5. Classifiers based on optimal decision rules

    KAUST Repository

    Amin, Talha

    2013-11-25

    Based on dynamic programming approach we design algorithms for sequential optimization of exact and approximate decision rules relative to the length and coverage [3, 4]. In this paper, we use optimal rules to construct classifiers, and study two questions: (i) which rules are better from the point of view of classification-exact or approximate; and (ii) which order of optimization gives better results of classifier work: length, length+coverage, coverage, or coverage+length. Experimental results show that, on average, classifiers based on exact rules are better than classifiers based on approximate rules, and sequential optimization (length+coverage or coverage+length) is better than the ordinary optimization (length or coverage).

  6. Classifiers based on optimal decision rules

    KAUST Repository

    Amin, Talha M.; Chikalov, Igor; Moshkov, Mikhail; Zielosko, Beata

    2013-01-01

    Based on dynamic programming approach we design algorithms for sequential optimization of exact and approximate decision rules relative to the length and coverage [3, 4]. In this paper, we use optimal rules to construct classifiers, and study two questions: (i) which rules are better from the point of view of classification-exact or approximate; and (ii) which order of optimization gives better results of classifier work: length, length+coverage, coverage, or coverage+length. Experimental results show that, on average, classifiers based on exact rules are better than classifiers based on approximate rules, and sequential optimization (length+coverage or coverage+length) is better than the ordinary optimization (length or coverage).

  7. Combining multiple classifiers for age classification

    CSIR Research Space (South Africa)

    Van Heerden, C

    2009-11-01

    Full Text Available The authors compare several different classifier combination methods on a single task, namely speaker age classification. This task is well suited to combination strategies, since significantly different feature classes are employed. Support vector...

  8. Neural Network Classifiers for Local Wind Prediction.

    Science.gov (United States)

    Kretzschmar, Ralf; Eckert, Pierre; Cattani, Daniel; Eggimann, Fritz

    2004-05-01

    This paper evaluates the quality of neural network classifiers for wind speed and wind gust prediction with prediction lead times between +1 and +24 h. The predictions were realized based on local time series and model data. The selection of appropriate input features was initiated by time series analysis and completed by empirical comparison of neural network classifiers trained on several choices of input features. The selected input features involved day time, yearday, features from a single wind observation device at the site of interest, and features derived from model data. The quality of the resulting classifiers was benchmarked against persistence for two different sites in Switzerland. The neural network classifiers exhibited superior quality when compared with persistence judged on a specific performance measure, hit and false-alarm rates.

  9. Consistency Analysis of Nearest Subspace Classifier

    OpenAIRE

    Wang, Yi

    2015-01-01

    The Nearest subspace classifier (NSS) finds an estimation of the underlying subspace within each class and assigns data points to the class that corresponds to its nearest subspace. This paper mainly studies how well NSS can be generalized to new samples. It is proved that NSS is strongly consistent under certain assumptions. For completeness, NSS is evaluated through experiments on various simulated and real data sets, in comparison with some other linear model based classifiers. It is also ...

  10. Speech coding

    Energy Technology Data Exchange (ETDEWEB)

    Ravishankar, C., Hughes Network Systems, Germantown, MD

    1998-05-08

    Speech is the predominant means of communication between human beings and since the invention of the telephone by Alexander Graham Bell in 1876, speech services have remained to be the core service in almost all telecommunication systems. Original analog methods of telephony had the disadvantage of speech signal getting corrupted by noise, cross-talk and distortion Long haul transmissions which use repeaters to compensate for the loss in signal strength on transmission links also increase the associated noise and distortion. On the other hand digital transmission is relatively immune to noise, cross-talk and distortion primarily because of the capability to faithfully regenerate digital signal at each repeater purely based on a binary decision. Hence end-to-end performance of the digital link essentially becomes independent of the length and operating frequency bands of the link Hence from a transmission point of view digital transmission has been the preferred approach due to its higher immunity to noise. The need to carry digital speech became extremely important from a service provision point of view as well. Modem requirements have introduced the need for robust, flexible and secure services that can carry a multitude of signal types (such as voice, data and video) without a fundamental change in infrastructure. Such a requirement could not have been easily met without the advent of digital transmission systems, thereby requiring speech to be coded digitally. The term Speech Coding is often referred to techniques that represent or code speech signals either directly as a waveform or as a set of parameters by analyzing the speech signal. In either case, the codes are transmitted to the distant end where speech is reconstructed or synthesized using the received set of codes. A more generic term that is applicable to these techniques that is often interchangeably used with speech coding is the term voice coding. This term is more generic in the sense that the

  11. Optimal codes as Tanner codes with cyclic component codes

    DEFF Research Database (Denmark)

    Høholdt, Tom; Pinero, Fernando; Zeng, Peng

    2014-01-01

    In this article we study a class of graph codes with cyclic code component codes as affine variety codes. Within this class of Tanner codes we find some optimal binary codes. We use a particular subgraph of the point-line incidence plane of A(2,q) as the Tanner graph, and we are able to describe ...

  12. Construction of Pancreatic Cancer Classifier Based on SVM Optimized by Improved FOA

    Science.gov (United States)

    Ma, Xiaoqi

    2015-01-01

    A novel method is proposed to establish the pancreatic cancer classifier. Firstly, the concept of quantum and fruit fly optimal algorithm (FOA) are introduced, respectively. Then FOA is improved by quantum coding and quantum operation, and a new smell concentration determination function is defined. Finally, the improved FOA is used to optimize the parameters of support vector machine (SVM) and the classifier is established by optimized SVM. In order to verify the effectiveness of the proposed method, SVM and other classification methods have been chosen as the comparing methods. The experimental results show that the proposed method can improve the classifier performance and cost less time. PMID:26543867

  13. Aztheca Code

    International Nuclear Information System (INIS)

    Quezada G, S.; Espinosa P, G.; Centeno P, J.; Sanchez M, H.

    2017-09-01

    This paper presents the Aztheca code, which is formed by the mathematical models of neutron kinetics, power generation, heat transfer, core thermo-hydraulics, recirculation systems, dynamic pressure and level models and control system. The Aztheca code is validated with plant data, as well as with predictions from the manufacturer when the reactor operates in a stationary state. On the other hand, to demonstrate that the model is applicable during a transient, an event occurred in a nuclear power plant with a BWR reactor is selected. The plant data are compared with the results obtained with RELAP-5 and the Aztheca model. The results show that both RELAP-5 and the Aztheca code have the ability to adequately predict the behavior of the reactor. (Author)

  14. Vocable Code

    DEFF Research Database (Denmark)

    Soon, Winnie; Cox, Geoff

    2018-01-01

    a computational and poetic composition for two screens: on one of these, texts and voices are repeated and disrupted by mathematical chaos, together exploring the performativity of code and language; on the other, is a mix of a computer programming syntax and human language. In this sense queer code can...... be understood as both an object and subject of study that intervenes in the world’s ‘becoming' and how material bodies are produced via human and nonhuman practices. Through mixing the natural and computer language, this article presents a script in six parts from a performative lecture for two persons...

  15. NSURE code

    International Nuclear Information System (INIS)

    Rattan, D.S.

    1993-11-01

    NSURE stands for Near-Surface Repository code. NSURE is a performance assessment code. developed for the safety assessment of near-surface disposal facilities for low-level radioactive waste (LLRW). Part one of this report documents the NSURE model, governing equations and formulation of the mathematical models, and their implementation under the SYVAC3 executive. The NSURE model simulates the release of nuclides from an engineered vault, their subsequent transport via the groundwater and surface water pathways tot he biosphere, and predicts the resulting dose rate to a critical individual. Part two of this report consists of a User's manual, describing simulation procedures, input data preparation, output and example test cases

  16. The Aster code; Code Aster

    Energy Technology Data Exchange (ETDEWEB)

    Delbecq, J.M

    1999-07-01

    The Aster code is a 2D or 3D finite-element calculation code for structures developed by the R and D direction of Electricite de France (EdF). This dossier presents a complete overview of the characteristics and uses of the Aster code: introduction of version 4; the context of Aster (organisation of the code development, versions, systems and interfaces, development tools, quality assurance, independent validation); static mechanics (linear thermo-elasticity, Euler buckling, cables, Zarka-Casier method); non-linear mechanics (materials behaviour, big deformations, specific loads, unloading and loss of load proportionality indicators, global algorithm, contact and friction); rupture mechanics (G energy restitution level, restitution level in thermo-elasto-plasticity, 3D local energy restitution level, KI and KII stress intensity factors, calculation of limit loads for structures), specific treatments (fatigue, rupture, wear, error estimation); meshes and models (mesh generation, modeling, loads and boundary conditions, links between different modeling processes, resolution of linear systems, display of results etc..); vibration mechanics (modal and harmonic analysis, dynamics with shocks, direct transient dynamics, seismic analysis and aleatory dynamics, non-linear dynamics, dynamical sub-structuring); fluid-structure interactions (internal acoustics, mass, rigidity and damping); linear and non-linear thermal analysis; steels and metal industry (structure transformations); coupled problems (internal chaining, internal thermo-hydro-mechanical coupling, chaining with other codes); products and services. (J.S.)

  17. The histone codes for meiosis.

    Science.gov (United States)

    Wang, Lina; Xu, Zhiliang; Khawar, Muhammad Babar; Liu, Chao; Li, Wei

    2017-09-01

    Meiosis is a specialized process that produces haploid gametes from diploid cells by a single round of DNA replication followed by two successive cell divisions. It contains many special events, such as programmed DNA double-strand break (DSB) formation, homologous recombination, crossover formation and resolution. These events are associated with dynamically regulated chromosomal structures, the dynamic transcriptional regulation and chromatin remodeling are mainly modulated by histone modifications, termed 'histone codes'. The purpose of this review is to summarize the histone codes that are required for meiosis during spermatogenesis and oogenesis, involving meiosis resumption, meiotic asymmetric division and other cellular processes. We not only systematically review the functional roles of histone codes in meiosis but also discuss future trends and perspectives in this field. © 2017 Society for Reproduction and Fertility.

  18. A Bayesian method for comparing and combining binary classifiers in the absence of a gold standard

    Directory of Open Access Journals (Sweden)

    Keith Jonathan M

    2012-07-01

    Full Text Available Abstract Background Many problems in bioinformatics involve classification based on features such as sequence, structure or morphology. Given multiple classifiers, two crucial questions arise: how does their performance compare, and how can they best be combined to produce a better classifier? A classifier can be evaluated in terms of sensitivity and specificity using benchmark, or gold standard, data, that is, data for which the true classification is known. However, a gold standard is not always available. Here we demonstrate that a Bayesian model for comparing medical diagnostics without a gold standard can be successfully applied in the bioinformatics domain, to genomic scale data sets. We present a new implementation, which unlike previous implementations is applicable to any number of classifiers. We apply this model, for the first time, to the problem of finding the globally optimal logical combination of classifiers. Results We compared three classifiers of protein subcellular localisation, and evaluated our estimates of sensitivity and specificity against estimates obtained using a gold standard. The method overestimated sensitivity and specificity with only a small discrepancy, and correctly ranked the classifiers. Diagnostic tests for swine flu were then compared on a small data set. Lastly, classifiers for a genome-wide association study of macular degeneration with 541094 SNPs were analysed. In all cases, run times were feasible, and results precise. The optimal logical combination of classifiers was also determined for all three data sets. Code and data are available from http://bioinformatics.monash.edu.au/downloads/. Conclusions The examples demonstrate the methods are suitable for both small and large data sets, applicable to the wide range of bioinformatics classification problems, and robust to dependence between classifiers. In all three test cases, the globally optimal logical combination of the classifiers was found to be

  19. Coding Class

    DEFF Research Database (Denmark)

    Ejsing-Duun, Stine; Hansbøl, Mikala

    Denne rapport rummer evaluering og dokumentation af Coding Class projektet1. Coding Class projektet blev igangsat i skoleåret 2016/2017 af IT-Branchen i samarbejde med en række medlemsvirksomheder, Københavns kommune, Vejle Kommune, Styrelsen for IT- og Læring (STIL) og den frivillige forening...... Coding Pirates2. Rapporten er forfattet af Docent i digitale læringsressourcer og forskningskoordinator for forsknings- og udviklingsmiljøet Digitalisering i Skolen (DiS), Mikala Hansbøl, fra Institut for Skole og Læring ved Professionshøjskolen Metropol; og Lektor i læringsteknologi, interaktionsdesign......, design tænkning og design-pædagogik, Stine Ejsing-Duun fra Forskningslab: It og Læringsdesign (ILD-LAB) ved Institut for kommunikation og psykologi, Aalborg Universitet i København. Vi har fulgt og gennemført evaluering og dokumentation af Coding Class projektet i perioden november 2016 til maj 2017...

  20. Uplink Coding

    Science.gov (United States)

    Andrews, Ken; Divsalar, Dariush; Dolinar, Sam; Moision, Bruce; Hamkins, Jon; Pollara, Fabrizio

    2007-01-01

    This slide presentation reviews the objectives, meeting goals and overall NASA goals for the NASA Data Standards Working Group. The presentation includes information on the technical progress surrounding the objective, short LDPC codes, and the general results on the Pu-Pw tradeoff.

  1. ANIMAL code

    International Nuclear Information System (INIS)

    Lindemuth, I.R.

    1979-01-01

    This report describes ANIMAL, a two-dimensional Eulerian magnetohydrodynamic computer code. ANIMAL's physical model also appears. Formulated are temporal and spatial finite-difference equations in a manner that facilitates implementation of the algorithm. Outlined are the functions of the algorithm's FORTRAN subroutines and variables

  2. Network Coding

    Indian Academy of Sciences (India)

    Home; Journals; Resonance – Journal of Science Education; Volume 15; Issue 7. Network Coding. K V Rashmi Nihar B Shah P Vijay Kumar. General Article Volume 15 Issue 7 July 2010 pp 604-621. Fulltext. Click here to view fulltext PDF. Permanent link: https://www.ias.ac.in/article/fulltext/reso/015/07/0604-0621 ...

  3. MCNP code

    International Nuclear Information System (INIS)

    Cramer, S.N.

    1984-01-01

    The MCNP code is the major Monte Carlo coupled neutron-photon transport research tool at the Los Alamos National Laboratory, and it represents the most extensive Monte Carlo development program in the United States which is available in the public domain. The present code is the direct descendent of the original Monte Carlo work of Fermi, von Neumaum, and Ulam at Los Alamos in the 1940s. Development has continued uninterrupted since that time, and the current version of MCNP (or its predecessors) has always included state-of-the-art methods in the Monte Carlo simulation of radiation transport, basic cross section data, geometry capability, variance reduction, and estimation procedures. The authors of the present code have oriented its development toward general user application. The documentation, though extensive, is presented in a clear and simple manner with many examples, illustrations, and sample problems. In addition to providing the desired results, the output listings give a a wealth of detailed information (some optional) concerning each state of the calculation. The code system is continually updated to take advantage of advances in computer hardware and software, including interactive modes of operation, diagnostic interrupts and restarts, and a variety of graphical and video aids

  4. Expander Codes

    Indian Academy of Sciences (India)

    Home; Journals; Resonance – Journal of Science Education; Volume 10; Issue 1. Expander Codes - The Sipser–Spielman Construction. Priti Shankar. General Article Volume 10 ... Author Affiliations. Priti Shankar1. Department of Computer Science and Automation, Indian Institute of Science Bangalore 560 012, India.

  5. TU-EF-304-10: Efficient Multiscale Simulation of the Proton Relative Biological Effectiveness (RBE) for DNA Double Strand Break (DSB) Induction and Bio-Effective Dose in the FLUKA Monte Carlo Radiation Transport Code

    Energy Technology Data Exchange (ETDEWEB)

    Moskvin, V; Tsiamas, P; Axente, M; Farr, J [St. Jude Children’s Research Hospital, Memphis, TN (United States); Stewart, R [University of Washington, Seattle, WA. (United States)

    2015-06-15

    Purpose: One of the more critical initiating events for reproductive cell death is the creation of a DNA double strand break (DSB). In this study, we present a computationally efficient way to determine spatial variations in the relative biological effectiveness (RBE) of proton therapy beams within the FLUKA Monte Carlo (MC) code. Methods: We used the independently tested Monte Carlo Damage Simulation (MCDS) developed by Stewart and colleagues (Radiat. Res. 176, 587–602 2011) to estimate the RBE for DSB induction of monoenergetic protons, tritium, deuterium, hellium-3, hellium-4 ions and delta-electrons. The dose-weighted (RBE) coefficients were incorporated into FLUKA to determine the equivalent {sup 6}°60Co γ-ray dose for representative proton beams incident on cells in an aerobic and anoxic environment. Results: We found that the proton beam RBE for DSB induction at the tip of the Bragg peak, including primary and secondary particles, is close to 1.2. Furthermore, the RBE increases laterally to the beam axis at the area of Bragg peak. At the distal edge, the RBE is in the range from 1.3–1.4 for cells irradiated under aerobic conditions and may be as large as 1.5–1.8 for cells irradiated under anoxic conditions. Across the plateau region, the recorded RBE for DSB induction is 1.02 for aerobic cells and 1.05 for cells irradiated under anoxic conditions. The contribution to total effective dose from secondary heavy ions decreases with depth and is higher at shallow depths (e.g., at the surface of the skin). Conclusion: Multiscale simulation of the RBE for DSB induction provides useful insights into spatial variations in proton RBE within pristine Bragg peaks. This methodology is potentially useful for the biological optimization of proton therapy for the treatment of cancer. The study highlights the need to incorporate spatial variations in proton RBE into proton therapy treatment plans.

  6. Reinforcement Learning Based Artificial Immune Classifier

    Directory of Open Access Journals (Sweden)

    Mehmet Karakose

    2013-01-01

    Full Text Available One of the widely used methods for classification that is a decision-making process is artificial immune systems. Artificial immune systems based on natural immunity system can be successfully applied for classification, optimization, recognition, and learning in real-world problems. In this study, a reinforcement learning based artificial immune classifier is proposed as a new approach. This approach uses reinforcement learning to find better antibody with immune operators. The proposed new approach has many contributions according to other methods in the literature such as effectiveness, less memory cell, high accuracy, speed, and data adaptability. The performance of the proposed approach is demonstrated by simulation and experimental results using real data in Matlab and FPGA. Some benchmark data and remote image data are used for experimental results. The comparative results with supervised/unsupervised based artificial immune system, negative selection classifier, and resource limited artificial immune classifier are given to demonstrate the effectiveness of the proposed new method.

  7. Classifier Fusion With Contextual Reliability Evaluation.

    Science.gov (United States)

    Liu, Zhunga; Pan, Quan; Dezert, Jean; Han, Jun-Wei; He, You

    2018-05-01

    Classifier fusion is an efficient strategy to improve the classification performance for the complex pattern recognition problem. In practice, the multiple classifiers to combine can have different reliabilities and the proper reliability evaluation plays an important role in the fusion process for getting the best classification performance. We propose a new method for classifier fusion with contextual reliability evaluation (CF-CRE) based on inner reliability and relative reliability concepts. The inner reliability, represented by a matrix, characterizes the probability of the object belonging to one class when it is classified to another class. The elements of this matrix are estimated from the -nearest neighbors of the object. A cautious discounting rule is developed under belief functions framework to revise the classification result according to the inner reliability. The relative reliability is evaluated based on a new incompatibility measure which allows to reduce the level of conflict between the classifiers by applying the classical evidence discounting rule to each classifier before their combination. The inner reliability and relative reliability capture different aspects of the classification reliability. The discounted classification results are combined with Dempster-Shafer's rule for the final class decision making support. The performance of CF-CRE have been evaluated and compared with those of main classical fusion methods using real data sets. The experimental results show that CF-CRE can produce substantially higher accuracy than other fusion methods in general. Moreover, CF-CRE is robust to the changes of the number of nearest neighbors chosen for estimating the reliability matrix, which is appealing for the applications.

  8. Classifying sows' activity types from acceleration patterns

    DEFF Research Database (Denmark)

    Cornou, Cecile; Lundbye-Christensen, Søren

    2008-01-01

    An automated method of classifying sow activity using acceleration measurements would allow the individual sow's behavior to be monitored throughout the reproductive cycle; applications for detecting behaviors characteristic of estrus and farrowing or to monitor illness and welfare can be foreseen....... This article suggests a method of classifying five types of activity exhibited by group-housed sows. The method involves the measurement of acceleration in three dimensions. The five activities are: feeding, walking, rooting, lying laterally and lying sternally. Four time series of acceleration (the three...

  9. Data characteristics that determine classifier performance

    CSIR Research Space (South Africa)

    Van der Walt, Christiaan M

    2006-11-01

    Full Text Available available at [11]. The kNN uses a LinearNN nearest neighbour search algorithm with an Euclidean distance metric [8]. The optimal k value is determined by performing 10-fold cross-validation. An optimal k value between 1 and 10 is used for Experiments 1... classifiers. 10-fold cross-validation is used to evaluate and compare the performance of the classifiers on the different data sets. 3.1. Artificial data generation Multivariate Gaussian distributions are used to generate artificial data sets. We use d...

  10. A Customizable Text Classifier for Text Mining

    Directory of Open Access Journals (Sweden)

    Yun-liang Zhang

    2007-12-01

    Full Text Available Text mining deals with complex and unstructured texts. Usually a particular collection of texts that is specified to one or more domains is necessary. We have developed a customizable text classifier for users to mine the collection automatically. It derives from the sentence category of the HNC theory and corresponding techniques. It can start with a few texts, and it can adjust automatically or be adjusted by user. The user can also control the number of domains chosen and decide the standard with which to choose the texts based on demand and abundance of materials. The performance of the classifier varies with the user's choice.

  11. A survey of decision tree classifier methodology

    Science.gov (United States)

    Safavian, S. R.; Landgrebe, David

    1991-01-01

    Decision tree classifiers (DTCs) are used successfully in many diverse areas such as radar signal classification, character recognition, remote sensing, medical diagnosis, expert systems, and speech recognition. Perhaps the most important feature of DTCs is their capability to break down a complex decision-making process into a collection of simpler decisions, thus providing a solution which is often easier to interpret. A survey of current methods is presented for DTC designs and the various existing issues. After considering potential advantages of DTCs over single-state classifiers, subjects of tree structure design, feature selection at each internal node, and decision and search strategies are discussed.

  12. Orthopedics coding and funding.

    Science.gov (United States)

    Baron, S; Duclos, C; Thoreux, P

    2014-02-01

    The French tarification à l'activité (T2A) prospective payment system is a financial system in which a health-care institution's resources are based on performed activity. Activity is described via the PMSI medical information system (programme de médicalisation du système d'information). The PMSI classifies hospital cases by clinical and economic categories known as diagnosis-related groups (DRG), each with an associated price tag. Coding a hospital case involves giving as realistic a description as possible so as to categorize it in the right DRG and thus ensure appropriate payment. For this, it is essential to understand what determines the pricing of inpatient stay: namely, the code for the surgical procedure, the patient's principal diagnosis (reason for admission), codes for comorbidities (everything that adds to management burden), and the management of the length of inpatient stay. The PMSI is used to analyze the institution's activity and dynamism: change on previous year, relation to target, and comparison with competing institutions based on indicators such as the mean length of stay performance indicator (MLS PI). The T2A system improves overall care efficiency. Quality of care, however, is not presently taken account of in the payment made to the institution, as there are no indicators for this; work needs to be done on this topic. Copyright © 2014. Published by Elsevier Masson SAS.

  13. 75 FR 37253 - Classified National Security Information

    Science.gov (United States)

    2010-06-28

    ... ``Secret.'' (3) Each interior page of a classified document shall be marked at the top and bottom either... ``(TS)'' for Top Secret, ``(S)'' for Secret, and ``(C)'' for Confidential will be used. (2) Portions... from the informational text. (1) Conspicuously place the overall classification at the top and bottom...

  14. 75 FR 707 - Classified National Security Information

    Science.gov (United States)

    2010-01-05

    ... classified at one of the following three levels: (1) ``Top Secret'' shall be applied to information, the... exercise this authority. (2) ``Top Secret'' original classification authority may be delegated only by the... official has been delegated ``Top Secret'' original classification authority by the agency head. (4) Each...

  15. Neural Network Classifier Based on Growing Hyperspheres

    Czech Academy of Sciences Publication Activity Database

    Jiřina Jr., Marcel; Jiřina, Marcel

    2000-01-01

    Roč. 10, č. 3 (2000), s. 417-428 ISSN 1210-0552. [Neural Network World 2000. Prague, 09.07.2000-12.07.2000] Grant - others:MŠMT ČR(CZ) VS96047; MPO(CZ) RP-4210 Institutional research plan: AV0Z1030915 Keywords : neural network * classifier * hyperspheres * big -dimensional data Subject RIV: BA - General Mathematics

  16. Histogram deconvolution - An aid to automated classifiers

    Science.gov (United States)

    Lorre, J. J.

    1983-01-01

    It is shown that N-dimensional histograms are convolved by the addition of noise in the picture domain. Three methods are described which provide the ability to deconvolve such noise-affected histograms. The purpose of the deconvolution is to provide automated classifiers with a higher quality N-dimensional histogram from which to obtain classification statistics.

  17. Panda code

    International Nuclear Information System (INIS)

    Altomare, S.; Minton, G.

    1975-02-01

    PANDA is a new two-group one-dimensional (slab/cylinder) neutron diffusion code designed to replace and extend the FAB series. PANDA allows for the nonlinear effects of xenon, enthalpy and Doppler. Fuel depletion is allowed. PANDA has a completely general search facility which will seek criticality, maximize reactivity, or minimize peaking. Any single parameter may be varied in a search. PANDA is written in FORTRAN IV, and as such is nearly machine independent. However, PANDA has been written with the present limitations of the Westinghouse CDC-6600 system in mind. Most computation loops are very short, and the code is less than half the useful 6600 memory size so that two jobs can reside in the core at once. (auth)

  18. CANAL code

    International Nuclear Information System (INIS)

    Gara, P.; Martin, E.

    1983-01-01

    The CANAL code presented here optimizes a realistic iron free extraction channel which has to provide a given transversal magnetic field law in the median plane: the current bars may be curved, have finite lengths and cooling ducts and move in a restricted transversal area; terminal connectors may be added, images of the bars in pole pieces may be included. A special option optimizes a real set of circular coils [fr

  19. DNA-based watermarks using the DNA-Crypt algorithm

    Science.gov (United States)

    Heider, Dominik; Barnekow, Angelika

    2007-01-01

    Background The aim of this paper is to demonstrate the application of watermarks based on DNA sequences to identify the unauthorized use of genetically modified organisms (GMOs) protected by patents. Predicted mutations in the genome can be corrected by the DNA-Crypt program leaving the encrypted information intact. Existing DNA cryptographic and steganographic algorithms use synthetic DNA sequences to store binary information however, although these sequences can be used for authentication, they may change the target DNA sequence when introduced into living organisms. Results The DNA-Crypt algorithm and image steganography are based on the same watermark-hiding principle, namely using the least significant base in case of DNA-Crypt and the least significant bit in case of the image steganography. It can be combined with binary encryption algorithms like AES, RSA or Blowfish. DNA-Crypt is able to correct mutations in the target DNA with several mutation correction codes such as the Hamming-code or the WDH-code. Mutations which can occur infrequently may destroy the encrypted information, however an integrated fuzzy controller decides on a set of heuristics based on three input dimensions, and recommends whether or not to use a correction code. These three input dimensions are the length of the sequence, the individual mutation rate and the stability over time, which is represented by the number of generations. In silico experiments using the Ypt7 in Saccharomyces cerevisiae shows that the DNA watermarks produced by DNA-Crypt do not alter the translation of mRNA into protein. Conclusion The program is able to store watermarks in living organisms and can maintain the original information by correcting mutations itself. Pairwise or multiple sequence alignments show that DNA-Crypt produces few mismatches between the sequences similar to all steganographic algorithms. PMID:17535434

  20. DNA-based watermarks using the DNA-Crypt algorithm

    Directory of Open Access Journals (Sweden)

    Barnekow Angelika

    2007-05-01

    Full Text Available Abstract Background The aim of this paper is to demonstrate the application of watermarks based on DNA sequences to identify the unauthorized use of genetically modified organisms (GMOs protected by patents. Predicted mutations in the genome can be corrected by the DNA-Crypt program leaving the encrypted information intact. Existing DNA cryptographic and steganographic algorithms use synthetic DNA sequences to store binary information however, although these sequences can be used for authentication, they may change the target DNA sequence when introduced into living organisms. Results The DNA-Crypt algorithm and image steganography are based on the same watermark-hiding principle, namely using the least significant base in case of DNA-Crypt and the least significant bit in case of the image steganography. It can be combined with binary encryption algorithms like AES, RSA or Blowfish. DNA-Crypt is able to correct mutations in the target DNA with several mutation correction codes such as the Hamming-code or the WDH-code. Mutations which can occur infrequently may destroy the encrypted information, however an integrated fuzzy controller decides on a set of heuristics based on three input dimensions, and recommends whether or not to use a correction code. These three input dimensions are the length of the sequence, the individual mutation rate and the stability over time, which is represented by the number of generations. In silico experiments using the Ypt7 in Saccharomyces cerevisiae shows that the DNA watermarks produced by DNA-Crypt do not alter the translation of mRNA into protein. Conclusion The program is able to store watermarks in living organisms and can maintain the original information by correcting mutations itself. Pairwise or multiple sequence alignments show that DNA-Crypt produces few mismatches between the sequences similar to all steganographic algorithms.

  1. DNA-based watermarks using the DNA-Crypt algorithm.

    Science.gov (United States)

    Heider, Dominik; Barnekow, Angelika

    2007-05-29

    The aim of this paper is to demonstrate the application of watermarks based on DNA sequences to identify the unauthorized use of genetically modified organisms (GMOs) protected by patents. Predicted mutations in the genome can be corrected by the DNA-Crypt program leaving the encrypted information intact. Existing DNA cryptographic and steganographic algorithms use synthetic DNA sequences to store binary information however, although these sequences can be used for authentication, they may change the target DNA sequence when introduced into living organisms. The DNA-Crypt algorithm and image steganography are based on the same watermark-hiding principle, namely using the least significant base in case of DNA-Crypt and the least significant bit in case of the image steganography. It can be combined with binary encryption algorithms like AES, RSA or Blowfish. DNA-Crypt is able to correct mutations in the target DNA with several mutation correction codes such as the Hamming-code or the WDH-code. Mutations which can occur infrequently may destroy the encrypted information, however an integrated fuzzy controller decides on a set of heuristics based on three input dimensions, and recommends whether or not to use a correction code. These three input dimensions are the length of the sequence, the individual mutation rate and the stability over time, which is represented by the number of generations. In silico experiments using the Ypt7 in Saccharomyces cerevisiae shows that the DNA watermarks produced by DNA-Crypt do not alter the translation of mRNA into protein. The program is able to store watermarks in living organisms and can maintain the original information by correcting mutations itself. Pairwise or multiple sequence alignments show that DNA-Crypt produces few mismatches between the sequences similar to all steganographic algorithms.

  2. Classifying features in CT imagery: accuracy for some single- and multiple-species classifiers

    Science.gov (United States)

    Daniel L. Schmoldt; Jing He; A. Lynn Abbott

    1998-01-01

    Our current approach to automatically label features in CT images of hardwood logs classifies each pixel of an image individually. These feature classifiers use a back-propagation artificial neural network (ANN) and feature vectors that include a small, local neighborhood of pixels and the distance of the target pixel to the center of the log. Initially, this type of...

  3. Disassembly and Sanitization of Classified Matter

    International Nuclear Information System (INIS)

    Stockham, Dwight J.; Saad, Max P.

    2008-01-01

    The Disassembly Sanitization Operation (DSO) process was implemented to support weapon disassembly and disposition by using recycling and waste minimization measures. This process was initiated by treaty agreements and reconfigurations within both the DOD and DOE Complexes. The DOE is faced with disassembling and disposing of a huge inventory of retired weapons, components, training equipment, spare parts, weapon maintenance equipment, and associated material. In addition, regulations have caused a dramatic increase in the need for information required to support the handling and disposition of these parts and materials. In the past, huge inventories of classified weapon components were required to have long-term storage at Sandia and at many other locations throughout the DoE Complex. These materials are placed in onsite storage unit due to classification issues and they may also contain radiological and/or hazardous components. Since no disposal options exist for this material, the only choice was long-term storage. Long-term storage is costly and somewhat problematic, requiring a secured storage area, monitoring, auditing, and presenting the potential for loss or theft of the material. Overall recycling rates for materials sent through the DSO process have enabled 70 to 80% of these components to be recycled. These components are made of high quality materials and once this material has been sanitized, the demand for the component metals for recycling efforts is very high. The DSO process for NGPF, classified components established the credibility of this technique for addressing the long-term storage requirements of the classified weapons component inventory. The success of this application has generated interest from other Sandia organizations and other locations throughout the complex. Other organizations are requesting the help of the DSO team and the DSO is responding to these requests by expanding its scope to include Work-for- Other projects. For example

  4. Comparing cosmic web classifiers using information theory

    International Nuclear Information System (INIS)

    Leclercq, Florent; Lavaux, Guilhem; Wandelt, Benjamin; Jasche, Jens

    2016-01-01

    We introduce a decision scheme for optimally choosing a classifier, which segments the cosmic web into different structure types (voids, sheets, filaments, and clusters). Our framework, based on information theory, accounts for the design aims of different classes of possible applications: (i) parameter inference, (ii) model selection, and (iii) prediction of new observations. As an illustration, we use cosmographic maps of web-types in the Sloan Digital Sky Survey to assess the relative performance of the classifiers T-WEB, DIVA and ORIGAMI for: (i) analyzing the morphology of the cosmic web, (ii) discriminating dark energy models, and (iii) predicting galaxy colors. Our study substantiates a data-supported connection between cosmic web analysis and information theory, and paves the path towards principled design of analysis procedures for the next generation of galaxy surveys. We have made the cosmic web maps, galaxy catalog, and analysis scripts used in this work publicly available.

  5. Design of Robust Neural Network Classifiers

    DEFF Research Database (Denmark)

    Larsen, Jan; Andersen, Lars Nonboe; Hintz-Madsen, Mads

    1998-01-01

    This paper addresses a new framework for designing robust neural network classifiers. The network is optimized using the maximum a posteriori technique, i.e., the cost function is the sum of the log-likelihood and a regularization term (prior). In order to perform robust classification, we present...... a modified likelihood function which incorporates the potential risk of outliers in the data. This leads to the introduction of a new parameter, the outlier probability. Designing the neural classifier involves optimization of network weights as well as outlier probability and regularization parameters. We...... suggest to adapt the outlier probability and regularisation parameters by minimizing the error on a validation set, and a simple gradient descent scheme is derived. In addition, the framework allows for constructing a simple outlier detector. Experiments with artificial data demonstrate the potential...

  6. Comparing cosmic web classifiers using information theory

    Energy Technology Data Exchange (ETDEWEB)

    Leclercq, Florent [Institute of Cosmology and Gravitation (ICG), University of Portsmouth, Dennis Sciama Building, Burnaby Road, Portsmouth PO1 3FX (United Kingdom); Lavaux, Guilhem; Wandelt, Benjamin [Institut d' Astrophysique de Paris (IAP), UMR 7095, CNRS – UPMC Université Paris 6, Sorbonne Universités, 98bis boulevard Arago, F-75014 Paris (France); Jasche, Jens, E-mail: florent.leclercq@polytechnique.org, E-mail: lavaux@iap.fr, E-mail: j.jasche@tum.de, E-mail: wandelt@iap.fr [Excellence Cluster Universe, Technische Universität München, Boltzmannstrasse 2, D-85748 Garching (Germany)

    2016-08-01

    We introduce a decision scheme for optimally choosing a classifier, which segments the cosmic web into different structure types (voids, sheets, filaments, and clusters). Our framework, based on information theory, accounts for the design aims of different classes of possible applications: (i) parameter inference, (ii) model selection, and (iii) prediction of new observations. As an illustration, we use cosmographic maps of web-types in the Sloan Digital Sky Survey to assess the relative performance of the classifiers T-WEB, DIVA and ORIGAMI for: (i) analyzing the morphology of the cosmic web, (ii) discriminating dark energy models, and (iii) predicting galaxy colors. Our study substantiates a data-supported connection between cosmic web analysis and information theory, and paves the path towards principled design of analysis procedures for the next generation of galaxy surveys. We have made the cosmic web maps, galaxy catalog, and analysis scripts used in this work publicly available.

  7. Detection of Fundus Lesions Using Classifier Selection

    Science.gov (United States)

    Nagayoshi, Hiroto; Hiramatsu, Yoshitaka; Sako, Hiroshi; Himaga, Mitsutoshi; Kato, Satoshi

    A system for detecting fundus lesions caused by diabetic retinopathy from fundus images is being developed. The system can screen the images in advance in order to reduce the inspection workload on doctors. One of the difficulties that must be addressed in completing this system is how to remove false positives (which tend to arise near blood vessels) without decreasing the detection rate of lesions in other areas. To overcome this difficulty, we developed classifier selection according to the position of a candidate lesion, and we introduced new features that can distinguish true lesions from false positives. A system incorporating classifier selection and these new features was tested in experiments using 55 fundus images with some lesions and 223 images without lesions. The results of the experiments confirm the effectiveness of the proposed system, namely, degrees of sensitivity and specificity of 98% and 81%, respectively.

  8. Classifying objects in LWIR imagery via CNNs

    Science.gov (United States)

    Rodger, Iain; Connor, Barry; Robertson, Neil M.

    2016-10-01

    The aim of the presented work is to demonstrate enhanced target recognition and improved false alarm rates for a mid to long range detection system, utilising a Long Wave Infrared (LWIR) sensor. By exploiting high quality thermal image data and recent techniques in machine learning, the system can provide automatic target recognition capabilities. A Convolutional Neural Network (CNN) is trained and the classifier achieves an overall accuracy of > 95% for 6 object classes related to land defence. While the highly accurate CNN struggles to recognise long range target classes, due to low signal quality, robust target discrimination is achieved for challenging candidates. The overall performance of the methodology presented is assessed using human ground truth information, generating classifier evaluation metrics for thermal image sequences.

  9. Learning for VMM + WTA Embedded Classifiers

    Science.gov (United States)

    2016-03-31

    Learning for VMM + WTA Embedded Classifiers Jennifer Hasler and Sahil Shah Electrical and Computer Engineering Georgia Institute of Technology...enabling correct classification of each novel acoustic signal (generator, idle car, and idle truck ). The classification structure requires, after...measured on our SoC FPAA IC. The test input is composed of signals from urban environment for 3 objects (generator, idle car, and idle truck

  10. Bayes classifiers for imbalanced traffic accidents datasets.

    Science.gov (United States)

    Mujalli, Randa Oqab; López, Griselda; Garach, Laura

    2016-03-01

    Traffic accidents data sets are usually imbalanced, where the number of instances classified under the killed or severe injuries class (minority) is much lower than those classified under the slight injuries class (majority). This, however, supposes a challenging problem for classification algorithms and may cause obtaining a model that well cover the slight injuries instances whereas the killed or severe injuries instances are misclassified frequently. Based on traffic accidents data collected on urban and suburban roads in Jordan for three years (2009-2011); three different data balancing techniques were used: under-sampling which removes some instances of the majority class, oversampling which creates new instances of the minority class and a mix technique that combines both. In addition, different Bayes classifiers were compared for the different imbalanced and balanced data sets: Averaged One-Dependence Estimators, Weightily Average One-Dependence Estimators, and Bayesian networks in order to identify factors that affect the severity of an accident. The results indicated that using the balanced data sets, especially those created using oversampling techniques, with Bayesian networks improved classifying a traffic accident according to its severity and reduced the misclassification of killed and severe injuries instances. On the other hand, the following variables were found to contribute to the occurrence of a killed causality or a severe injury in a traffic accident: number of vehicles involved, accident pattern, number of directions, accident type, lighting, surface condition, and speed limit. This work, to the knowledge of the authors, is the first that aims at analyzing historical data records for traffic accidents occurring in Jordan and the first to apply balancing techniques to analyze injury severity of traffic accidents. Copyright © 2015 Elsevier Ltd. All rights reserved.

  11. A Bayesian classifier for symbol recognition

    OpenAIRE

    Barrat , Sabine; Tabbone , Salvatore; Nourrissier , Patrick

    2007-01-01

    URL : http://www.buyans.com/POL/UploadedFile/134_9977.pdf; International audience; We present in this paper an original adaptation of Bayesian networks to symbol recognition problem. More precisely, a descriptor combination method, which enables to improve significantly the recognition rate compared to the recognition rates obtained by each descriptor, is presented. In this perspective, we use a simple Bayesian classifier, called naive Bayes. In fact, probabilistic graphical models, more spec...

  12. Optimization of short amino acid sequences classifier

    Science.gov (United States)

    Barcz, Aleksy; Szymański, Zbigniew

    This article describes processing methods used for short amino acid sequences classification. The data processed are 9-symbols string representations of amino acid sequences, divided into 49 data sets - each one containing samples labeled as reacting or not with given enzyme. The goal of the classification is to determine for a single enzyme, whether an amino acid sequence would react with it or not. Each data set is processed separately. Feature selection is performed to reduce the number of dimensions for each data set. The method used for feature selection consists of two phases. During the first phase, significant positions are selected using Classification and Regression Trees. Afterwards, symbols appearing at the selected positions are substituted with numeric values of amino acid properties taken from the AAindex database. In the second phase the new set of features is reduced using a correlation-based ranking formula and Gram-Schmidt orthogonalization. Finally, the preprocessed data is used for training LS-SVM classifiers. SPDE, an evolutionary algorithm, is used to obtain optimal hyperparameters for the LS-SVM classifier, such as error penalty parameter C and kernel-specific hyperparameters. A simple score penalty is used to adapt the SPDE algorithm to the task of selecting classifiers with best performance measures values.

  13. SVM classifier on chip for melanoma detection.

    Science.gov (United States)

    Afifi, Shereen; GholamHosseini, Hamid; Sinha, Roopak

    2017-07-01

    Support Vector Machine (SVM) is a common classifier used for efficient classification with high accuracy. SVM shows high accuracy for classifying melanoma (skin cancer) clinical images within computer-aided diagnosis systems used by skin cancer specialists to detect melanoma early and save lives. We aim to develop a medical low-cost handheld device that runs a real-time embedded SVM-based diagnosis system for use in primary care for early detection of melanoma. In this paper, an optimized SVM classifier is implemented onto a recent FPGA platform using the latest design methodology to be embedded into the proposed device for realizing online efficient melanoma detection on a single system on chip/device. The hardware implementation results demonstrate a high classification accuracy of 97.9% and a significant acceleration factor of 26 from equivalent software implementation on an embedded processor, with 34% of resources utilization and 2 watts for power consumption. Consequently, the implemented system meets crucial embedded systems constraints of high performance and low cost, resources utilization and power consumption, while achieving high classification accuracy.

  14. From concatenated codes to graph codes

    DEFF Research Database (Denmark)

    Justesen, Jørn; Høholdt, Tom

    2004-01-01

    We consider codes based on simple bipartite expander graphs. These codes may be seen as the first step leading from product type concatenated codes to more complex graph codes. We emphasize constructions of specific codes of realistic lengths, and study the details of decoding by message passing...

  15. miRNAting control of DNA methylation

    Indian Academy of Sciences (India)

    DNA methylation is a type of epigenetic modification where a methyl group is added to the cytosine or adenine residue of a given DNA sequence. It has been observed that DNA methylation is achieved by some collaborative agglomeration of certain proteins and non-coding RNAs. The assembly of IDN2 and its ...

  16. Robust Framework to Combine Diverse Classifiers Assigning Distributed Confidence to Individual Classifiers at Class Level

    Directory of Open Access Journals (Sweden)

    Shehzad Khalid

    2014-01-01

    Full Text Available We have presented a classification framework that combines multiple heterogeneous classifiers in the presence of class label noise. An extension of m-Mediods based modeling is presented that generates model of various classes whilst identifying and filtering noisy training data. This noise free data is further used to learn model for other classifiers such as GMM and SVM. A weight learning method is then introduced to learn weights on each class for different classifiers to construct an ensemble. For this purpose, we applied genetic algorithm to search for an optimal weight vector on which classifier ensemble is expected to give the best accuracy. The proposed approach is evaluated on variety of real life datasets. It is also compared with existing standard ensemble techniques such as Adaboost, Bagging, and Random Subspace Methods. Experimental results show the superiority of proposed ensemble method as compared to its competitors, especially in the presence of class label noise and imbalance classes.

  17. The Protection of Classified Information: The Legal Framework

    National Research Council Canada - National Science Library

    Elsea, Jennifer K

    2006-01-01

    Recent incidents involving leaks of classified information have heightened interest in the legal framework that governs security classification, access to classified information, and penalties for improper disclosure...

  18. Detecting non-coding selective pressure in coding regions

    Directory of Open Access Journals (Sweden)

    Blanchette Mathieu

    2007-02-01

    Full Text Available Abstract Background Comparative genomics approaches, where orthologous DNA regions are compared and inter-species conserved regions are identified, have proven extremely powerful for identifying non-coding regulatory regions located in intergenic or intronic regions. However, non-coding functional elements can also be located within coding region, as is common for exonic splicing enhancers, some transcription factor binding sites, and RNA secondary structure elements affecting mRNA stability, localization, or translation. Since these functional elements are located in regions that are themselves highly conserved because they are coding for a protein, they generally escaped detection by comparative genomics approaches. Results We introduce a comparative genomics approach for detecting non-coding functional elements located within coding regions. Codon evolution is modeled as a mixture of codon substitution models, where each component of the mixture describes the evolution of codons under a specific type of coding selective pressure. We show how to compute the posterior distribution of the entropy and parsimony scores under this null model of codon evolution. The method is applied to a set of growth hormone 1 orthologous mRNA sequences and a known exonic splicing elements is detected. The analysis of a set of CORTBP2 orthologous genes reveals a region of several hundred base pairs under strong non-coding selective pressure whose function remains unknown. Conclusion Non-coding functional elements, in particular those involved in post-transcriptional regulation, are likely to be much more prevalent than is currently known. With the numerous genome sequencing projects underway, comparative genomics approaches like that proposed here are likely to become increasingly powerful at detecting such elements.

  19. Classifying smoking urges via machine learning.

    Science.gov (United States)

    Dumortier, Antoine; Beckjord, Ellen; Shiffman, Saul; Sejdić, Ervin

    2016-12-01

    Smoking is the largest preventable cause of death and diseases in the developed world, and advances in modern electronics and machine learning can help us deliver real-time intervention to smokers in novel ways. In this paper, we examine different machine learning approaches to use situational features associated with having or not having urges to smoke during a quit attempt in order to accurately classify high-urge states. To test our machine learning approaches, specifically, Bayes, discriminant analysis and decision tree learning methods, we used a dataset collected from over 300 participants who had initiated a quit attempt. The three classification approaches are evaluated observing sensitivity, specificity, accuracy and precision. The outcome of the analysis showed that algorithms based on feature selection make it possible to obtain high classification rates with only a few features selected from the entire dataset. The classification tree method outperformed the naive Bayes and discriminant analysis methods, with an accuracy of the classifications up to 86%. These numbers suggest that machine learning may be a suitable approach to deal with smoking cessation matters, and to predict smoking urges, outlining a potential use for mobile health applications. In conclusion, machine learning classifiers can help identify smoking situations, and the search for the best features and classifier parameters significantly improves the algorithms' performance. In addition, this study also supports the usefulness of new technologies in improving the effect of smoking cessation interventions, the management of time and patients by therapists, and thus the optimization of available health care resources. Future studies should focus on providing more adaptive and personalized support to people who really need it, in a minimum amount of time by developing novel expert systems capable of delivering real-time interventions. Copyright © 2016 Elsevier Ireland Ltd. All rights

  20. Classifying spaces of degenerating polarized Hodge structures

    CERN Document Server

    Kato, Kazuya

    2009-01-01

    In 1970, Phillip Griffiths envisioned that points at infinity could be added to the classifying space D of polarized Hodge structures. In this book, Kazuya Kato and Sampei Usui realize this dream by creating a logarithmic Hodge theory. They use the logarithmic structures begun by Fontaine-Illusie to revive nilpotent orbits as a logarithmic Hodge structure. The book focuses on two principal topics. First, Kato and Usui construct the fine moduli space of polarized logarithmic Hodge structures with additional structures. Even for a Hermitian symmetric domain D, the present theory is a refinem

  1. Gearbox Condition Monitoring Using Advanced Classifiers

    Directory of Open Access Journals (Sweden)

    P. Večeř

    2010-01-01

    Full Text Available New efficient and reliable methods for gearbox diagnostics are needed in automotive industry because of growing demand for production quality. This paper presents the application of two different classifiers for gearbox diagnostics – Kohonen Neural Networks and the Adaptive-Network-based Fuzzy Interface System (ANFIS. Two different practical applications are presented. In the first application, the tested gearboxes are separated into two classes according to their condition indicators. In the second example, ANFIS is applied to label the tested gearboxes with a Quality Index according to the condition indicators. In both applications, the condition indicators were computed from the vibration of the gearbox housing. 

  2. Cubical sets as a classifying topos

    DEFF Research Database (Denmark)

    Spitters, Bas

    Coquand’s cubical set model for homotopy type theory provides the basis for a computational interpretation of the univalence axiom and some higher inductive types, as implemented in the cubical proof assistant. We show that the underlying cube category is the opposite of the Lawvere theory of De...... Morgan algebras. The topos of cubical sets itself classifies the theory of ‘free De Morgan algebras’. This provides us with a topos with an internal ‘interval’. Using this interval we construct a model of type theory following van den Berg and Garner. We are currently investigating the precise relation...

  3. Double Ramp Loss Based Reject Option Classifier

    Science.gov (United States)

    2015-05-22

    of convex (DC) functions. To minimize it, we use DC programming approach [1]. The proposed method has following advantages: (1) the proposed loss LDR ...space constraints. We see that LDR does not put any restriction on ρ for it to be an upper bound of L0−d−1. 2.2 Risk Formulation Using LDR Let S = {(xn...classifier learnt using LDR based approach (C = 100, μ = 1, d = .2). Filled circles and triangles represent the support vectors. 4 Experimental Results We show

  4. Compilation of the abstracts of nuclear computer codes available at CPD/IPEN

    International Nuclear Information System (INIS)

    Granzotto, A.; Gouveia, A.S. de; Lourencao, E.M.

    1981-06-01

    A compilation of all computer codes available at IPEN in S.Paulo are presented. These computer codes are classified according to Argonne National Laboratory - and Energy Nuclear Agency schedule. (E.G.) [pt

  5. A systematic comparison of supervised classifiers.

    Directory of Open Access Journals (Sweden)

    Diego Raphael Amancio

    Full Text Available Pattern recognition has been employed in a myriad of industrial, commercial and academic applications. Many techniques have been devised to tackle such a diversity of applications. Despite the long tradition of pattern recognition research, there is no technique that yields the best classification in all scenarios. Therefore, as many techniques as possible should be considered in high accuracy applications. Typical related works either focus on the performance of a given algorithm or compare various classification methods. In many occasions, however, researchers who are not experts in the field of machine learning have to deal with practical classification tasks without an in-depth knowledge about the underlying parameters. Actually, the adequate choice of classifiers and parameters in such practical circumstances constitutes a long-standing problem and is one of the subjects of the current paper. We carried out a performance study of nine well-known classifiers implemented in the Weka framework and compared the influence of the parameter configurations on the accuracy. The default configuration of parameters in Weka was found to provide near optimal performance for most cases, not including methods such as the support vector machine (SVM. In addition, the k-nearest neighbor method frequently allowed the best accuracy. In certain conditions, it was possible to improve the quality of SVM by more than 20% with respect to their default parameter configuration.

  6. STATISTICAL TOOLS FOR CLASSIFYING GALAXY GROUP DYNAMICS

    International Nuclear Information System (INIS)

    Hou, Annie; Parker, Laura C.; Harris, William E.; Wilman, David J.

    2009-01-01

    The dynamical state of galaxy groups at intermediate redshifts can provide information about the growth of structure in the universe. We examine three goodness-of-fit tests, the Anderson-Darling (A-D), Kolmogorov, and χ 2 tests, in order to determine which statistical tool is best able to distinguish between groups that are relaxed and those that are dynamically complex. We perform Monte Carlo simulations of these three tests and show that the χ 2 test is profoundly unreliable for groups with fewer than 30 members. Power studies of the Kolmogorov and A-D tests are conducted to test their robustness for various sample sizes. We then apply these tests to a sample of the second Canadian Network for Observational Cosmology Redshift Survey (CNOC2) galaxy groups and find that the A-D test is far more reliable and powerful at detecting real departures from an underlying Gaussian distribution than the more commonly used χ 2 and Kolmogorov tests. We use this statistic to classify a sample of the CNOC2 groups and find that 34 of 106 groups are inconsistent with an underlying Gaussian velocity distribution, and thus do not appear relaxed. In addition, we compute velocity dispersion profiles (VDPs) for all groups with more than 20 members and compare the overall features of the Gaussian and non-Gaussian groups, finding that the VDPs of the non-Gaussian groups are distinct from those classified as Gaussian.

  7. Mercury⊕: An evidential reasoning image classifier

    Science.gov (United States)

    Peddle, Derek R.

    1995-12-01

    MERCURY⊕ is a multisource evidential reasoning classification software system based on the Dempster-Shafer theory of evidence. The design and implementation of this software package is described for improving the classification and analysis of multisource digital image data necessary for addressing advanced environmental and geoscience applications. In the remote-sensing context, the approach provides a more appropriate framework for classifying modern, multisource, and ancillary data sets which may contain a large number of disparate variables with different statistical properties, scales of measurement, and levels of error which cannot be handled using conventional Bayesian approaches. The software uses a nonparametric, supervised approach to classification, and provides a more objective and flexible interface to the evidential reasoning framework using a frequency-based method for computing support values from training data. The MERCURY⊕ software package has been implemented efficiently in the C programming language, with extensive use made of dynamic memory allocation procedures and compound linked list and hash-table data structures to optimize the storage and retrieval of evidence in a Knowledge Look-up Table. The software is complete with a full user interface and runs under Unix, Ultrix, VAX/VMS, MS-DOS, and Apple Macintosh operating system. An example of classifying alpine land cover and permafrost active layer depth in northern Canada is presented to illustrate the use and application of these ideas.

  8. The use of hyperspectral data for tree species discrimination: Combining binary classifiers

    CSIR Research Space (South Africa)

    Dastile, X

    2010-11-01

    Full Text Available classifier Classification system 7 class 1 class 2 new sample For 5-nearest neighbour classification: assign new sample to class 1. RU SASA 2010 ? Given learning task {(x1,t1),(x 2,t2),?,(x p,tp)} (xi ? Rn feature vectors, ti ? {?1,?, ?c...). A review on the combination of binary classifiers in multiclass problems. Springer science and Business Media B.V [7] Dietterich T.G and Bakiri G.(1995). Solving Multiclass Learning Problem via Error-Correcting Output Codes. AI Access Foundation...

  9. 36 CFR 1256.46 - National security-classified information.

    Science.gov (United States)

    2010-07-01

    ... 36 Parks, Forests, and Public Property 3 2010-07-01 2010-07-01 false National security-classified... Restrictions § 1256.46 National security-classified information. In accordance with 5 U.S.C. 552(b)(1), NARA... properly classified under the provisions of the pertinent Executive Order on Classified National Security...

  10. Automatic coding method of the ACR Code

    International Nuclear Information System (INIS)

    Park, Kwi Ae; Ihm, Jong Sool; Ahn, Woo Hyun; Baik, Seung Kook; Choi, Han Yong; Kim, Bong Gi

    1993-01-01

    The authors developed a computer program for automatic coding of ACR(American College of Radiology) code. The automatic coding of the ACR code is essential for computerization of the data in the department of radiology. This program was written in foxbase language and has been used for automatic coding of diagnosis in the Department of Radiology, Wallace Memorial Baptist since May 1992. The ACR dictionary files consisted of 11 files, one for the organ code and the others for the pathology code. The organ code was obtained by typing organ name or code number itself among the upper and lower level codes of the selected one that were simultaneous displayed on the screen. According to the first number of the selected organ code, the corresponding pathology code file was chosen automatically. By the similar fashion of organ code selection, the proper pathologic dode was obtained. An example of obtained ACR code is '131.3661'. This procedure was reproducible regardless of the number of fields of data. Because this program was written in 'User's Defined Function' from, decoding of the stored ACR code was achieved by this same program and incorporation of this program into program in to another data processing was possible. This program had merits of simple operation, accurate and detail coding, and easy adjustment for another program. Therefore, this program can be used for automation of routine work in the department of radiology

  11. Error-correction coding

    Science.gov (United States)

    Hinds, Erold W. (Principal Investigator)

    1996-01-01

    This report describes the progress made towards the completion of a specific task on error-correcting coding. The proposed research consisted of investigating the use of modulation block codes as the inner code of a concatenated coding system in order to improve the overall space link communications performance. The study proposed to identify and analyze candidate codes that will complement the performance of the overall coding system which uses the interleaved RS (255,223) code as the outer code.

  12. Dynamic Shannon Coding

    OpenAIRE

    Gagie, Travis

    2005-01-01

    We present a new algorithm for dynamic prefix-free coding, based on Shannon coding. We give a simple analysis and prove a better upper bound on the length of the encoding produced than the corresponding bound for dynamic Huffman coding. We show how our algorithm can be modified for efficient length-restricted coding, alphabetic coding and coding with unequal letter costs.

  13. Fundamentals of convolutional coding

    CERN Document Server

    Johannesson, Rolf

    2015-01-01

    Fundamentals of Convolutional Coding, Second Edition, regarded as a bible of convolutional coding brings you a clear and comprehensive discussion of the basic principles of this field * Two new chapters on low-density parity-check (LDPC) convolutional codes and iterative coding * Viterbi, BCJR, BEAST, list, and sequential decoding of convolutional codes * Distance properties of convolutional codes * Includes a downloadable solutions manual

  14. Codes Over Hyperfields

    Directory of Open Access Journals (Sweden)

    Atamewoue Surdive

    2017-12-01

    Full Text Available In this paper, we define linear codes and cyclic codes over a finite Krasner hyperfield and we characterize these codes by their generator matrices and parity check matrices. We also demonstrate that codes over finite Krasner hyperfields are more interesting for code theory than codes over classical finite fields.

  15. Two channel EEG thought pattern classifier.

    Science.gov (United States)

    Craig, D A; Nguyen, H T; Burchey, H A

    2006-01-01

    This paper presents a real-time electro-encephalogram (EEG) identification system with the goal of achieving hands free control. With two EEG electrodes placed on the scalp of the user, EEG signals are amplified and digitised directly using a ProComp+ encoder and transferred to the host computer through the RS232 interface. Using a real-time multilayer neural network, the actual classification for the control of a powered wheelchair has a very fast response. It can detect changes in the user's thought pattern in 1 second. Using only two EEG electrodes at positions O(1) and C(4) the system can classify three mental commands (forward, left and right) with an accuracy of more than 79 %

  16. Classifying Drivers' Cognitive Load Using EEG Signals.

    Science.gov (United States)

    Barua, Shaibal; Ahmed, Mobyen Uddin; Begum, Shahina

    2017-01-01

    A growing traffic safety issue is the effect of cognitive loading activities on traffic safety and driving performance. To monitor drivers' mental state, understanding cognitive load is important since while driving, performing cognitively loading secondary tasks, for example talking on the phone, can affect the performance in the primary task, i.e. driving. Electroencephalography (EEG) is one of the reliable measures of cognitive load that can detect the changes in instantaneous load and effect of cognitively loading secondary task. In this driving simulator study, 1-back task is carried out while the driver performs three different simulated driving scenarios. This paper presents an EEG based approach to classify a drivers' level of cognitive load using Case-Based Reasoning (CBR). The results show that for each individual scenario as well as using data combined from the different scenarios, CBR based system achieved approximately over 70% of classification accuracy.

  17. Classifying prion and prion-like phenomena.

    Science.gov (United States)

    Harbi, Djamel; Harrison, Paul M

    2014-01-01

    The universe of prion and prion-like phenomena has expanded significantly in the past several years. Here, we overview the challenges in classifying this data informatically, given that terms such as "prion-like", "prion-related" or "prion-forming" do not have a stable meaning in the scientific literature. We examine the spectrum of proteins that have been described in the literature as forming prions, and discuss how "prion" can have a range of meaning, with a strict definition being for demonstration of infection with in vitro-derived recombinant prions. We suggest that although prion/prion-like phenomena can largely be apportioned into a small number of broad groups dependent on the type of transmissibility evidence for them, as new phenomena are discovered in the coming years, a detailed ontological approach might be necessary that allows for subtle definition of different "flavors" of prion / prion-like phenomena.

  18. Hybrid Neuro-Fuzzy Classifier Based On Nefclass Model

    Directory of Open Access Journals (Sweden)

    Bogdan Gliwa

    2011-01-01

    Full Text Available The paper presents hybrid neuro-fuzzy classifier, based on NEFCLASS model, which wasmodified. The presented classifier was compared to popular classifiers – neural networks andk-nearest neighbours. Efficiency of modifications in classifier was compared with methodsused in original model NEFCLASS (learning methods. Accuracy of classifier was testedusing 3 datasets from UCI Machine Learning Repository: iris, wine and breast cancer wisconsin.Moreover, influence of ensemble classification methods on classification accuracy waspresented.

  19. Classifying Transition Behaviour in Postural Activity Monitoring

    Directory of Open Access Journals (Sweden)

    James BRUSEY

    2009-10-01

    Full Text Available A few accelerometers positioned on different parts of the body can be used to accurately classify steady state behaviour, such as walking, running, or sitting. Such systems are usually built using supervised learning approaches. Transitions between postures are, however, difficult to deal with using posture classification systems proposed to date, since there is no label set for intermediary postures and also the exact point at which the transition occurs can sometimes be hard to pinpoint. The usual bypass when using supervised learning to train such systems is to discard a section of the dataset around each transition. This leads to poorer classification performance when the systems are deployed out of the laboratory and used on-line, particularly if the regimes monitored involve fast paced activity changes. Time-based filtering that takes advantage of sequential patterns is a potential mechanism to improve posture classification accuracy in such real-life applications. Also, such filtering should reduce the number of event messages needed to be sent across a wireless network to track posture remotely, hence extending the system’s life. To support time-based filtering, understanding transitions, which are the major event generators in a classification system, is a key. This work examines three approaches to post-process the output of a posture classifier using time-based filtering: a naïve voting scheme, an exponentially weighted voting scheme, and a Bayes filter. Best performance is obtained from the exponentially weighted voting scheme although it is suspected that a more sophisticated treatment of the Bayes filter might yield better results.

  20. Mitochondrial DNA haplogroup phylogeny of the dog: Proposal for a cladistic nomenclature.

    Science.gov (United States)

    Fregel, Rosa; Suárez, Nicolás M; Betancor, Eva; González, Ana M; Cabrera, Vicente M; Pestano, José

    2015-05-01

    Canis lupus familiaris mitochondrial DNA analysis has increased in recent years, not only for the purpose of deciphering dog domestication but also for forensic genetic studies or breed characterization. The resultant accumulation of data has increased the need for a normalized and phylogenetic-based nomenclature like those provided for human maternal lineages. Although a standardized classification has been proposed, haplotype names within clades have been assigned gradually without considering the evolutionary history of dog mtDNA. Moreover, this classification is based only on the D-loop region, proven to be insufficient for phylogenetic purposes due to its high number of recurrent mutations and the lack of relevant information present in the coding region. In this study, we design 1) a refined mtDNA cladistic nomenclature from a phylogenetic tree based on complete sequences, classifying dog maternal lineages into haplogroups defined by specific diagnostic mutations, and 2) a coding region SNP analysis that allows a more accurate classification into haplogroups when combined with D-loop sequencing, thus improving the phylogenetic information obtained in dog mitochondrial DNA studies. Copyright © 2015 Elsevier B.V. All rights reserved.

  1. Just-in-time adaptive classifiers-part II: designing the classifier.

    Science.gov (United States)

    Alippi, Cesare; Roveri, Manuel

    2008-12-01

    Aging effects, environmental changes, thermal drifts, and soft and hard faults affect physical systems by changing their nature and behavior over time. To cope with a process evolution adaptive solutions must be envisaged to track its dynamics; in this direction, adaptive classifiers are generally designed by assuming the stationary hypothesis for the process generating the data with very few results addressing nonstationary environments. This paper proposes a methodology based on k-nearest neighbor (NN) classifiers for designing adaptive classification systems able to react to changing conditions just-in-time (JIT), i.e., exactly when it is needed. k-NN classifiers have been selected for their computational-free training phase, the possibility to easily estimate the model complexity k and keep under control the computational complexity of the classifier through suitable data reduction mechanisms. A JIT classifier requires a temporal detection of a (possible) process deviation (aspect tackled in a companion paper) followed by an adaptive management of the knowledge base (KB) of the classifier to cope with the process change. The novelty of the proposed approach resides in the general framework supporting the real-time update of the KB of the classification system in response to novel information coming from the process both in stationary conditions (accuracy improvement) and in nonstationary ones (process tracking) and in providing a suitable estimate of k. It is shown that the classification system grants consistency once the change targets the process generating the data in a new stationary state, as it is the case in many real applications.

  2. Continuous speech recognition with sparse coding

    CSIR Research Space (South Africa)

    Smit, WJ

    2009-04-01

    Full Text Available generative model. The spike train is classified by making use of a spike train model and dynamic programming. It is computationally expensive to find a sparse code. We use an iterative subset selection algorithm with quadratic programming for this process...

  3. Profiling DNA damage response following mitotic perturbations

    DEFF Research Database (Denmark)

    Pedersen, Ronni Sølvhøi; Karemore, Gopal; Gudjonsson, Thorkell

    2016-01-01

    that a broad spectrum of mitotic errors correlates with increased DNA breakage in daughter cells. Unexpectedly, we find that only a subset of these correlations are functionally linked. We identify the genuine mitosis-born DNA damage events and sub-classify them according to penetrance of the observed...

  4. Lectin cDNA and transgenic plants derived therefrom

    Science.gov (United States)

    Raikhel, Natasha V.

    2000-10-03

    Transgenic plants containing cDNA encoding Gramineae lectin are described. The plants preferably contain cDNA coding for barley lectin and store the lectin in the leaves. The transgenic plants, particularly the leaves exhibit insecticidal and fungicidal properties.

  5. Combining MLC and SVM Classifiers for Learning Based Decision Making: Analysis and Evaluations.

    Science.gov (United States)

    Zhang, Yi; Ren, Jinchang; Jiang, Jianmin

    2015-01-01

    Maximum likelihood classifier (MLC) and support vector machines (SVM) are two commonly used approaches in machine learning. MLC is based on Bayesian theory in estimating parameters of a probabilistic model, whilst SVM is an optimization based nonparametric method in this context. Recently, it is found that SVM in some cases is equivalent to MLC in probabilistically modeling the learning process. In this paper, MLC and SVM are combined in learning and classification, which helps to yield probabilistic output for SVM and facilitate soft decision making. In total four groups of data are used for evaluations, covering sonar, vehicle, breast cancer, and DNA sequences. The data samples are characterized in terms of Gaussian/non-Gaussian distributed and balanced/unbalanced samples which are then further used for performance assessment in comparing the SVM and the combined SVM-MLC classifier. Interesting results are reported to indicate how the combined classifier may work under various conditions.

  6. Combining MLC and SVM Classifiers for Learning Based Decision Making: Analysis and Evaluations

    Directory of Open Access Journals (Sweden)

    Yi Zhang

    2015-01-01

    Full Text Available Maximum likelihood classifier (MLC and support vector machines (SVM are two commonly used approaches in machine learning. MLC is based on Bayesian theory in estimating parameters of a probabilistic model, whilst SVM is an optimization based nonparametric method in this context. Recently, it is found that SVM in some cases is equivalent to MLC in probabilistically modeling the learning process. In this paper, MLC and SVM are combined in learning and classification, which helps to yield probabilistic output for SVM and facilitate soft decision making. In total four groups of data are used for evaluations, covering sonar, vehicle, breast cancer, and DNA sequences. The data samples are characterized in terms of Gaussian/non-Gaussian distributed and balanced/unbalanced samples which are then further used for performance assessment in comparing the SVM and the combined SVM-MLC classifier. Interesting results are reported to indicate how the combined classifier may work under various conditions.

  7. Sequence analysis of mitochondrial DNA hypervariable region III of ...

    African Journals Online (AJOL)

    Aghomotsegin

    2015-07-01

    Jul 1, 2015 ... population genetics research, studies based on mitochondrial DNA (mtDNA) and Y-chromosome DNA are an excellent way of illustrating population structure .... avoid landing investigators into serious situations of medical genetic privacy and ethnics, especially for. mtDNA coding area whose mutation often ...

  8. Vector Network Coding Algorithms

    OpenAIRE

    Ebrahimi, Javad; Fragouli, Christina

    2010-01-01

    We develop new algebraic algorithms for scalar and vector network coding. In vector network coding, the source multicasts information by transmitting vectors of length L, while intermediate nodes process and combine their incoming packets by multiplying them with L x L coding matrices that play a similar role as coding c in scalar coding. Our algorithms for scalar network jointly optimize the employed field size while selecting the coding coefficients. Similarly, for vector coding, our algori...

  9. Immunization with DNA plasmids coding for crimean-congo hemorrhagic fever virus capsid and envelope proteins and/or virus-like particles induces protection and survival in challenged mice

    DEFF Research Database (Denmark)

    Hinkula, Jorma; Devignot, Stéphanie; Åkerström, Sara

    2017-01-01

    , there was no correlation with the neutralizing antibody titers alone, which were higher in the tc-VLP-vaccinated mice. However, the animals with a lower neutralizing titer, but a dominant cell-mediated Th1 response and a balanced Th2 response, resisted the CCHFV challenge. Moreover, we found that in challenged mice...... with a Th1 response (immunized by DNA/DNA and boosted by tc-VLPs), the immune response changed to Th2 at day 9 postchallenge. In addition, we were able to identify new linear B-cell epitope regions that are highly conserved between CCHFV strains. Altogether, our results suggest that a predominantly Th1-type...

  10. ECLogger: Cross-Project Catch-Block Logging Prediction Using Ensemble of Classifiers

    Directory of Open Access Journals (Sweden)

    Sangeeta Lal

    2017-01-01

    Full Text Available Background: Software developers insert log statements in the source code to record program execution information. However, optimizing the number of log statements in the source code is challenging. Machine learning based within-project logging prediction tools, proposed in previous studies, may not be suitable for new or small software projects. For such software projects, we can use cross-project logging prediction. Aim: The aim of the study presented here is to investigate cross-project logging prediction methods and techniques. Method: The proposed method is ECLogger, which is a novel, ensemble-based, cross-project, catch-block logging prediction model. In the research We use 9 base classifiers were used and combined using ensemble techniques. The performance of ECLogger was evaluated on on three open-source Java projects: Tomcat, CloudStack and Hadoop. Results: ECLogger Bagging, ECLogger AverageVote, and ECLogger MajorityVote show a considerable improvement in the average Logged F-measure (LF on 3, 5, and 4 source -> target project pairs, respectively, compared to the baseline classifiers. ECLogger AverageVote performs best and shows improvements of 3.12% (average LF and 6.08% (average ACC – Accuracy. Conclusion: The classifier based on ensemble techniques, such as bagging, average vote, and majority vote outperforms the baseline classifier. Overall, the ECLogger AverageVote model performs best. The results show that the CloudStack project is more generalizable than the other projects.

  11. Classifying Adverse Events in the Dental Office.

    Science.gov (United States)

    Kalenderian, Elsbeth; Obadan-Udoh, Enihomo; Maramaldi, Peter; Etolue, Jini; Yansane, Alfa; Stewart, Denice; White, Joel; Vaderhobli, Ram; Kent, Karla; Hebballi, Nutan B; Delattre, Veronique; Kahn, Maria; Tokede, Oluwabunmi; Ramoni, Rachel B; Walji, Muhammad F

    2017-06-30

    Dentists strive to provide safe and effective oral healthcare. However, some patients may encounter an adverse event (AE) defined as "unnecessary harm due to dental treatment." In this research, we propose and evaluate two systems for categorizing the type and severity of AEs encountered at the dental office. Several existing medical AE type and severity classification systems were reviewed and adapted for dentistry. Using data collected in previous work, two initial dental AE type and severity classification systems were developed. Eight independent reviewers performed focused chart reviews, and AEs identified were used to evaluate and modify these newly developed classifications. A total of 958 charts were independently reviewed. Among the reviewed charts, 118 prospective AEs were found and 101 (85.6%) were verified as AEs through a consensus process. At the end of the study, a final AE type classification comprising 12 categories, and an AE severity classification comprising 7 categories emerged. Pain and infection were the most common AE types representing 73% of the cases reviewed (56% and 17%, respectively) and 88% were found to cause temporary, moderate to severe harm to the patient. Adverse events found during the chart review process were successfully classified using the novel dental AE type and severity classifications. Understanding the type of AEs and their severity are important steps if we are to learn from and prevent patient harm in the dental office.

  12. Is it important to classify ischaemic stroke?

    LENUS (Irish Health Repository)

    Iqbal, M

    2012-02-01

    Thirty-five percent of all ischemic events remain classified as cryptogenic. This study was conducted to ascertain the accuracy of diagnosis of ischaemic stroke based on information given in the medical notes. It was tested by applying the clinical information to the (TOAST) criteria. Hundred and five patients presented with acute stroke between Jan-Jun 2007. Data was collected on 90 patients. Male to female ratio was 39:51 with age range of 47-93 years. Sixty (67%) patients had total\\/partial anterior circulation stroke; 5 (5.6%) had a lacunar stroke and in 25 (28%) the mechanism of stroke could not be identified. Four (4.4%) patients with small vessel disease were anticoagulated; 5 (5.6%) with atrial fibrillation received antiplatelet therapy and 2 (2.2%) patients with atrial fibrillation underwent CEA. This study revealed deficiencies in the clinical assessment of patients and treatment was not tailored to the mechanism of stroke in some patients.

  13. Stress fracture development classified by bone scintigraphy

    International Nuclear Information System (INIS)

    Zwas, S.T.; Elkanovich, R.; Frank, G.; Aharonson, Z.

    1985-01-01

    There is no consensus on classifying stress fractures (SF) appearing on bone scans. The authors present a system of classification based on grading the severity and development of bone lesions by visual inspection, according to three main scintigraphic criteria: focality and size, intensity of uptake compare to adjacent bone, and local medular extension. Four grades of development (I-IV) were ranked, ranging from ill defined slightly increased cortical uptake to well defined regions with markedly increased uptake extending transversely bicortically. 310 male subjects aged 19-2, suffering several weeks from leg pains occurring during intensive physical training underwent bone scans of the pelvis and lower extremities using Tc-99-m-MDP. 76% of the scans were positive with 354 lesions, of which 88% were in th4e mild (I-II) grades and 12% in the moderate (III) and severe (IV) grades. Post-treatment scans were obtained in 65 cases having 78 lesions during 1- to 6-month intervals. Complete resolution was found after 1-2 months in 36% of the mild lesions but in only 12% of the moderate and severe ones, and after 3-6 months in 55% of the mild lesions and 15% of the severe ones. 75% of the moderate and severe lesions showed residual uptake in various stages throughout the follow-up period. Early recognition and treatment of mild SF lesions in this study prevented protracted disability and progression of the lesions and facilitated complete healing

  14. DNA methylation in obesity

    Directory of Open Access Journals (Sweden)

    Małgorzata Pokrywka

    2014-11-01

    Full Text Available The number of overweight and obese people is increasing at an alarming rate, especially in the developed and developing countries. Obesity is a major risk factor for diabetes, cardiovascular disease, and cancer, and in consequence for premature death. The development of obesity results from the interplay of both genetic and environmental factors, which include sedentary life style and abnormal eating habits. In the past few years a number of events accompanying obesity, affecting expression of genes which are not directly connected with the DNA base sequence (e.g. epigenetic changes, have been described. Epigenetic processes include DNA methylation, histone modifications such as acetylation, methylation, phosphorylation, ubiquitination, and sumoylation, as well as non-coding micro-RNA (miRNA synthesis. In this review, the known changes in the profile of DNA methylation as a factor affecting obesity and its complications are described.

  15. Diagnostic Coding for Epilepsy.

    Science.gov (United States)

    Williams, Korwyn; Nuwer, Marc R; Buchhalter, Jeffrey R

    2016-02-01

    Accurate coding is an important function of neurologic practice. This contribution to Continuum is part of an ongoing series that presents helpful coding information along with examples related to the issue topic. Tips for diagnosis coding, Evaluation and Management coding, procedure coding, or a combination are presented, depending on which is most applicable to the subject area of the issue.

  16. Coding of Neuroinfectious Diseases.

    Science.gov (United States)

    Barkley, Gregory L

    2015-12-01

    Accurate coding is an important function of neurologic practice. This contribution to Continuum is part of an ongoing series that presents helpful coding information along with examples related to the issue topic. Tips for diagnosis coding, Evaluation and Management coding, procedure coding, or a combination are presented, depending on which is most applicable to the subject area of the issue.

  17. Modeling DNA

    Science.gov (United States)

    Robertson, Carol

    2016-01-01

    Deoxyribonucleic acid (DNA) is life's most amazing molecule. It carries the genetic instructions that almost every organism needs to develop and reproduce. In the human genome alone, there are some three billion DNA base pairs. The most difficult part of teaching DNA structure, however, may be getting students to visualize something as small as a…

  18. Discriminative sparse coding on multi-manifolds

    KAUST Repository

    Wang, J.J.-Y.; Bensmail, H.; Yao, N.; Gao, Xin

    2013-01-01

    Sparse coding has been popularly used as an effective data representation method in various applications, such as computer vision, medical imaging and bioinformatics. However, the conventional sparse coding algorithms and their manifold-regularized variants (graph sparse coding and Laplacian sparse coding), learn codebooks and codes in an unsupervised manner and neglect class information that is available in the training set. To address this problem, we propose a novel discriminative sparse coding method based on multi-manifolds, that learns discriminative class-conditioned codebooks and sparse codes from both data feature spaces and class labels. First, the entire training set is partitioned into multiple manifolds according to the class labels. Then, we formulate the sparse coding as a manifold-manifold matching problem and learn class-conditioned codebooks and codes to maximize the manifold margins of different classes. Lastly, we present a data sample-manifold matching-based strategy to classify the unlabeled data samples. Experimental results on somatic mutations identification and breast tumor classification based on ultrasonic images demonstrate the efficacy of the proposed data representation and classification approach. 2013 The Authors. All rights reserved.

  19. Discriminative sparse coding on multi-manifolds

    KAUST Repository

    Wang, J.J.-Y.

    2013-09-26

    Sparse coding has been popularly used as an effective data representation method in various applications, such as computer vision, medical imaging and bioinformatics. However, the conventional sparse coding algorithms and their manifold-regularized variants (graph sparse coding and Laplacian sparse coding), learn codebooks and codes in an unsupervised manner and neglect class information that is available in the training set. To address this problem, we propose a novel discriminative sparse coding method based on multi-manifolds, that learns discriminative class-conditioned codebooks and sparse codes from both data feature spaces and class labels. First, the entire training set is partitioned into multiple manifolds according to the class labels. Then, we formulate the sparse coding as a manifold-manifold matching problem and learn class-conditioned codebooks and codes to maximize the manifold margins of different classes. Lastly, we present a data sample-manifold matching-based strategy to classify the unlabeled data samples. Experimental results on somatic mutations identification and breast tumor classification based on ultrasonic images demonstrate the efficacy of the proposed data representation and classification approach. 2013 The Authors. All rights reserved.

  20. 41 CFR 105-62.102 - Authority to originally classify.

    Science.gov (United States)

    2010-07-01

    ... originally classify. (a) Top secret, secret, and confidential. The authority to originally classify information as Top Secret, Secret, or Confidential may be exercised only by the Administrator and is delegable...

  1. Naive Bayesian classifiers for multinomial features: a theoretical analysis

    CSIR Research Space (South Africa)

    Van Dyk, E

    2007-11-01

    Full Text Available The authors investigate the use of naive Bayesian classifiers for multinomial feature spaces and derive error estimates for these classifiers. The error analysis is done by developing a mathematical model to estimate the probability density...

  2. Ensemble of classifiers based network intrusion detection system performance bound

    CSIR Research Space (South Africa)

    Mkuzangwe, Nenekazi NP

    2017-11-01

    Full Text Available This paper provides a performance bound of a network intrusion detection system (NIDS) that uses an ensemble of classifiers. Currently researchers rely on implementing the ensemble of classifiers based NIDS before they can determine the performance...

  3. Haben repetitive DNA-Sequenzen biologische Funktionen?

    Science.gov (United States)

    John, Maliyakal E.; Knöchel, Walter

    1983-05-01

    By DNA reassociation kinetics it is known that the eucaryotic genome consists of non-repetitive DNA, middle-repetitive DNA and highly repetitive DNA. Whereas the majority of protein-coding genes is located on non-repetitive DNA, repetitive DNA forms a constitutive part of eucaryotic DNA and its amount in most cases equals or even substantially exceeds that of non-repetitive DNA. During the past years a large body of data on repetitive DNA has accumulated and these have prompted speculations ranging from specific roles in the regulation of gene expression to that of a selfish entity with inconsequential functions. The following article summarizes recent findings on structural, transcriptional and evolutionary aspects and, although by no means being proven, some possible biological functions are discussed.

  4. Fast Most Similar Neighbor (MSN) classifiers for Mixed Data

    OpenAIRE

    Hernández Rodríguez, Selene

    2010-01-01

    The k nearest neighbor (k-NN) classifier has been extensively used in Pattern Recognition because of its simplicity and its good performance. However, in large datasets applications, the exhaustive k-NN classifier becomes impractical. Therefore, many fast k-NN classifiers have been developed; most of them rely on metric properties (usually the triangle inequality) to reduce the number of prototype comparisons. Hence, the existing fast k-NN classifiers are applicable only when the comparison f...

  5. Using sequence-specific chemical and structural properties of DNA to predict transcription factor binding sites.

    Directory of Open Access Journals (Sweden)

    Amy L Bauer

    2010-11-01

    Full Text Available An important step in understanding gene regulation is to identify the DNA binding sites recognized by each transcription factor (TF. Conventional approaches to prediction of TF binding sites involve the definition of consensus sequences or position-specific weight matrices and rely on statistical analysis of DNA sequences of known binding sites. Here, we present a method called SiteSleuth in which DNA structure prediction, computational chemistry, and machine learning are applied to develop models for TF binding sites. In this approach, binary classifiers are trained to discriminate between true and false binding sites based on the sequence-specific chemical and structural features of DNA. These features are determined via molecular dynamics calculations in which we consider each base in different local neighborhoods. For each of 54 TFs in Escherichia coli, for which at least five DNA binding sites are documented in RegulonDB, the TF binding sites and portions of the non-coding genome sequence are mapped to feature vectors and used in training. According to cross-validation analysis and a comparison of computational predictions against ChIP-chip data available for the TF Fis, SiteSleuth outperforms three conventional approaches: Match, MATRIX SEARCH, and the method of Berg and von Hippel. SiteSleuth also outperforms QPMEME, a method similar to SiteSleuth in that it involves a learning algorithm. The main advantage of SiteSleuth is a lower false positive rate.

  6. Three data partitioning strategies for building local classifiers (Chapter 14)

    NARCIS (Netherlands)

    Zliobaite, I.; Okun, O.; Valentini, G.; Re, M.

    2011-01-01

    Divide-and-conquer approach has been recognized in multiple classifier systems aiming to utilize local expertise of individual classifiers. In this study we experimentally investigate three strategies for building local classifiers that are based on different routines of sampling data for training.

  7. Recognition of pornographic web pages by classifying texts and images.

    Science.gov (United States)

    Hu, Weiming; Wu, Ou; Chen, Zhouyao; Fu, Zhouyu; Maybank, Steve

    2007-06-01

    With the rapid development of the World Wide Web, people benefit more and more from the sharing of information. However, Web pages with obscene, harmful, or illegal content can be easily accessed. It is important to recognize such unsuitable, offensive, or pornographic Web pages. In this paper, a novel framework for recognizing pornographic Web pages is described. A C4.5 decision tree is used to divide Web pages, according to content representations, into continuous text pages, discrete text pages, and image pages. These three categories of Web pages are handled, respectively, by a continuous text classifier, a discrete text classifier, and an algorithm that fuses the results from the image classifier and the discrete text classifier. In the continuous text classifier, statistical and semantic features are used to recognize pornographic texts. In the discrete text classifier, the naive Bayes rule is used to calculate the probability that a discrete text is pornographic. In the image classifier, the object's contour-based features are extracted to recognize pornographic images. In the text and image fusion algorithm, the Bayes theory is used to combine the recognition results from images and texts. Experimental results demonstrate that the continuous text classifier outperforms the traditional keyword-statistics-based classifier, the contour-based image classifier outperforms the traditional skin-region-based image classifier, the results obtained by our fusion algorithm outperform those by either of the individual classifiers, and our framework can be adapted to different categories of Web pages.

  8. 32 CFR 2400.28 - Dissemination of classified information.

    Science.gov (United States)

    2010-07-01

    ... 32 National Defense 6 2010-07-01 2010-07-01 false Dissemination of classified information. 2400.28... SECURITY PROGRAM Safeguarding § 2400.28 Dissemination of classified information. Heads of OSTP offices... originating official may prescribe specific restrictions on dissemination of classified information when...

  9. Francis Crick, DNA, and the Central Dogma

    Science.gov (United States)

    Olby, Robert

    1970-01-01

    This essay describes how Francis Crick, ex-physicist, entered the field of biology and discovered the structure of DNA. Emphasis is upon the double helix, the sequence hypothesis, the central dogma, and the genetic code. (VW)

  10. Vector Network Coding

    OpenAIRE

    Ebrahimi, Javad; Fragouli, Christina

    2010-01-01

    We develop new algebraic algorithms for scalar and vector network coding. In vector network coding, the source multicasts information by transmitting vectors of length L, while intermediate nodes process and combine their incoming packets by multiplying them with L X L coding matrices that play a similar role as coding coefficients in scalar coding. Our algorithms for scalar network jointly optimize the employed field size while selecting the coding coefficients. Similarly, for vector co...

  11. Entropy Coding in HEVC

    OpenAIRE

    Sze, Vivienne; Marpe, Detlev

    2014-01-01

    Context-Based Adaptive Binary Arithmetic Coding (CABAC) is a method of entropy coding first introduced in H.264/AVC and now used in the latest High Efficiency Video Coding (HEVC) standard. While it provides high coding efficiency, the data dependencies in H.264/AVC CABAC make it challenging to parallelize and thus limit its throughput. Accordingly, during the standardization of entropy coding for HEVC, both aspects of coding efficiency and throughput were considered. This chapter describes th...

  12. Generalized concatenated quantum codes

    International Nuclear Information System (INIS)

    Grassl, Markus; Shor, Peter; Smith, Graeme; Smolin, John; Zeng Bei

    2009-01-01

    We discuss the concept of generalized concatenated quantum codes. This generalized concatenation method provides a systematical way for constructing good quantum codes, both stabilizer codes and nonadditive codes. Using this method, we construct families of single-error-correcting nonadditive quantum codes, in both binary and nonbinary cases, which not only outperform any stabilizer codes for finite block length but also asymptotically meet the quantum Hamming bound for large block length.

  13. Rateless feedback codes

    DEFF Research Database (Denmark)

    Sørensen, Jesper Hemming; Koike-Akino, Toshiaki; Orlik, Philip

    2012-01-01

    This paper proposes a concept called rateless feedback coding. We redesign the existing LT and Raptor codes, by introducing new degree distributions for the case when a few feedback opportunities are available. We show that incorporating feedback to LT codes can significantly decrease both...... the coding overhead and the encoding/decoding complexity. Moreover, we show that, at the price of a slight increase in the coding overhead, linear complexity is achieved with Raptor feedback coding....

  14. Peat classified as slowly renewable biomass fuel

    International Nuclear Information System (INIS)

    2001-01-01

    thousands of years. The report states also that peat should be classified as biomass fuel instead of biofuels, such as wood, or fossil fuels such as coal. According to the report peat is a renewable biomass fuel like biofuels, but due to slow accumulation it should be considered as slowly renewable fuel. The report estimates that bonding of carbon in both virgin and forest drained peatlands are so high that it can compensate the emissions formed in combustion of energy peat

  15. Isolation and characterization of a cDNA clone coding for a glutathione S-transferase class delta enzyme from the biting midge Culicoides variipennis sonorensis Wirth and Jones.

    Science.gov (United States)

    Abdallah, M A; Pollenz, R S; Droog, F N; Nunamaker, R A; Tabachnick, W J; Murphy, K E

    2000-12-01

    Culicoides variipennis sonorensis is the primary vector of bluetongue viruses in North America. Glutathione S-transferases (GSTs) are enzymes that catalyze nucleophilic substitutions, converting reactive lipophilic molecules into soluble conjugates. Increased GST activity is associated with development of insecticide resistance. Described here is the isolation of the first cDNA encoding a C. variipennis GST. The clone consists of 720 translated bases encoding a protein with a M(r) of approximately 24,800 composed of 219 amino acids. The deduced amino acid sequence is similar (64%-74%) to class Delta (previously named Theta) GSTs from the dipteran genera Musca, Drosophila, Lucilia and Anopheles. The cDNA was subcloned into pET-11b, expressed in Epicurian coli BL21 (DE3) and has a specific activity of approximately 28,000 units/mg for the substrate 1-chloro-2,4-dinitrobenzene.

  16. Advanced video coding systems

    CERN Document Server

    Gao, Wen

    2015-01-01

    This comprehensive and accessible text/reference presents an overview of the state of the art in video coding technology. Specifically, the book introduces the tools of the AVS2 standard, describing how AVS2 can help to achieve a significant improvement in coding efficiency for future video networks and applications by incorporating smarter coding tools such as scene video coding. Topics and features: introduces the basic concepts in video coding, and presents a short history of video coding technology and standards; reviews the coding framework, main coding tools, and syntax structure of AV

  17. Coding for dummies

    CERN Document Server

    Abraham, Nikhil

    2015-01-01

    Hands-on exercises help you learn to code like a pro No coding experience is required for Coding For Dummies,your one-stop guide to building a foundation of knowledge inwriting computer code for web, application, and softwaredevelopment. It doesn't matter if you've dabbled in coding or neverwritten a line of code, this book guides you through the basics.Using foundational web development languages like HTML, CSS, andJavaScript, it explains in plain English how coding works and whyit's needed. Online exercises developed by Codecademy, a leading online codetraining site, help hone coding skill

  18. The Classification of Complementary Information Set Codes of Lengths 14 and 16

    OpenAIRE

    Freibert, Finley

    2012-01-01

    In the paper "A new class of codes for Boolean masking of cryptographic computations," Carlet, Gaborit, Kim, and Sol\\'{e} defined a new class of rate one-half binary codes called \\emph{complementary information set} (or CIS) codes. The authors then classified all CIS codes of length less than or equal to 12. CIS codes have relations to classical Coding Theory as they are a generalization of self-dual codes. As stated in the paper, CIS codes also have important practical applications as they m...

  19. Computation of the Genetic Code

    Science.gov (United States)

    Kozlov, Nicolay N.; Kozlova, Olga N.

    2018-03-01

    One of the problems in the development of mathematical theory of the genetic code (summary is presented in [1], the detailed -to [2]) is the problem of the calculation of the genetic code. Similar problems in the world is unknown and could be delivered only in the 21st century. One approach to solving this problem is devoted to this work. For the first time provides a detailed description of the method of calculation of the genetic code, the idea of which was first published earlier [3]), and the choice of one of the most important sets for the calculation was based on an article [4]. Such a set of amino acid corresponds to a complete set of representations of the plurality of overlapping triple gene belonging to the same DNA strand. A separate issue was the initial point, triggering an iterative search process all codes submitted by the initial data. Mathematical analysis has shown that the said set contains some ambiguities, which have been founded because of our proposed compressed representation of the set. As a result, the developed method of calculation was limited to the two main stages of research, where the first stage only the of the area were used in the calculations. The proposed approach will significantly reduce the amount of computations at each step in this complex discrete structure.

  20. Transcription of repetitive DNA in Neurospora crassa

    Energy Technology Data Exchange (ETDEWEB)

    Dutta, S K; Chaudhuri, R K

    1975-01-01

    Repeated DNA sequences of Neurospora crassa were isolated and characterized. Approximately 10 to 12 percent of N. crassa DNA sequence were repeated, of which 7.3 percent were found to be transcribed in mid-log phase of mycelial growth as measured by DNA:RNA hybridization. It is suggested that part of repetitive DNA transcripts in N. crassa were mitochondrial and part were nuclear DNA. Most of the nuclear repeated DNAs, however, code for rRNA and tRNA in N. crassa. (auth)

  1. Discussion on LDPC Codes and Uplink Coding

    Science.gov (United States)

    Andrews, Ken; Divsalar, Dariush; Dolinar, Sam; Moision, Bruce; Hamkins, Jon; Pollara, Fabrizio

    2007-01-01

    This slide presentation reviews the progress that the workgroup on Low-Density Parity-Check (LDPC) for space link coding. The workgroup is tasked with developing and recommending new error correcting codes for near-Earth, Lunar, and deep space applications. Included in the presentation is a summary of the technical progress of the workgroup. Charts that show the LDPC decoder sensitivity to symbol scaling errors are reviewed, as well as a chart showing the performance of several frame synchronizer algorithms compared to that of some good codes and LDPC decoder tests at ESTL. Also reviewed is a study on Coding, Modulation, and Link Protocol (CMLP), and the recommended codes. A design for the Pseudo-Randomizer with LDPC Decoder and CRC is also reviewed. A chart that summarizes the three proposed coding systems is also presented.

  2. Locating and classifying defects using an hybrid data base

    Energy Technology Data Exchange (ETDEWEB)

    Luna-Aviles, A; Diaz Pineda, A [Tecnologico de Estudios Superiores de Coacalco. Av. 16 de Septiembre 54, Col. Cabecera Municipal. C.P. 55700 (Mexico); Hernandez-Gomez, L H; Urriolagoitia-Calderon, G; Urriolagoitia-Sosa, G [Instituto Politecnico Nacional. ESIME-SEPI. Unidad Profesional ' Adolfo Lopez Mateos' Edificio 5, 30 Piso, Colonia Lindavista. Gustavo A. Madero. 07738 Mexico D.F. (Mexico); Durodola, J F [School of Technology, Oxford Brookes University, Headington Campus, Gipsy Lane, Oxford OX3 0BP (United Kingdom); Beltran Fernandez, J A, E-mail: alelunaav@hotmail.com, E-mail: luishector56@hotmail.com, E-mail: jdurodola@brookes.ac.uk

    2011-07-19

    A computational inverse technique was used in the localization and classification of defects. Postulated voids of two different sizes (2 mm and 4 mm diameter) were introduced in PMMA bars with and without a notch. The bar dimensions are 200x20x5 mm. One half of them were plain and the other half has a notch (3 mm x 4 mm) which is close to the defect area (19 mm x 16 mm).This analysis was done with an Artificial Neural Network (ANN) and its optimization was done with an Adaptive Neuro Fuzzy Procedure (ANFIS). A hybrid data base was developed with numerical and experimental results. Synthetic data was generated with the finite element method using SOLID95 element of ANSYS code. A parametric analysis was carried out. Only one defect in such bars was taken into account and the first five natural frequencies were calculated. 460 cases were evaluated. Half of them were plain and the other half has a notch. All the input data was classified in two groups. Each one has 230 cases and corresponds to one of the two sort of voids mentioned above. On the other hand, experimental analysis was carried on with PMMA specimens of the same size. The first two natural frequencies of 40 cases were obtained with one void. The other three frequencies were obtained numerically. 20 of these bars were plain and the others have a notch. These experimental results were introduced in the synthetic data base. 400 cases were taken randomly and, with this information, the ANN was trained with the backpropagation algorithm. The accuracy of the results was tested with the 100 cases that were left. In the next stage of this work, the ANN output was optimized with ANFIS. Previous papers showed that localization and classification of defects was reduced as notches were introduced in such bars. In the case of this paper, improved results were obtained when a hybrid data base was used.

  3. CLASSIFYING X-RAY BINARIES: A PROBABILISTIC APPROACH

    International Nuclear Information System (INIS)

    Gopalan, Giri; Bornn, Luke; Vrtilek, Saeqa Dil

    2015-01-01

    In X-ray binary star systems consisting of a compact object that accretes material from an orbiting secondary star, there is no straightforward means to decide whether the compact object is a black hole or a neutron star. To assist in this process, we develop a Bayesian statistical model that makes use of the fact that X-ray binary systems appear to cluster based on their compact object type when viewed from a three-dimensional coordinate system derived from X-ray spectral data where the first coordinate is the ratio of counts in the mid- to low-energy band (color 1), the second coordinate is the ratio of counts in the high- to low-energy band (color 2), and the third coordinate is the sum of counts in all three bands. We use this model to estimate the probabilities of an X-ray binary system containing a black hole, non-pulsing neutron star, or pulsing neutron star. In particular, we utilize a latent variable model in which the latent variables follow a Gaussian process prior distribution, and hence we are able to induce the spatial correlation which we believe exists between systems of the same type. The utility of this approach is demonstrated by the accurate prediction of system types using Rossi X-ray Timing Explorer All Sky Monitor data, but it is not flawless. In particular, non-pulsing neutron systems containing “bursters” that are close to the boundary demarcating systems containing black holes tend to be classified as black hole systems. As a byproduct of our analyses, we provide the astronomer with the public R code which can be used to predict the compact object type of XRBs given training data

  4. Locating and classifying defects using an hybrid data base

    Science.gov (United States)

    Luna-Avilés, A.; Hernández-Gómez, L. H.; Durodola, J. F.; Urriolagoitia-Calderón, G.; Urriolagoitia-Sosa, G.; Beltrán Fernández, J. A.; Díaz Pineda, A.

    2011-07-01

    A computational inverse technique was used in the localization and classification of defects. Postulated voids of two different sizes (2 mm and 4 mm diameter) were introduced in PMMA bars with and without a notch. The bar dimensions are 200×20×5 mm. One half of them were plain and the other half has a notch (3 mm × 4 mm) which is close to the defect area (19 mm × 16 mm).This analysis was done with an Artificial Neural Network (ANN) and its optimization was done with an Adaptive Neuro Fuzzy Procedure (ANFIS). A hybrid data base was developed with numerical and experimental results. Synthetic data was generated with the finite element method using SOLID95 element of ANSYS code. A parametric analysis was carried out. Only one defect in such bars was taken into account and the first five natural frequencies were calculated. 460 cases were evaluated. Half of them were plain and the other half has a notch. All the input data was classified in two groups. Each one has 230 cases and corresponds to one of the two sort of voids mentioned above. On the other hand, experimental analysis was carried on with PMMA specimens of the same size. The first two natural frequencies of 40 cases were obtained with one void. The other three frequencies were obtained numerically. 20 of these bars were plain and the others have a notch. These experimental results were introduced in the synthetic data base. 400 cases were taken randomly and, with this information, the ANN was trained with the backpropagation algorithm. The accuracy of the results was tested with the 100 cases that were left. In the next stage of this work, the ANN output was optimized with ANFIS. Previous papers showed that localization and classification of defects was reduced as notches were introduced in such bars. In the case of this paper, improved results were obtained when a hybrid data base was used.

  5. Locating and classifying defects using an hybrid data base

    International Nuclear Information System (INIS)

    Luna-Aviles, A; Diaz Pineda, A; Hernandez-Gomez, L H; Urriolagoitia-Calderon, G; Urriolagoitia-Sosa, G; Durodola, J F; Beltran Fernandez, J A

    2011-01-01

    A computational inverse technique was used in the localization and classification of defects. Postulated voids of two different sizes (2 mm and 4 mm diameter) were introduced in PMMA bars with and without a notch. The bar dimensions are 200x20x5 mm. One half of them were plain and the other half has a notch (3 mm x 4 mm) which is close to the defect area (19 mm x 16 mm).This analysis was done with an Artificial Neural Network (ANN) and its optimization was done with an Adaptive Neuro Fuzzy Procedure (ANFIS). A hybrid data base was developed with numerical and experimental results. Synthetic data was generated with the finite element method using SOLID95 element of ANSYS code. A parametric analysis was carried out. Only one defect in such bars was taken into account and the first five natural frequencies were calculated. 460 cases were evaluated. Half of them were plain and the other half has a notch. All the input data was classified in two groups. Each one has 230 cases and corresponds to one of the two sort of voids mentioned above. On the other hand, experimental analysis was carried on with PMMA specimens of the same size. The first two natural frequencies of 40 cases were obtained with one void. The other three frequencies were obtained numerically. 20 of these bars were plain and the others have a notch. These experimental results were introduced in the synthetic data base. 400 cases were taken randomly and, with this information, the ANN was trained with the backpropagation algorithm. The accuracy of the results was tested with the 100 cases that were left. In the next stage of this work, the ANN output was optimized with ANFIS. Previous papers showed that localization and classification of defects was reduced as notches were introduced in such bars. In the case of this paper, improved results were obtained when a hybrid data base was used.

  6. Locally orderless registration code

    DEFF Research Database (Denmark)

    2012-01-01

    This is code for the TPAMI paper "Locally Orderless Registration". The code requires intel threadding building blocks installed and is provided for 64 bit on mac, linux and windows.......This is code for the TPAMI paper "Locally Orderless Registration". The code requires intel threadding building blocks installed and is provided for 64 bit on mac, linux and windows....

  7. Decoding Codes on Graphs

    Indian Academy of Sciences (India)

    Shannon limit of the channel. Among the earliest discovered codes that approach the. Shannon limit were the low density parity check (LDPC) codes. The term low density arises from the property of the parity check matrix defining the code. We will now define this matrix and the role that it plays in decoding. 2. Linear Codes.

  8. Manually operated coded switch

    International Nuclear Information System (INIS)

    Barnette, J.H.

    1978-01-01

    The disclosure related to a manually operated recodable coded switch in which a code may be inserted, tried and used to actuate a lever controlling an external device. After attempting a code, the switch's code wheels must be returned to their zero positions before another try is made

  9. DNA Camouflage

    Science.gov (United States)

    2016-01-08

    1 DNA Camouflage Supplementary Information Bijan Zakeri1,2*, Timothy K. Lu1,2*, Peter A. Carr2,3* 1Department of Electrical Engineering and...ll.mit.edu). Distribution A: Public Release   2 Supplementary Figure 1 DNA camouflage with the 2-state device. (a) In the presence of Cre, DSD-2[α...10 1 + Cre 1 500 1,000 length (bp) chromatogram alignment template − Cre   4 Supplementary Figure 3 DNA camouflage with a switchable

  10. Long Non-coding RNAs in Response to Genotoxic Stress

    Institute of Scientific and Technical Information of China (English)

    Xiaoman Li; Dong Pan; Baoquan Zhao; Burong Hu

    2016-01-01

    Long non-coding RNAs(lncRNAs) are increasingly involved in diverse biological processes.Upon DNA damage,the DNA damage response(DDR) elicits a complex signaling cascade,which includes the induction of lncRNAs.LncRNA-mediated DDR is involved in non-canonical and canonical manners.DNA-damage induced lncRNAs contribute to the regulation of cell cycle,apoptosis,and DNA repair,thereby playing a key role in maintaining genome stability.This review summarizes the emerging role of lncRNAs in DNA damage and repair.

  11. Coding in Muscle Disease.

    Science.gov (United States)

    Jones, Lyell K; Ney, John P

    2016-12-01

    Accurate coding is critically important for clinical practice and research. Ongoing changes to diagnostic and billing codes require the clinician to stay abreast of coding updates. Payment for health care services, data sets for health services research, and reporting for medical quality improvement all require accurate administrative coding. This article provides an overview of administrative coding for patients with muscle disease and includes a case-based review of diagnostic and Evaluation and Management (E/M) coding principles in patients with myopathy. Procedural coding for electrodiagnostic studies and neuromuscular ultrasound is also reviewed.

  12. Characterization of the porcine carboxypeptidase E cDNA

    DEFF Research Database (Denmark)

    Hreidarsdôttir, G.E.; Cirera, Susanna; Fredholm, Merete

    2007-01-01

    the sequence of the cDNA for the porcine CPE gene including all the coding region and the 3'-UTR region was generated. Comparisons with bovine, human, mouse, and rat CPE cDNA sequences showed that the coding regions of the gene are highly conserved both at the nucleotide and at the amino acid level. A very low...

  13. The DNA of prophetic speech

    African Journals Online (AJOL)

    2014-03-04

    Mar 4, 2014 ... It is expected that people will be drawn into the reality of God by authentic prophetic speech, .... strands of the DNA molecule show themselves to be arranged ... explains, chemical patterns act like the letters of a code, .... viewing the self-reflection regarding the ministry of renewal from the .... Irresistible force.

  14. Classifying aging as a disease in the context of ICD-11.

    Science.gov (United States)

    Zhavoronkov, Alex; Bhullar, Bhupinder

    2015-01-01

    Aging is a complex continuous multifactorial process leading to loss of function and crystalizing into the many age-related diseases. Here, we explore the arguments for classifying aging as a disease in the context of the upcoming World Health Organization's 11th International Statistical Classification of Diseases and Related Health Problems (ICD-11), expected to be finalized in 2018. We hypothesize that classifying aging as a disease with a "non-garbage" set of codes will result in new approaches and business models for addressing aging as a treatable condition, which will lead to both economic and healthcare benefits for all stakeholders. Actionable classification of aging as a disease may lead to more efficient allocation of resources by enabling funding bodies and other stakeholders to use quality-adjusted life years (QALYs) and healthy-years equivalent (HYE) as metrics when evaluating both research and clinical programs. We propose forming a Task Force to interface the WHO in order to develop a multidisciplinary framework for classifying aging as a disease with multiple disease codes facilitating for therapeutic interventions and preventative strategies.

  15. Building gene expression profile classifiers with a simple and efficient rejection option in R.

    Science.gov (United States)

    Benso, Alfredo; Di Carlo, Stefano; Politano, Gianfranco; Savino, Alessandro; Hafeezurrehman, Hafeez

    2011-01-01

    The collection of gene expression profiles from DNA microarrays and their analysis with pattern recognition algorithms is a powerful technology applied to several biological problems. Common pattern recognition systems classify samples assigning them to a set of known classes. However, in a clinical diagnostics setup, novel and unknown classes (new pathologies) may appear and one must be able to reject those samples that do not fit the trained model. The problem of implementing a rejection option in a multi-class classifier has not been widely addressed in the statistical literature. Gene expression profiles represent a critical case study since they suffer from the curse of dimensionality problem that negatively reflects on the reliability of both traditional rejection models and also more recent approaches such as one-class classifiers. This paper presents a set of empirical decision rules that can be used to implement a rejection option in a set of multi-class classifiers widely used for the analysis of gene expression profiles. In particular, we focus on the classifiers implemented in the R Language and Environment for Statistical Computing (R for short in the remaining of this paper). The main contribution of the proposed rules is their simplicity, which enables an easy integration with available data analysis environments. Since in the definition of a rejection model tuning of the involved parameters is often a complex and delicate task, in this paper we exploit an evolutionary strategy to automate this process. This allows the final user to maximize the rejection accuracy with minimum manual intervention. This paper shows how the use of simple decision rules can be used to help the use of complex machine learning algorithms in real experimental setups. The proposed approach is almost completely automated and therefore a good candidate for being integrated in data analysis flows in labs where the machine learning expertise required to tune traditional

  16. QR Codes 101

    Science.gov (United States)

    Crompton, Helen; LaFrance, Jason; van 't Hooft, Mark

    2012-01-01

    A QR (quick-response) code is a two-dimensional scannable code, similar in function to a traditional bar code that one might find on a product at the supermarket. The main difference between the two is that, while a traditional bar code can hold a maximum of only 20 digits, a QR code can hold up to 7,089 characters, so it can contain much more…

  17. DNA AND ITS METAPHORES

    Directory of Open Access Journals (Sweden)

    Jan Domaradzki

    2015-04-01

    Full Text Available The aim of the present paper is to describe the main metaphors presented in genetic discourse: DNA as text, information, language, book, code, project/blueprint, map, computer, music, and cooking. It also analyses the social implication of these metaphors. The author of this article argues that metaphors are double-edged swords: while they brighten difficult and abstract genetic concepts, they also lead to the misunderstanding and misinterpretation of the reality. The reason for this is that most of these metaphors are of deterministic, reductionist, and fatalistic character. Consequently, they shift the attention from complexity of genetic processes. Moreover, as they appeal to emotions, ascetics, and morality they may involve exaggeration: while they bring hope, they also create an atmosphere of fear over the misuse of genetic knowledge. The author of this article states that the genetic metaphors do not simply reflect the social ideas on DNA, but also shape our understanding of genetics and imagination on the social application of genetic knowledge. Due to this reason, DNA should be understood not only as a biological code, but as a cultural as well.

  18. DNA glue

    DEFF Research Database (Denmark)

    Filichev, Vyacheslav V; Astakhova, Irina V.; Malakhov, Andrei D.

    2008-01-01

    Significant alterations in thermal stability of parallel DNA triplexes and antiparallel duplexes were observed upon changing the attachment of ethynylpyrenes from para to ortho in the structure of phenylmethylglycerol inserted as a bulge into DNA (TINA). Insertions of two ortho-TINAs as a pseudo...

  19. Hyperstretching DNA

    NARCIS (Netherlands)

    Schakenraad, Koen; Biebricher, Andreas S.; Sebregts, Maarten; Ten Bensel, Brian; Peterman, Erwin J.G.; Wuite, Gijs J L; Heller, Iddo; Storm, Cornelis; Van Der Schoot, Paul

    2017-01-01

    The three-dimensional structure of DNA is highly susceptible to changes by mechanical and biochemical cues in vivo and in vitro. In particular, large increases in base pair spacing compared to regular B-DNA are effected by mechanical (over)stretching and by intercalation of compounds that are widely

  20. Phylogenetic reconstruction in the order Nymphaeales: ITS2 secondary structure analysis and in silico testing of maturase k (matK) as a potential marker for DNA bar coding.

    Science.gov (United States)

    Biswal, Devendra Kumar; Debnath, Manish; Kumar, Shakti; Tandon, Pramod

    2012-01-01

    The Nymphaeales (waterlilly and relatives) lineage has diverged as the second branch of basal angiosperms and comprises of two families: Cabombaceae and Nymphaceae. The classification of Nymphaeales and phylogeny within the flowering plants are quite intriguing as several systems (Thorne system, Dahlgren system, Cronquist system, Takhtajan system and APG III system (Angiosperm Phylogeny Group III system) have attempted to redefine the Nymphaeales taxonomy. There have been also fossil records consisting especially of seeds, pollen, stems, leaves and flowers as early as the lower Cretaceous. Here we present an in silico study of the order Nymphaeales taking maturaseK (matK) and internal transcribed spacer (ITS2) as biomarkers for phylogeny reconstruction (using character-based methods and Bayesian approach) and identification of motifs for DNA barcoding. The Maximum Likelihood (ML) and Bayesian approach yielded congruent fully resolved and well-supported trees using a concatenated (ITS2+ matK) supermatrix aligned dataset. The taxon sampling corroborates the monophyly of Cabombaceae. Nuphar emerges as a monophyletic clade in the family Nymphaeaceae while there are slight discrepancies in the monophyletic nature of the genera Nymphaea owing to Victoria-Euryale and Ondinea grouping in the same node of Nymphaeaceae. ITS2 secondary structures alignment corroborate the primary sequence analysis. Hydatellaceae emerged as a sister clade to Nymphaeaceae and had a basal lineage amongst the water lilly clades. Species from Cycas and Ginkgo were taken as outgroups and were rooted in the overall tree topology from various methods. MatK genes are fast evolving highly variant regions of plant chloroplast DNA that can serve as potential biomarkers for DNA barcoding and also in generating primers for angiosperms with identification of unique motif regions. We have reported unique genus specific motif regions in the Order Nymphaeles from matK dataset which can be further validated for

  1. Predicting protein subcellular locations using hierarchical ensemble of Bayesian classifiers based on Markov chains

    Directory of Open Access Journals (Sweden)

    Eils Roland

    2006-06-01

    Full Text Available Abstract Background The subcellular location of a protein is closely related to its function. It would be worthwhile to develop a method to predict the subcellular location for a given protein when only the amino acid sequence of the protein is known. Although many efforts have been made to predict subcellular location from sequence information only, there is the need for further research to improve the accuracy of prediction. Results A novel method called HensBC is introduced to predict protein subcellular location. HensBC is a recursive algorithm which constructs a hierarchical ensemble of classifiers. The classifiers used are Bayesian classifiers based on Markov chain models. We tested our method on six various datasets; among them are Gram-negative bacteria dataset, data for discriminating outer membrane proteins and apoptosis proteins dataset. We observed that our method can predict the subcellular location with high accuracy. Another advantage of the proposed method is that it can improve the accuracy of the prediction of some classes with few sequences in training and is therefore useful for datasets with imbalanced distribution of classes. Conclusion This study introduces an algorithm which uses only the primary sequence of a protein to predict its subcellular location. The proposed recursive scheme represents an interesting methodology for learning and combining classifiers. The method is computationally efficient and competitive with the previously reported approaches in terms of prediction accuracies as empirical results indicate. The code for the software is available upon request.

  2. Codes and curves

    CERN Document Server

    Walker, Judy L

    2000-01-01

    When information is transmitted, errors are likely to occur. Coding theory examines efficient ways of packaging data so that these errors can be detected, or even corrected. The traditional tools of coding theory have come from combinatorics and group theory. Lately, however, coding theorists have added techniques from algebraic geometry to their toolboxes. In particular, by re-interpreting the Reed-Solomon codes, one can see how to define new codes based on divisors on algebraic curves. For instance, using modular curves over finite fields, Tsfasman, Vladut, and Zink showed that one can define a sequence of codes with asymptotically better parameters than any previously known codes. This monograph is based on a series of lectures the author gave as part of the IAS/PCMI program on arithmetic algebraic geometry. Here, the reader is introduced to the exciting field of algebraic geometric coding theory. Presenting the material in the same conversational tone of the lectures, the author covers linear codes, inclu...

  3. On the classification of long non-coding RNAs

    KAUST Repository

    Ma, Lina

    2013-06-01

    Long non-coding RNAs (lncRNAs) have been found to perform various functions in a wide variety of important biological processes. To make easier interpretation of lncRNA functionality and conduct deep mining on these transcribed sequences, it is convenient to classify lncRNAs into different groups. Here, we summarize classification methods of lncRNAs according to their four major features, namely, genomic location and context, effect exerted on DNA sequences, mechanism of functioning and their targeting mechanism. In combination with the presently available function annotations, we explore potential relationships between different classification categories, and generalize and compare biological features of different lncRNAs within each category. Finally, we present our view on potential further studies. We believe that the classifications of lncRNAs as indicated above are of fundamental importance for lncRNA studies, helpful for further investigation of specific lncRNAs, for formulation of new hypothesis based on different features of lncRNA and for exploration of the underlying lncRNA functional mechanisms. © 2013 Landes Bioscience.

  4. Genetic coding and gene expression - new Quadruplet genetic coding model

    Science.gov (United States)

    Shankar Singh, Rama

    2012-07-01

    Successful demonstration of human genome project has opened the door not only for developing personalized medicine and cure for genetic diseases, but it may also answer the complex and difficult question of the origin of life. It may lead to making 21st century, a century of Biological Sciences as well. Based on the central dogma of Biology, genetic codons in conjunction with tRNA play a key role in translating the RNA bases forming sequence of amino acids leading to a synthesized protein. This is the most critical step in synthesizing the right protein needed for personalized medicine and curing genetic diseases. So far, only triplet codons involving three bases of RNA, transcribed from DNA bases, have been used. Since this approach has several inconsistencies and limitations, even the promise of personalized medicine has not been realized. The new Quadruplet genetic coding model proposed and developed here involves all four RNA bases which in conjunction with tRNA will synthesize the right protein. The transcription and translation process used will be the same, but the Quadruplet codons will help overcome most of the inconsistencies and limitations of the triplet codes. Details of this new Quadruplet genetic coding model and its subsequent potential applications including relevance to the origin of life will be presented.

  5. The Intertwined Roles of DNA Damage and Transcription

    OpenAIRE

    Di Palo, Giacomo

    2016-01-01

    DNA damage and transcription are two interconnected events. Transcription can induce damage and scheduled DNA damage can be required for transcription. Here, we analyzed genome-wide distribution of 8oxodG-marked oxidative DNA damage obtained by OxiDIP-Seq, and we found a correlation with transcription of protein coding genes.

  6. 18 CFR 3a.12 - Authority to classify official information.

    Science.gov (United States)

    2010-04-01

    ... efficient administration. (b) The authority to classify information or material originally as Top Secret is... classify information or material originally as Secret is exercised only by: (1) Officials who have Top... information or material originally as Confidential is exercised by officials who have Top Secret or Secret...

  7. Using Neural Networks to Classify Digitized Images of Galaxies

    Science.gov (United States)

    Goderya, S. N.; McGuire, P. C.

    2000-12-01

    Automated classification of Galaxies into Hubble types is of paramount importance to study the large scale structure of the Universe, particularly as survey projects like the Sloan Digital Sky Survey complete their data acquisition of one million galaxies. At present it is not possible to find robust and efficient artificial intelligence based galaxy classifiers. In this study we will summarize progress made in the development of automated galaxy classifiers using neural networks as machine learning tools. We explore the Bayesian linear algorithm, the higher order probabilistic network, the multilayer perceptron neural network and Support Vector Machine Classifier. The performance of any machine classifier is dependant on the quality of the parameters that characterize the different groups of galaxies. Our effort is to develop geometric and invariant moment based parameters as input to the machine classifiers instead of the raw pixel data. Such an approach reduces the dimensionality of the classifier considerably, and removes the effects of scaling and rotation, and makes it easier to solve for the unknown parameters in the galaxy classifier. To judge the quality of training and classification we develop the concept of Mathews coefficients for the galaxy classification community. Mathews coefficients are single numbers that quantify classifier performance even with unequal prior probabilities of the classes.

  8. Fisher classifier and its probability of error estimation

    Science.gov (United States)

    Chittineni, C. B.

    1979-01-01

    Computationally efficient expressions are derived for estimating the probability of error using the leave-one-out method. The optimal threshold for the classification of patterns projected onto Fisher's direction is derived. A simple generalization of the Fisher classifier to multiple classes is presented. Computational expressions are developed for estimating the probability of error of the multiclass Fisher classifier.

  9. Performance of classification confidence measures in dynamic classifier systems

    Czech Academy of Sciences Publication Activity Database

    Štefka, D.; Holeňa, Martin

    2013-01-01

    Roč. 23, č. 4 (2013), s. 299-319 ISSN 1210-0552 R&D Projects: GA ČR GA13-17187S Institutional support: RVO:67985807 Keywords : classifier combining * dynamic classifier systems * classification confidence Subject RIV: IN - Informatics, Computer Science Impact factor: 0.412, year: 2013

  10. 32 CFR 2400.30 - Reproduction of classified information.

    Science.gov (United States)

    2010-07-01

    ... 32 National Defense 6 2010-07-01 2010-07-01 false Reproduction of classified information. 2400.30... SECURITY PROGRAM Safeguarding § 2400.30 Reproduction of classified information. Documents or portions of... the originator or higher authority. Any stated prohibition against reproduction shall be strictly...

  11. Classifying spaces with virtually cyclic stabilizers for linear groups

    DEFF Research Database (Denmark)

    Degrijse, Dieter Dries; Köhl, Ralf; Petrosyan, Nansen

    2015-01-01

    We show that every discrete subgroup of GL(n, ℝ) admits a finite-dimensional classifying space with virtually cyclic stabilizers. Applying our methods to SL(3, ℤ), we obtain a four-dimensional classifying space with virtually cyclic stabilizers and a decomposition of the algebraic K-theory of its...

  12. Dynamic integration of classifiers in the space of principal components

    NARCIS (Netherlands)

    Tsymbal, A.; Pechenizkiy, M.; Puuronen, S.; Patterson, D.W.; Kalinichenko, L.A.; Manthey, R.; Thalheim, B.; Wloka, U.

    2003-01-01

    Recent research has shown the integration of multiple classifiers to be one of the most important directions in machine learning and data mining. It was shown that, for an ensemble to be successful, it should consist of accurate and diverse base classifiers. However, it is also important that the

  13. The materiality of Code

    DEFF Research Database (Denmark)

    Soon, Winnie

    2014-01-01

    This essay studies the source code of an artwork from a software studies perspective. By examining code that come close to the approach of critical code studies (Marino, 2006), I trace the network artwork, Pupufu (Lin, 2009) to understand various real-time approaches to social media platforms (MSN......, Twitter and Facebook). The focus is not to investigate the functionalities and efficiencies of the code, but to study and interpret the program level of code in order to trace the use of various technological methods such as third-party libraries and platforms’ interfaces. These are important...... to understand the socio-technical side of a changing network environment. Through the study of code, including but not limited to source code, technical specifications and other materials in relation to the artwork production, I would like to explore the materiality of code that goes beyond technical...

  14. Coding for optical channels

    CERN Document Server

    Djordjevic, Ivan; Vasic, Bane

    2010-01-01

    This unique book provides a coherent and comprehensive introduction to the fundamentals of optical communications, signal processing and coding for optical channels. It is the first to integrate the fundamentals of coding theory and optical communication.

  15. SEVERO code - user's manual

    International Nuclear Information System (INIS)

    Sacramento, A.M. do.

    1989-01-01

    This user's manual contains all the necessary information concerning the use of SEVERO code. This computer code is related to the statistics of extremes = extreme winds, extreme precipitation and flooding hazard risk analysis. (A.C.A.S.)

  16. An ensemble of dissimilarity based classifiers for Mackerel gender determination

    International Nuclear Information System (INIS)

    Blanco, A; Rodriguez, R; Martinez-Maranon, I

    2014-01-01

    Mackerel is an infravalored fish captured by European fishing vessels. A manner to add value to this specie can be achieved by trying to classify it attending to its sex. Colour measurements were performed on Mackerel females and males (fresh and defrozen) extracted gonads to obtain differences between sexes. Several linear and non linear classifiers such as Support Vector Machines (SVM), k Nearest Neighbors (k-NN) or Diagonal Linear Discriminant Analysis (DLDA) can been applied to this problem. However, theyare usually based on Euclidean distances that fail to reflect accurately the sample proximities. Classifiers based on non-Euclidean dissimilarities misclassify a different set of patterns. We combine different kind of dissimilarity based classifiers. The diversity is induced considering a set of complementary dissimilarities for each model. The experimental results suggest that our algorithm helps to improve classifiers based on a single dissimilarity

  17. An ensemble of dissimilarity based classifiers for Mackerel gender determination

    Science.gov (United States)

    Blanco, A.; Rodriguez, R.; Martinez-Maranon, I.

    2014-03-01

    Mackerel is an infravalored fish captured by European fishing vessels. A manner to add value to this specie can be achieved by trying to classify it attending to its sex. Colour measurements were performed on Mackerel females and males (fresh and defrozen) extracted gonads to obtain differences between sexes. Several linear and non linear classifiers such as Support Vector Machines (SVM), k Nearest Neighbors (k-NN) or Diagonal Linear Discriminant Analysis (DLDA) can been applied to this problem. However, theyare usually based on Euclidean distances that fail to reflect accurately the sample proximities. Classifiers based on non-Euclidean dissimilarities misclassify a different set of patterns. We combine different kind of dissimilarity based classifiers. The diversity is induced considering a set of complementary dissimilarities for each model. The experimental results suggest that our algorithm helps to improve classifiers based on a single dissimilarity.

  18. Just-in-time classifiers for recurrent concepts.

    Science.gov (United States)

    Alippi, Cesare; Boracchi, Giacomo; Roveri, Manuel

    2013-04-01

    Just-in-time (JIT) classifiers operate in evolving environments by classifying instances and reacting to concept drift. In stationary conditions, a JIT classifier improves its accuracy over time by exploiting additional supervised information coming from the field. In nonstationary conditions, however, the classifier reacts as soon as concept drift is detected; the current classification setup is discarded and a suitable one activated to keep the accuracy high. We present a novel generation of JIT classifiers able to deal with recurrent concept drift by means of a practical formalization of the concept representation and the definition of a set of operators working on such representations. The concept-drift detection activity, which is crucial in promptly reacting to changes exactly when needed, is advanced by considering change-detection tests monitoring both inputs and classes distributions.

  19. Synthesizing Certified Code

    OpenAIRE

    Whalen, Michael; Schumann, Johann; Fischer, Bernd

    2002-01-01

    Code certification is a lightweight approach for formally demonstrating software quality. Its basic idea is to require code producers to provide formal proofs that their code satisfies certain quality properties. These proofs serve as certificates that can be checked independently. Since code certification uses the same underlying technology as program verification, it requires detailed annotations (e.g., loop invariants) to make the proofs possible. However, manually adding annotations to th...

  20. FERRET data analysis code

    International Nuclear Information System (INIS)

    Schmittroth, F.

    1979-09-01

    A documentation of the FERRET data analysis code is given. The code provides a way to combine related measurements and calculations in a consistent evaluation. Basically a very general least-squares code, it is oriented towards problems frequently encountered in nuclear data and reactor physics. A strong emphasis is on the proper treatment of uncertainties and correlations and in providing quantitative uncertainty estimates. Documentation includes a review of the method, structure of the code, input formats, and examples

  1. Stylize Aesthetic QR Code

    OpenAIRE

    Xu, Mingliang; Su, Hao; Li, Yafei; Li, Xi; Liao, Jing; Niu, Jianwei; Lv, Pei; Zhou, Bing

    2018-01-01

    With the continued proliferation of smart mobile devices, Quick Response (QR) code has become one of the most-used types of two-dimensional code in the world. Aiming at beautifying the appearance of QR codes, existing works have developed a series of techniques to make the QR code more visual-pleasant. However, these works still leave much to be desired, such as visual diversity, aesthetic quality, flexibility, universal property, and robustness. To address these issues, in this paper, we pro...

  2. Enhancing QR Code Security

    OpenAIRE

    Zhang, Linfan; Zheng, Shuang

    2015-01-01

    Quick Response code opens possibility to convey data in a unique way yet insufficient prevention and protection might lead into QR code being exploited on behalf of attackers. This thesis starts by presenting a general introduction of background and stating two problems regarding QR code security, which followed by a comprehensive research on both QR code itself and related issues. From the research a solution taking advantages of cloud and cryptography together with an implementation come af...

  3. Sorting fluorescent nanocrystals with DNA

    Energy Technology Data Exchange (ETDEWEB)

    Gerion, Daniele; Parak, Wolfgang J.; Williams, Shara C.; Zanchet, Daniela; Micheel, Christine M.; Alivisatos, A. Paul

    2001-12-10

    Semiconductor nanocrystals with narrow and tunable fluorescence are covalently linked to oligonucleotides. These biocompounds retain the properties of both nanocrystals and DNA. Therefore, different sequences of DNA can be coded with nanocrystals and still preserve their ability to hybridize to their complements. We report the case where four different sequences of DNA are linked to four nanocrystal samples having different colors of emission in the range of 530-640 nm. When the DNA-nanocrystal conjugates are mixed together, it is possible to sort each type of nanoparticle using hybridization on a defined micrometer -size surface containing the complementary oligonucleotide. Detection of sorting requires only a single excitation source and an epifluorescence microscope. The possibility of directing fluorescent nanocrystals towards specific biological targets and detecting them, combined with their superior photo-stability compared to organic dyes, opens the way to improved biolabeling experiments, such as gene mapping on a nanometer scale or multicolor microarray analysis.

  4. Opening up codings?

    DEFF Research Database (Denmark)

    Steensig, Jakob; Heinemann, Trine

    2015-01-01

    doing formal coding and when doing more “traditional” conversation analysis research based on collections. We are more wary, however, of the implication that coding-based research is the end result of a process that starts with qualitative investigations and ends with categories that can be coded...

  5. Gauge color codes

    DEFF Research Database (Denmark)

    Bombin Palomo, Hector

    2015-01-01

    Color codes are topological stabilizer codes with unusual transversality properties. Here I show that their group of transversal gates is optimal and only depends on the spatial dimension, not the local geometry. I also introduce a generalized, subsystem version of color codes. In 3D they allow...

  6. Refactoring test code

    NARCIS (Netherlands)

    A. van Deursen (Arie); L.M.F. Moonen (Leon); A. van den Bergh; G. Kok

    2001-01-01

    textabstractTwo key aspects of extreme programming (XP) are unit testing and merciless refactoring. Given the fact that the ideal test code / production code ratio approaches 1:1, it is not surprising that unit tests are being refactored. We found that refactoring test code is different from

  7. DNA methylation

    DEFF Research Database (Denmark)

    Williams, Kristine; Christensen, Jesper; Helin, Kristian

    2012-01-01

    DNA methylation is involved in key cellular processes, including X-chromosome inactivation, imprinting and transcriptional silencing of specific genes and repetitive elements. DNA methylation patterns are frequently perturbed in human diseases such as imprinting disorders and cancer. The recent...... discovery that the three members of the TET protein family can convert 5-methylcytosine (5mC) into 5-hydroxymethylcytosine (5hmC) has provided a potential mechanism leading to DNA demethylation. Moreover, the demonstration that TET2 is frequently mutated in haematopoietic tumours suggests that the TET...... proteins are important regulators of cellular identity. Here, we review the current knowledge regarding the function of the TET proteins, and discuss various mechanisms by which they contribute to transcriptional control. We propose that the TET proteins have an important role in regulating DNA methylation...

  8. DNA data

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — Raw DNA chromatogram data produced by the ABI 373, 377, 3130 and 3730 automated sequencing machines in ABI format. These are from fish (primarily Sebastes spp.,...

  9. DNA nanotechnology

    Science.gov (United States)

    Seeman, Nadrian C.; Sleiman, Hanadi F.

    2018-01-01

    DNA is the molecule that stores and transmits genetic information in biological systems. The field of DNA nanotechnology takes this molecule out of its biological context and uses its information to assemble structural motifs and then to connect them together. This field has had a remarkable impact on nanoscience and nanotechnology, and has been revolutionary in our ability to control molecular self-assembly. In this Review, we summarize the approaches used to assemble DNA nanostructures and examine their emerging applications in areas such as biophysics, diagnostics, nanoparticle and protein assembly, biomolecule structure determination, drug delivery and synthetic biology. The introduction of orthogonal interactions into DNA nanostructures is discussed, and finally, a perspective on the future directions of this field is presented.

  10. Remote-Handled Transuranic Content Codes

    International Nuclear Information System (INIS)

    2001-01-01

    The Remote-Handled Transuranic (RH-TRU) Content Codes (RH-TRUCON) document represents the development of a uniform content code system for RH-TRU waste to be transported in the 72-Bcask. It will be used to convert existing waste form numbers, content codes, and site-specific identification codes into a system that is uniform across the U.S. Department of Energy (DOE) sites.The existing waste codes at the sites can be grouped under uniform content codes without any lossof waste characterization information. The RH-TRUCON document provides an all-encompassing description for each content code and compiles this information for all DOE sites. Compliance with waste generation, processing, and certification procedures at the sites (outlined in this document foreach content code) ensures that prohibited waste forms are not present in the waste. The content code gives an overall description of the RH-TRU waste material in terms of processes and packaging, as well as the generation location. This helps to provide cradle-to-grave traceability of the waste material so that the various actions required to assess its qualification as payload for the 72-B cask can be performed. The content codes also impose restrictions and requirements on the manner in which a payload can be assembled. The RH-TRU Waste Authorized Methods for Payload Control (RH-TRAMPAC), Appendix 1.3.7 of the 72-B Cask Safety Analysis Report (SAR), describes the current governing procedures applicable for the qualification of waste as payload for the 72-B cask. The logic for this classification is presented in the 72-B Cask SAR. Together, these documents (RH-TRUCON, RH-TRAMPAC, and relevant sections of the 72-B Cask SAR) present the foundation and justification for classifying RH-TRU waste into content codes. Only content codes described in thisdocument can be considered for transport in the 72-B cask. Revisions to this document will be madeas additional waste qualifies for transport. Each content code uniquely

  11. Software Certification - Coding, Code, and Coders

    Science.gov (United States)

    Havelund, Klaus; Holzmann, Gerard J.

    2011-01-01

    We describe a certification approach for software development that has been adopted at our organization. JPL develops robotic spacecraft for the exploration of the solar system. The flight software that controls these spacecraft is considered to be mission critical. We argue that the goal of a software certification process cannot be the development of "perfect" software, i.e., software that can be formally proven to be correct under all imaginable and unimaginable circumstances. More realistically, the goal is to guarantee a software development process that is conducted by knowledgeable engineers, who follow generally accepted procedures to control known risks, while meeting agreed upon standards of workmanship. We target three specific issues that must be addressed in such a certification procedure: the coding process, the code that is developed, and the skills of the coders. The coding process is driven by standards (e.g., a coding standard) and tools. The code is mechanically checked against the standard with the help of state-of-the-art static source code analyzers. The coders, finally, are certified in on-site training courses that include formal exams.

  12. From nonspecific DNA-protein encounter complexes to the prediction of DNA-protein interactions.

    Directory of Open Access Journals (Sweden)

    Mu Gao

    2009-03-01

    Full Text Available DNA-protein interactions are involved in many essential biological activities. Because there is no simple mapping code between DNA base pairs and protein amino acids, the prediction of DNA-protein interactions is a challenging problem. Here, we present a novel computational approach for predicting DNA-binding protein residues and DNA-protein interaction modes without knowing its specific DNA target sequence. Given the structure of a DNA-binding protein, the method first generates an ensemble of complex structures obtained by rigid-body docking with a nonspecific canonical B-DNA. Representative models are subsequently selected through clustering and ranking by their DNA-protein interfacial energy. Analysis of these encounter complex models suggests that the recognition sites for specific DNA binding are usually favorable interaction sites for the nonspecific DNA probe and that nonspecific DNA-protein interaction modes exhibit some similarity to specific DNA-protein binding modes. Although the method requires as input the knowledge that the protein binds DNA, in benchmark tests, it achieves better performance in identifying DNA-binding sites than three previously established methods, which are based on sophisticated machine-learning techniques. We further apply our method to protein structures predicted through modeling and demonstrate that our method performs satisfactorily on protein models whose root-mean-square Calpha deviation from native is up to 5 A from their native structures. This study provides valuable structural insights into how a specific DNA-binding protein interacts with a nonspecific DNA sequence. The similarity between the specific DNA-protein interaction mode and nonspecific interaction modes may reflect an important sampling step in search of its specific DNA targets by a DNA-binding protein.

  13. The EB factory project. I. A fast, neural-net-based, general purpose light curve classifier optimized for eclipsing binaries

    International Nuclear Information System (INIS)

    Paegert, Martin; Stassun, Keivan G.; Burger, Dan M.

    2014-01-01

    We describe a new neural-net-based light curve classifier and provide it with documentation as a ready-to-use tool for the community. While optimized for identification and classification of eclipsing binary stars, the classifier is general purpose, and has been developed for speed in the context of upcoming massive surveys such as the Large Synoptic Survey Telescope. A challenge for classifiers in the context of neural-net training and massive data sets is to minimize the number of parameters required to describe each light curve. We show that a simple and fast geometric representation that encodes the overall light curve shape, together with a chi-square parameter to capture higher-order morphology information results in efficient yet robust light curve classification, especially for eclipsing binaries. Testing the classifier on the ASAS light curve database, we achieve a retrieval rate of 98% and a false-positive rate of 2% for eclipsing binaries. We achieve similarly high retrieval rates for most other periodic variable-star classes, including RR Lyrae, Mira, and delta Scuti. However, the classifier currently has difficulty discriminating between different sub-classes of eclipsing binaries, and suffers a relatively low (∼60%) retrieval rate for multi-mode delta Cepheid stars. We find that it is imperative to train the classifier's neural network with exemplars that include the full range of light curve quality to which the classifier will be expected to perform; the classifier performs well on noisy light curves only when trained with noisy exemplars. The classifier source code, ancillary programs, a trained neural net, and a guide for use, are provided.

  14. DNA expressions - A formal notation for DNA

    NARCIS (Netherlands)

    Vliet, Rudy van

    2015-01-01

    We describe a formal notation for DNA molecules that may contain nicks and gaps. The resulting DNA expressions denote formal DNA molecules. Different DNA expressions may denote the same molecule. Such DNA expressions are called equivalent. We examine which DNA expressions are minimal, which

  15. The network code

    International Nuclear Information System (INIS)

    1997-01-01

    The Network Code defines the rights and responsibilities of all users of the natural gas transportation system in the liberalised gas industry in the United Kingdom. This report describes the operation of the Code, what it means, how it works and its implications for the various participants in the industry. The topics covered are: development of the competitive gas market in the UK; key points in the Code; gas transportation charging; impact of the Code on producers upstream; impact on shippers; gas storage; supply point administration; impact of the Code on end users; the future. (20 tables; 33 figures) (UK)

  16. Coding for Electronic Mail

    Science.gov (United States)

    Rice, R. F.; Lee, J. J.

    1986-01-01

    Scheme for coding facsimile messages promises to reduce data transmission requirements to one-tenth current level. Coding scheme paves way for true electronic mail in which handwritten, typed, or printed messages or diagrams sent virtually instantaneously - between buildings or between continents. Scheme, called Universal System for Efficient Electronic Mail (USEEM), uses unsupervised character recognition and adaptive noiseless coding of text. Image quality of resulting delivered messages improved over messages transmitted by conventional coding. Coding scheme compatible with direct-entry electronic mail as well as facsimile reproduction. Text transmitted in this scheme automatically translated to word-processor form.

  17. Class-specific Error Bounds for Ensemble Classifiers

    Energy Technology Data Exchange (ETDEWEB)

    Prenger, R; Lemmond, T; Varshney, K; Chen, B; Hanley, W

    2009-10-06

    The generalization error, or probability of misclassification, of ensemble classifiers has been shown to be bounded above by a function of the mean correlation between the constituent (i.e., base) classifiers and their average strength. This bound suggests that increasing the strength and/or decreasing the correlation of an ensemble's base classifiers may yield improved performance under the assumption of equal error costs. However, this and other existing bounds do not directly address application spaces in which error costs are inherently unequal. For applications involving binary classification, Receiver Operating Characteristic (ROC) curves, performance curves that explicitly trade off false alarms and missed detections, are often utilized to support decision making. To address performance optimization in this context, we have developed a lower bound for the entire ROC curve that can be expressed in terms of the class-specific strength and correlation of the base classifiers. We present empirical analyses demonstrating the efficacy of these bounds in predicting relative classifier performance. In addition, we specify performance regions of the ROC curve that are naturally delineated by the class-specific strengths of the base classifiers and show that each of these regions can be associated with a unique set of guidelines for performance optimization of binary classifiers within unequal error cost regimes.

  18. Frog sound identification using extended k-nearest neighbor classifier

    Science.gov (United States)

    Mukahar, Nordiana; Affendi Rosdi, Bakhtiar; Athiar Ramli, Dzati; Jaafar, Haryati

    2017-09-01

    Frog sound identification based on the vocalization becomes important for biological research and environmental monitoring. As a result, different types of feature extractions and classifiers have been employed to evaluate the accuracy of frog sound identification. This paper presents a frog sound identification with Extended k-Nearest Neighbor (EKNN) classifier. The EKNN classifier integrates the nearest neighbors and mutual sharing of neighborhood concepts, with the aims of improving the classification performance. It makes a prediction based on who are the nearest neighbors of the testing sample and who consider the testing sample as their nearest neighbors. In order to evaluate the classification performance in frog sound identification, the EKNN classifier is compared with competing classifier, k -Nearest Neighbor (KNN), Fuzzy k -Nearest Neighbor (FKNN) k - General Nearest Neighbor (KGNN)and Mutual k -Nearest Neighbor (MKNN) on the recorded sounds of 15 frog species obtained in Malaysia forest. The recorded sounds have been segmented using Short Time Energy and Short Time Average Zero Crossing Rate (STE+STAZCR), sinusoidal modeling (SM), manual and the combination of Energy (E) and Zero Crossing Rate (ZCR) (E+ZCR) while the features are extracted by Mel Frequency Cepstrum Coefficient (MFCC). The experimental results have shown that the EKNCN classifier exhibits the best performance in terms of accuracy compared to the competing classifiers, KNN, FKNN, GKNN and MKNN for all cases.

  19. NAGRADATA. Code key. Geology

    International Nuclear Information System (INIS)

    Mueller, W.H.; Schneider, B.; Staeuble, J.

    1984-01-01

    This reference manual provides users of the NAGRADATA system with comprehensive keys to the coding/decoding of geological and technical information to be stored in or retreaved from the databank. Emphasis has been placed on input data coding. When data is retreaved the translation into plain language of stored coded information is done automatically by computer. Three keys each, list the complete set of currently defined codes for the NAGRADATA system, namely codes with appropriate definitions, arranged: 1. according to subject matter (thematically) 2. the codes listed alphabetically and 3. the definitions listed alphabetically. Additional explanation is provided for the proper application of the codes and the logic behind the creation of new codes to be used within the NAGRADATA system. NAGRADATA makes use of codes instead of plain language for data storage; this offers the following advantages: speed of data processing, mainly data retrieval, economies of storage memory requirements, the standardisation of terminology. The nature of this thesaurian type 'key to codes' makes it impossible to either establish a final form or to cover the entire spectrum of requirements. Therefore, this first issue of codes to NAGRADATA must be considered to represent the current state of progress of a living system and future editions will be issued in a loose leave ringbook system which can be updated by an organised (updating) service. (author)

  20. XSOR codes users manual

    International Nuclear Information System (INIS)

    Jow, Hong-Nian; Murfin, W.B.; Johnson, J.D.

    1993-11-01

    This report describes the source term estimation codes, XSORs. The codes are written for three pressurized water reactors (Surry, Sequoyah, and Zion) and two boiling water reactors (Peach Bottom and Grand Gulf). The ensemble of codes has been named ''XSOR''. The purpose of XSOR codes is to estimate the source terms which would be released to the atmosphere in severe accidents. A source term includes the release fractions of several radionuclide groups, the timing and duration of releases, the rates of energy release, and the elevation of releases. The codes have been developed by Sandia National Laboratories for the US Nuclear Regulatory Commission (NRC) in support of the NUREG-1150 program. The XSOR codes are fast running parametric codes and are used as surrogates for detailed mechanistic codes. The XSOR codes also provide the capability to explore the phenomena and their uncertainty which are not currently modeled by the mechanistic codes. The uncertainty distributions of input parameters may be used by an. XSOR code to estimate the uncertainty of source terms

  1. Reactor lattice codes

    International Nuclear Information System (INIS)

    Kulikowska, T.

    1999-01-01

    The present lecture has a main goal to show how the transport lattice calculations are realised in a standard computer code. This is illustrated on the example of the WIMSD code, belonging to the most popular tools for reactor calculations. Most of the approaches discussed here can be easily modified to any other lattice code. The description of the code assumes the basic knowledge of reactor lattice, on the level given in the lecture on 'Reactor lattice transport calculations'. For more advanced explanation of the WIMSD code the reader is directed to the detailed descriptions of the code cited in References. The discussion of the methods and models included in the code is followed by the generally used homogenisation procedure and several numerical examples of discrepancies in calculated multiplication factors based on different sources of library data. (author)

  2. DLLExternalCode

    Energy Technology Data Exchange (ETDEWEB)

    2014-05-14

    DLLExternalCode is the a general dynamic-link library (DLL) interface for linking GoldSim (www.goldsim.com) with external codes. The overall concept is to use GoldSim as top level modeling software with interfaces to external codes for specific calculations. The DLLExternalCode DLL that performs the linking function is designed to take a list of code inputs from GoldSim, create an input file for the external application, run the external code, and return a list of outputs, read from files created by the external application, back to GoldSim. Instructions for creating the input file, running the external code, and reading the output are contained in an instructions file that is read and interpreted by the DLL.

  3. Essential and non-essential DNA replication genes in the model halophilic Archaeon, Halobacterium sp. NRC-1

    Directory of Open Access Journals (Sweden)

    DasSarma Shiladitya

    2007-06-01

    Full Text Available Abstract Background Information transfer systems in Archaea, including many components of the DNA replication machinery, are similar to those found in eukaryotes. Functional assignments of archaeal DNA replication genes have been primarily based upon sequence homology and biochemical studies of replisome components, but few genetic studies have been conducted thus far. We have developed a tractable genetic system for knockout analysis of genes in the model halophilic archaeon, Halobacterium sp. NRC-1, and used it to determine which DNA replication genes are essential. Results Using a directed in-frame gene knockout method in Halobacterium sp. NRC-1, we examined nineteen genes predicted to be involved in DNA replication. Preliminary bioinformatic analysis of the large haloarchaeal Orc/Cdc6 family, related to eukaryotic Orc1 and Cdc6, showed five distinct clades of Orc/Cdc6 proteins conserved in all sequenced haloarchaea. Of ten orc/cdc6 genes in Halobacterium sp. NRC-1, only two were found to be essential, orc10, on the large chromosome, and orc2, on the minichromosome, pNRC200. Of the three replicative-type DNA polymerase genes, two were essential: the chromosomally encoded B family, polB1, and the chromosomally encoded euryarchaeal-specific D family, polD1/D2 (formerly called polA1/polA2 in the Halobacterium sp. NRC-1 genome sequence. The pNRC200-encoded B family polymerase, polB2, was non-essential. Accessory genes for DNA replication initiation and elongation factors, including the putative replicative helicase, mcm, the eukaryotic-type DNA primase, pri1/pri2, the DNA polymerase sliding clamp, pcn, and the flap endonuclease, rad2, were all essential. Targeted genes were classified as non-essential if knockouts were obtained and essential based on statistical analysis and/or by demonstrating the inability to isolate chromosomal knockouts except in the presence of a complementing plasmid copy of the gene. Conclusion The results showed that ten

  4. The PARTRAC code: Status and recent developments

    Science.gov (United States)

    Friedland, Werner; Kundrat, Pavel

    Biophysical modeling is of particular value for predictions of radiation effects due to manned space missions. PARTRAC is an established tool for Monte Carlo-based simulations of radiation track structures, damage induction in cellular DNA and its repair [1]. Dedicated modules describe interactions of ionizing particles with the traversed medium, the production and reactions of reactive species, and score DNA damage determined by overlapping track structures with multi-scale chromatin models. The DNA repair module describes the repair of DNA double-strand breaks (DSB) via the non-homologous end-joining pathway; the code explicitly simulates the spatial mobility of individual DNA ends in parallel with their processing by major repair enzymes [2]. To simulate the yields and kinetics of radiation-induced chromosome aberrations, the repair module has been extended by tracking the information on the chromosome origin of ligated fragments as well as the presence of centromeres [3]. PARTRAC calculations have been benchmarked against experimental data on various biological endpoints induced by photon and ion irradiation. The calculated DNA fragment distributions after photon and ion irradiation reproduce corresponding experimental data and their dose- and LET-dependence. However, in particular for high-LET radiation many short DNA fragments are predicted below the detection limits of the measurements, so that the experiments significantly underestimate DSB yields by high-LET radiation [4]. The DNA repair module correctly describes the LET-dependent repair kinetics after (60) Co gamma-rays and different N-ion radiation qualities [2]. First calculations on the induction of chromosome aberrations have overestimated the absolute yields of dicentrics, but correctly reproduced their relative dose-dependence and the difference between gamma- and alpha particle irradiation [3]. Recent developments of the PARTRAC code include a model of hetero- vs euchromatin structures to enable

  5. Minisequencing mitochondrial DNA pathogenic mutations

    Directory of Open Access Journals (Sweden)

    Carracedo Ángel

    2008-04-01

    Full Text Available Abstract Background There are a number of well-known mutations responsible of common mitochondrial DNA (mtDNA diseases. In order to overcome technical problems related to the analysis of complete mtDNA genomes, a variety of different techniques have been proposed that allow the screening of coding region pathogenic mutations. Methods We here propose a minisequencing assay for the analysis of mtDNA mutations. In a single reaction, we interrogate a total of 25 pathogenic mutations distributed all around the whole mtDNA genome in a sample of patients suspected for mtDNA disease. Results We have detected 11 causal homoplasmic mutations in patients suspected for Leber disease, which were further confirmed by standard automatic sequencing. Mutations m.11778G>A and m.14484T>C occur at higher frequency than expected by change in the Galician (northwest Spain patients carrying haplogroup J lineages (Fisher's Exact test, P-value Conclusion We here developed a minisequencing genotyping method for the screening of the most common pathogenic mtDNA mutations which is simple, fast, and low-cost. The technique is robust and reproducible and can easily be implemented in standard clinical laboratories.

  6. Ship localization in Santa Barbara Channel using machine learning classifiers.

    Science.gov (United States)

    Niu, Haiqiang; Ozanich, Emma; Gerstoft, Peter

    2017-11-01

    Machine learning classifiers are shown to outperform conventional matched field processing for a deep water (600 m depth) ocean acoustic-based ship range estimation problem in the Santa Barbara Channel Experiment when limited environmental information is known. Recordings of three different ships of opportunity on a vertical array were used as training and test data for the feed-forward neural network and support vector machine classifiers, demonstrating the feasibility of machine learning methods to locate unseen sources. The classifiers perform well up to 10 km range whereas the conventional matched field processing fails at about 4 km range without accurate environmental information.

  7. What Is Mitochondrial DNA?

    Science.gov (United States)

    ... DNA What is mitochondrial DNA? What is mitochondrial DNA? Although most DNA is packaged in chromosomes within ... proteins. For more information about mitochondria and mitochondrial DNA: Molecular Expressions, a web site from the Florida ...

  8. SVM Classifier – a comprehensive java interface for support vector machine classification of microarray data

    Science.gov (United States)

    Pirooznia, Mehdi; Deng, Youping

    2006-01-01

    Motivation Graphical user interface (GUI) software promotes novelty by allowing users to extend the functionality. SVM Classifier is a cross-platform graphical application that handles very large datasets well. The purpose of this study is to create a GUI application that allows SVM users to perform SVM training, classification and prediction. Results The GUI provides user-friendly access to state-of-the-art SVM methods embodied in the LIBSVM implementation of Support Vector Machine. We implemented the java interface using standard swing libraries. We used a sample data from a breast cancer study for testing classification accuracy. We achieved 100% accuracy in classification among the BRCA1–BRCA2 samples with RBF kernel of SVM. Conclusion We have developed a java GUI application that allows SVM users to perform SVM training, classification and prediction. We have demonstrated that support vector machines can accurately classify genes into functional categories based upon expression data from DNA microarray hybridization experiments. Among the different kernel functions that we examined, the SVM that uses a radial basis kernel function provides the best performance. The SVM Classifier is available at . PMID:17217518

  9. SVM Classifier - a comprehensive java interface for support vector machine classification of microarray data.

    Science.gov (United States)

    Pirooznia, Mehdi; Deng, Youping

    2006-12-12

    Graphical user interface (GUI) software promotes novelty by allowing users to extend the functionality. SVM Classifier is a cross-platform graphical application that handles very large datasets well. The purpose of this study is to create a GUI application that allows SVM users to perform SVM training, classification and prediction. The GUI provides user-friendly access to state-of-the-art SVM methods embodied in the LIBSVM implementation of Support Vector Machine. We implemented the java interface using standard swing libraries. We used a sample data from a breast cancer study for testing classification accuracy. We achieved 100% accuracy in classification among the BRCA1-BRCA2 samples with RBF kernel of SVM. We have developed a java GUI application that allows SVM users to perform SVM training, classification and prediction. We have demonstrated that support vector machines can accurately classify genes into functional categories based upon expression data from DNA microarray hybridization experiments. Among the different kernel functions that we examined, the SVM that uses a radial basis kernel function provides the best performance. The SVM Classifier is available at http://mfgn.usm.edu/ebl/svm/.

  10. Classifying hot water chemistry: Application of MULTIVARIATE STATISTICS

    OpenAIRE

    Sumintadireja, Prihadi; Irawan, Dasapta Erwin; Rezky, Yuanno; Gio, Prana Ugiana; Agustin, Anggita

    2016-01-01

    This file is the dataset for the following paper "Classifying hot water chemistry: Application of MULTIVARIATE STATISTICS". Authors: Prihadi Sumintadireja1, Dasapta Erwin Irawan1, Yuano Rezky2, Prana Ugiana Gio3, Anggita Agustin1

  11. Robust Combining of Disparate Classifiers Through Order Statistics

    Science.gov (United States)

    Tumer, Kagan; Ghosh, Joydeep

    2001-01-01

    Integrating the outputs of multiple classifiers via combiners or meta-learners has led to substantial improvements in several difficult pattern recognition problems. In this article we investigate a family of combiners based on order statistics, for robust handling of situations where there are large discrepancies in performance of individual classifiers. Based on a mathematical modeling of how the decision boundaries are affected by order statistic combiners, we derive expressions for the reductions in error expected when simple output combination methods based on the the median, the maximum and in general, the ith order statistic, are used. Furthermore, we analyze the trim and spread combiners, both based on linear combinations of the ordered classifier outputs, and show that in the presence of uneven classifier performance, they often provide substantial gains over both linear and simple order statistics combiners. Experimental results on both real world data and standard public domain data sets corroborate these findings.

  12. Using Statistical Process Control Methods to Classify Pilot Mental Workloads

    National Research Council Canada - National Science Library

    Kudo, Terence

    2001-01-01

    .... These include cardiac, ocular, respiratory, and brain activity measures. The focus of this effort is to apply statistical process control methodology on different psychophysiological features in an attempt to classify pilot mental workload...

  13. An ensemble classifier to predict track geometry degradation

    International Nuclear Information System (INIS)

    Cárdenas-Gallo, Iván; Sarmiento, Carlos A.; Morales, Gilberto A.; Bolivar, Manuel A.; Akhavan-Tabatabaei, Raha

    2017-01-01

    Railway operations are inherently complex and source of several problems. In particular, track geometry defects are one of the leading causes of train accidents in the United States. This paper presents a solution approach which entails the construction of an ensemble classifier to forecast the degradation of track geometry. Our classifier is constructed by solving the problem from three different perspectives: deterioration, regression and classification. We considered a different model from each perspective and our results show that using an ensemble method improves the predictive performance. - Highlights: • We present an ensemble classifier to forecast the degradation of track geometry. • Our classifier considers three perspectives: deterioration, regression and classification. • We construct and test three models and our results show that using an ensemble method improves the predictive performance.

  14. A novel statistical method for classifying habitat generalists and specialists

    DEFF Research Database (Denmark)

    Chazdon, Robin L; Chao, Anne; Colwell, Robert K

    2011-01-01

    in second-growth (SG) and old-growth (OG) rain forests in the Caribbean lowlands of northeastern Costa Rica. We evaluate the multinomial model in detail for the tree data set. Our results for birds were highly concordant with a previous nonstatistical classification, but our method classified a higher......: (1) generalist; (2) habitat A specialist; (3) habitat B specialist; and (4) too rare to classify with confidence. We illustrate our multinomial classification method using two contrasting data sets: (1) bird abundance in woodland and heath habitats in southeastern Australia and (2) tree abundance...... fraction (57.7%) of bird species with statistical confidence. Based on a conservative specialization threshold and adjustment for multiple comparisons, 64.4% of tree species in the full sample were too rare to classify with confidence. Among the species classified, OG specialists constituted the largest...

  15. 6 CFR 7.23 - Emergency release of classified information.

    Science.gov (United States)

    2010-01-01

    ... Classified Information Non-disclosure Form. In emergency situations requiring immediate verbal release of... information through approved communication channels by the most secure and expeditious method possible, or by...

  16. Toric Varieties and Codes, Error-correcting Codes, Quantum Codes, Secret Sharing and Decoding

    DEFF Research Database (Denmark)

    Hansen, Johan Peder

    We present toric varieties and associated toric codes and their decoding. Toric codes are applied to construct Linear Secret Sharing Schemes (LSSS) with strong multiplication by the Massey construction. Asymmetric Quantum Codes are obtained from toric codes by the A.R. Calderbank P.W. Shor and A.......M. Steane construction of stabilizer codes (CSS) from linear codes containing their dual codes....

  17. DECISION TREE CLASSIFIERS FOR STAR/GALAXY SEPARATION

    International Nuclear Information System (INIS)

    Vasconcellos, E. C.; Ruiz, R. S. R.; De Carvalho, R. R.; Capelato, H. V.; Gal, R. R.; LaBarbera, F. L.; Frago Campos Velho, H.; Trevisan, M.

    2011-01-01

    We study the star/galaxy classification efficiency of 13 different decision tree algorithms applied to photometric objects in the Sloan Digital Sky Survey Data Release Seven (SDSS-DR7). Each algorithm is defined by a set of parameters which, when varied, produce different final classification trees. We extensively explore the parameter space of each algorithm, using the set of 884,126 SDSS objects with spectroscopic data as the training set. The efficiency of star-galaxy separation is measured using the completeness function. We find that the Functional Tree algorithm (FT) yields the best results as measured by the mean completeness in two magnitude intervals: 14 ≤ r ≤ 21 (85.2%) and r ≥ 19 (82.1%). We compare the performance of the tree generated with the optimal FT configuration to the classifications provided by the SDSS parametric classifier, 2DPHOT, and Ball et al. We find that our FT classifier is comparable to or better in completeness over the full magnitude range 15 ≤ r ≤ 21, with much lower contamination than all but the Ball et al. classifier. At the faintest magnitudes (r > 19), our classifier is the only one that maintains high completeness (>80%) while simultaneously achieving low contamination (∼2.5%). We also examine the SDSS parametric classifier (psfMag - modelMag) to see if the dividing line between stars and galaxies can be adjusted to improve the classifier. We find that currently stars in close pairs are often misclassified as galaxies, and suggest a new cut to improve the classifier. Finally, we apply our FT classifier to separate stars from galaxies in the full set of 69,545,326 SDSS photometric objects in the magnitude range 14 ≤ r ≤ 21.

  18. Enhancing the Biological Relevance of Machine Learning Classifiers for Reverse Vaccinology

    Directory of Open Access Journals (Sweden)

    Ashley I. Heinson

    2017-02-01

    Full Text Available Reverse vaccinology (RV is a bioinformatics approach that can predict antigens with protective potential from the protein coding genomes of bacterial pathogens for subunit vaccine design. RV has become firmly established following the development of the BEXSERO® vaccine against Neisseria meningitidis serogroup B. RV studies have begun to incorporate machine learning (ML techniques to distinguish bacterial protective antigens (BPAs from non-BPAs. This research contributes significantly to the RV field by using permutation analysis to demonstrate that a signal for protective antigens can be curated from published data. Furthermore, the effects of the following on an ML approach to RV were also assessed: nested cross-validation, balancing selection of non-BPAs for subcellular localization, increasing the training data, and incorporating greater numbers of protein annotation tools for feature generation. These enhancements yielded a support vector machine (SVM classifier that could discriminate BPAs (n = 200 from non-BPAs (n = 200 with an area under the curve (AUC of 0.787. In addition, hierarchical clustering of BPAs revealed that intracellular BPAs clustered separately from extracellular BPAs. However, no immediate benefit was derived when training SVM classifiers on data sets exclusively containing intra- or extracellular BPAs. In conclusion, this work demonstrates that ML classifiers have great utility in RV approaches and will lead to new subunit vaccines in the future.

  19. Enhancing the Biological Relevance of Machine Learning Classifiers for Reverse Vaccinology

    KAUST Repository

    Heinson, Ashley

    2017-02-01

    Reverse vaccinology (RV) is a bioinformatics approach that can predict antigens with protective potential from the protein coding genomes of bacterial pathogens for subunit vaccine design. RV has become firmly established following the development of the BEXSERO® vaccine against Neisseria meningitidis serogroup B. RV studies have begun to incorporate machine learning (ML) techniques to distinguish bacterial protective antigens (BPAs) from non-BPAs. This research contributes significantly to the RV field by using permutation analysis to demonstrate that a signal for protective antigens can be curated from published data. Furthermore, the effects of the following on an ML approach to RV were also assessed: nested cross-validation, balancing selection of non-BPAs for subcellular localization, increasing the training data, and incorporating greater numbers of protein annotation tools for feature generation. These enhancements yielded a support vector machine (SVM) classifier that could discriminate BPAs (n = 200) from non-BPAs (n = 200) with an area under the curve (AUC) of 0.787. In addition, hierarchical clustering of BPAs revealed that intracellular BPAs clustered separately from extracellular BPAs. However, no immediate benefit was derived when training SVM classifiers on data sets exclusively containing intra- or extracellular BPAs. In conclusion, this work demonstrates that ML classifiers have great utility in RV approaches and will lead to new subunit vaccines in the future.

  20. A comprehensive statistical classifier of foci in the cell transformation assay for carcinogenicity testing.

    Science.gov (United States)

    Callegaro, Giulia; Malkoc, Kasja; Corvi, Raffaella; Urani, Chiara; Stefanini, Federico M

    2017-12-01

    The identification of the carcinogenic risk of chemicals is currently mainly based on animal studies. The in vitro Cell Transformation Assays (CTAs) are a promising alternative to be considered in an integrated approach. CTAs measure the induction of foci of transformed cells. CTAs model key stages of the in vivo neoplastic process and are able to detect both genotoxic and some non-genotoxic compounds, being the only in vitro method able to deal with the latter. Despite their favorable features, CTAs can be further improved, especially reducing the possible subjectivity arising from the last phase of the protocol, namely visual scoring of foci using coded morphological features. By taking advantage of digital image analysis, the aim of our work is to translate morphological features into statistical descriptors of foci images, and to use them to mimic the classification performances of the visual scorer to discriminate between transformed and non-transformed foci. Here we present a classifier based on five descriptors trained on a dataset of 1364 foci, obtained with different compounds and concentrations. Our classifier showed accuracy, sensitivity and specificity equal to 0.77 and an area under the curve (AUC) of 0.84. The presented classifier outperforms a previously published model. Copyright © 2017 Elsevier Ltd. All rights reserved.

  1. Parents' Experiences and Perceptions when Classifying their Children with Cerebral Palsy: Recommendations for Service Providers.

    Science.gov (United States)

    Scime, Natalie V; Bartlett, Doreen J; Brunton, Laura K; Palisano, Robert J

    2017-08-01

    This study investigated the experiences and perceptions of parents of children with cerebral palsy (CP) when classifying their children using the Gross Motor Function Classification System (GMFCS), the Manual Ability Classification System (MACS), and the Communication Function Classification System (CFCS). The second aim was to collate parents' recommendations for service providers on how to interact and communicate with families. A purposive sample of seven parents participating in the On Track study was recruited. Semi-structured interviews were conducted orally and were audiotaped, transcribed, and coded openly. A descriptive interpretive approach within a pragmatic perspective was used during analysis. Seven themes encompassing parents' experiences and perspectives reflect a process of increased understanding when classifying their children, with perceptions of utility evident throughout this process. Six recommendations for service providers emerged, including making the child a priority and being a dependable resource. Knowledge of parents' experiences when using the GMFCS, MACS, and CFCS can provide useful insight for service providers collaborating with parents to classify function in children with CP. Using the recommendations from these parents can facilitate family-provider collaboration for goal setting and intervention planning.

  2. Enhancing the Biological Relevance of Machine Learning Classifiers for Reverse Vaccinology

    KAUST Repository

    Heinson, Ashley; Gunawardana, Yawwani; Moesker, Bastiaan; Hume, Carmen; Vataga, Elena; Hall, Yper; Stylianou, Elena; McShane, Helen; Williams, Ann; Niranjan, Mahesan; Woelk, Christopher

    2017-01-01

    Reverse vaccinology (RV) is a bioinformatics approach that can predict antigens with protective potential from the protein coding genomes of bacterial pathogens for subunit vaccine design. RV has become firmly established following the development of the BEXSERO® vaccine against Neisseria meningitidis serogroup B. RV studies have begun to incorporate machine learning (ML) techniques to distinguish bacterial protective antigens (BPAs) from non-BPAs. This research contributes significantly to the RV field by using permutation analysis to demonstrate that a signal for protective antigens can be curated from published data. Furthermore, the effects of the following on an ML approach to RV were also assessed: nested cross-validation, balancing selection of non-BPAs for subcellular localization, increasing the training data, and incorporating greater numbers of protein annotation tools for feature generation. These enhancements yielded a support vector machine (SVM) classifier that could discriminate BPAs (n = 200) from non-BPAs (n = 200) with an area under the curve (AUC) of 0.787. In addition, hierarchical clustering of BPAs revealed that intracellular BPAs clustered separately from extracellular BPAs. However, no immediate benefit was derived when training SVM classifiers on data sets exclusively containing intra- or extracellular BPAs. In conclusion, this work demonstrates that ML classifiers have great utility in RV approaches and will lead to new subunit vaccines in the future.

  3. Local-global classifier fusion for screening chest radiographs

    Science.gov (United States)

    Ding, Meng; Antani, Sameer; Jaeger, Stefan; Xue, Zhiyun; Candemir, Sema; Kohli, Marc; Thoma, George

    2017-03-01

    Tuberculosis (TB) is a severe comorbidity of HIV and chest x-ray (CXR) analysis is a necessary step in screening for the infective disease. Automatic analysis of digital CXR images for detecting pulmonary abnormalities is critical for population screening, especially in medical resource constrained developing regions. In this article, we describe steps that improve previously reported performance of NLM's CXR screening algorithms and help advance the state of the art in the field. We propose a local-global classifier fusion method where two complementary classification systems are combined. The local classifier focuses on subtle and partial presentation of the disease leveraging information in radiology reports that roughly indicates locations of the abnormalities. In addition, the global classifier models the dominant spatial structure in the gestalt image using GIST descriptor for the semantic differentiation. Finally, the two complementary classifiers are combined using linear fusion, where the weight of each decision is calculated by the confidence probabilities from the two classifiers. We evaluated our method on three datasets in terms of the area under the Receiver Operating Characteristic (ROC) curve, sensitivity, specificity and accuracy. The evaluation demonstrates the superiority of our proposed local-global fusion method over any single classifier.

  4. Verification of classified fissile material using unclassified attributes

    International Nuclear Information System (INIS)

    Nicholas, N.J.; Fearey, B.L.; Puckett, J.M.; Tape, J.W.

    1998-01-01

    This paper reports on the most recent efforts of US technical experts to explore verification by IAEA of unclassified attributes of classified excess fissile material. Two propositions are discussed: (1) that multiple unclassified attributes could be declared by the host nation and then verified (and reverified) by the IAEA in order to provide confidence in that declaration of a classified (or unclassified) inventory while protecting classified or sensitive information; and (2) that attributes could be measured, remeasured, or monitored to provide continuity of knowledge in a nonintrusive and unclassified manner. They believe attributes should relate to characteristics of excess weapons materials and should be verifiable and authenticatable with methods usable by IAEA inspectors. Further, attributes (along with the methods to measure them) must not reveal any classified information. The approach that the authors have taken is as follows: (1) assume certain attributes of classified excess material, (2) identify passive signatures, (3) determine range of applicable measurement physics, (4) develop a set of criteria to assess and select measurement technologies, (5) select existing instrumentation for proof-of-principle measurements and demonstration, and (6) develop and design information barriers to protect classified information. While the attribute verification concepts and measurements discussed in this paper appear promising, neither the attribute verification approach nor the measurement technologies have been fully developed, tested, and evaluated

  5. A cardiorespiratory classifier of voluntary and involuntary electrodermal activity

    Directory of Open Access Journals (Sweden)

    Sejdic Ervin

    2010-02-01

    Full Text Available Abstract Background Electrodermal reactions (EDRs can be attributed to many origins, including spontaneous fluctuations of electrodermal activity (EDA and stimuli such as deep inspirations, voluntary mental activity and startling events. In fields that use EDA as a measure of psychophysiological state, the fact that EDRs may be elicited from many different stimuli is often ignored. This study attempts to classify observed EDRs as voluntary (i.e., generated from intentional respiratory or mental activity or involuntary (i.e., generated from startling events or spontaneous electrodermal fluctuations. Methods Eight able-bodied participants were subjected to conditions that would cause a change in EDA: music imagery, startling noises, and deep inspirations. A user-centered cardiorespiratory classifier consisting of 1 an EDR detector, 2 a respiratory filter and 3 a cardiorespiratory filter was developed to automatically detect a participant's EDRs and to classify the origin of their stimulation as voluntary or involuntary. Results Detected EDRs were classified with a positive predictive value of 78%, a negative predictive value of 81% and an overall accuracy of 78%. Without the classifier, EDRs could only be correctly attributed as voluntary or involuntary with an accuracy of 50%. Conclusions The proposed classifier may enable investigators to form more accurate interpretations of electrodermal activity as a measure of an individual's psychophysiological state.

  6. An Optimal Linear Coding for Index Coding Problem

    OpenAIRE

    Pezeshkpour, Pouya

    2015-01-01

    An optimal linear coding solution for index coding problem is established. Instead of network coding approach by focus on graph theoric and algebraic methods a linear coding program for solving both unicast and groupcast index coding problem is presented. The coding is proved to be the optimal solution from the linear perspective and can be easily utilize for any number of messages. The importance of this work is lying mostly on the usage of the presented coding in the groupcast index coding ...

  7. Word2Vec inversion and traditional text classifiers for phenotyping lupus.

    Science.gov (United States)

    Turner, Clayton A; Jacobs, Alexander D; Marques, Cassios K; Oates, James C; Kamen, Diane L; Anderson, Paul E; Obeid, Jihad S

    2017-08-22

    Identifying patients with certain clinical criteria based on manual chart review of doctors' notes is a daunting task given the massive amounts of text notes in the electronic health records (EHR). This task can be automated using text classifiers based on Natural Language Processing (NLP) techniques along with pattern recognition machine learning (ML) algorithms. The aim of this research is to evaluate the performance of traditional classifiers for identifying patients with Systemic Lupus Erythematosus (SLE) in comparison with a newer Bayesian word vector method. We obtained clinical notes for patients with SLE diagnosis along with controls from the Rheumatology Clinic (662 total patients). Sparse bag-of-words (BOWs) and Unified Medical Language System (UMLS) Concept Unique Identifiers (CUIs) matrices were produced using NLP pipelines. These matrices were subjected to several different NLP classifiers: neural networks, random forests, naïve Bayes, support vector machines, and Word2Vec inversion, a Bayesian inversion method. Performance was measured by calculating accuracy and area under the Receiver Operating Characteristic (ROC) curve (AUC) of a cross-validated (CV) set and a separate testing set. We calculated the accuracy of the ICD-9 billing codes as a baseline to be 90.00% with an AUC of 0.900, the shallow neural network with CUIs to be 92.10% with an AUC of 0.970, the random forest with BOWs to be 95.25% with an AUC of 0.994, the random forest with CUIs to be 95.00% with an AUC of 0.979, and the Word2Vec inversion to be 90.03% with an AUC of 0.905. Our results suggest that a shallow neural network with CUIs and random forests with both CUIs and BOWs are the best classifiers for this lupus phenotyping task. The Word2Vec inversion method failed to significantly beat the ICD-9 code classification, but yielded promising results. This method does not require explicit features and is more adaptable to non-binary classification tasks. The Word2Vec inversion is

  8. The Aesthetics of Coding

    DEFF Research Database (Denmark)

    Andersen, Christian Ulrik

    2007-01-01

    Computer art is often associated with computer-generated expressions (digitally manipulated audio/images in music, video, stage design, media facades, etc.). In recent computer art, however, the code-text itself – not the generated output – has become the artwork (Perl Poetry, ASCII Art, obfuscated...... code, etc.). The presentation relates this artistic fascination of code to a media critique expressed by Florian Cramer, claiming that the graphical interface represents a media separation (of text/code and image) causing alienation to the computer’s materiality. Cramer is thus the voice of a new ‘code...... avant-garde’. In line with Cramer, the artists Alex McLean and Adrian Ward (aka Slub) declare: “art-oriented programming needs to acknowledge the conditions of its own making – its poesis.” By analysing the Live Coding performances of Slub (where they program computer music live), the presentation...

  9. Majorana fermion codes

    International Nuclear Information System (INIS)

    Bravyi, Sergey; Terhal, Barbara M; Leemhuis, Bernhard

    2010-01-01

    We initiate the study of Majorana fermion codes (MFCs). These codes can be viewed as extensions of Kitaev's one-dimensional (1D) model of unpaired Majorana fermions in quantum wires to higher spatial dimensions and interacting fermions. The purpose of MFCs is to protect quantum information against low-weight fermionic errors, that is, operators acting on sufficiently small subsets of fermionic modes. We examine to what extent MFCs can surpass qubit stabilizer codes in terms of their stability properties. A general construction of 2D MFCs is proposed that combines topological protection based on a macroscopic code distance with protection based on fermionic parity conservation. Finally, we use MFCs to show how to transform any qubit stabilizer code to a weakly self-dual CSS code.

  10. Theory of epigenetic coding.

    Science.gov (United States)

    Elder, D

    1984-06-07

    The logic of genetic control of development may be based on a binary epigenetic code. This paper revises the author's previous scheme dealing with the numerology of annelid metamerism in these terms. Certain features of the code had been deduced to be combinatorial, others not. This paradoxical contrast is resolved here by the interpretation that these features relate to different operations of the code; the combinatiorial to coding identity of units, the non-combinatorial to coding production of units. Consideration of a second paradox in the theory of epigenetic coding leads to a new solution which further provides a basis for epimorphic regeneration, and may in particular throw light on the "regeneration-duplication" phenomenon. A possible test of the model is also put forward.

  11. DISP1 code

    International Nuclear Information System (INIS)

    Vokac, P.

    1999-12-01

    DISP1 code is a simple tool for assessment of the dispersion of the fission product cloud escaping from a nuclear power plant after an accident. The code makes it possible to tentatively check the feasibility of calculations by more complex PSA3 codes and/or codes for real-time dispersion calculations. The number of input parameters is reasonably low and the user interface is simple enough to allow a rapid processing of sensitivity analyses. All input data entered through the user interface are stored in the text format. Implementation of dispersion model corrections taken from the ARCON96 code enables the DISP1 code to be employed for assessment of the radiation hazard within the NPP area, in the control room for instance. (P.A.)

  12. Phonological coding during reading.

    Science.gov (United States)

    Leinenger, Mallorie

    2014-11-01

    The exact role that phonological coding (the recoding of written, orthographic information into a sound based code) plays during silent reading has been extensively studied for more than a century. Despite the large body of research surrounding the topic, varying theories as to the time course and function of this recoding still exist. The present review synthesizes this body of research, addressing the topics of time course and function in tandem. The varying theories surrounding the function of phonological coding (e.g., that phonological codes aid lexical access, that phonological codes aid comprehension and bolster short-term memory, or that phonological codes are largely epiphenomenal in skilled readers) are first outlined, and the time courses that each maps onto (e.g., that phonological codes come online early [prelexical] or that phonological codes come online late [postlexical]) are discussed. Next the research relevant to each of these proposed functions is reviewed, discussing the varying methodologies that have been used to investigate phonological coding (e.g., response time methods, reading while eye-tracking or recording EEG and MEG, concurrent articulation) and highlighting the advantages and limitations of each with respect to the study of phonological coding. In response to the view that phonological coding is largely epiphenomenal in skilled readers, research on the use of phonological codes in prelingually, profoundly deaf readers is reviewed. Finally, implications for current models of word identification (activation-verification model, Van Orden, 1987; dual-route model, e.g., M. Coltheart, Rastle, Perry, Langdon, & Ziegler, 2001; parallel distributed processing model, Seidenberg & McClelland, 1989) are discussed. (PsycINFO Database Record (c) 2014 APA, all rights reserved).

  13. The aeroelastic code FLEXLAST

    Energy Technology Data Exchange (ETDEWEB)

    Visser, B. [Stork Product Eng., Amsterdam (Netherlands)

    1996-09-01

    To support the discussion on aeroelastic codes, a description of the code FLEXLAST was given and experiences within benchmarks and measurement programmes were summarized. The code FLEXLAST has been developed since 1982 at Stork Product Engineering (SPE). Since 1992 FLEXLAST has been used by Dutch industries for wind turbine and rotor design. Based on the comparison with measurements, it can be concluded that the main shortcomings of wind turbine modelling lie in the field of aerodynamics, wind field and wake modelling. (au)

  14. Altruistic functions for selfish DNA.

    Science.gov (United States)

    Faulkner, Geoffrey J; Carninci, Piero

    2009-09-15

    Mammalian genomes are comprised of 30-50% transposed elements (TEs). The vast majority of these TEs are truncated and mutated fragments of retrotransposons that are no longer capable of transposition. Although initially regarded as important factors in the evolution of gene regulatory networks, TEs are now commonly perceived as neutrally evolving and non-functional genomic elements. In a major development, recent works have strongly contradicted this "selfish DNA" or "junk DNA" dogma by demonstrating that TEs use a host of novel promoters to generate RNA on a massive scale across most eukaryotic cells. This transcription frequently functions to control the expression of protein-coding genes via alternative promoters, cis regulatory non protein-coding RNAs and the formation of double stranded short RNAs. If considered in sum, these findings challenge the designation of TEs as selfish and neutrally evolving genomic elements. Here, we will expand upon these themes and discuss challenges in establishing novel TE functions in vivo.

  15. MORSE Monte Carlo code

    International Nuclear Information System (INIS)

    Cramer, S.N.

    1984-01-01

    The MORSE code is a large general-use multigroup Monte Carlo code system. Although no claims can be made regarding its superiority in either theoretical details or Monte Carlo techniques, MORSE has been, since its inception at ORNL in the late 1960s, the most widely used Monte Carlo radiation transport code. The principal reason for this popularity is that MORSE is relatively easy to use, independent of any installation or distribution center, and it can be easily customized to fit almost any specific need. Features of the MORSE code are described

  16. QR codes for dummies

    CERN Document Server

    Waters, Joe

    2012-01-01

    Find out how to effectively create, use, and track QR codes QR (Quick Response) codes are popping up everywhere, and businesses are reaping the rewards. Get in on the action with the no-nonsense advice in this streamlined, portable guide. You'll find out how to get started, plan your strategy, and actually create the codes. Then you'll learn to link codes to mobile-friendly content, track your results, and develop ways to give your customers value that will keep them coming back. It's all presented in the straightforward style you've come to know and love, with a dash of humor thrown

  17. Tokamak Systems Code

    International Nuclear Information System (INIS)

    Reid, R.L.; Barrett, R.J.; Brown, T.G.

    1985-03-01

    The FEDC Tokamak Systems Code calculates tokamak performance, cost, and configuration as a function of plasma engineering parameters. This version of the code models experimental tokamaks. It does not currently consider tokamak configurations that generate electrical power or incorporate breeding blankets. The code has a modular (or subroutine) structure to allow independent modeling for each major tokamak component or system. A primary benefit of modularization is that a component module may be updated without disturbing the remainder of the systems code as long as the imput to or output from the module remains unchanged

  18. Efficient Coding of Information: Huffman Coding -RE ...

    Indian Academy of Sciences (India)

    to a stream of equally-likely symbols so as to recover the original stream in the event of errors. The for- ... The source-coding problem is one of finding a mapping from U to a ... probability that the random variable X takes the value x written as ...

  19. NR-code: Nonlinear reconstruction code

    Science.gov (United States)

    Yu, Yu; Pen, Ue-Li; Zhu, Hong-Ming

    2018-04-01

    NR-code applies nonlinear reconstruction to the dark matter density field in redshift space and solves for the nonlinear mapping from the initial Lagrangian positions to the final redshift space positions; this reverses the large-scale bulk flows and improves the precision measurement of the baryon acoustic oscillations (BAO) scale.

  20. Balanced sensitivity functions for tuning multi-dimensional Bayesian network classifiers

    NARCIS (Netherlands)

    Bolt, J.H.; van der Gaag, L.C.

    Multi-dimensional Bayesian network classifiers are Bayesian networks of restricted topological structure, which are tailored to classifying data instances into multiple dimensions. Like more traditional classifiers, multi-dimensional classifiers are typically learned from data and may include

  1. Nonparametric, Coupled ,Bayesian ,Dictionary ,and Classifier Learning for Hyperspectral Classification.

    Science.gov (United States)

    Akhtar, Naveed; Mian, Ajmal

    2017-10-03

    We present a principled approach to learn a discriminative dictionary along a linear classifier for hyperspectral classification. Our approach places Gaussian Process priors over the dictionary to account for the relative smoothness of the natural spectra, whereas the classifier parameters are sampled from multivariate Gaussians. We employ two Beta-Bernoulli processes to jointly infer the dictionary and the classifier. These processes are coupled under the same sets of Bernoulli distributions. In our approach, these distributions signify the frequency of the dictionary atom usage in representing class-specific training spectra, which also makes the dictionary discriminative. Due to the coupling between the dictionary and the classifier, the popularity of the atoms for representing different classes gets encoded into the classifier. This helps in predicting the class labels of test spectra that are first represented over the dictionary by solving a simultaneous sparse optimization problem. The labels of the spectra are predicted by feeding the resulting representations to the classifier. Our approach exploits the nonparametric Bayesian framework to automatically infer the dictionary size--the key parameter in discriminative dictionary learning. Moreover, it also has the desirable property of adaptively learning the association between the dictionary atoms and the class labels by itself. We use Gibbs sampling to infer the posterior probability distributions over the dictionary and the classifier under the proposed model, for which, we derive analytical expressions. To establish the effectiveness of our approach, we test it on benchmark hyperspectral images. The classification performance is compared with the state-of-the-art dictionary learning-based classification methods.

  2. Classifying a smoker scale in adult daily and nondaily smokers.

    Science.gov (United States)

    Pulvers, Kim; Scheuermann, Taneisha S; Romero, Devan R; Basora, Brittany; Luo, Xianghua; Ahluwalia, Jasjit S

    2014-05-01

    Smoker identity, or the strength of beliefs about oneself as a smoker, is a robust marker of smoking behavior. However, many nondaily smokers do not identify as smokers, underestimating their risk for tobacco-related disease and resulting in missed intervention opportunities. Assessing underlying beliefs about characteristics used to classify smokers may help explain the discrepancy between smoking behavior and smoker identity. This study examines the factor structure, reliability, and validity of the Classifying a Smoker scale among a racially diverse sample of adult smokers. A cross-sectional survey was administered through an online panel survey service to 2,376 current smokers who were at least 25 years of age. The sample was stratified to obtain equal numbers of 3 racial/ethnic groups (African American, Latino, and White) across smoking level (nondaily and daily smoking). The Classifying a Smoker scale displayed a single factor structure and excellent internal consistency (α = .91). Classifying a Smoker scores significantly increased at each level of smoking, F(3,2375) = 23.68, p smoker identity, stronger dependence on cigarettes, greater health risk perceptions, more smoking friends, and were more likely to carry cigarettes. Classifying a Smoker scores explained unique variance in smoking variables above and beyond that explained by smoker identity. The present study supports the use of the Classifying a Smoker scale among diverse, experienced smokers. Stronger endorsement of characteristics used to classify a smoker (i.e., stricter criteria) was positively associated with heavier smoking and related characteristics. Prospective studies are needed to inform prevention and treatment efforts.

  3. Representative Vector Machines: A Unified Framework for Classical Classifiers.

    Science.gov (United States)

    Gui, Jie; Liu, Tongliang; Tao, Dacheng; Sun, Zhenan; Tan, Tieniu

    2016-08-01

    Classifier design is a fundamental problem in pattern recognition. A variety of pattern classification methods such as the nearest neighbor (NN) classifier, support vector machine (SVM), and sparse representation-based classification (SRC) have been proposed in the literature. These typical and widely used classifiers were originally developed from different theory or application motivations and they are conventionally treated as independent and specific solutions for pattern classification. This paper proposes a novel pattern classification framework, namely, representative vector machines (or RVMs for short). The basic idea of RVMs is to assign the class label of a test example according to its nearest representative vector. The contributions of RVMs are twofold. On one hand, the proposed RVMs establish a unified framework of classical classifiers because NN, SVM, and SRC can be interpreted as the special cases of RVMs with different definitions of representative vectors. Thus, the underlying relationship among a number of classical classifiers is revealed for better understanding of pattern classification. On the other hand, novel and advanced classifiers are inspired in the framework of RVMs. For example, a robust pattern classification method called discriminant vector machine (DVM) is motivated from RVMs. Given a test example, DVM first finds its k -NNs and then performs classification based on the robust M-estimator and manifold regularization. Extensive experimental evaluations on a variety of visual recognition tasks such as face recognition (Yale and face recognition grand challenge databases), object categorization (Caltech-101 dataset), and action recognition (Action Similarity LAbeliNg) demonstrate the advantages of DVM over other classifiers.

  4. DNA Vaccines

    Indian Academy of Sciences (India)

    diseases. Keywords. DNA vaccine, immune response, antibodies, infectious diseases. GENERAL .... tein vaccines require expensive virus/protein purification tech- niques as ... sphere continue to remain major health hazards in developing nations. ... significance since it can be produced at a very low cost and can be stored ...

  5. DNA Investigations.

    Science.gov (United States)

    Mayo, Ellen S.; Bertino, Anthony J.

    1991-01-01

    Presents a simulation activity that allow students to work through the exercise of DNA profiling and to grapple with some analytical and ethical questions involving a couple arranging with a surrogate mother to have a baby. Can be used to teach the principles of restriction enzyme digestion, gel electrophoresis, and probe hybridization. (MDH)

  6. Current Directional Protection of Series Compensated Line Using Intelligent Classifier

    Directory of Open Access Journals (Sweden)

    M. Mollanezhad Heydarabadi

    2016-12-01

    Full Text Available Current inversion condition leads to incorrect operation of current based directional relay in power system with series compensated device. Application of the intelligent system for fault direction classification has been suggested in this paper. A new current directional protection scheme based on intelligent classifier is proposed for the series compensated line. The proposed classifier uses only half cycle of pre-fault and post fault current samples at relay location to feed the classifier. A lot of forward and backward fault simulations under different system conditions upon a transmission line with a fixed series capacitor are carried out using PSCAD/EMTDC software. The applicability of decision tree (DT, probabilistic neural network (PNN and support vector machine (SVM are investigated using simulated data under different system conditions. The performance comparison of the classifiers indicates that the SVM is a best suitable classifier for fault direction discriminating. The backward faults can be accurately distinguished from forward faults even under current inversion without require to detect of the current inversion condition.

  7. Neural network classifier of attacks in IP telephony

    Science.gov (United States)

    Safarik, Jakub; Voznak, Miroslav; Mehic, Miralem; Partila, Pavol; Mikulec, Martin

    2014-05-01

    Various types of monitoring mechanism allow us to detect and monitor behavior of attackers in VoIP networks. Analysis of detected malicious traffic is crucial for further investigation and hardening the network. This analysis is typically based on statistical methods and the article brings a solution based on neural network. The proposed algorithm is used as a classifier of attacks in a distributed monitoring network of independent honeypot probes. Information about attacks on these honeypots is collected on a centralized server and then classified. This classification is based on different mechanisms. One of them is based on the multilayer perceptron neural network. The article describes inner structure of used neural network and also information about implementation of this network. The learning set for this neural network is based on real attack data collected from IP telephony honeypot called Dionaea. We prepare the learning set from real attack data after collecting, cleaning and aggregation of this information. After proper learning is the neural network capable to classify 6 types of most commonly used VoIP attacks. Using neural network classifier brings more accurate attack classification in a distributed system of honeypots. With this approach is possible to detect malicious behavior in a different part of networks, which are logically or geographically divided and use the information from one network to harden security in other networks. Centralized server for distributed set of nodes serves not only as a collector and classifier of attack data, but also as a mechanism for generating a precaution steps against attacks.

  8. Maximum margin classifier working in a set of strings.

    Science.gov (United States)

    Koyano, Hitoshi; Hayashida, Morihiro; Akutsu, Tatsuya

    2016-03-01

    Numbers and numerical vectors account for a large portion of data. However, recently, the amount of string data generated has increased dramatically. Consequently, classifying string data is a common problem in many fields. The most widely used approach to this problem is to convert strings into numerical vectors using string kernels and subsequently apply a support vector machine that works in a numerical vector space. However, this non-one-to-one conversion involves a loss of information and makes it impossible to evaluate, using probability theory, the generalization error of a learning machine, considering that the given data to train and test the machine are strings generated according to probability laws. In this study, we approach this classification problem by constructing a classifier that works in a set of strings. To evaluate the generalization error of such a classifier theoretically, probability theory for strings is required. Therefore, we first extend a limit theorem for a consensus sequence of strings demonstrated by one of the authors and co-workers in a previous study. Using the obtained result, we then demonstrate that our learning machine classifies strings in an asymptotically optimal manner. Furthermore, we demonstrate the usefulness of our machine in practical data analysis by applying it to predicting protein-protein interactions using amino acid sequences and classifying RNAs by the secondary structure using nucleotide sequences.

  9. Use of information barriers to protect classified information

    International Nuclear Information System (INIS)

    MacArthur, D.; Johnson, M.W.; Nicholas, N.J.; Whiteson, R.

    1998-01-01

    This paper discusses the detailed requirements for an information barrier (IB) for use with verification systems that employ intrusive measurement technologies. The IB would protect classified information in a bilateral or multilateral inspection of classified fissile material. Such a barrier must strike a balance between providing the inspecting party the confidence necessary to accept the measurement while protecting the inspected party's classified information. The authors discuss the structure required of an IB as well as the implications of the IB on detector system maintenance. A defense-in-depth approach is proposed which would provide assurance to the inspected party that all sensitive information is protected and to the inspecting party that the measurements are being performed as expected. The barrier could include elements of physical protection (such as locks, surveillance systems, and tamper indicators), hardening of key hardware components, assurance of capabilities and limitations of hardware and software systems, administrative controls, validation and verification of the systems, and error detection and resolution. Finally, an unclassified interface could be used to display and, possibly, record measurement results. The introduction of an IB into an analysis system may result in many otherwise innocuous components (detectors, analyzers, etc.) becoming classified and unavailable for routine maintenance by uncleared personnel. System maintenance and updating will be significantly simplified if the classification status of as many components as possible can be made reversible (i.e. the component can become unclassified following the removal of classified objects)

  10. Detection of microaneurysms in retinal images using an ensemble classifier

    Directory of Open Access Journals (Sweden)

    M.M. Habib

    2017-01-01

    Full Text Available This paper introduces, and reports on the performance of, a novel combination of algorithms for automated microaneurysm (MA detection in retinal images. The presence of MAs in retinal images is a pathognomonic sign of Diabetic Retinopathy (DR which is one of the leading causes of blindness amongst the working age population. An extensive survey of the literature is presented and current techniques in the field are summarised. The proposed technique first detects an initial set of candidates using a Gaussian Matched Filter and then classifies this set to reduce the number of false positives. A Tree Ensemble classifier is used with a set of 70 features (the most commons features in the literature. A new set of 32 MA groundtruth images (with a total of 256 labelled MAs based on images from the MESSIDOR dataset is introduced as a public dataset for benchmarking MA detection algorithms. We evaluate our algorithm on this dataset as well as another public dataset (DIARETDB1 v2.1 and compare it against the best available alternative. Results show that the proposed classifier is superior in terms of eliminating false positive MA detection from the initial set of candidates. The proposed method achieves an ROC score of 0.415 compared to 0.2636 achieved by the best available technique. Furthermore, results show that the classifier model maintains consistent performance across datasets, illustrating the generalisability of the classifier and that overfitting does not occur.

  11. Generalization in the XCSF classifier system: analysis, improvement, and extension.

    Science.gov (United States)

    Lanzi, Pier Luca; Loiacono, Daniele; Wilson, Stewart W; Goldberg, David E

    2007-01-01

    We analyze generalization in XCSF and introduce three improvements. We begin by showing that the types of generalizations evolved by XCSF can be influenced by the input range. To explain these results we present a theoretical analysis of the convergence of classifier weights in XCSF which highlights a broader issue. In XCSF, because of the mathematical properties of the Widrow-Hoff update, the convergence of classifier weights in a given subspace can be slow when the spread of the eigenvalues of the autocorrelation matrix associated with each classifier is large. As a major consequence, the system's accuracy pressure may act before classifier weights are adequately updated, so that XCSF may evolve piecewise constant approximations, instead of the intended, and more efficient, piecewise linear ones. We propose three different ways to update classifier weights in XCSF so as to increase the generalization capabilities of XCSF: one based on a condition-based normalization of the inputs, one based on linear least squares, and one based on the recursive version of linear least squares. Through a series of experiments we show that while all three approaches significantly improve XCSF, least squares approaches appear to be best performing and most robust. Finally we show how XCSF can be extended to include polynomial approximations.

  12. Dynamic cluster generation for a fuzzy classifier with ellipsoidal regions.

    Science.gov (United States)

    Abe, S

    1998-01-01

    In this paper, we discuss a fuzzy classifier with ellipsoidal regions that dynamically generates clusters. First, for the data belonging to a class we define a fuzzy rule with an ellipsoidal region. Namely, using the training data for each class, we calculate the center and the covariance matrix of the ellipsoidal region for the class. Then we tune the fuzzy rules, i.e., the slopes of the membership functions, successively until there is no improvement in the recognition rate of the training data. Then if the number of the data belonging to a class that are misclassified into another class exceeds a prescribed number, we define a new cluster to which those data belong and the associated fuzzy rule. Then we tune the newly defined fuzzy rules in the similar way as stated above, fixing the already obtained fuzzy rules. We iterate generation of clusters and tuning of the newly generated fuzzy rules until the number of the data belonging to a class that are misclassified into another class does not exceed the prescribed number. We evaluate our method using thyroid data, Japanese Hiragana data of vehicle license plates, and blood cell data. By dynamic cluster generation, the generalization ability of the classifier is improved and the recognition rate of the fuzzy classifier for the test data is the best among the neural network classifiers and other fuzzy classifiers if there are no discrete input variables.

  13. enDNA-Prot: Identification of DNA-Binding Proteins by Applying Ensemble Learning

    Directory of Open Access Journals (Sweden)

    Ruifeng Xu

    2014-01-01

    Full Text Available DNA-binding proteins are crucial for various cellular processes, such as recognition of specific nucleotide, regulation of transcription, and regulation of gene expression. Developing an effective model for identifying DNA-binding proteins is an urgent research problem. Up to now, many methods have been proposed, but most of them focus on only one classifier and cannot make full use of the large number of negative samples to improve predicting performance. This study proposed a predictor called enDNA-Prot for DNA-binding protein identification by employing the ensemble learning technique. Experiential results showed that enDNA-Prot was comparable with DNA-Prot and outperformed DNAbinder and iDNA-Prot with performance improvement in the range of 3.97–9.52% in ACC and 0.08–0.19 in MCC. Furthermore, when the benchmark dataset was expanded with negative samples, the performance of enDNA-Prot outperformed the three existing methods by 2.83–16.63% in terms of ACC and 0.02–0.16 in terms of MCC. It indicated that enDNA-Prot is an effective method for DNA-binding protein identification and expanding training dataset with negative samples can improve its performance. For the convenience of the vast majority of experimental scientists, we developed a user-friendly web-server for enDNA-Prot which is freely accessible to the public.

  14. Synthesizing Certified Code

    Science.gov (United States)

    Whalen, Michael; Schumann, Johann; Fischer, Bernd

    2002-01-01

    Code certification is a lightweight approach to demonstrate software quality on a formal level. Its basic idea is to require producers to provide formal proofs that their code satisfies certain quality properties. These proofs serve as certificates which can be checked independently. Since code certification uses the same underlying technology as program verification, it also requires many detailed annotations (e.g., loop invariants) to make the proofs possible. However, manually adding theses annotations to the code is time-consuming and error-prone. We address this problem by combining code certification with automatic program synthesis. We propose an approach to generate simultaneously, from a high-level specification, code and all annotations required to certify generated code. Here, we describe a certification extension of AUTOBAYES, a synthesis tool which automatically generates complex data analysis programs from compact specifications. AUTOBAYES contains sufficient high-level domain knowledge to generate detailed annotations. This allows us to use a general-purpose verification condition generator to produce a set of proof obligations in first-order logic. The obligations are then discharged using the automated theorem E-SETHEO. We demonstrate our approach by certifying operator safety for a generated iterative data classification program without manual annotation of the code.

  15. Code of Ethics

    Science.gov (United States)

    Division for Early Childhood, Council for Exceptional Children, 2009

    2009-01-01

    The Code of Ethics of the Division for Early Childhood (DEC) of the Council for Exceptional Children is a public statement of principles and practice guidelines supported by the mission of DEC. The foundation of this Code is based on sound ethical reasoning related to professional practice with young children with disabilities and their families…

  16. Interleaved Product LDPC Codes

    OpenAIRE

    Baldi, Marco; Cancellieri, Giovanni; Chiaraluce, Franco

    2011-01-01

    Product LDPC codes take advantage of LDPC decoding algorithms and the high minimum distance of product codes. We propose to add suitable interleavers to improve the waterfall performance of LDPC decoding. Interleaving also reduces the number of low weight codewords, that gives a further advantage in the error floor region.

  17. Insurance billing and coding.

    Science.gov (United States)

    Napier, Rebecca H; Bruelheide, Lori S; Demann, Eric T K; Haug, Richard H

    2008-07-01

    The purpose of this article is to highlight the importance of understanding various numeric and alpha-numeric codes for accurately billing dental and medically related services to private pay or third-party insurance carriers. In the United States, common dental terminology (CDT) codes are most commonly used by dentists to submit claims, whereas current procedural terminology (CPT) and International Classification of Diseases, Ninth Revision, Clinical Modification (ICD.9.CM) codes are more commonly used by physicians to bill for their services. The CPT and ICD.9.CM coding systems complement each other in that CPT codes provide the procedure and service information and ICD.9.CM codes provide the reason or rationale for a particular procedure or service. These codes are more commonly used for "medical necessity" determinations, and general dentists and specialists who routinely perform care, including trauma-related care, biopsies, and dental treatment as a result of or in anticipation of a cancer-related treatment, are likely to use these codes. Claim submissions for care provided can be completed electronically or by means of paper forms.

  18. Error Correcting Codes

    Indian Academy of Sciences (India)

    Science and Automation at ... the Reed-Solomon code contained 223 bytes of data, (a byte ... then you have a data storage system with error correction, that ..... practical codes, storing such a table is infeasible, as it is generally too large.

  19. Scrum Code Camps

    DEFF Research Database (Denmark)

    Pries-Heje, Lene; Pries-Heje, Jan; Dalgaard, Bente

    2013-01-01

    is required. In this paper we present the design of such a new approach, the Scrum Code Camp, which can be used to assess agile team capability in a transparent and consistent way. A design science research approach is used to analyze properties of two instances of the Scrum Code Camp where seven agile teams...

  20. RFQ simulation code

    International Nuclear Information System (INIS)

    Lysenko, W.P.

    1984-04-01

    We have developed the RFQLIB simulation system to provide a means to systematically generate the new versions of radio-frequency quadrupole (RFQ) linac simulation codes that are required by the constantly changing needs of a research environment. This integrated system simplifies keeping track of the various versions of the simulation code and makes it practical to maintain complete and up-to-date documentation. In this scheme, there is a certain standard version of the simulation code that forms a library upon which new versions are built. To generate a new version of the simulation code, the routines to be modified or added are appended to a standard command file, which contains the commands to compile the new routines and link them to the routines in the library. The library itself is rarely changed. Whenever the library is modified, however, this modification is seen by all versions of the simulation code, which actually exist as different versions of the command file. All code is written according to the rules of structured programming. Modularity is enforced by not using COMMON statements, simplifying the relation of the data flow to a hierarchy diagram. Simulation results are similar to those of the PARMTEQ code, as expected, because of the similar physical model. Different capabilities, such as those for generating beams matched in detail to the structure, are available in the new code for help in testing new ideas in designing RFQ linacs

  1. Error Correcting Codes

    Indian Academy of Sciences (India)

    Home; Journals; Resonance – Journal of Science Education; Volume 2; Issue 3. Error Correcting Codes - Reed Solomon Codes. Priti Shankar. Series Article Volume 2 Issue 3 March ... Author Affiliations. Priti Shankar1. Department of Computer Science and Automation, Indian Institute of Science, Bangalore 560 012, India ...

  2. 78 FR 18321 - International Code Council: The Update Process for the International Codes and Standards

    Science.gov (United States)

    2013-03-26

    ... Energy Conservation Code. International Existing Building Code. International Fire Code. International... Code. International Property Maintenance Code. International Residential Code. International Swimming Pool and Spa Code International Wildland-Urban Interface Code. International Zoning Code. ICC Standards...

  3. Validation of thermalhydraulic codes

    International Nuclear Information System (INIS)

    Wilkie, D.

    1992-01-01

    Thermalhydraulic codes require to be validated against experimental data collected over a wide range of situations if they are to be relied upon. A good example is provided by the nuclear industry where codes are used for safety studies and for determining operating conditions. Errors in the codes could lead to financial penalties, to the incorrect estimation of the consequences of accidents and even to the accidents themselves. Comparison between prediction and experiment is often described qualitatively or in approximate terms, e.g. ''agreement is within 10%''. A quantitative method is preferable, especially when several competing codes are available. The codes can then be ranked in order of merit. Such a method is described. (Author)

  4. Fracture flow code

    International Nuclear Information System (INIS)

    Dershowitz, W; Herbert, A.; Long, J.

    1989-03-01

    The hydrology of the SCV site will be modelled utilizing discrete fracture flow models. These models are complex, and can not be fully cerified by comparison to analytical solutions. The best approach for verification of these codes is therefore cross-verification between different codes. This is complicated by the variation in assumptions and solution techniques utilized in different codes. Cross-verification procedures are defined which allow comparison of the codes developed by Harwell Laboratory, Lawrence Berkeley Laboratory, and Golder Associates Inc. Six cross-verification datasets are defined for deterministic and stochastic verification of geometric and flow features of the codes. Additional datasets for verification of transport features will be documented in a future report. (13 figs., 7 tabs., 10 refs.) (authors)

  5. SpectraClassifier 1.0: a user friendly, automated MRS-based classifier-development system

    Directory of Open Access Journals (Sweden)

    Julià-Sapé Margarida

    2010-02-01

    Full Text Available Abstract Background SpectraClassifier (SC is a Java solution for designing and implementing Magnetic Resonance Spectroscopy (MRS-based classifiers. The main goal of SC is to allow users with minimum background knowledge of multivariate statistics to perform a fully automated pattern recognition analysis. SC incorporates feature selection (greedy stepwise approach, either forward or backward, and feature extraction (PCA. Fisher Linear Discriminant Analysis is the method of choice for classification. Classifier evaluation is performed through various methods: display of the confusion matrix of the training and testing datasets; K-fold cross-validation, leave-one-out and bootstrapping as well as Receiver Operating Characteristic (ROC curves. Results SC is composed of the following modules: Classifier design, Data exploration, Data visualisation, Classifier evaluation, Reports, and Classifier history. It is able to read low resolution in-vivo MRS (single-voxel and multi-voxel and high resolution tissue MRS (HRMAS, processed with existing tools (jMRUI, INTERPRET, 3DiCSI or TopSpin. In addition, to facilitate exchanging data between applications, a standard format capable of storing all the information needed for a dataset was developed. Each functionality of SC has been specifically validated with real data with the purpose of bug-testing and methods validation. Data from the INTERPRET project was used. Conclusions SC is a user-friendly software designed to fulfil the needs of potential users in the MRS community. It accepts all kinds of pre-processed MRS data types and classifies them semi-automatically, allowing spectroscopists to concentrate on interpretation of results with the use of its visualisation tools.

  6. Characterization of mitochondrial DNA polymorphisms in the Han population in Liaoning Province, Northeast China.

    Science.gov (United States)

    Xu, Feng-Ling; Yao, Jun; Ding, Mei; Shi, Zhang-Sen; Wu, Xue; Zhang, Jing-Jing; Wang, Bao-Jie

    2018-03-01

    This study characterized the genetic variations of mitochondrial DNA (mtDNA) to elucidate the maternal genetic structure of Liaoning Han Chinese. A total of 317 blood samples of unrelated individuals were collected for analysis in Liaoning Province. The mtDNA samples were analyzed using two distinct methods: sequencing of the hypervariable sequences I and II (HVSI and HVSII), and polymerase chain reaction-restriction fragment length polymorphism (PCR-RFLP) analysis of the coding region. The results indicated a high gene diversity value (0.9997 ± 0.0003), a high polymorphism information content (0.99668) and a random match probability (0.00332). These samples were classified into 305 haplotypes, with 9 shared haplotypes. The most common haplogroup was D4 (12.93%). The principal component analysis map, the phylogenetic tree map, and the genetic distance matrix all indicated that the genetic distance of the Liaoning Han population from the Tibetan group was distant, whereas that from the Miao group was relatively close.

  7. A History of Classified Activities at Oak Ridge National Laboratory

    Energy Technology Data Exchange (ETDEWEB)

    Quist, A.S.

    2001-01-30

    The facilities that became Oak Ridge National Laboratory (ORNL) were created in 1943 during the United States' super-secret World War II project to construct an atomic bomb (the Manhattan Project). During World War II and for several years thereafter, essentially all ORNL activities were classified. Now, in 2000, essentially all ORNL activities are unclassified. The major purpose of this report is to provide a brief history of ORNL's major classified activities from 1943 until the present (September 2000). This report is expected to be useful to the ORNL Classification Officer and to ORNL's Authorized Derivative Classifiers and Authorized Derivative Declassifiers in their classification review of ORNL documents, especially those documents that date from the 1940s and 1950s.

  8. COMPARISON OF SVM AND FUZZY CLASSIFIER FOR AN INDIAN SCRIPT

    Directory of Open Access Journals (Sweden)

    M. J. Baheti

    2012-01-01

    Full Text Available With the advent of technological era, conversion of scanned document (handwritten or printed into machine editable format has attracted many researchers. This paper deals with the problem of recognition of Gujarati handwritten numerals. Gujarati numeral recognition requires performing some specific steps as a part of preprocessing. For preprocessing digitization, segmentation, normalization and thinning are done with considering that the image have almost no noise. Further affine invariant moments based model is used for feature extraction and finally Support Vector Machine (SVM and Fuzzy classifiers are used for numeral classification. . The comparison of SVM and Fuzzy classifier is made and it can be seen that SVM procured better results as compared to Fuzzy Classifier.

  9. Optimal threshold estimation for binary classifiers using game theory.

    Science.gov (United States)

    Sanchez, Ignacio Enrique

    2016-01-01

    Many bioinformatics algorithms can be understood as binary classifiers. They are usually compared using the area under the receiver operating characteristic ( ROC ) curve. On the other hand, choosing the best threshold for practical use is a complex task, due to uncertain and context-dependent skews in the abundance of positives in nature and in the yields/costs for correct/incorrect classification. We argue that considering a classifier as a player in a zero-sum game allows us to use the minimax principle from game theory to determine the optimal operating point. The proposed classifier threshold corresponds to the intersection between the ROC curve and the descending diagonal in ROC space and yields a minimax accuracy of 1-FPR. Our proposal can be readily implemented in practice, and reveals that the empirical condition for threshold estimation of "specificity equals sensitivity" maximizes robustness against uncertainties in the abundance of positives in nature and classification costs.

  10. Statistical text classifier to detect specific type of medical incidents.

    Science.gov (United States)

    Wong, Zoie Shui-Yee; Akiyama, Masanori

    2013-01-01

    WHO Patient Safety has put focus to increase the coherence and expressiveness of patient safety classification with the foundation of International Classification for Patient Safety (ICPS). Text classification and statistical approaches has showed to be successful to identifysafety problems in the Aviation industryusing incident text information. It has been challenging to comprehend the taxonomy of medical incidents in a structured manner. Independent reporting mechanisms for patient safety incidents have been established in the UK, Canada, Australia, Japan, Hong Kong etc. This research demonstrates the potential to construct statistical text classifiers to detect specific type of medical incidents using incident text data. An illustrative example for classifying look-alike sound-alike (LASA) medication incidents using structured text from 227 advisories related to medication errors from Global Patient Safety Alerts (GPSA) is shown in this poster presentation. The classifier was built using logistic regression model. ROC curve and the AUC value indicated that this is a satisfactory good model.

  11. A Topic Model Approach to Representing and Classifying Football Plays

    KAUST Repository

    Varadarajan, Jagannadan

    2013-09-09

    We address the problem of modeling and classifying American Football offense teams’ plays in video, a challenging example of group activity analysis. Automatic play classification will allow coaches to infer patterns and tendencies of opponents more ef- ficiently, resulting in better strategy planning in a game. We define a football play as a unique combination of player trajectories. To this end, we develop a framework that uses player trajectories as inputs to MedLDA, a supervised topic model. The joint maximiza- tion of both likelihood and inter-class margins of MedLDA in learning the topics allows us to learn semantically meaningful play type templates, as well as, classify different play types with 70% average accuracy. Furthermore, this method is extended to analyze individual player roles in classifying each play type. We validate our method on a large dataset comprising 271 play clips from real-world football games, which will be made publicly available for future comparisons.

  12. Multiple tag labeling method for DNA sequencing

    Science.gov (United States)

    Mathies, R.A.; Huang, X.C.; Quesada, M.A.

    1995-07-25

    A DNA sequencing method is described which uses single lane or channel electrophoresis. Sequencing fragments are separated in the lane and detected using a laser-excited, confocal fluorescence scanner. Each set of DNA sequencing fragments is separated in the same lane and then distinguished using a binary coding scheme employing only two different fluorescent labels. Also described is a method of using radioisotope labels. 5 figs.

  13. Species classifier choice is a key consideration when analysing low-complexity food microbiome data.

    Science.gov (United States)

    Walsh, Aaron M; Crispie, Fiona; O'Sullivan, Orla; Finnegan, Laura; Claesson, Marcus J; Cotter, Paul D

    2018-03-20

    The use of shotgun metagenomics to analyse low-complexity microbial communities in foods has the potential to be of considerable fundamental and applied value. However, there is currently no consensus with respect to choice of species classification tool, platform, or sequencing depth. Here, we benchmarked the performances of three high-throughput short-read sequencing platforms, the Illumina MiSeq, NextSeq 500, and Ion Proton, for shotgun metagenomics of food microbiota. Briefly, we sequenced six kefir DNA samples and a mock community DNA sample, the latter constructed by evenly mixing genomic DNA from 13 food-related bacterial species. A variety of bioinformatic tools were used to analyse the data generated, and the effects of sequencing depth on these analyses were tested by randomly subsampling reads. Compositional analysis results were consistent between the platforms at divergent sequencing depths. However, we observed pronounced differences in the predictions from species classification tools. Indeed, PERMANOVA indicated that there was no significant differences between the compositional results generated by the different sequencers (p = 0.693, R 2  = 0.011), but there was a significant difference between the results predicted by the species classifiers (p = 0.01, R 2  = 0.127). The relative abundances predicted by the classifiers, apart from MetaPhlAn2, were apparently biased by reference genome sizes. Additionally, we observed varying false-positive rates among the classifiers. MetaPhlAn2 had the lowest false-positive rate, whereas SLIMM had the greatest false-positive rate. Strain-level analysis results were also similar across platforms. Each platform correctly identified the strains present in the mock community, but accuracy was improved slightly with greater sequencing depth. Notably, PanPhlAn detected the dominant strains in each kefir sample above 500,000 reads per sample. Again, the outputs from functional profiling analysis using

  14. Supervised Machine Learning Algorithms Can Classify Open-Text Feedback of Doctor Performance With Human-Level Accuracy

    Science.gov (United States)

    2017-01-01

    Background Machine learning techniques may be an effective and efficient way to classify open-text reports on doctor’s activity for the purposes of quality assurance, safety, and continuing professional development. Objective The objective of the study was to evaluate the accuracy of machine learning algorithms trained to classify open-text reports of doctor performance and to assess the potential for classifications to identify significant differences in doctors’ professional performance in the United Kingdom. Methods We used 1636 open-text comments (34,283 words) relating to the performance of 548 doctors collected from a survey of clinicians’ colleagues using the General Medical Council Colleague Questionnaire (GMC-CQ). We coded 77.75% (1272/1636) of the comments into 5 global themes (innovation, interpersonal skills, popularity, professionalism, and respect) using a qualitative framework. We trained 8 machine learning algorithms to classify comments and assessed their performance using several training samples. We evaluated doctor performance using the GMC-CQ and compared scores between doctors with different classifications using t tests. Results Individual algorithm performance was high (range F score=.68 to .83). Interrater agreement between the algorithms and the human coder was highest for codes relating to “popular” (recall=.97), “innovator” (recall=.98), and “respected” (recall=.87) codes and was lower for the “interpersonal” (recall=.80) and “professional” (recall=.82) codes. A 10-fold cross-validation demonstrated similar performance in each analysis. When combined together into an ensemble of multiple algorithms, mean human-computer interrater agreement was .88. Comments that were classified as “respected,” “professional,” and “interpersonal” related to higher doctor scores on the GMC-CQ compared with comments that were not classified (P.05). Conclusions Machine learning algorithms can classify open-text feedback

  15. Supervised Machine Learning Algorithms Can Classify Open-Text Feedback of Doctor Performance With Human-Level Accuracy.

    Science.gov (United States)

    Gibbons, Chris; Richards, Suzanne; Valderas, Jose Maria; Campbell, John

    2017-03-15

    Machine learning techniques may be an effective and efficient way to classify open-text reports on doctor's activity for the purposes of quality assurance, safety, and continuing professional development. The objective of the study was to evaluate the accuracy of machine learning algorithms trained to classify open-text reports of doctor performance and to assess the potential for classifications to identify significant differences in doctors' professional performance in the United Kingdom. We used 1636 open-text comments (34,283 words) relating to the performance of 548 doctors collected from a survey of clinicians' colleagues using the General Medical Council Colleague Questionnaire (GMC-CQ). We coded 77.75% (1272/1636) of the comments into 5 global themes (innovation, interpersonal skills, popularity, professionalism, and respect) using a qualitative framework. We trained 8 machine learning algorithms to classify comments and assessed their performance using several training samples. We evaluated doctor performance using the GMC-CQ and compared scores between doctors with different classifications using t tests. Individual algorithm performance was high (range F score=.68 to .83). Interrater agreement between the algorithms and the human coder was highest for codes relating to "popular" (recall=.97), "innovator" (recall=.98), and "respected" (recall=.87) codes and was lower for the "interpersonal" (recall=.80) and "professional" (recall=.82) codes. A 10-fold cross-validation demonstrated similar performance in each analysis. When combined together into an ensemble of multiple algorithms, mean human-computer interrater agreement was .88. Comments that were classified as "respected," "professional," and "interpersonal" related to higher doctor scores on the GMC-CQ compared with comments that were not classified (P.05). Machine learning algorithms can classify open-text feedback of doctor performance into multiple themes derived by human raters with high

  16. A systems biology-based classifier for hepatocellular carcinoma diagnosis.

    Directory of Open Access Journals (Sweden)

    Yanqiong Zhang

    Full Text Available AIM: The diagnosis of hepatocellular carcinoma (HCC in the early stage is crucial to the application of curative treatments which are the only hope for increasing the life expectancy of patients. Recently, several large-scale studies have shed light on this problem through analysis of gene expression profiles to identify markers correlated with HCC progression. However, those marker sets shared few genes in common and were poorly validated using independent data. Therefore, we developed a systems biology based classifier by combining the differential gene expression with topological features of human protein interaction networks to enhance the ability of HCC diagnosis. METHODS AND RESULTS: In the Oncomine platform, genes differentially expressed in HCC tissues relative to their corresponding normal tissues were filtered by a corrected Q value cut-off and Concept filters. The identified genes that are common to different microarray datasets were chosen as the candidate markers. Then, their networks were analyzed by GeneGO Meta-Core software and the hub genes were chosen. After that, an HCC diagnostic classifier was constructed by Partial Least Squares modeling based on the microarray gene expression data of the hub genes. Validations of diagnostic performance showed that this classifier had high predictive accuracy (85.88∼92.71% and area under ROC curve (approximating 1.0, and that the network topological features integrated into this classifier contribute greatly to improving the predictive performance. Furthermore, it has been demonstrated that this modeling strategy is not only applicable to HCC, but also to other cancers. CONCLUSION: Our analysis suggests that the systems biology-based classifier that combines the differential gene expression and topological features of human protein interaction network may enhance the diagnostic performance of HCC classifier.

  17. On the consistency of Monte Carlo track structure DNA damage simulations

    Energy Technology Data Exchange (ETDEWEB)

    Pater, Piotr, E-mail: piotr.pater@mail.mcgill.ca; Seuntjens, Jan; El Naqa, Issam [McGill University, Montreal, Quebec H3G 1A4 (Canada); Bernal, Mario A. [Instituto de Fisica Gleb Wataghin, Universidade Estadual de Campinas, Campinas 13083-859 (Brazil)

    2014-12-15

    Purpose: Monte Carlo track structures (MCTS) simulations have been recognized as useful tools for radiobiological modeling. However, the authors noticed several issues regarding the consistency of reported data. Therefore, in this work, they analyze the impact of various user defined parameters on simulated direct DNA damage yields. In addition, they draw attention to discrepancies in published literature in DNA strand break (SB) yields and selected methodologies. Methods: The MCTS code Geant4-DNA was used to compare radial dose profiles in a nanometer-scale region of interest (ROI) for photon sources of varying sizes and energies. Then, electron tracks of 0.28 keV–220 keV were superimposed on a geometric DNA model composed of 2.7 × 10{sup 6} nucleosomes, and SBs were simulated according to four definitions based on energy deposits or energy transfers in DNA strand targets compared to a threshold energy E{sub TH}. The SB frequencies and complexities in nucleosomes as a function of incident electron energies were obtained. SBs were classified into higher order clusters such as single and double strand breaks (SSBs and DSBs) based on inter-SB distances and on the number of affected strands. Results: Comparisons of different nonuniform dose distributions lacking charged particle equilibrium may lead to erroneous conclusions regarding the effect of energy on relative biological effectiveness. The energy transfer-based SB definitions give similar SB yields as the one based on energy deposit when E{sub TH} ≈ 10.79 eV, but deviate significantly for higher E{sub TH} values. Between 30 and 40 nucleosomes/Gy show at least one SB in the ROI. The number of nucleosomes that present a complex damage pattern of more than 2 SBs and the degree of complexity of the damage in these nucleosomes diminish as the incident electron energy increases. DNA damage classification into SSB and DSB is highly dependent on the definitions of these higher order structures and their

  18. On the consistency of Monte Carlo track structure DNA damage simulations

    International Nuclear Information System (INIS)

    Pater, Piotr; Seuntjens, Jan; El Naqa, Issam; Bernal, Mario A.

    2014-01-01

    Purpose: Monte Carlo track structures (MCTS) simulations have been recognized as useful tools for radiobiological modeling. However, the authors noticed several issues regarding the consistency of reported data. Therefore, in this work, they analyze the impact of various user defined parameters on simulated direct DNA damage yields. In addition, they draw attention to discrepancies in published literature in DNA strand break (SB) yields and selected methodologies. Methods: The MCTS code Geant4-DNA was used to compare radial dose profiles in a nanometer-scale region of interest (ROI) for photon sources of varying sizes and energies. Then, electron tracks of 0.28 keV–220 keV were superimposed on a geometric DNA model composed of 2.7 × 10 6 nucleosomes, and SBs were simulated according to four definitions based on energy deposits or energy transfers in DNA strand targets compared to a threshold energy E TH . The SB frequencies and complexities in nucleosomes as a function of incident electron energies were obtained. SBs were classified into higher order clusters such as single and double strand breaks (SSBs and DSBs) based on inter-SB distances and on the number of affected strands. Results: Comparisons of different nonuniform dose distributions lacking charged particle equilibrium may lead to erroneous conclusions regarding the effect of energy on relative biological effectiveness. The energy transfer-based SB definitions give similar SB yields as the one based on energy deposit when E TH ≈ 10.79 eV, but deviate significantly for higher E TH values. Between 30 and 40 nucleosomes/Gy show at least one SB in the ROI. The number of nucleosomes that present a complex damage pattern of more than 2 SBs and the degree of complexity of the damage in these nucleosomes diminish as the incident electron energy increases. DNA damage classification into SSB and DSB is highly dependent on the definitions of these higher order structures and their implementations. The authors

  19. Huffman coding in advanced audio coding standard

    Science.gov (United States)

    Brzuchalski, Grzegorz

    2012-05-01

    This article presents several hardware architectures of Advanced Audio Coding (AAC) Huffman noiseless encoder, its optimisations and working implementation. Much attention has been paid to optimise the demand of hardware resources especially memory size. The aim of design was to get as short binary stream as possible in this standard. The Huffman encoder with whole audio-video system has been implemented in FPGA devices.

  20. Feature Selection and Classifier Development for Radio Frequency Device Identification

    Science.gov (United States)

    2015-12-01

    248 Table C-5: p-values vs Test Statistic for Wine Quality. ................................................. 248 Table C-6...p-values vs Test Statistic for Wisconsin Breast Cancer. .............................. 249 Table C-7: p-values vs Test Statistic for Wine ...electronic devices, such as RF-DNA. A brief review of the various approaches is considered to illustrate the benefits of the RF-DNA approach. 40

  1. Implications of physical symmetries in adaptive image classifiers

    DEFF Research Database (Denmark)

    Sams, Thomas; Hansen, Jonas Lundbek

    2000-01-01

    It is demonstrated that rotational invariance and reflection symmetry of image classifiers lead to a reduction in the number of free parameters in the classifier. When used in adaptive detectors, e.g. neural networks, this may be used to decrease the number of training samples necessary to learn...... a given classification task, or to improve generalization of the neural network. Notably, the symmetrization of the detector does not compromise the ability to distinguish objects that break the symmetry. (C) 2000 Elsevier Science Ltd. All rights reserved....

  2. Silicon nanowire arrays as learning chemical vapour classifiers

    International Nuclear Information System (INIS)

    Niskanen, A O; Colli, A; White, R; Li, H W; Spigone, E; Kivioja, J M

    2011-01-01

    Nanowire field-effect transistors are a promising class of devices for various sensing applications. Apart from detecting individual chemical or biological analytes, it is especially interesting to use multiple selective sensors to look at their collective response in order to perform classification into predetermined categories. We show that non-functionalised silicon nanowire arrays can be used to robustly classify different chemical vapours using simple statistical machine learning methods. We were able to distinguish between acetone, ethanol and water with 100% accuracy while methanol, ethanol and 2-propanol were classified with 96% accuracy in ambient conditions.

  3. DNA repair

    International Nuclear Information System (INIS)

    Setlow, R.

    1978-01-01

    Some topics discussed are as follows: difficulty in extrapolating data from E. coli to mammalian systems; mutations caused by UV-induced changes in DNA; mutants deficient in excision repair; other postreplication mechanisms; kinds of excision repair systems; detection of repair by biochemical or biophysical means; human mutants deficient in repair; mutagenic effects of UV on XP cells; and detection of UV-repair defects among XP individuals

  4. Report number codes

    Energy Technology Data Exchange (ETDEWEB)

    Nelson, R.N. (ed.)

    1985-05-01

    This publication lists all report number codes processed by the Office of Scientific and Technical Information. The report codes are substantially based on the American National Standards Institute, Standard Technical Report Number (STRN)-Format and Creation Z39.23-1983. The Standard Technical Report Number (STRN) provides one of the primary methods of identifying a specific technical report. The STRN consists of two parts: The report code and the sequential number. The report code identifies the issuing organization, a specific program, or a type of document. The sequential number, which is assigned in sequence by each report issuing entity, is not included in this publication. Part I of this compilation is alphabetized by report codes followed by issuing installations. Part II lists the issuing organization followed by the assigned report code(s). In both Parts I and II, the names of issuing organizations appear for the most part in the form used at the time the reports were issued. However, for some of the more prolific installations which have had name changes, all entries have been merged under the current name.

  5. Report number codes

    International Nuclear Information System (INIS)

    Nelson, R.N.

    1985-05-01

    This publication lists all report number codes processed by the Office of Scientific and Technical Information. The report codes are substantially based on the American National Standards Institute, Standard Technical Report Number (STRN)-Format and Creation Z39.23-1983. The Standard Technical Report Number (STRN) provides one of the primary methods of identifying a specific technical report. The STRN consists of two parts: The report code and the sequential number. The report code identifies the issuing organization, a specific program, or a type of document. The sequential number, which is assigned in sequence by each report issuing entity, is not included in this publication. Part I of this compilation is alphabetized by report codes followed by issuing installations. Part II lists the issuing organization followed by the assigned report code(s). In both Parts I and II, the names of issuing organizations appear for the most part in the form used at the time the reports were issued. However, for some of the more prolific installations which have had name changes, all entries have been merged under the current name

  6. A Consistent System for Coding Laboratory Samples

    Science.gov (United States)

    Sih, John C.

    1996-07-01

    A formal laboratory coding system is presented to keep track of laboratory samples. Preliminary useful information regarding the sample (origin and history) is gained without consulting a research notebook. Since this system uses and retains the same research notebook page number for each new experiment (reaction), finding and distinguishing products (samples) of the same or different reactions becomes an easy task. Using this system multiple products generated from a single reaction can be identified and classified in a uniform fashion. Samples can be stored and filed according to stage and degree of purification, e.g. crude reaction mixtures, recrystallized samples, chromatographed or distilled products.

  7. Is a genome a codeword of an error-correcting code?

    Directory of Open Access Journals (Sweden)

    Luzinete C B Faria

    Full Text Available Since a genome is a discrete sequence, the elements of which belong to a set of four letters, the question as to whether or not there is an error-correcting code underlying DNA sequences is unavoidable. The most common approach to answering this question is to propose a methodology to verify the existence of such a code. However, none of the methodologies proposed so far, although quite clever, has achieved that goal. In a recent work, we showed that DNA sequences can be identified as codewords in a class of cyclic error-correcting codes known as Hamming codes. In this paper, we show that a complete intron-exon gene, and even a plasmid genome, can be identified as a Hamming code codeword as well. Although this does not constitute a definitive proof that there is an error-correcting code underlying DNA sequences, it is the first evidence in this direction.

  8. A mitochondrial DNA SNP multiplex assigning Caucasians into 36 haplo- and subhaplogroups

    DEFF Research Database (Denmark)

    Mikkelsen, Martin; Rockenbauer, Eszter; Sørensen, Erik

    2008-01-01

    Mitochondrial DNA (mtDNA) is maternally inherited without recombination events and has a high copy number, which makes mtDNA analysis feasible even when genomic DNA is sparse or degraded. Here, we present a SNP typing assay with 33 previously described mtDNA coding region SNPs for haplogroup...... previously typed by sequencing of the mitochondrial HV1 and HV2 regions. Haplogroup assignments based on mtDNA coding region SNPs and sequencing of HV1 and HV2 regions gave identical results for 27% of the samples, and except for one sample, differences in haplogroup assignments were at the subhaplogroup...

  9. Cryptography cracking codes

    CERN Document Server

    2014-01-01

    While cracking a code might seem like something few of us would encounter in our daily lives, it is actually far more prevalent than we may realize. Anyone who has had personal information taken because of a hacked email account can understand the need for cryptography and the importance of encryption-essentially the need to code information to keep it safe. This detailed volume examines the logic and science behind various ciphers, their real world uses, how codes can be broken, and the use of technology in this oft-overlooked field.

  10. Coded Splitting Tree Protocols

    DEFF Research Database (Denmark)

    Sørensen, Jesper Hemming; Stefanovic, Cedomir; Popovski, Petar

    2013-01-01

    This paper presents a novel approach to multiple access control called coded splitting tree protocol. The approach builds on the known tree splitting protocols, code structure and successive interference cancellation (SIC). Several instances of the tree splitting protocol are initiated, each...... instance is terminated prematurely and subsequently iterated. The combined set of leaves from all the tree instances can then be viewed as a graph code, which is decodable using belief propagation. The main design problem is determining the order of splitting, which enables successful decoding as early...

  11. Transport theory and codes

    International Nuclear Information System (INIS)

    Clancy, B.E.

    1986-01-01

    This chapter begins with a neutron transport equation which includes the one dimensional plane geometry problems, the one dimensional spherical geometry problems, and numerical solutions. The section on the ANISN code and its look-alikes covers problems which can be solved; eigenvalue problems; outer iteration loop; inner iteration loop; and finite difference solution procedures. The input and output data for ANISN is also discussed. Two dimensional problems such as the DOT code are given. Finally, an overview of the Monte-Carlo methods and codes are elaborated on

  12. Gravity inversion code

    International Nuclear Information System (INIS)

    Burkhard, N.R.

    1979-01-01

    The gravity inversion code applies stabilized linear inverse theory to determine the topography of a subsurface density anomaly from Bouguer gravity data. The gravity inversion program consists of four source codes: SEARCH, TREND, INVERT, and AVERAGE. TREND and INVERT are used iteratively to converge on a solution. SEARCH forms the input gravity data files for Nevada Test Site data. AVERAGE performs a covariance analysis on the solution. This document describes the necessary input files and the proper operation of the code. 2 figures, 2 tables

  13. DNA cards: determinants of DNA yield and quality in collecting genetic samples for pharmacogenetic studies.

    Science.gov (United States)

    Mas, Sergi; Crescenti, Anna; Gassó, Patricia; Vidal-Taboada, Jose M; Lafuente, Amalia

    2007-08-01

    As pharmacogenetic studies frequently require establishment of DNA banks containing large cohorts with multi-centric designs, inexpensive methods for collecting and storing high-quality DNA are needed. The aims of this study were two-fold: to compare the amount and quality of DNA obtained from two different DNA cards (IsoCode Cards or FTA Classic Cards, Whatman plc, Brentford, Middlesex, UK); and to evaluate the effects of time and storage temperature, as well as the influence of anticoagulant ethylenediaminetetraacetic acid on the DNA elution procedure. The samples were genotyped by several methods typically used in pharmacogenetic studies: multiplex PCR, PCR-restriction fragment length polymorphism, single nucleotide primer extension, and allelic discrimination assay. In addition, they were amplified by whole genome amplification to increase genomic DNA mass. Time, storage temperature and ethylenediaminetetraacetic acid had no significant effects on either DNA card. This study reveals the importance of drying blood spots prior to isolation to avoid haemoglobin interference. Moreover, our results demonstrate that re-isolation protocols could be applied to increase the amount of DNA recovered. The samples analysed were accurately genotyped with all the methods examined herein. In conclusion, our study shows that both DNA cards, IsoCode Cards and FTA Classic Cards, facilitate genetic and pharmacogenetic testing for routine clinical practice.

  14. 18 CFR 367.18 - Criteria for classifying leases.

    Science.gov (United States)

    2010-04-01

    ... the lessee) must not give rise to a new classification of a lease for accounting purposes. ... classifying the lease. (4) The present value at the beginning of the lease term of the minimum lease payments... taxes to be paid by the lessor, including any related profit, equals or exceeds 90 percent of the excess...

  15. Discrimination-Aware Classifiers for Student Performance Prediction

    Science.gov (United States)

    Luo, Ling; Koprinska, Irena; Liu, Wei

    2015-01-01

    In this paper we consider discrimination-aware classification of educational data. Mining and using rules that distinguish groups of students based on sensitive attributes such as gender and nationality may lead to discrimination. It is desirable to keep the sensitive attributes during the training of a classifier to avoid information loss but…

  16. 29 CFR 1910.307 - Hazardous (classified) locations.

    Science.gov (United States)

    2010-07-01

    ... equipment at the location. (c) Electrical installations. Equipment, wiring methods, and installations of... covers the requirements for electric equipment and wiring in locations that are classified depending on... provisions of this section. (4) Division and zone classification. In Class I locations, an installation must...

  17. 29 CFR 1926.407 - Hazardous (classified) locations.

    Science.gov (United States)

    2010-07-01

    ...) locations, unless modified by provisions of this section. (b) Electrical installations. Equipment, wiring..., DEPARTMENT OF LABOR (CONTINUED) SAFETY AND HEALTH REGULATIONS FOR CONSTRUCTION Electrical Installation Safety... electric equipment and wiring in locations which are classified depending on the properties of the...

  18. 18 CFR 3a.71 - Accountability for classified material.

    Science.gov (United States)

    2010-04-01

    ... numbers assigned to top secret material will be separate from the sequence for other classified material... central control registry in calendar year 1969. TS 1006—Sixth Top Secret document controlled by the... control registry when the document is transferred. (e) For Top Secret documents only, an access register...

  19. Classifier fusion for VoIP attacks classification

    Science.gov (United States)

    Safarik, Jakub; Rezac, Filip

    2017-05-01

    SIP is one of the most successful protocols in the field of IP telephony communication. It establishes and manages VoIP calls. As the number of SIP implementation rises, we can expect a higher number of attacks on the communication system in the near future. This work aims at malicious SIP traffic classification. A number of various machine learning algorithms have been developed for attack classification. The paper presents a comparison of current research and the use of classifier fusion method leading to a potential decrease in classification error rate. Use of classifier combination makes a more robust solution without difficulties that may affect single algorithms. Different voting schemes, combination rules, and classifiers are discussed to improve the overall performance. All classifiers have been trained on real malicious traffic. The concept of traffic monitoring depends on the network of honeypot nodes. These honeypots run in several networks spread in different locations. Separation of honeypots allows us to gain an independent and trustworthy attack information.

  20. Bayesian Classifier for Medical Data from Doppler Unit

    Directory of Open Access Journals (Sweden)

    J. Málek

    2006-01-01

    Full Text Available Nowadays, hand-held ultrasonic Doppler units (probes are often used for noninvasive screening of atherosclerosis in the arteries of the lower limbs. The mean velocity of blood flow in time and blood pressures are measured on several positions on each lower limb. By listening to the acoustic signal generated by the device or by reading the signal displayed on screen, a specialist can detect peripheral arterial disease (PAD.This project aims to design software that will be able to analyze data from such a device and classify it into several diagnostic classes. At the Department of Functional Diagnostics at the Regional Hospital in Liberec a database of several hundreds signals was collected. In cooperation with the specialist, the signals were manually classified into four classes. For each class, selected signal features were extracted and then used for training a Bayesian classifier. Another set of signals was used for evaluating and optimizing the parameters of the classifier. Slightly above 84 % of successfully recognized diagnostic states, was recently achieved on the test data. 

  1. An Investigation to Improve Classifier Accuracy for Myo Collected Data

    Science.gov (United States)

    2017-02-01

    Bad Samples Effect on Classification Accuracy 7 5.1 Naïve Bayes (NB) Classifier Accuracy 7 5.2 Logistic Model Tree (LMT) 10 5.3 K-Nearest Neighbor...gesture, pitch feature, user 06. All samples exhibit reversed movement...20 Fig. A-2 Come gesture, pitch feature, user 14. All samples exhibit reversed movement

  2. Diagnosis of Broiler Livers by Classifying Image Patches

    DEFF Research Database (Denmark)

    Jørgensen, Anders; Fagertun, Jens; Moeslund, Thomas B.

    2017-01-01

    The manual health inspection are becoming the bottleneck at poultry processing plants. We present a computer vision method for automatic diagnosis of broiler livers. The non-rigid livers, of varying shape and sizes, are classified in patches by a convolutional neural network, outputting maps...

  3. Support vector machines classifiers of physical activities in preschoolers

    Science.gov (United States)

    The goal of this study is to develop, test, and compare multinomial logistic regression (MLR) and support vector machines (SVM) in classifying preschool-aged children physical activity data acquired from an accelerometer. In this study, 69 children aged 3-5 years old were asked to participate in a s...

  4. A Linguistic Image of Nature: The Burmese Numerative Classifier System

    Science.gov (United States)

    Becker, Alton L.

    1975-01-01

    The Burmese classifier system is coherent because it is based upon a single elementary semantic dimension: deixis. On that dimension, four distances are distinguished, distances which metaphorically substitute for other conceptual relations between people and other living beings, people and things, and people and concepts. (Author/RM)

  5. Data Stream Classification Based on the Gamma Classifier

    Directory of Open Access Journals (Sweden)

    Abril Valeria Uriarte-Arcia

    2015-01-01

    Full Text Available The ever increasing data generation confronts us with the problem of handling online massive amounts of information. One of the biggest challenges is how to extract valuable information from these massive continuous data streams during single scanning. In a data stream context, data arrive continuously at high speed; therefore the algorithms developed to address this context must be efficient regarding memory and time management and capable of detecting changes over time in the underlying distribution that generated the data. This work describes a novel method for the task of pattern classification over a continuous data stream based on an associative model. The proposed method is based on the Gamma classifier, which is inspired by the Alpha-Beta associative memories, which are both supervised pattern recognition models. The proposed method is capable of handling the space and time constrain inherent to data stream scenarios. The Data Streaming Gamma classifier (DS-Gamma classifier implements a sliding window approach to provide concept drift detection and a forgetting mechanism. In order to test the classifier, several experiments were performed using different data stream scenarios with real and synthetic data streams. The experimental results show that the method exhibits competitive performance when compared to other state-of-the-art algorithms.

  6. Building an automated SOAP classifier for emergency department reports.

    Science.gov (United States)

    Mowery, Danielle; Wiebe, Janyce; Visweswaran, Shyam; Harkema, Henk; Chapman, Wendy W

    2012-02-01

    Information extraction applications that extract structured event and entity information from unstructured text can leverage knowledge of clinical report structure to improve performance. The Subjective, Objective, Assessment, Plan (SOAP) framework, used to structure progress notes to facilitate problem-specific, clinical decision making by physicians, is one example of a well-known, canonical structure in the medical domain. Although its applicability to structuring data is understood, its contribution to information extraction tasks has not yet been determined. The first step to evaluating the SOAP framework's usefulness for clinical information extraction is to apply the model to clinical narratives and develop an automated SOAP classifier that classifies sentences from clinical reports. In this quantitative study, we applied the SOAP framework to sentences from emergency department reports, and trained and evaluated SOAP classifiers built with various linguistic features. We found the SOAP framework can be applied manually to emergency department reports with high agreement (Cohen's kappa coefficients over 0.70). Using a variety of features, we found classifiers for each SOAP class can be created with moderate to outstanding performance with F(1) scores of 93.9 (subjective), 94.5 (objective), 75.7 (assessment), and 77.0 (plan). We look forward to expanding the framework and applying the SOAP classification to clinical information extraction tasks. Copyright © 2011. Published by Elsevier Inc.

  7. Learning to classify wakes from local sensory information

    Science.gov (United States)

    Alsalman, Mohamad; Colvert, Brendan; Kanso, Eva; Kanso Team

    2017-11-01

    Aquatic organisms exhibit remarkable abilities to sense local flow signals contained in their fluid environment and to surmise the origins of these flows. For example, fish can discern the information contained in various flow structures and utilize this information for obstacle avoidance and prey tracking. Flow structures created by flapping and swimming bodies are well characterized in the fluid dynamics literature; however, such characterization relies on classical methods that use an external observer to reconstruct global flow fields. The reconstructed flows, or wakes, are then classified according to the unsteady vortex patterns. Here, we propose a new approach for wake identification: we classify the wakes resulting from a flapping airfoil by applying machine learning algorithms to local flow information. In particular, we simulate the wakes of an oscillating airfoil in an incoming flow, extract the downstream vorticity information, and train a classifier to learn the different flow structures and classify new ones. This data-driven approach provides a promising framework for underwater navigation and detection in application to autonomous bio-inspired vehicles.

  8. The Closing of the Classified Catalog at Boston University

    Science.gov (United States)

    Hazen, Margaret Hindle

    1974-01-01

    Although the classified catalog at Boston University libraries has been a useful research tool, it has proven too expensive to keep current. The library has converted to a traditional alphabetic subject catalog and will recieve catalog cards from the Ohio College Library Center through the New England Library Network. (Author/LS)

  9. Recognition of Arabic Sign Language Alphabet Using Polynomial Classifiers

    Directory of Open Access Journals (Sweden)

    M. Al-Rousan

    2005-08-01

    Full Text Available Building an accurate automatic sign language recognition system is of great importance in facilitating efficient communication with deaf people. In this paper, we propose the use of polynomial classifiers as a classification engine for the recognition of Arabic sign language (ArSL alphabet. Polynomial classifiers have several advantages over other classifiers in that they do not require iterative training, and that they are highly computationally scalable with the number of classes. Based on polynomial classifiers, we have built an ArSL system and measured its performance using real ArSL data collected from deaf people. We show that the proposed system provides superior recognition results when compared with previously published results using ANFIS-based classification on the same dataset and feature extraction methodology. The comparison is shown in terms of the number of misclassified test patterns. The reduction in the rate of misclassified patterns was very significant. In particular, we have achieved a 36% reduction of misclassifications on the training data and 57% on the test data.

  10. Reconfigurable support vector machine classifier with approximate computing

    NARCIS (Netherlands)

    van Leussen, M.J.; Huisken, J.; Wang, L.; Jiao, H.; De Gyvez, J.P.

    2017-01-01

    Support Vector Machine (SVM) is one of the most popular machine learning algorithms. An energy-efficient SVM classifier is proposed in this paper, where approximate computing is utilized to reduce energy consumption and silicon area. A hardware architecture with reconfigurable kernels and

  11. Classifying regularized sensor covariance matrices: An alternative to CSP

    NARCIS (Netherlands)

    Roijendijk, L.M.M.; Gielen, C.C.A.M.; Farquhar, J.D.R.

    2016-01-01

    Common spatial patterns ( CSP) is a commonly used technique for classifying imagined movement type brain-computer interface ( BCI) datasets. It has been very successful with many extensions and improvements on the basic technique. However, a drawback of CSP is that the signal processing pipeline

  12. Classifying regularised sensor covariance matrices: An alternative to CSP

    NARCIS (Netherlands)

    Roijendijk, L.M.M.; Gielen, C.C.A.M.; Farquhar, J.D.R.

    2016-01-01

    Common spatial patterns (CSP) is a commonly used technique for classifying imagined movement type brain computer interface (BCI) datasets. It has been very successful with many extensions and improvements on the basic technique. However, a drawback of CSP is that the signal processing pipeline

  13. Two-categorical bundles and their classifying spaces

    DEFF Research Database (Denmark)

    Baas, Nils A.; Bökstedt, M.; Kro, T.A.

    2012-01-01

    -category is a classifying space for the associated principal 2-bundles. In the process of proving this we develop a lot of powerful machinery which may be useful in further studies of 2-categorical topology. As a corollary we get a new proof of the classification of principal bundles. A calculation based...

  14. 3 CFR - Classified Information and Controlled Unclassified Information

    Science.gov (United States)

    2010-01-01

    ... on Transparency and Open Government and on the Freedom of Information Act, my Administration is... memoranda of January 21, 2009, on Transparency and Open Government and on the Freedom of Information Act; (B... 3 The President 1 2010-01-01 2010-01-01 false Classified Information and Controlled Unclassified...

  15. Comparison of Classifier Architectures for Online Neural Spike Sorting.

    Science.gov (United States)

    Saeed, Maryam; Khan, Amir Ali; Kamboh, Awais Mehmood

    2017-04-01

    High-density, intracranial recordings from micro-electrode arrays need to undergo Spike Sorting in order to associate the recorded neuronal spikes to particular neurons. This involves spike detection, feature extraction, and classification. To reduce the data transmission and power requirements, on-chip real-time processing is becoming very popular. However, high computational resources are required for classifiers in on-chip spike-sorters, making scalability a great challenge. In this review paper, we analyze several popular classifiers to propose five new hardware architectures using the off-chip training with on-chip classification approach. These include support vector classification, fuzzy C-means classification, self-organizing maps classification, moving-centroid K-means classification, and Cosine distance classification. The performance of these architectures is analyzed in terms of accuracy and resource requirement. We establish that the neural networks based Self-Organizing Maps classifier offers the most viable solution. A spike sorter based on the Self-Organizing Maps classifier, requires only 7.83% of computational resources of the best-reported spike sorter, hierarchical adaptive means, while offering a 3% better accuracy at 7 dB SNR.

  16. A Gene Expression Classifier of Node-Positive Colorectal Cancer

    Directory of Open Access Journals (Sweden)

    Paul F. Meeh

    2009-10-01

    Full Text Available We used digital long serial analysis of gene expression to discover gene expression differences between node-negative and node-positive colorectal tumors and developed a multigene classifier able to discriminate between these two tumor types. We prepared and sequenced long serial analysis of gene expression libraries from one node-negative and one node-positive colorectal tumor, sequenced to a depth of 26,060 unique tags, and identified 262 tags significantly differentially expressed between these two tumors (P < 2 x 10-6. We confirmed the tag-to-gene assignments and differential expression of 31 genes by quantitative real-time polymerase chain reaction, 12 of which were elevated in the node-positive tumor. We analyzed the expression levels of these 12 upregulated genes in a validation panel of 23 additional tumors and developed an optimized seven-gene logistic regression classifier. The classifier discriminated between node-negative and node-positive tumors with 86% sensitivity and 80% specificity. Receiver operating characteristic analysis of the classifier revealed an area under the curve of 0.86. Experimental manipulation of the function of one classification gene, Fibronectin, caused profound effects on invasion and migration of colorectal cancer cells in vitro. These results suggest that the development of node-positive colorectal cancer occurs in part through elevated epithelial FN1 expression and suggest novel strategies for the diagnosis and treatment of advanced disease.

  17. Cascaded lexicalised classifiers for second-person reference resolution

    NARCIS (Netherlands)

    Purver, M.; Fernández, R.; Frampton, M.; Peters, S.; Healey, P.; Pieraccini, R.; Byron, D.; Young, S.; Purver, M.

    2009-01-01

    This paper examines the resolution of the second person English pronoun you in multi-party dialogue. Following previous work, we attempt to classify instances as generic or referential, and in the latter case identify the singular or plural addressee. We show that accuracy and robustness can be

  18. Human Activity Recognition by Combining a Small Number of Classifiers.

    Science.gov (United States)

    Nazabal, Alfredo; Garcia-Moreno, Pablo; Artes-Rodriguez, Antonio; Ghahramani, Zoubin

    2016-09-01

    We consider the problem of daily human activity recognition (HAR) using multiple wireless inertial sensors, and specifically, HAR systems with a very low number of sensors, each one providing an estimation of the performed activities. We propose new Bayesian models to combine the output of the sensors. The models are based on a soft outputs combination of individual classifiers to deal with the small number of sensors. We also incorporate the dynamic nature of human activities as a first-order homogeneous Markov chain. We develop both inductive and transductive inference methods for each model to be employed in supervised and semisupervised situations, respectively. Using different real HAR databases, we compare our classifiers combination models against a single classifier that employs all the signals from the sensors. Our models exhibit consistently a reduction of the error rate and an increase of robustness against sensor failures. Our models also outperform other classifiers combination models that do not consider soft outputs and an Markovian structure of the human activities.

  19. Evaluation of three classifiers in mapping forest stand types using ...

    African Journals Online (AJOL)

    EJIRO

    applied for classification of the image. Supervised classification technique using maximum likelihood algorithm is the most commonly and widely used method for land cover classification (Jia and Richards, 2006). In Australia, the maximum likelihood classifier was effectively used to map different forest stand types with high.

  20. Classifying patients' complaints for regulatory purposes : A Pilot Study

    NARCIS (Netherlands)

    Bouwman, R.J.R.; Bomhoff, Manja; Robben, Paul; Friele, R.D.

    2018-01-01

    Objectives: It is assumed that classifying and aggregated reporting of patients' complaints by regulators helps to identify problem areas, to respond better to patients and increase public accountability. This pilot study addresses what a classification of complaints in a regulatory setting

  1. Localizing genes to cerebellar layers by classifying ISH images.

    Directory of Open Access Journals (Sweden)

    Lior Kirsch

    Full Text Available Gene expression controls how the brain develops and functions. Understanding control processes in the brain is particularly hard since they involve numerous types of neurons and glia, and very little is known about which genes are expressed in which cells and brain layers. Here we describe an approach to detect genes whose expression is primarily localized to a specific brain layer and apply it to the mouse cerebellum. We learn typical spatial patterns of expression from a few markers that are known to be localized to specific layers, and use these patterns to predict localization for new genes. We analyze images of in-situ hybridization (ISH experiments, which we represent using histograms of local binary patterns (LBP and train image classifiers and gene classifiers for four layers of the cerebellum: the Purkinje, granular, molecular and white matter layer. On held-out data, the layer classifiers achieve accuracy above 94% (AUC by representing each image at multiple scales and by combining multiple image scores into a single gene-level decision. When applied to the full mouse genome, the classifiers predict specific layer localization for hundreds of new genes in the Purkinje and granular layers. Many genes localized to the Purkinje layer are likely to be expressed in astrocytes, and many others are involved in lipid metabolism, possibly due to the unusual size of Purkinje cells.

  2. An ensemble self-training protein interaction article classifier.

    Science.gov (United States)

    Chen, Yifei; Hou, Ping; Manderick, Bernard

    2014-01-01

    Protein-protein interaction (PPI) is essential to understand the fundamental processes governing cell biology. The mining and curation of PPI knowledge are critical for analyzing proteomics data. Hence it is desired to classify articles PPI-related or not automatically. In order to build interaction article classification systems, an annotated corpus is needed. However, it is usually the case that only a small number of labeled articles can be obtained manually. Meanwhile, a large number of unlabeled articles are available. By combining ensemble learning and semi-supervised self-training, an ensemble self-training interaction classifier called EST_IACer is designed to classify PPI-related articles based on a small number of labeled articles and a large number of unlabeled articles. A biological background based feature weighting strategy is extended using the category information from both labeled and unlabeled data. Moreover, a heuristic constraint is put forward to select optimal instances from unlabeled data to improve the performance further. Experiment results show that the EST_IACer can classify the PPI related articles effectively and efficiently.

  3. Classifying Your Food as Acid, Low-Acid, or Acidified

    OpenAIRE

    Bacon, Karleigh

    2012-01-01

    As a food entrepreneur, you should be aware of how ingredients in your product make the food look, feel, and taste; as well as how the ingredients create environments for microorganisms like bacteria, yeast, and molds to survive and grow. This guide will help you classifying your food as acid, low-acid, or acidified.

  4. Gene-expression Classifier in Papillary Thyroid Carcinoma

    DEFF Research Database (Denmark)

    Londero, Stefano Christian; Jespersen, Marie Louise; Krogdahl, Annelise

    2016-01-01

    BACKGROUND: No reliable biomarker for metastatic potential in the risk stratification of papillary thyroid carcinoma exists. We aimed to develop a gene-expression classifier for metastatic potential. MATERIALS AND METHODS: Genome-wide expression analyses were used. Development cohort: freshly...

  5. Abbreviations: Their Effects on Comprehension of Classified Advertisements.

    Science.gov (United States)

    Sokol, Kirstin R.

    Two experimental designs were used to test the hypothesis that abbreviations in classified advertisements decrease the reader's comprehension of such ads. In the first experimental design, 73 high school students read four ads (for employment, used cars, apartments for rent, and articles for sale) either with abbreviations or with all…

  6. A lncRNA to repair DNA

    DEFF Research Database (Denmark)

    Lukas, Jiri; Altmeyer, Matthias

    2015-01-01

    Long non-coding RNAs (lncRNAs) have emerged as regulators of various biological processes, but to which extent lncRNAs play a role in genome integrity maintenance is not well understood. In this issue of EMBO Reports, Sharma et al [1] identify the DNA damage-induced lncRNA DDSR1 as an integral...... player of the DNA damage response (DDR). DDSR1 has both an early role by modulating repair pathway choices, and a later function when it regulates gene expression. Sharma et al [1] thus uncover a dual role for a hitherto uncharacterized lncRNA during the cellular response to DNA damage....

  7. Fulcrum Network Codes

    DEFF Research Database (Denmark)

    2015-01-01

    Fulcrum network codes, which are a network coding framework, achieve three objectives: (i) to reduce the overhead per coded packet to almost 1 bit per source packet; (ii) to operate the network using only low field size operations at intermediate nodes, dramatically reducing complexity...... in the network; and (iii) to deliver an end-to-end performance that is close to that of a high field size network coding system for high-end receivers while simultaneously catering to low-end ones that can only decode in a lower field size. Sources may encode using a high field size expansion to increase...... the number of dimensions seen by the network using a linear mapping. Receivers can tradeoff computational effort with network delay, decoding in the high field size, the low field size, or a combination thereof....

  8. Supervised Convolutional Sparse Coding

    KAUST Repository

    Affara, Lama Ahmed; Ghanem, Bernard; Wonka, Peter

    2018-01-01

    coding, which aims at learning discriminative dictionaries instead of purely reconstructive ones. We incorporate a supervised regularization term into the traditional unsupervised CSC objective to encourage the final dictionary elements

  9. SASSYS LMFBR systems code

    International Nuclear Information System (INIS)

    Dunn, F.E.; Prohammer, F.G.; Weber, D.P.

    1983-01-01

    The SASSYS LMFBR systems analysis code is being developed mainly to analyze the behavior of the shut-down heat-removal system and the consequences of failures in the system, although it is also capable of analyzing a wide range of transients, from mild operational transients through more severe transients leading to sodium boiling in the core and possible melting of clad and fuel. The code includes a detailed SAS4A multi-channel core treatment plus a general thermal-hydraulic treatment of the primary and intermediate heat-transport loops and the steam generators. The code can handle any LMFBR design, loop or pool, with an arbitrary arrangement of components. The code is fast running: usually faster than real time

  10. OCA Code Enforcement

    Data.gov (United States)

    Montgomery County of Maryland — The Office of the County Attorney (OCA) processes Code Violation Citations issued by County agencies. The citations can be viewed by issued department, issued date...

  11. The fast code

    Energy Technology Data Exchange (ETDEWEB)

    Freeman, L.N.; Wilson, R.E. [Oregon State Univ., Dept. of Mechanical Engineering, Corvallis, OR (United States)

    1996-09-01

    The FAST Code which is capable of determining structural loads on a flexible, teetering, horizontal axis wind turbine is described and comparisons of calculated loads with test data are given at two wind speeds for the ESI-80. The FAST Code models a two-bladed HAWT with degrees of freedom for blade bending, teeter, drive train flexibility, yaw, and windwise and crosswind tower motion. The code allows blade dimensions, stiffnesses, and weights to differ and models tower shadow, wind shear, and turbulence. Additionally, dynamic stall is included as are delta-3 and an underslung rotor. Load comparisons are made with ESI-80 test data in the form of power spectral density, rainflow counting, occurrence histograms, and azimuth averaged bin plots. It is concluded that agreement between the FAST Code and test results is good. (au)

  12. Code Disentanglement: Initial Plan

    Energy Technology Data Exchange (ETDEWEB)

    Wohlbier, John Greaton [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Kelley, Timothy M. [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Rockefeller, Gabriel M. [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Calef, Matthew Thomas [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

    2015-01-27

    The first step to making more ambitious changes in the EAP code base is to disentangle the code into a set of independent, levelized packages. We define a package as a collection of code, most often across a set of files, that provides a defined set of functionality; a package a) can be built and tested as an entity and b) fits within an overall levelization design. Each package contributes one or more libraries, or an application that uses the other libraries. A package set is levelized if the relationships between packages form a directed, acyclic graph and each package uses only packages at lower levels of the diagram (in Fortran this relationship is often describable by the use relationship between modules). Independent packages permit independent- and therefore parallel|development. The packages form separable units for the purposes of development and testing. This is a proven path for enabling finer-grained changes to a complex code.

  13. Induction technology optimization code

    International Nuclear Information System (INIS)

    Caporaso, G.J.; Brooks, A.L.; Kirbie, H.C.

    1992-01-01

    A code has been developed to evaluate relative costs of induction accelerator driver systems for relativistic klystrons. The code incorporates beam generation, transport and pulsed power system constraints to provide an integrated design tool. The code generates an injector/accelerator combination which satisfies the top level requirements and all system constraints once a small number of design choices have been specified (rise time of the injector voltage and aspect ratio of the ferrite induction cores, for example). The code calculates dimensions of accelerator mechanical assemblies and values of all electrical components. Cost factors for machined parts, raw materials and components are applied to yield a total system cost. These costs are then plotted as a function of the two design choices to enable selection of an optimum design based on various criteria. (Author) 11 refs., 3 figs

  14. VT ZIP Code Areas

    Data.gov (United States)

    Vermont Center for Geographic Information — (Link to Metadata) A ZIP Code Tabulation Area (ZCTA) is a statistical geographic entity that approximates the delivery area for a U.S. Postal Service five-digit...

  15. Bandwidth efficient coding

    CERN Document Server

    Anderson, John B

    2017-01-01

    Bandwidth Efficient Coding addresses the major challenge in communication engineering today: how to communicate more bits of information in the same radio spectrum. Energy and bandwidth are needed to transmit bits, and bandwidth affects capacity the most. Methods have been developed that are ten times as energy efficient at a given bandwidth consumption as simple methods. These employ signals with very complex patterns and are called "coding" solutions. The book begins with classical theory before introducing new techniques that combine older methods of error correction coding and radio transmission in order to create narrowband methods that are as efficient in both spectrum and energy as nature allows. Other topics covered include modulation techniques such as CPM, coded QAM and pulse design.

  16. Reactor lattice codes

    International Nuclear Information System (INIS)

    Kulikowska, T.

    2001-01-01

    The description of reactor lattice codes is carried out on the example of the WIMSD-5B code. The WIMS code in its various version is the most recognised lattice code. It is used in all parts of the world for calculations of research and power reactors. The version WIMSD-5B is distributed free of charge by NEA Data Bank. The description of its main features given in the present lecture follows the aspects defined previously for lattice calculations in the lecture on Reactor Lattice Transport Calculations. The spatial models are described, and the approach to the energy treatment is given. Finally the specific algorithm applied in fuel depletion calculations is outlined. (author)

  17. Critical Care Coding for Neurologists.

    Science.gov (United States)

    Nuwer, Marc R; Vespa, Paul M

    2015-10-01

    Accurate coding is an important function of neurologic practice. This contribution to Continuum is part of an ongoing series that presents helpful coding information along with examples related to the issue topic. Tips for diagnosis coding, Evaluation and Management coding, procedure coding, or a combination are presented, depending on which is most applicable to the subject area of the issue.

  18. Lattice Index Coding

    OpenAIRE

    Natarajan, Lakshmi; Hong, Yi; Viterbo, Emanuele

    2014-01-01

    The index coding problem involves a sender with K messages to be transmitted across a broadcast channel, and a set of receivers each of which demands a subset of the K messages while having prior knowledge of a different subset as side information. We consider the specific case of noisy index coding where the broadcast channel is Gaussian and every receiver demands all the messages from the source. Instances of this communication problem arise in wireless relay networks, sensor networks, and ...

  19. Towards advanced code simulators

    International Nuclear Information System (INIS)

    Scriven, A.H.

    1990-01-01

    The Central Electricity Generating Board (CEGB) uses advanced thermohydraulic codes extensively to support PWR safety analyses. A system has been developed to allow fully interactive execution of any code with graphical simulation of the operator desk and mimic display. The system operates in a virtual machine environment, with the thermohydraulic code executing in one virtual machine, communicating via interrupts with any number of other virtual machines each running other programs and graphics drivers. The driver code itself does not have to be modified from its normal batch form. Shortly following the release of RELAP5 MOD1 in IBM compatible form in 1983, this code was used as the driver for this system. When RELAP5 MOD2 became available, it was adopted with no changes needed in the basic system. Overall the system has been used for some 5 years for the analysis of LOBI tests, full scale plant studies and for simple what-if studies. For gaining rapid understanding of system dependencies it has proved invaluable. The graphical mimic system, being independent of the driver code, has also been used with other codes to study core rewetting, to replay results obtained from batch jobs on a CRAY2 computer system and to display suitably processed experimental results from the LOBI facility to aid interpretation. For the above work real-time execution was not necessary. Current work now centers on implementing the RELAP 5 code on a true parallel architecture machine. Marconi Simulation have been contracted to investigate the feasibility of using upwards of 100 processors, each capable of a peak of 30 MIPS to run a highly detailed RELAP5 model in real time, complete with specially written 3D core neutronics and balance of plant models. This paper describes the experience of using RELAP5 as an analyzer/simulator, and outlines the proposed methods and problems associated with parallel execution of RELAP5

  20. Cracking the Gender Codes

    DEFF Research Database (Denmark)

    Rennison, Betina Wolfgang

    2016-01-01

    extensive work to raise the proportion of women. This has helped slightly, but women remain underrepresented at the corporate top. Why is this so? What can be done to solve it? This article presents five different types of answers relating to five discursive codes: nature, talent, business, exclusion...... in leadership management, we must become more aware and take advantage of this complexity. We must crack the codes in order to crack the curve....

  1. PEAR code review

    International Nuclear Information System (INIS)

    De Wit, R.; Jamieson, T.; Lord, M.; Lafortune, J.F.

    1997-07-01

    As a necessary component in the continuous improvement and refinement of methodologies employed in the nuclear industry, regulatory agencies need to periodically evaluate these processes to improve confidence in results and ensure appropriate levels of safety are being achieved. The independent and objective review of industry-standard computer codes forms an essential part of this program. To this end, this work undertakes an in-depth review of the computer code PEAR (Public Exposures from Accidental Releases), developed by Atomic Energy of Canada Limited (AECL) to assess accidental releases from CANDU reactors. PEAR is based largely on the models contained in the Canadian Standards Association (CSA) N288.2-M91. This report presents the results of a detailed technical review of the PEAR code to identify any variations from the CSA standard and other supporting documentation, verify the source code, assess the quality of numerical models and results, and identify general strengths and weaknesses of the code. The version of the code employed in this review is the one which AECL intends to use for CANDU 9 safety analyses. (author)

  2. KENO-V code

    International Nuclear Information System (INIS)

    Cramer, S.N.

    1984-01-01

    The KENO-V code is the current release of the Oak Ridge multigroup Monte Carlo criticality code development. The original KENO, with 16 group Hansen-Roach cross sections and P 1 scattering, was one ot the first multigroup Monte Carlo codes and it and its successors have always been a much-used research tool for criticality studies. KENO-V is able to accept large neutron cross section libraries (a 218 group set is distributed with the code) and has a general P/sub N/ scattering capability. A supergroup feature allows execution of large problems on small computers, but at the expense of increased calculation time and system input/output operations. This supergroup feature is activated automatically by the code in a manner which utilizes as much computer memory as is available. The primary purpose of KENO-V is to calculate the system k/sub eff/, from small bare critical assemblies to large reflected arrays of differing fissile and moderator elements. In this respect KENO-V neither has nor requires the many options and sophisticated biasing techniques of general Monte Carlo codes

  3. Code, standard and specifications

    International Nuclear Information System (INIS)

    Abdul Nassir Ibrahim; Azali Muhammad; Ab. Razak Hamzah; Abd. Aziz Mohamed; Mohamad Pauzi Ismail

    2008-01-01

    Radiography also same as the other technique, it need standard. This standard was used widely and method of used it also regular. With that, radiography testing only practical based on regulations as mentioned and documented. These regulation or guideline documented in code, standard and specifications. In Malaysia, level one and basic radiographer can do radiography work based on instruction give by level two or three radiographer. This instruction was produced based on guideline that mention in document. Level two must follow the specifications mentioned in standard when write the instruction. From this scenario, it makes clearly that this radiography work is a type of work that everything must follow the rule. For the code, the radiography follow the code of American Society for Mechanical Engineer (ASME) and the only code that have in Malaysia for this time is rule that published by Atomic Energy Licensing Board (AELB) known as Practical code for radiation Protection in Industrial radiography. With the existence of this code, all the radiography must follow the rule or standard regulated automatically.

  4. Fast Coding Unit Encoding Mechanism for Low Complexity Video Coding

    OpenAIRE

    Gao, Yuan; Liu, Pengyu; Wu, Yueying; Jia, Kebin; Gao, Guandong

    2016-01-01

    In high efficiency video coding (HEVC), coding tree contributes to excellent compression performance. However, coding tree brings extremely high computational complexity. Innovative works for improving coding tree to further reduce encoding time are stated in this paper. A novel low complexity coding tree mechanism is proposed for HEVC fast coding unit (CU) encoding. Firstly, this paper makes an in-depth study of the relationship among CU distribution, quantization parameter (QP) and content ...

  5. Storing data encoded DNA in living organisms

    Science.gov (United States)

    Wong,; Pak C. , Wong; Kwong K. , Foote; Harlan, P [Richland, WA

    2006-06-06

    Current technologies allow the generation of artificial DNA molecules and/or the ability to alter the DNA sequences of existing DNA molecules. With a careful coding scheme and arrangement, it is possible to encode important information as an artificial DNA strand and store it in a living host safely and permanently. This inventive technology can be used to identify origins and protect R&D investments. It can also be used in environmental research to track generations of organisms and observe the ecological impact of pollutants. Today, there are microorganisms that can survive under extreme conditions. As well, it is advantageous to consider multicellular organisms as hosts for stored information. These living organisms can provide as memory housing and protection for stored data or information. The present invention provides well for data storage in a living organism wherein at least one DNA sequence is encoded to represent data and incorporated into a living organism.

  6. Generalized DNA Barcode Design Based on Hamming Codes

    NARCIS (Netherlands)

    Bystrykh, Leonid V.

    2012-01-01

    The diversity and scope of multiplex parallel sequencing applications is steadily increasing. Critically, multiplex parallel sequencing applications methods rely on the use of barcoded primers for sample identification, and the quality of the barcodes directly impacts the quality of the resulting

  7. DNA repair

    International Nuclear Information System (INIS)

    Van Zeeland, A.A.

    1984-01-01

    In this chapter a series of DNA repair pathways are discussed which are available to the cell to cope with the problem of DNA damaged by chemical or physical agents. In the case of microorganisms our knowledge about the precise mechanism of each DNA repair pathway and the regulation of it has been improved considerably when mutants deficient in these repair mechanisms became available. In the case of mammalian cells in culture, until recently there were very little repair deficient mutants available, because in almost all mammalian cells in culture at least the diploid number of chromosomes is present. Therefore the frequency of repair deficient mutants in such populations is very low. Nevertheless because replica plating techniques are improving some mutants from Chinese hamsters ovary cells and L5178Y mouse lymphoma cells are now available. In the case of human cells, cultures obtained from patients with certain genetic diseases are available. A number of cells appear to be sensitive to some chemical or physical mutagens. These include cells from patients suffering from xeroderma pigmentosum, Ataxia telangiectasia, Fanconi's anemia, Cockayne's syndrome. However, only in the case of xeroderma pigmentosum cells, has the sensitivity to ultraviolet light been clearly correlated with a deficiency in excision repair of pyrimidine dimers. Furthermore the work with strains obtained from biopsies from man is difficult because these cells generally have low cloning efficiencies and also have a limited lifespan in vitro. It is therefore very important that more repair deficient mutants will become available from established cell lines from human or animal origin

  8. Intricate and Cell Type-Specific Populations of Endogenous Circular DNA (eccDNA) in Caenorhabditis elegans and Homo sapiens.

    Science.gov (United States)

    Shoura, Massa J; Gabdank, Idan; Hansen, Loren; Merker, Jason; Gotlib, Jason; Levene, Stephen D; Fire, Andrew Z

    2017-10-05

    Investigations aimed at defining the 3D configuration of eukaryotic chromosomes have consistently encountered an endogenous population of chromosome-derived circular genomic DNA, referred to as extrachromosomal circular DNA (eccDNA). While the production, distribution, and activities of eccDNAs remain understudied, eccDNA formation from specific regions of the linear genome has profound consequences on the regulatory and coding capabilities for these regions. Here, we define eccDNA distributions in Caenorhabditis elegans and in three human cell types, utilizing a set of DNA topology-dependent approaches for enrichment and characterization. The use of parallel biophysical, enzymatic, and informatic approaches provides a comprehensive profiling of eccDNA robust to isolation and analysis methodology. Results in human and nematode systems provide quantitative analysis of the eccDNA loci at both unique and repetitive regions. Our studies converge on and support a consistent picture, in which endogenous genomic DNA circles are present in normal physiological states, and in which the circles come from both coding and noncoding genomic regions. Prominent among the coding regions generating DNA circles are several genes known to produce a diversity of protein isoforms, with mucin proteins and titin as specific examples. Copyright © 2017 Shoura et al.

  9. Block-classified bidirectional motion compensation scheme for wavelet-decomposed digital video

    Energy Technology Data Exchange (ETDEWEB)

    Zafar, S. [Argonne National Lab., IL (United States). Mathematics and Computer Science Div.; Zhang, Y.Q. [David Sarnoff Research Center, Princeton, NJ (United States); Jabbari, B. [George Mason Univ., Fairfax, VA (United States)

    1997-08-01

    In this paper the authors introduce a block-classified bidirectional motion compensation scheme for the previously developed wavelet-based video codec, where multiresolution motion estimation is performed in the wavelet domain. The frame classification structure described in this paper is similar to that used in the MPEG standard. Specifically, the I-frames are intraframe coded, the P-frames are interpolated from a previous I- or a P-frame, and the B-frames are bidirectional interpolated frames. They apply this frame classification structure to the wavelet domain with variable block sizes and multiresolution representation. They use a symmetric bidirectional scheme for the B-frames and classify the motion blocks as intraframe, compensated either from the preceding or the following frame, or bidirectional (i.e., compensated based on which type yields the minimum energy). They also introduce the concept of F-frames, which are analogous to P-frames but are predicted from the following frame only. This improves the overall quality of the reconstruction in a group of pictures (GOP) but at the expense of extra buffering. They also study the effect of quantization of the I-frames on the reconstruction of a GOP, and they provide intuitive explanation for the results. In addition, the authors study a variety of wavelet filter-banks to be used in a multiresolution motion-compensated hierarchical video codec.

  10. A naïve Bayes classifier for planning transfusion requirements in heart surgery.

    Science.gov (United States)

    Cevenini, Gabriele; Barbini, Emanuela; Massai, Maria R; Barbini, Paolo

    2013-02-01

    Transfusion of allogeneic blood products is a key issue in cardiac surgery. Although blood conservation and standard transfusion guidelines have been published by different medical groups, actual transfusion practices after cardiac surgery vary widely among institutions. Models can be a useful support for decision making and may reduce the total cost of care. The objective of this study was to propose and evaluate a procedure to develop a simple locally customized decision-support system. We analysed 3182 consecutive patients undergoing cardiac surgery at the University Hospital of Siena, Italy. Univariate statistical tests were performed to identify a set of preoperative and intraoperative variables as likely independent features for planning transfusion quantities. These features were utilized to design a naïve Bayes classifier. Model performance was evaluated using the leave-one-out cross-validation approach. All computations were done using spss and matlab code. The overall correct classification percentage was not particularly high if several classes of patients were to be identified. Model performance improved appreciably when the patient sample was divided into two classes (transfused and non-transfused patients). In this case the naïve Bayes model correctly classified about three quarters of patients with 71.2% sensitivity and 78.4% specificity, thus providing useful information for recognizing patients with transfusion requirements in the specific scenario considered. Although the classifier is customized to a particular setting and cannot be generalized to other scenarios, the simplicity of its development and the results obtained make it a promising approach for designing a simple model for different heart surgery centres needing a customized decision-support system for planning transfusion requirements in intensive care unit. © 2011 Blackwell Publishing Ltd.

  11. The quest for a general and reliable fungal DNA barcode

    NARCIS (Netherlands)

    Robert, V.; Szöke, S.; Eberhardt, U.; Cardinali, G.; Meyer, W.; Seifert, K.A.; Levesques, A.; Lewis, C.T.

    2011-01-01

    DNA sequences are key elements for both identification and classification of living organisms. Mainly for historical reasons, a limited number of genes are currently used for this purpose. From a mathematical point of view, any DNA segment, at any location, even outside of coding regions and even if

  12. SPECTRAL AMPLITUDE CODING OCDMA SYSTEMS USING ENHANCED DOUBLE WEIGHT CODE

    Directory of Open Access Journals (Sweden)

    F.N. HASOON

    2006-12-01

    Full Text Available A new code structure for spectral amplitude coding optical code division multiple access systems based on double weight (DW code families is proposed. The DW has a fixed weight of two. Enhanced double-weight (EDW code is another variation of a DW code family that can has a variable weight greater than one. The EDW code possesses ideal cross-correlation properties and exists for every natural number n. A much better performance can be provided by using the EDW code compared to the existing code such as Hadamard and Modified Frequency-Hopping (MFH codes. It has been observed that theoretical analysis and simulation for EDW is much better performance compared to Hadamard and Modified Frequency-Hopping (MFH codes.

  13. Nuclear code abstracts (1975 edition)

    International Nuclear Information System (INIS)

    Akanuma, Makoto; Hirakawa, Takashi

    1976-02-01

    Nuclear Code Abstracts is compiled in the Nuclear Code Committee to exchange information of the nuclear code developments among members of the committee. Enlarging the collection, the present one includes nuclear code abstracts obtained in 1975 through liaison officers of the organizations in Japan participating in the Nuclear Energy Agency's Computer Program Library at Ispra, Italy. The classification of nuclear codes and the format of code abstracts are the same as those in the library. (auth.)

  14. Some new ternary linear codes

    Directory of Open Access Journals (Sweden)

    Rumen Daskalov

    2017-07-01

    Full Text Available Let an $[n,k,d]_q$ code be a linear code of length $n$, dimension $k$ and minimum Hamming distance $d$ over $GF(q$. One of the most important problems in coding theory is to construct codes with optimal minimum distances. In this paper 22 new ternary linear codes are presented. Two of them are optimal. All new codes improve the respective lower bounds in [11].

  15. Deep Feature Learning and Cascaded Classifier for Large Scale Data

    DEFF Research Database (Denmark)

    Prasoon, Adhish

    from data rather than having a predefined feature set. We explore deep learning approach of convolutional neural network (CNN) for segmenting three dimensional medical images. We propose a novel system integrating three 2D CNNs, which have a one-to-one association with the xy, yz and zx planes of 3D......This thesis focuses on voxel/pixel classification based approaches for image segmentation. The main application is segmentation of articular cartilage in knee MRIs. The first major contribution of the thesis deals with large scale machine learning problems. Many medical imaging problems need huge...... amount of training data to cover sufficient biological variability. Learning methods scaling badly with number of training data points cannot be used in such scenarios. This may restrict the usage of many powerful classifiers having excellent generalization ability. We propose a cascaded classifier which...

  16. Scoring and Classifying Examinees Using Measurement Decision Theory

    Directory of Open Access Journals (Sweden)

    Lawrence M. Rudner

    2009-04-01

    Full Text Available This paper describes and evaluates the use of measurement decision theory (MDT to classify examinees based on their item response patterns. The model has a simple framework that starts with the conditional probabilities of examinees in each category or mastery state responding correctly to each item. The presented evaluation investigates: (1 the classification accuracy of tests scored using decision theory; (2 the effectiveness of different sequential testing procedures; and (3 the number of items needed to make a classification. A large percentage of examinees can be classified accurately with very few items using decision theory. A Java Applet for self instruction and software for generating, calibrating and scoring MDT data are provided.

  17. MAMMOGRAMS ANALYSIS USING SVM CLASSIFIER IN COMBINED TRANSFORMS DOMAIN

    Directory of Open Access Journals (Sweden)

    B.N. Prathibha

    2011-02-01

    Full Text Available Breast cancer is a primary cause of mortality and morbidity in women. Reports reveal that earlier the detection of abnormalities, better the improvement in survival. Digital mammograms are one of the most effective means for detecting possible breast anomalies at early stages. Digital mammograms supported with Computer Aided Diagnostic (CAD systems help the radiologists in taking reliable decisions. The proposed CAD system extracts wavelet features and spectral features for the better classification of mammograms. The Support Vector Machines classifier is used to analyze 206 mammogram images from Mias database pertaining to the severity of abnormality, i.e., benign and malign. The proposed system gives 93.14% accuracy for discrimination between normal-malign and 87.25% accuracy for normal-benign samples and 89.22% accuracy for benign-malign samples. The study reveals that features extracted in hybrid transform domain with SVM classifier proves to be a promising tool for analysis of mammograms.

  18. Evaluation of LDA Ensembles Classifiers for Brain Computer Interface

    International Nuclear Information System (INIS)

    Arjona, Cristian; Pentácolo, José; Gareis, Iván; Atum, Yanina; Gentiletti, Gerardo; Acevedo, Rubén; Rufiner, Leonardo

    2011-01-01

    The Brain Computer Interface (BCI) translates brain activity into computer commands. To increase the performance of the BCI, to decode the user intentions it is necessary to get better the feature extraction and classification techniques. In this article the performance of a three linear discriminant analysis (LDA) classifiers ensemble is studied. The system based on ensemble can theoretically achieved better classification results than the individual counterpart, regarding individual classifier generation algorithm and the procedures for combine their outputs. Classic algorithms based on ensembles such as bagging and boosting are discussed here. For the application on BCI, it was concluded that the generated results using ER and AUC as performance index do not give enough information to establish which configuration is better.

  19. Security Enrichment in Intrusion Detection System Using Classifier Ensemble

    Directory of Open Access Journals (Sweden)

    Uma R. Salunkhe

    2017-01-01

    Full Text Available In the era of Internet and with increasing number of people as its end users, a large number of attack categories are introduced daily. Hence, effective detection of various attacks with the help of Intrusion Detection Systems is an emerging trend in research these days. Existing studies show effectiveness of machine learning approaches in handling Intrusion Detection Systems. In this work, we aim to enhance detection rate of Intrusion Detection System by using machine learning technique. We propose a novel classifier ensemble based IDS that is constructed using hybrid approach which combines data level and feature level approach. Classifier ensembles combine the opinions of different experts and improve the intrusion detection rate. Experimental results show the improved detection rates of our system compared to reference technique.

  20. The three-dimensional origin of the classifying algebra

    International Nuclear Information System (INIS)

    Fuchs, Juergen; Schweigert, Christoph; Stigner, Carl

    2010-01-01

    It is known that reflection coefficients for bulk fields of a rational conformal field theory in the presence of an elementary boundary condition can be obtained as representation matrices of irreducible representations of the classifying algebra, a semisimple commutative associative complex algebra. We show how this algebra arises naturally from the three-dimensional geometry of factorization of correlators of bulk fields on the disk. This allows us to derive explicit expressions for the structure constants of the classifying algebra as invariants of ribbon graphs in the three-manifold S 2 xS 1 . Our result unravels a precise relation between intertwiners of the action of the mapping class group on spaces of conformal blocks and boundary conditions in rational conformal field theories.