WorldWideScience

Sample records for gene expression-based classification

  1. Gene expression-based classifications of fibroadenomas and phyllodes tumours of the breast.

    Science.gov (United States)

    Vidal, Maria; Peg, Vicente; Galván, Patricia; Tres, Alejandro; Cortés, Javier; Ramón y Cajal, Santiago; Rubio, Isabel T; Prat, Aleix

    2015-06-01

    Fibroepithelial tumors (FTs) of the breast are a heterogeneous group of lesions ranging from fibroadenomas (FAD) to phyllodes tumors (PT) (benign, borderline, malignant). Further understanding of their molecular features and classification might be of clinical value. In this study, we analysed the expression of 105 breast cancer-related genes, including the 50 genes of the PAM50 intrinsic subtype predictor and 12 genes of the Claudin-low subtype predictor, in a panel of 75 FTs (34 FADs, 5 juvenile FADs, 20 benign PTs, 5 borderline PTs and 11 malignant PTs) with clinical follow-up. In addition, we compared the expression profiles of FTs with those of 14 normal breast tissues and 49 primary invasive ductal carcinomas (IDCs). Our results revealed that the levels of expression of all breast cancer-related genes can discriminate the various groups of FTs, together with normal breast tissues and IDCs (False Discovery Rate expression of proliferation-related genes (e.g. CCNB1 and MKI67) and mesenchymal/epithelial-related (e.g. CLDN3 and EPCAM) genes were found to be most discriminative. As expected, FADs showed the highest and lowest expression of epithelial- and proliferation-related genes, respectively, whereas malignant PTs showed the opposite expression pattern. Interestingly, the overall profile of benign PTs was found more similar to FADs and normal breast tissues than the rest of tumours, including juvenile FADs. Within the dataset of IDCs and normal breast tissues, the vast majority of FADs, juvenile FADs, benign PTs and borderline PTs were identified as Normal-like by intrinsic breast cancer subtyping, whereas 7 (63.6%) and 3 (27.3%) malignant PTs were identified as Claudin-low and Basal-like, respectively. Finally, we observed that the previously described PAM50 risk of relapse prognostic score better predicted outcome in FTs than the morphological classification, even within PTs-only. Our results suggest that classification of FTs using gene expression-based

  2. Integrating Colon Cancer Microarray Data: Associating Locus-Specific Methylation Groups to Gene Expression-Based Classifications

    Directory of Open Access Journals (Sweden)

    Ana Barat

    2015-11-01

    Full Text Available Recently, considerable attention has been paid to gene expression-based classifications of colorectal cancers (CRC and their association with patient prognosis. In addition to changes in gene expression, abnormal DNA-methylation is known to play an important role in cancer onset and development, and colon cancer is no exception to this rule. Large-scale technologies, such as methylation microarray assays and specific sequencing of methylated DNA, have been used to determine whole genome profiles of CpG island methylation in tissue samples. In this article, publicly available microarray-based gene expression and methylation data sets are used to characterize expression subtypes with respect to locus-specific methylation. A major objective was to determine whether integration of these data types improves previously characterized subtypes, or provides evidence for additional subtypes. We used unsupervised clustering techniques to determine methylation-based subgroups, which are subsequently annotated with three published expression-based classifications, comprising from three to six subtypes. Our results showed that, while methylation profiles provide a further basis for segregation of certain (Inflammatory and Goblet-like finer-grained expression-based subtypes, they also suggest that other finer-grained subtypes are not distinctive and can be considered as a single subtype.

  3. Cross-platform analysis of cancer microarray data improves gene expression based classification of phenotypes

    Directory of Open Access Journals (Sweden)

    Eils Roland

    2005-11-01

    Full Text Available Abstract Background The extensive use of DNA microarray technology in the characterization of the cell transcriptome is leading to an ever increasing amount of microarray data from cancer studies. Although similar questions for the same type of cancer are addressed in these different studies, a comparative analysis of their results is hampered by the use of heterogeneous microarray platforms and analysis methods. Results In contrast to a meta-analysis approach where results of different studies are combined on an interpretative level, we investigate here how to directly integrate raw microarray data from different studies for the purpose of supervised classification analysis. We use median rank scores and quantile discretization to derive numerically comparable measures of gene expression from different platforms. These transformed data are then used for training of classifiers based on support vector machines. We apply this approach to six publicly available cancer microarray gene expression data sets, which consist of three pairs of studies, each examining the same type of cancer, i.e. breast cancer, prostate cancer or acute myeloid leukemia. For each pair, one study was performed by means of cDNA microarrays and the other by means of oligonucleotide microarrays. In each pair, high classification accuracies (> 85% were achieved with training and testing on data instances randomly chosen from both data sets in a cross-validation analysis. To exemplify the potential of this cross-platform classification analysis, we use two leukemia microarray data sets to show that important genes with regard to the biology of leukemia are selected in an integrated analysis, which are missed in either single-set analysis. Conclusion Cross-platform classification of multiple cancer microarray data sets yields discriminative gene expression signatures that are found and validated on a large number of microarray samples, generated by different laboratories and

  4. Gene expression-based classification of non-small cell lung carcinomas and survival prediction.

    Directory of Open Access Journals (Sweden)

    Jun Hou

    Full Text Available BACKGROUND: Current clinical therapy of non-small cell lung cancer depends on histo-pathological classification. This approach poorly predicts clinical outcome for individual patients. Gene expression profiling holds promise to improve clinical stratification, thus paving the way for individualized therapy. METHODOLOGY AND PRINCIPAL FINDINGS: A genome-wide gene expression analysis was performed on a cohort of 91 patients. We used 91 tumor- and 65 adjacent normal lung tissue samples. We defined sets of predictor genes (probe sets with the expression profiles. The power of predictor genes was evaluated using an independent cohort of 96 non-small cell lung cancer- and 6 normal lung samples. We identified a tumor signature of 5 genes that aggregates the 156 tumor and normal samples into the expected groups. We also identified a histology signature of 75 genes, which classifies the samples in the major histological subtypes of non-small cell lung cancer. Correlation analysis identified 17 genes which showed the best association with post-surgery survival time. This signature was used for stratification of all patients in two risk groups. Kaplan-Meier survival curves show that the two groups display a significant difference in post-surgery survival time (p = 5.6E-6. The performance of the signatures was validated using a patient cohort of similar size (Duke University, n = 96. Compared to previously published prognostic signatures for NSCLC, the 17 gene signature performed well on these two cohorts. CONCLUSIONS: The gene signatures identified are promising tools for histo-pathological classification of non-small cell lung cancer, and may improve the prediction of clinical outcome.

  5. Gene expression-based biomarkers for discriminating early and late stage of clear cell renal cancer

    Science.gov (United States)

    Bhalla, Sherry; Chaudhary, Kumardeep; Kumar, Ritesh; Sehgal, Manika; Kaur, Harpreet; Sharma, Suresh; Raghava, Gajendra P. S.

    2017-01-01

    In this study, an attempt has been made to identify expression-based gene biomarkers that can discriminate early and late stage of clear cell renal cell carcinoma (ccRCC) patients. We have analyzed the gene expression of 523 samples to identify genes that are differentially expressed in the early and late stage of ccRCC. First, a threshold-based method has been developed, which attained a maximum accuracy of 71.12% with ROC 0.67 using single gene NR3C2. To improve the performance of threshold-based method, we combined two or more genes and achieved maximum accuracy of 70.19% with ROC of 0.74 using eight genes on the validation dataset. These eight genes include four underexpressed (NR3C2, ENAM, DNASE1L3, FRMPD2) and four overexpressed (PLEKHA9, MAP6D1, SMPD4, C11orf73) genes in the late stage of ccRCC. Second, models were developed using state-of-art techniques and achieved maximum accuracy of 72.64% and 0.81 ROC using 64 genes on validation dataset. Similar accuracy was obtained on 38 genes selected from subset of genes, involved in cancer hallmark biological processes. Our analysis further implied a need to develop gender-specific models for stage classification. A web server, CancerCSP, has been developed to predict stage of ccRCC using gene expression data derived from RNAseq experiments. PMID:28349958

  6. Global Expression-Based Classification of Lymph Node Metastasis and Extracapsular Spread of Oral Tongue Squamous Cell Carcinoma

    Directory of Open Access Journals (Sweden)

    Xiaofeng Zhou

    2006-11-01

    Full Text Available Regional lymph node metastasis is a critical event in oral tongue squamous cell carcinoma (OTSCC progression. The identification of biomarkers associated with the metastatic process would provide critical prognostic information to facilitate clinical decision making for improved management of OTSCC patients. Global expressional profiles were obtained for 25 primary OTSCCs, where 11 cases showed lymph node metastasis (pN+ histologically and 14 cases were nonmetastatic (pN-. Seven of pN+ cases also exhibited extracapsular spread (ECS of metastatic nodes. Multiple expression indices were used to generate signature gene sets for pN+/- and ECS+/- cases. Selected genes from signature gene sets were validated using quantitative reverse transcription-polymerase chain reaction (qRT-PCR. The classification powers of these genes were then evaluated using a logistic model, receiver operating characteristic curve analysis, leave-oneout cross-validation. qRT-PCR validation data showed that differences at RNA levels are either statistically significant (P<.05 or suggestive (P< .1 for six of eight genes tested (BMP2, CTTN, EEF1A1, GTSE1, MMP9, EGFR for pN+/- cases, for five of eight genes tested (BMP2, CTTN, EEF1A1, MMP9, EGFR for ECS+/- cases. Logistic models with specific combinations of genes (CTTN+MMP9+EGFR for pN and CTTN+EEFIA1+MMP9 for ECS achieved perfect specificity and sensitivity. Leave-one-out cross-validation showed overall accuracy rates of 85% for both pN and ECS prediction models. Our results demonstrated that the pN and the ECS of OTSCCs can be predicted by gene expression analyses of primary tumors.

  7. Accurate Gene Expression-Based Biodosimetry Using a Minimal Set of Human Gene Transcripts

    Energy Technology Data Exchange (ETDEWEB)

    Tucker, James D., E-mail: jtucker@biology.biosci.wayne.edu [Department of Biological Sciences, Wayne State University, Detroit, Michigan (United States); Joiner, Michael C. [Department of Radiation Oncology, Wayne State University, Detroit, Michigan (United States); Thomas, Robert A.; Grever, William E.; Bakhmutsky, Marina V. [Department of Biological Sciences, Wayne State University, Detroit, Michigan (United States); Chinkhota, Chantelle N.; Smolinski, Joseph M. [Department of Electrical and Computer Engineering, Wayne State University, Detroit, Michigan (United States); Divine, George W. [Department of Public Health Sciences, Henry Ford Hospital, Detroit, Michigan (United States); Auner, Gregory W. [Department of Electrical and Computer Engineering, Wayne State University, Detroit, Michigan (United States)

    2014-03-15

    Purpose: Rapid and reliable methods for conducting biological dosimetry are a necessity in the event of a large-scale nuclear event. Conventional biodosimetry methods lack the speed, portability, ease of use, and low cost required for triaging numerous victims. Here we address this need by showing that polymerase chain reaction (PCR) on a small number of gene transcripts can provide accurate and rapid dosimetry. The low cost and relative ease of PCR compared with existing dosimetry methods suggest that this approach may be useful in mass-casualty triage situations. Methods and Materials: Human peripheral blood from 60 adult donors was acutely exposed to cobalt-60 gamma rays at doses of 0 (control) to 10 Gy. mRNA expression levels of 121 selected genes were obtained 0.5, 1, and 2 days after exposure by reverse-transcriptase real-time PCR. Optimal dosimetry at each time point was obtained by stepwise regression of dose received against individual gene transcript expression levels. Results: Only 3 to 4 different gene transcripts, ASTN2, CDKN1A, GDF15, and ATM, are needed to explain ≥0.87 of the variance (R{sup 2}). Receiver-operator characteristics, a measure of sensitivity and specificity, of 0.98 for these statistical models were achieved at each time point. Conclusions: The actual and predicted radiation doses agree very closely up to 6 Gy. Dosimetry at 8 and 10 Gy shows some effect of saturation, thereby slightly diminishing the ability to quantify higher exposures. Analyses of these gene transcripts may be advantageous for use in a field-portable device designed to assess exposures in mass casualty situations or in clinical radiation emergencies.

  8. Applicability of a gene expression based prediction method to SD and Wistar rats: an example of CARCINOscreen®.

    Science.gov (United States)

    Matsumoto, Hiroshi; Saito, Fumiyo; Takeyoshi, Masahiro

    2015-12-01

    Recently, the development of several gene expression-based prediction methods has been attempted in the fields of toxicology. CARCINOscreen® is a gene expression-based screening method to predict carcinogenicity of chemicals which target the liver with high accuracy. In this study, we investigated the applicability of the gene expression-based screening method to SD and Wistar rats by using CARCINOscreen®, originally developed with F344 rats, with two carcinogens, 2,4-diaminotoluen and thioacetamide, and two non-carcinogens, 2,6-diaminotoluen and sodium benzoate. After the 28-day repeated dose test was conducted with each chemical in SD and Wistar rats, microarray analysis was performed using total RNA extracted from each liver. Obtained gene expression data were applied to CARCINOscreen®. Predictive scores obtained by the CARCINOscreen® for known carcinogens were > 2 in all strains of rats, while non-carcinogens gave prediction scores below 0.5. These results suggested that the gene expression based screening method, CARCINOscreen®, can be applied to SD and Wistar rats, widely used strains in toxicological studies, by setting of an appropriate boundary line of prediction score to classify the chemicals into carcinogens and non-carcinogens.

  9. Classification with binary gene expressions

    OpenAIRE

    Tuna, Salih; Niranjan, Mahesan

    2009-01-01

    Microarray gene expression measurements are reported, used and archived usually to high numerical precision. However, properties of mRNA molecules, such as their low stability and availability in small copy numbers, and the fact that measurements correspond to a population of cells, rather than a single cell, makes high precision meaningless. Recent work shows that reducing measurement precision leads to very little loss of information, right down to binary levels. In this paper we show how p...

  10. A model of gene expression based on random dynamical systems reveals modularity properties of gene regulatory networks.

    Science.gov (United States)

    Antoneli, Fernando; Ferreira, Renata C; Briones, Marcelo R S

    2016-06-01

    Here we propose a new approach to modeling gene expression based on the theory of random dynamical systems (RDS) that provides a general coupling prescription between the nodes of any given regulatory network given the dynamics of each node is modeled by a RDS. The main virtues of this approach are the following: (i) it provides a natural way to obtain arbitrarily large networks by coupling together simple basic pieces, thus revealing the modularity of regulatory networks; (ii) the assumptions about the stochastic processes used in the modeling are fairly general, in the sense that the only requirement is stationarity; (iii) there is a well developed mathematical theory, which is a blend of smooth dynamical systems theory, ergodic theory and stochastic analysis that allows one to extract relevant dynamical and statistical information without solving the system; (iv) one may obtain the classical rate equations form the corresponding stochastic version by averaging the dynamic random variables (small noise limit). It is important to emphasize that unlike the deterministic case, where coupling two equations is a trivial matter, coupling two RDS is non-trivial, specially in our case, where the coupling is performed between a state variable of one gene and the switching stochastic process of another gene and, hence, it is not a priori true that the resulting coupled system will satisfy the definition of a random dynamical system. We shall provide the necessary arguments that ensure that our coupling prescription does indeed furnish a coupled regulatory network of random dynamical systems. Finally, the fact that classical rate equations are the small noise limit of our stochastic model ensures that any validation or prediction made on the basis of the classical theory is also a validation or prediction of our model. We illustrate our framework with some simple examples of single-gene system and network motifs.

  11. Design and Implementation of Visual Dynamic Display Software of Gene Expression Based on GTK

    Institute of Scientific and Technical Information of China (English)

    JIANG Wei; MENG Fanjiang; LI Yong; YU Xiao

    2009-01-01

    The paper presented an implement method for a dynamic gene expression display software based on the GTK. This method established the dynamic presentation system of gene expression which according to gene expression data from gene chip hybridize at different time, adopted a linearity combination model and Pearson correlation coefficient algorithm. The system described the gene expression changes in graphic form, the gene expression changes with time and the changes in characteristics of the gene expression, also the changes in relations of the gene expression and regulation relationships among genes. The system also provided an integrated platform for analysis on gene chips data, especially for the research on the network of gene regulation.

  12. Random forest for gene selection and microarray data classification.

    Science.gov (United States)

    Moorthy, Kohbalan; Mohamad, Mohd Saberi

    2011-01-01

    A random forest method has been selected to perform both gene selection and classification of the microarray data. In this embedded method, the selection of smallest possible sets of genes with lowest error rates is the key factor in achieving highest classification accuracy. Hence, improved gene selection method using random forest has been proposed to obtain the smallest subset of genes as well as biggest subset of genes prior to classification. The option for biggest subset selection is done to assist researchers who intend to use the informative genes for further research. Enhanced random forest gene selection has performed better in terms of selecting the smallest subset as well as biggest subset of informative genes with lowest out of bag error rates through gene selection. Furthermore, the classification performed on the selected subset of genes using random forest has lead to lower prediction error rates compared to existing method and other similar available methods.

  13. Weighted gene co-expression based biomarker discovery for psoriasis detection.

    Science.gov (United States)

    Sundarrajan, Sudharsana; Arumugam, Mohanapriya

    2016-11-15

    Psoriasis is a chronic inflammatory disease of the skin with an unknown aetiology. The disease manifests itself as red and silvery scaly plaques distributed over the scalp, lower back and extensor aspects of the limbs. After receiving scant consideration for quite a few years, psoriasis has now become a prominent focus for new drug development. A group of closely connected and differentially co-expressed genes may act in a network and may serve as molecular signatures for an underlying phenotype. A weighted gene coexpression network analysis (WGCNA), a system biology approach has been utilized for identification of new molecular targets for psoriasis. Gene coexpression relationships were investigated in 58 psoriatic lesional samples resulting in five gene modules, clustered based on the gene coexpression patterns. The coexpression pattern was validated using three psoriatic datasets. 10 highly connected and informative genes from each module was selected and termed as psoriasis specific hub signatures. A random forest based binary classifier built using the expression profiles of signature genes robustly distinguished psoriatic samples from the normal samples in the validation set with an accuracy of 0.95 to 1. These signature genes may serve as potential candidates for biomarker discovery leading to new therapeutic targets. WGCNA, the network based approach has provided an alternative path to mine out key controllers and drivers of psoriasis. The study principle from the current work can be extended to other pathological conditions.

  14. MicroRNA and target gene expression based clustering of oral cancer, precancer and normal tissues.

    Science.gov (United States)

    Roy, Roshni; Singh, Richa; Chattopadhyay, Esita; Ray, Anindita; Sarkar, Navonil De; Aich, Ritesh; Paul, Ranjan Rashmi; Pal, Mousumi; Roy, Bidyut

    2016-11-15

    Development of oral cancer is usually preceded by precancerous lesion. Despite histopathological diagnosis, development of disease specific biomarkers continues to be a promising field of study. Expression of miRNAs and their target genes was studied in oral cancer and two types of precancer lesions to look for disease specific gene expression patterns. Expression of miR-26a, miR-29a, miR-34b and miR-423 and their 11 target genes were determined in 20 oral leukoplakia, 20 lichen planus and 20 cancer tissues with respect to 20 normal tissues using qPCR assay. Expression data were, then, used for cluster analysis of normal as well as disease tissues. Expression of miR-26a and miR-29a was significantly down regulated in leukoplakia and cancer tissues but up regulated in lichen planus tissues. Expression of target genes such as, ADAMTS7, ATP1B1, COL4A2, CPEB3, CDK6, DNMT3a and PI3KR1 was significantly down regulated in at least two of three disease types with respect to normal tissues. Negative correlations between expression levels of miRNAs and their targets were observed in normal tissues but not in disease tissues implying altered miRNA-target interaction in disease state. Specific expression profile of miRNAs and target genes formed separate clusters of normal, lichen planus and cancer tissues. Our results suggest that alterations in expression of selected miRNAs and target genes may play important roles in development of precancer to cancer. Expression profiles of miRNA and target genes may be useful to differentiate cancer and lichen planus from normal tissues, thereby bolstering their role in diagnostics. Copyright © 2016 Elsevier B.V. All rights reserved.

  15. A gene-expression-based neural code for food abundance that modulates lifespan.

    Science.gov (United States)

    Entchev, Eugeni V; Patel, Dhaval S; Zhan, Mei; Steele, Andrew J; Lu, Hang; Ch'ng, QueeLim

    2015-05-12

    How the nervous system internally represents environmental food availability is poorly understood. Here, we show that quantitative information about food abundance is encoded by combinatorial neuron-specific gene-expression of conserved TGFβ and serotonin pathway components in Caenorhabditis elegans. Crosstalk and auto-regulation between these pathways alters the shape, dynamic range, and population variance of the gene-expression responses of daf-7 (TGFβ) and tph-1 (tryptophan hydroxylase) to food availability. These intricate regulatory features provide distinct mechanisms for TGFβ and serotonin signaling to tune the accuracy of this multi-neuron code: daf-7 primarily regulates gene-expression variability, while tph-1 primarily regulates the dynamic range of gene-expression responses. This code is functional because daf-7 and tph-1 mutations bidirectionally attenuate food level-dependent changes in lifespan. Our results reveal a neural code for food abundance and demonstrate that gene expression serves as an additional layer of information processing in the nervous system to control long-term physiology.

  16. Gene Expression-Based Survival Prediction in Lung Adenocarcinoma: A Multi-Site, Blinded Validation Study

    Science.gov (United States)

    Shedden, Kerby; Taylor, Jeremy M.G.; Enkemann, Steve A.; Tsao, Ming S.; Yeatman, Timothy J.; Gerald, William L.; Eschrich, Steve; Jurisica, Igor; Venkatraman, Seshan E.; Meyerson, Matthew; Kuick, Rork; Dobbin, Kevin K.; Lively, Tracy; Jacobson, James W.; Beer, David G.; Giordano, Thomas J.; Misek, David E.; Chang, Andrew C.; Zhu, Chang Qi; Strumpf, Dan; Hanash, Samir; Shepherd, Francis A.; Ding, Kuyue; Seymour, Lesley; Naoki, Katsuhiko; Pennell, Nathan; Weir, Barbara; Verhaak, Roel; Ladd-Acosta, Christine; Golub, Todd; Gruidl, Mike; Szoke, Janos; Zakowski, Maureen; Rusch, Valerie; Kris, Mark; Viale, Agnes; Motoi, Noriko; Travis, William; Sharma, Anupama

    2009-01-01

    Although prognostic gene expression signatures for survival in early stage lung cancer have been proposed, for clinical application it is critical to establish their performance across different subject populations and in different laboratories. Here we report a large, training-testing, multi-site blinded validation study to characterize the performance of several prognostic models based on gene expression for 442 lung adenocarcinomas. The hypotheses proposed examined whether microarray measurements of gene expression either alone or combined with basic clinical covariates (stage, age, sex) can be used to predict overall survival in lung cancer subjects. Several models examined produced risk scores that substantially correlated with actual subject outcome. Most methods performed better with clinical data, supporting the combined use of clinical and molecular information when building prognostic models for early stage lung cancer. This study also provides the largest available set of microarray data with extensive pathological and clinical annotation for lung adenocarcinomas. PMID:18641660

  17. Autonomous Bacterial Localization and Gene Expression Based on Nearby Cell Receptor Density

    Science.gov (United States)

    2013-01-22

    Upon detection of B1–5 mM AI-2, these cells express T7 polymerase that amplifies the native lsr operon response by overexpressing DsRed (see...2 for initiating gene expression (lsr operon ). (B) Indicated densities of PCI-15B or HEK293 cells were seeded to wells followed by mouse anti-EGFR

  18. Gene expression based evidence of innate immune response activation in the epithelium with oral lichen planus.

    Science.gov (United States)

    Adami, Guy R; Yeung, Alexander C F; Stucki, Grant; Kolokythas, Antonia; Sroussi, Herve Y; Cabay, Robert J; Kuzin, Igor; Schwartz, Joel L

    2014-03-01

    Oral lichen planus (OLP) is a disease of the oral mucosa of unknown cause producing lesions with an intense band-like inflammatory infiltrate of T cells to the subepithelium and keratinocyte cell death. We performed gene expression analysis of the oral epithelium of lesions in subjects with OLP and its sister disease, oral lichenoid reaction (OLR), in order to better understand the role of the keratinocytes in these diseases. Fourteen patients with OLP or OLR were included in the study, along with a control group of 23 subjects with a variety of oral diseases and a normal group of 17 subjects with no clinically visible mucosal abnormalities. Various proteins have been associated with OLP, based on detection of secreted proteins or changes in RNA levels in tissue samples consisting of epithelium, stroma, and immune cells. The mRNA level of twelve of these genes expressed in the epithelium was tested in the three groups. Four genes showed increased expression in the epithelium of OLP patients: CD14, CXCL1, IL8, and TLR1, and at least two of these proteins, TLR1 and CXCL1, were expressed at substantial levels in oral keratinocytes. Because of the large accumulation of T cells in lesions of OLP it has long been thought to be an adaptive immunity malfunction. We provide evidence that there is increased expression of innate immune genes in the epithelium with this illness, suggesting a role for this process in the disease and a possible target for treatment. Copyright © 2014 Elsevier Ltd. All rights reserved.

  19. Gene expression-based biological test for major depressive disorder: an advanced study

    Directory of Open Access Journals (Sweden)

    Watanabe S

    2017-02-01

    Full Text Available Shin-ya Watanabe,1 Shusuke Numata,1 Jun-ichi Iga,2 Makoto Kinoshita,1 Hidehiro Umehara,1 Kazuo Ishii,3 Tetsuro Ohmori1 1Department of Psychiatry, Institute of Biomedical Sciences, Tokushima University Graduate School, Tokushima, 2Department of Neuropsychiatry, Molecules and Function, Ehime University Graduate School of Medicine, Ehime, 3Department of Applied Biological Science, Faculty of Agriculture, Tokyo University of Agriculture and Technology, Tokyo, Japan Purpose: Recently, we could distinguished patients with major depressive disorder (MDD from nonpsychiatric controls with high accuracy using a panel of five gene expression markers (ARHGAP24, HDAC5, PDGFC, PRNP, and SLC6A4 in leukocyte. In the present study, we examined whether this biological test is able to discriminate patients with MDD from those without MDD, including those with schizophrenia and bipolar disorder.Patients and methods: We measured messenger ribonucleic acid expression levels of the aforementioned five genes in peripheral leukocytes in 17 patients with schizophrenia and 36 patients with bipolar disorder using quantitative real-time polymerase chain reaction (PCR, and we combined these expression data with our previous expression data of 25 patients with MDD and 25 controls. Subsequently, a linear discriminant function was developed for use in discriminating between patients with MDD and without MDD.Results: This expression panel was able to segregate patients with MDD from those without MDD with a sensitivity and specificity of 64% and 67.9%, respectively.Conclusion: Further research to identify MDD-specific markers is needed to improve the performance of this biological test. Keywords: depressive disorder, biomarker, gene expression, schizophrenia, bipolar disorder

  20. Gene expression-based biological test for major depressive disorder: an advanced study

    Science.gov (United States)

    Watanabe, Shin-ya; Numata, Shusuke; Iga, Jun-ichi; Kinoshita, Makoto; Umehara, Hidehiro; Ishii, Kazuo; Ohmori, Tetsuro

    2017-01-01

    Purpose Recently, we could distinguished patients with major depressive disorder (MDD) from nonpsychiatric controls with high accuracy using a panel of five gene expression markers (ARHGAP24, HDAC5, PDGFC, PRNP, and SLC6A4) in leukocyte. In the present study, we examined whether this biological test is able to discriminate patients with MDD from those without MDD, including those with schizophrenia and bipolar disorder. Patients and methods We measured messenger ribonucleic acid expression levels of the aforementioned five genes in peripheral leukocytes in 17 patients with schizophrenia and 36 patients with bipolar disorder using quantitative real-time polymerase chain reaction (PCR), and we combined these expression data with our previous expression data of 25 patients with MDD and 25 controls. Subsequently, a linear discriminant function was developed for use in discriminating between patients with MDD and without MDD. Results This expression panel was able to segregate patients with MDD from those without MDD with a sensitivity and specificity of 64% and 67.9%, respectively. Conclusion Further research to identify MDD-specific markers is needed to improve the performance of this biological test. PMID:28260899

  1. Minimising Immunohistochemical False Negative ER Classification Using a Complementary 23 Gene Expression Signature of ER Status

    DEFF Research Database (Denmark)

    Li, Qiyuan; Eklund, Aron Charles; Birkbak, Nicolai Juul;

    2010-01-01

    subtypes as compared to IHC-based determination has not been systematically evaluated. Here we attempt to reduce the frequency of false negative ER status classification using two gene expression approaches and compare these methods to IHC based ER status in terms of predictive and prognostic concordance......BACKGROUND: Expression of the oestrogen receptor (ER) in breast cancer predicts benefit from endocrine therapy. Minimising the frequency of false negative ER status classification is essential to identify all patients with ER positive breast cancers who should be offered endocrine therapies...... in order to improve clinical outcome. In routine oncological practice ER status is determined by semi-quantitative methods such as immunohistochemistry (IHC) or other immunoassays in which the ER expression level is compared to an empirical threshold. The clinical relevance of gene expression-based ER...

  2. Minimising immunohistochemical false negative ER classification using a complementary 23 gene expression signature of ER status.

    Directory of Open Access Journals (Sweden)

    Qiyuan Li

    Full Text Available BACKGROUND: Expression of the oestrogen receptor (ER in breast cancer predicts benefit from endocrine therapy. Minimising the frequency of false negative ER status classification is essential to identify all patients with ER positive breast cancers who should be offered endocrine therapies in order to improve clinical outcome. In routine oncological practice ER status is determined by semi-quantitative methods such as immunohistochemistry (IHC or other immunoassays in which the ER expression level is compared to an empirical threshold. The clinical relevance of gene expression-based ER subtypes as compared to IHC-based determination has not been systematically evaluated. Here we attempt to reduce the frequency of false negative ER status classification using two gene expression approaches and compare these methods to IHC based ER status in terms of predictive and prognostic concordance with clinical outcome. METHODOLOGY/PRINCIPAL FINDINGS: Firstly, ER status was discriminated by fitting the bimodal expression of ESR1 to a mixed Gaussian model. The discriminative power of ESR1 suggested bimodal expression as an efficient way to stratify breast cancer; therefore we identified a set of genes whose expression was both strongly bimodal, mimicking ESR expression status, and highly expressed in breast epithelial cell lines, to derive a 23-gene ER expression signature-based classifier. We assessed our classifiers in seven published breast cancer cohorts by comparing the gene expression-based ER status to IHC-based ER status as a predictor of clinical outcome in both untreated and tamoxifen treated cohorts. In untreated breast cancer cohorts, the 23 gene signature-based ER status provided significantly improved prognostic power compared to IHC-based ER status (P = 0.006. In tamoxifen-treated cohorts, the 23 gene ER expression signature predicted clinical outcome (HR = 2.20, P = 0.00035. These complementary ER signature-based strategies

  3. Identification of gene expression-based prognostic markers in the hematopoietic stem cells of patients with myelodysplastic syndromes.

    Science.gov (United States)

    Pellagatti, Andrea; Benner, Axel; Mills, Ken I; Cazzola, Mario; Giagounidis, Aristoteles; Perry, Janet; Malcovati, Luca; Della Porta, Matteo G; Jädersten, Martin; Verma, Amit; McDonald, Emma-Jane; Killick, Sally; Hellström-Lindberg, Eva; Bullinger, Lars; Wainscoat, James S; Boultwood, Jacqueline

    2013-10-01

    The diagnosis of patients with myelodysplastic syndromes (MDS) is largely dependent on morphologic examination of bone marrow aspirates. Several criteria that form the basis of the classifications and scoring systems most commonly used in clinical practice are affected by operator-dependent variation. To identify standardized molecular markers that would allow prediction of prognosis, we have used gene expression profiling (GEP) data on CD34+ cells from patients with MDS to determine the relationship between gene expression levels and prognosis. GEP data on CD34+ cells from 125 patients with MDS with a minimum 12-month follow-up since date of bone marrow sample collection were included in this study. Supervised principal components and lasso penalized Cox proportional hazards regression (Coxnet) were used for the analysis. We identified several genes, the expression of which was significantly associated with survival of patients with MDS, including LEF1, CDH1, WT1, and MN1. The Coxnet predictor, based on expression data on 20 genes, outperformed other predictors, including one that additionally used clinical information. Our Coxnet gene signature based on CD34+ cells significantly identified a separation of patients with good or bad prognosis in an independent GEP data set based on unsorted bone marrow mononuclear cells, demonstrating that our signature is robust and may be applicable to bone marrow cells without the need to isolate CD34+ cells. We present a new, valuable GEP-based signature for assessing prognosis in MDS. GEP-based signatures correlating with clinical outcome may significantly contribute to a refined risk classification of MDS.

  4. Cancer classification based on gene expression using neural networks.

    Science.gov (United States)

    Hu, H P; Niu, Z J; Bai, Y P; Tan, X H

    2015-12-21

    Based on gene expression, we have classified 53 colon cancer patients with UICC II into two groups: relapse and no relapse. Samples were taken from each patient, and gene information was extracted. Of the 53 samples examined, 500 genes were considered proper through analyses by S-Kohonen, BP, and SVM neural networks. Classification accuracy obtained by S-Kohonen neural network reaches 91%, which was more accurate than classification by BP and SVM neural networks. The results show that S-Kohonen neural network is more plausible for classification and has a certain feasibility and validity as compared with BP and SVM neural networks.

  5. Verdict Accuracy of Quick Reduct Algorithm using Clustering and Classification Techniques for Gene Expression Data

    Directory of Open Access Journals (Sweden)

    T.Chandrasekhar

    2012-01-01

    Full Text Available In most gene expression data, the number of training samples is very small compared to the large number of genes involved in the experiments. However, among the large amount of genes, only a small fraction is effective for performing a certain task. Furthermore, a small subset of genes is desirable in developing gene expression based diagnostic tools for delivering reliable and understandable results. With the gene selection results, the cost of biological experiment and decision can be greatly reduced by analyzing only the marker genes. An important application of gene expression data in functional genomics is to classify samples according to their gene expression profiles. Feature selection (FS is a process which attempts to select more informative features. It is one of the important steps in knowledge discovery. Conventional supervised FS methods evaluate various feature subsets using an evaluation function or metric to select only those features which are related to the decision classes of the data under consideration. This paper studies a feature selection method based on rough set theory. Further K-Means, Fuzzy C-Means (FCM algorithm have implemented for the reduced feature set without considering class labels. Then the obtained results are compared with the original class labels. Back Propagation Network (BPN has also been used for classification. Then the performance of K-Means, FCM, and BPN are analyzed through the confusion matrix. It is found that the BPN is performing well comparatively.

  6. Multiclass cancer classification based on gene expression comparison

    Science.gov (United States)

    Yang, Sitan; Naiman, Daniel Q.

    2016-01-01

    As the complexity and heterogeneity of cancer is being increasingly appreciated through genomic analyses, microarray-based cancer classification comprising multiple discriminatory molecular markers is an emerging trend. Such multiclass classification problems pose new methodological and computational challenges for developing novel and effective statistical approaches. In this paper, we introduce a new approach for classifying multiple disease states associated with cancer based on gene expression profiles. Our method focuses on detecting small sets of genes in which the relative comparison of their expression values leads to class discrimination. For an m-class problem, the classification rule typically depends on a small number of m-gene sets, which provide transparent decision boundaries and allow for potential biological interpretations. We first test our approach on seven common gene expression datasets and compare it with popular classification methods including support vector machines and random forests. We then consider an extremely large cohort of leukemia cancer to further assess its effectiveness. In both experiments, our method yields comparable or even better results to benchmark classifiers. In addition, we demonstrate that our approach can integrate pathway analysis of gene expression to provide accurate and biological meaningful classification. PMID:24918456

  7. Sparse Representation for Classification of Tumors Using Gene Expression Data

    Directory of Open Access Journals (Sweden)

    Xiyi Hang

    2009-01-01

    Full Text Available Personalized drug design requires the classification of cancer patients as accurate as possible. With advances in genome sequencing and microarray technology, a large amount of gene expression data has been and will continuously be produced from various cancerous patients. Such cancer-alerted gene expression data allows us to classify tumors at the genomewide level. However, cancer-alerted gene expression datasets typically have much more number of genes (features than that of samples (patients, which imposes a challenge for classification of tumors. In this paper, a new method is proposed for cancer diagnosis using gene expression data by casting the classification problem as finding sparse representations of test samples with respect to training samples. The sparse representation is computed by the l1-regularized least square method. To investigate its performance, the proposed method is applied to six tumor gene expression datasets and compared with various support vector machine (SVM methods. The experimental results have shown that the performance of the proposed method is comparable with or better than those of SVMs. In addition, the proposed method is more efficient than SVMs as it has no need of model selection.

  8. Classifying genes to the correct Gene Ontology Slim term in Saccharomyces cerevisiae using neighbouring genes with classification learning

    Directory of Open Access Journals (Sweden)

    Tsatsoulis Costas

    2010-05-01

    Full Text Available Abstract Background There is increasing evidence that gene location and surrounding genes influence the functionality of genes in the eukaryotic genome. Knowing the Gene Ontology Slim terms associated with a gene gives us insight into a gene's functionality by informing us how its gene product behaves in a cellular context using three different ontologies: molecular function, biological process, and cellular component. In this study, we analyzed if we could classify a gene in Saccharomyces cerevisiae to its correct Gene Ontology Slim term using information about its location in the genome and information from its nearest-neighbouring genes using classification learning. Results We performed experiments to establish that the MultiBoostAB algorithm using the J48 classifier could correctly classify Gene Ontology Slim terms of a gene given information regarding the gene's location and information from its nearest-neighbouring genes for training. Different neighbourhood sizes were examined to determine how many nearest neighbours should be included around each gene to provide better classification rules. Our results show that by just incorporating neighbour information from each gene's two-nearest neighbours, the percentage of correctly classified genes to their correct Gene Ontology Slim term for each ontology reaches over 80% with high accuracy (reflected in F-measures over 0.80 of the classification rules produced. Conclusions We confirmed that in classifying genes to their correct Gene Ontology Slim term, the inclusion of neighbour information from those genes is beneficial. Knowing the location of a gene and the Gene Ontology Slim information from neighbouring genes gives us insight into that gene's functionality. This benefit is seen by just including information from a gene's two-nearest neighbouring genes.

  9. Classifying genes to the correct Gene Ontology Slim term in Saccharomyces cerevisiae using neighbouring genes with classification learning.

    Science.gov (United States)

    Amthauer, Heather A; Tsatsoulis, Costas

    2010-05-28

    There is increasing evidence that gene location and surrounding genes influence the functionality of genes in the eukaryotic genome. Knowing the Gene Ontology Slim terms associated with a gene gives us insight into a gene's functionality by informing us how its gene product behaves in a cellular context using three different ontologies: molecular function, biological process, and cellular component. In this study, we analyzed if we could classify a gene in Saccharomyces cerevisiae to its correct Gene Ontology Slim term using information about its location in the genome and information from its nearest-neighbouring genes using classification learning. We performed experiments to establish that the MultiBoostAB algorithm using the J48 classifier could correctly classify Gene Ontology Slim terms of a gene given information regarding the gene's location and information from its nearest-neighbouring genes for training. Different neighbourhood sizes were examined to determine how many nearest neighbours should be included around each gene to provide better classification rules. Our results show that by just incorporating neighbour information from each gene's two-nearest neighbours, the percentage of correctly classified genes to their correct Gene Ontology Slim term for each ontology reaches over 80% with high accuracy (reflected in F-measures over 0.80) of the classification rules produced. We confirmed that in classifying genes to their correct Gene Ontology Slim term, the inclusion of neighbour information from those genes is beneficial. Knowing the location of a gene and the Gene Ontology Slim information from neighbouring genes gives us insight into that gene's functionality. This benefit is seen by just including information from a gene's two-nearest neighbouring genes.

  10. Chaotic genetic algorithm for gene selection and classification problems.

    Science.gov (United States)

    Chuang, Li-Yeh; Yang, Cheng-San; Li, Jung-Chike; Yang, Cheng-Hong

    2009-10-01

    Pattern recognition techniques suffer from a well-known curse, the dimensionality problem. The microarray data classification problem is a classical complex pattern recognition problem. Selecting relevant genes from microarray data poses a formidable challenge to researchers due to the high-dimensionality of features, multiclass categories being involved, and the usually small sample size. The goal of feature (gene) selection is to select those subsets of differentially expressed genes that are potentially relevant for distinguishing the sample classes. In this paper, information gain and chaotic genetic algorithm are proposed for the selection of relevant genes, and a K-nearest neighbor with the leave-one-out crossvalidation method serves as a classifier. The chaotic genetic algorithm is modified by using the chaotic mutation operator to increase the population diversity. The enhanced population diversity expands the GA's search ability. The proposed approach is tested on 10 microarray data sets from the literature. The experimental results show that the proposed method not only effectively reduced the number of gene expression levels, but also achieved lower classification error rates than other methods.

  11. Optimization based tumor classification from microarray gene expression data.

    Directory of Open Access Journals (Sweden)

    Onur Dagliyan

    Full Text Available BACKGROUND: An important use of data obtained from microarray measurements is the classification of tumor types with respect to genes that are either up or down regulated in specific cancer types. A number of algorithms have been proposed to obtain such classifications. These algorithms usually require parameter optimization to obtain accurate results depending on the type of data. Additionally, it is highly critical to find an optimal set of markers among those up or down regulated genes that can be clinically utilized to build assays for the diagnosis or to follow progression of specific cancer types. In this paper, we employ a mixed integer programming based classification algorithm named hyper-box enclosure method (HBE for the classification of some cancer types with a minimal set of predictor genes. This optimization based method which is a user friendly and efficient classifier may allow the clinicians to diagnose and follow progression of certain cancer types. METHODOLOGY/PRINCIPAL FINDINGS: We apply HBE algorithm to some well known data sets such as leukemia, prostate cancer, diffuse large B-cell lymphoma (DLBCL, small round blue cell tumors (SRBCT to find some predictor genes that can be utilized for diagnosis and prognosis in a robust manner with a high accuracy. Our approach does not require any modification or parameter optimization for each data set. Additionally, information gain attribute evaluator, relief attribute evaluator and correlation-based feature selection methods are employed for the gene selection. The results are compared with those from other studies and biological roles of selected genes in corresponding cancer type are described. CONCLUSIONS/SIGNIFICANCE: The performance of our algorithm overall was better than the other algorithms reported in the literature and classifiers found in WEKA data-mining package. Since it does not require a parameter optimization and it performs consistently very high prediction rate on

  12. A simulation to analyze feature selection methods utilizing gene ontology for gene expression classification.

    Science.gov (United States)

    Gillies, Christopher E; Siadat, Mohammad-Reza; Patel, Nilesh V; Wilson, George D

    2013-12-01

    Gene expression profile classification is a pivotal research domain assisting in the transformation from traditional to personalized medicine. A major challenge associated with gene expression data classification is the small number of samples relative to the large number of genes. To address this problem, researchers have devised various feature selection algorithms to reduce the number of genes. Recent studies have been experimenting with the use of semantic similarity between genes in Gene Ontology (GO) as a method to improve feature selection. While there are few studies that discuss how to use GO for feature selection, there is no simulation study that addresses when to use GO-based feature selection. To investigate this, we developed a novel simulation, which generates binary class datasets, where the differentially expressed genes between two classes have some underlying relationship in GO. This allows us to investigate the effects of various factors such as the relative connectedness of the underlying genes in GO, the mean magnitude of separation between differentially expressed genes denoted by δ, and the number of training samples. Our simulation results suggest that the connectedness in GO of the differentially expressed genes for a biological condition is the primary factor for determining the efficacy of GO-based feature selection. In particular, as the connectedness of differentially expressed genes increases, the classification accuracy improvement increases. To quantify this notion of connectedness, we defined a measure called Biological Condition Annotation Level BCAL(G), where G is a graph of differentially expressed genes. Our main conclusions with respect to GO-based feature selection are the following: (1) it increases classification accuracy when BCAL(G) ≥ 0.696; (2) it decreases classification accuracy when BCAL(G) ≤ 0.389; (3) it provides marginal accuracy improvement when 0.389genes in a biological condition increases beyond 50 and

  13. Classification and expression analyses of homeobox genes from Dictyostelium discoideum

    Indian Academy of Sciences (India)

    Himanshu Mishra; Shweta Saran

    2015-06-01

    Homeobox genes are compared between genomes in an attempt to understand the evolution of animal development. The ability of the protist, Dictyostelium discoideum, to shift between uni- and multicellularity makes this group ideal for studying the genetic changes that may have occurred during this transition. We present here the first genome-wide classification and comparative genomic analysis of the 14 homeobox genes present in D. discoideum. Based on the structural alignment of the homeodomains, they can be broadly divided into TALE and non-TALE classes. When individual homeobox genes were compared with members of known class or family, we could further classify them into 3 groups, namely, TALE, OTHER and NOVEL classes, but no HOX family was found. The 5 members of TALE class could be further divided into PBX, PKNOX, IRX and CUP families; 4 homeobox genes classified as NOVEL did not show any similarity to any known homeobox genes; while the remaining 5 were classified as OTHERS as they did show certain degree of similarity to few known homeobox genes. No unique RNA expression pattern during development of D. discoideum emerged for members of an individual group. Putative promoter analysis revealed binding sites for few homeobox transcription factors among many probable factors.

  14. Concordance among gene expression-based predictors for ER-positive breast cancer treated with adjuvant tamoxifen.

    Science.gov (United States)

    Prat, A; Parker, J S; Fan, C; Cheang, M C U; Miller, L D; Bergh, J; Chia, S K L; Bernard, P S; Nielsen, T O; Ellis, M J; Carey, L A; Perou, C M

    2012-11-01

    ER-positive (ER+) breast cancer includes all of the intrinsic molecular subtypes, although the luminal A and B subtypes predominate. In this study, we evaluated the ability of six clinically relevant genomic signatures to predict relapse in patients with ER+ tumors treated with adjuvant tamoxifen only. Four microarray datasets were combined and research-based versions of PAM50 intrinsic subtyping and risk of relapse (PAM50-ROR) score, 21-gene recurrence score (OncotypeDX), Mammaprint, Rotterdam 76 gene, index of sensitivity to endocrine therapy (SET) and an estrogen-induced gene set were evaluated. Distant relapse-free survival (DRFS) was estimated by Kaplan-Meier and log-rank tests, and multivariable analyses were done using Cox regression analysis. Harrell's C-index was also used to estimate performance. All signatures were prognostic in patients with ER+ node-negative tumors, whereas most were prognostic in ER+ node-positive disease. Among the signatures evaluated, PAM50-ROR, OncotypeDX, Mammaprint and SET were consistently found to be independent predictors of relapse. A combination of all signatures significantly increased the performance prediction. Importantly, low-risk tumors (>90% DRFS at 8.5 years) were identified by the majority of signatures only within node-negative disease, and these tumors were mostly luminal A (78%-100%). Most established genomic signatures were successful in outcome predictions in ER+ breast cancer and provided statistically independent information. From a clinical perspective, multiple signatures combined together most accurately predicted outcome, but a common finding was that each signature identified a subset of luminal A patients with node-negative disease who might be considered suitable candidates for adjuvant endocrine therapy alone.

  15. Looking in the mouth for noninvasive gene expression-based methods to detect oral, oropharyngeal, and systemic cancer.

    Science.gov (United States)

    Adami, Guy R; Adami, Alexander J

    2012-01-01

    Noninvasive diagnosis, whether by sampling body fluids, body scans, or other technique, has the potential to simplify early cancer detection. A classic example is Pap smear screening, which has helped to reduce cervical cancer 75% over the last 50 years. No test is error-free; the real concern is sufficient accuracy combined with ease of use. This paper will discuss methods that measure gene expression or epigenetic markers in oral cells or saliva to diagnose oral and pharyngeal cancers, without requiring surgical biopsy. Evidence for lung and other distal cancer detection is also reviewed.

  16. Classification with reject option in gene expression data

    National Research Council Canada - National Science Library

    Hanczar, Blaise; Dougherty, Edward R

    2008-01-01

    Motivation: The classification methods typically used in bioinformatics classify all examples, even if the classification is ambiguous, for instance, when the example is close to the separating hyperplane in linear classification...

  17. Genome classification by gene distribution: An overlapping subspace clustering approach

    Directory of Open Access Journals (Sweden)

    Halgamuge Saman K

    2008-04-01

    Full Text Available Abstract Background Genomes of lower organisms have been observed with a large amount of horizontal gene transfers, which cause difficulties in their evolutionary study. Bacteriophage genomes are a typical example. One recent approach that addresses this problem is the unsupervised clustering of genomes based on gene order and genome position, which helps to reveal species relationships that may not be apparent from traditional phylogenetic methods. Results We propose the use of an overlapping subspace clustering algorithm for such genome classification problems. The advantage of subspace clustering over traditional clustering is that it can associate clusters with gene arrangement patterns, preserving genomic information in the clusters produced. Additionally, overlapping capability is desirable for the discovery of multiple conserved patterns within a single genome, such as those acquired from different species via horizontal gene transfers. The proposed method involves a novel strategy to vectorize genomes based on their gene distribution. A number of existing subspace clustering and biclustering algorithms were evaluated to identify the best framework upon which to develop our algorithm; we extended a generic subspace clustering algorithm called HARP to incorporate overlapping capability. The proposed algorithm was assessed and applied on bacteriophage genomes. The phage grouping results are consistent overall with the Phage Proteomic Tree and showed common genomic characteristics among the TP901-like, Sfi21-like and sk1-like phage groups. Among 441 phage genomes, we identified four significantly conserved distribution patterns structured by the terminase, portal, integrase, holin and lysin genes. We also observed a subgroup of Sfi21-like phages comprising a distinctive divergent genome organization and identified nine new phage members to the Sfi21-like genus: Staphylococcus 71, phiPVL108, Listeria A118, 2389, Lactobacillus phi AT3, A2

  18. A meta-analysis of gene expression-based biomarkers predicting outcome after tamoxifen treatment in breast cancer.

    Science.gov (United States)

    Mihály, Zsuzsanna; Kormos, Máté; Lánczky, András; Dank, Magdolna; Budczies, Jan; Szász, Marcell A; Győrffy, Balázs

    2013-07-01

    To date, three molecular markers (ER, PR, and CYP2D6) have been used in clinical setting to predict the benefit of the anti-estrogen tamoxifen therapy. Our aim was to validate new biomarker candidates predicting response to tamoxifen treatment in breast cancer by evaluating these in a meta-analysis of available transcriptomic datasets with known treatment and follow-up. Biomarker candidates were identified in Pubmed and in the 2007-2012 ASCO and 2011-2012 SABCS abstracts. Breast cancer microarray datasets of endocrine therapy-treated patients were downloaded from GEO and EGA and RNAseq datasets from TCGA. Of the biomarker candidates, only those identified or already validated in a clinical cohort were included. Relapse-free survival (RFS) up to 5 years was used as endpoint in a ROC analysis in the GEO and RNAseq datasets. In the EGA dataset, Kaplan-Meier analysis was performed for overall survival. Statistical significance was set at p tamoxifen-resistance genes in three independent platforms and identified PGR, MAPT, and SLC7A5 as the most promising prognostic biomarkers in tamoxifen treated patients.

  19. Succinate Dehydrogenase Subunit B (SDHB Is Expressed in Neurofibromatosis 1-Associated Gastrointestinal Stromal Tumors (Gists: Implications for the SDHB Expression Based Classification of Gists

    Directory of Open Access Journals (Sweden)

    Jeanny H. Wang, Jerzy Lasota, Markku Miettinen

    2011-01-01

    Full Text Available Gastrointestinal Stromal Tumor (GIST is the most common mesenchymal tumor of the digestive tract. GISTs develop with relatively high incidence in patients with Neurofibromatosis-1 syndrome (NF1. Mutational activation of KIT or PDGFRA is believed to be a driving force in the pathogenesis of familial and sporadic GISTs. Unlike those tumors, NF1-associated GISTs do not have KIT or PGDFRA mutations. Similarly, no mutational activation of KIT or PDGFRA has been identified in pediatric GISTs and in GISTs associated with Carney Triad and Carney-Stratakis Syndrome. KIT and PDGFRA-wild type tumors are expected to have lesser response to imatinib treatment. Recently, Carney Triad and Carney-Stratakis Syndrome -associated GISTs and pediatric GISTs have been shown to have a loss of expression of succinate dehydrogenase subunit B (SDHB, a Krebs cycle/electron transport chain interface protein. It was proposed that GISTs can be divided into SDHB- positive (type 1, and SDHB-negative (type 2 tumors because of similarities in clinical features and response to imatinib treatment. In this study, SDHB expression was examined immunohistochemically in 22 well-characterized NF1-associated GISTs. All analyzed tumors expressed SDHB. Based on SDHB-expression status, NF1-associated GISTs belong to type 1 category; however, similarly to SDHB type 2 tumors, they do not respond well to imatinib treatment. Therefore, a simple categorization of GISTs into SDHB-positive and-negative seems to be incomplete. A classification based on both SDHB expression status and KIT and PDGFRA mutation status characterize GISTs more accurately and allow subdivision of SDHB-positive tumors into different clinico-genetic categories.

  20. Multi-label literature classification based on the Gene Ontology graph

    Directory of Open Access Journals (Sweden)

    Lu Xinghua

    2008-12-01

    Full Text Available Abstract Background The Gene Ontology is a controlled vocabulary for representing knowledge related to genes and proteins in a computable form. The current effort of manually annotating proteins with the Gene Ontology is outpaced by the rate of accumulation of biomedical knowledge in literature, which urges the development of text mining approaches to facilitate the process by automatically extracting the Gene Ontology annotation from literature. The task is usually cast as a text classification problem, and contemporary methods are confronted with unbalanced training data and the difficulties associated with multi-label classification. Results In this research, we investigated the methods of enhancing automatic multi-label classification of biomedical literature by utilizing the structure of the Gene Ontology graph. We have studied three graph-based multi-label classification algorithms, including a novel stochastic algorithm and two top-down hierarchical classification methods for multi-label literature classification. We systematically evaluated and compared these graph-based classification algorithms to a conventional flat multi-label algorithm. The results indicate that, through utilizing the information from the structure of the Gene Ontology graph, the graph-based multi-label classification methods can significantly improve predictions of the Gene Ontology terms implied by the analyzed text. Furthermore, the graph-based multi-label classifiers are capable of suggesting Gene Ontology annotations (to curators that are closely related to the true annotations even if they fail to predict the true ones directly. A software package implementing the studied algorithms is available for the research community. Conclusion Through utilizing the information from the structure of the Gene Ontology graph, the graph-based multi-label classification methods have better potential than the conventional flat multi-label classification approach to facilitate

  1. Recursive Cluster Elimination (RCE for classification and feature selection from gene expression data

    Directory of Open Access Journals (Sweden)

    Showe Louise C

    2007-05-01

    Full Text Available Abstract Background Classification studies using gene expression datasets are usually based on small numbers of samples and tens of thousands of genes. The selection of those genes that are important for distinguishing the different sample classes being compared, poses a challenging problem in high dimensional data analysis. We describe a new procedure for selecting significant genes as recursive cluster elimination (RCE rather than recursive feature elimination (RFE. We have tested this algorithm on six datasets and compared its performance with that of two related classification procedures with RFE. Results We have developed a novel method for selecting significant genes in comparative gene expression studies. This method, which we refer to as SVM-RCE, combines K-means, a clustering method, to identify correlated gene clusters, and Support Vector Machines (SVMs, a supervised machine learning classification method, to identify and score (rank those gene clusters for the purpose of classification. K-means is used initially to group genes into clusters. Recursive cluster elimination (RCE is then applied to iteratively remove those clusters of genes that contribute the least to the classification performance. SVM-RCE identifies the clusters of correlated genes that are most significantly differentially expressed between the sample classes. Utilization of gene clusters, rather than individual genes, enhances the supervised classification accuracy of the same data as compared to the accuracy when either SVM or Penalized Discriminant Analysis (PDA with recursive feature elimination (SVM-RFE and PDA-RFE are used to remove genes based on their individual discriminant weights. Conclusion SVM-RCE provides improved classification accuracy with complex microarray data sets when it is compared to the classification accuracy of the same datasets using either SVM-RFE or PDA-RFE. SVM-RCE identifies clusters of correlated genes that when considered together

  2. A new gene superfamily of pathogen-response (repat) genes in Lepidoptera: classification and expression analysis.

    Science.gov (United States)

    Navarro-Cerrillo, G; Hernández-Martínez, P; Vogel, H; Ferré, J; Herrero, S

    2013-01-01

    Repat (REsponse to PAThogens) genes were first identified in the midgut of Spodoptera exigua (Lepidoptera: Noctuidae) in response to Bacillus thuringiensis and baculovirus exposure. Since then, additional repat gene homologs have been identified in different studies. In this study the comprehensive larval transcriptome from S. exigua was analyzed for the presence of novel repat-homolog sequences. These analyses revealed the presence of at least 46 repat genes in S. exigua, establishing a new gene superfamily in this species. Phylogenetic analysis and studies of conserved motifs in these hypothetical proteins have allowed their classification in two main classes, αREPAT and βREPAT. Studies on the transcriptional response of repat genes have shown that αREPAT and βREPAT differ in their sequence but also in the pattern of regulation. The αREPAT were mainly regulated in response to the Cry1Ca toxin from B. thuringiensis but not to the increase in the midgut microbiota load. In contrast, βREPAT were neither responding to Cry1Ca toxin nor to midgut microbiota. Differential expression between midgut stem cells and the whole midgut tissue was studied for the different repat genes revealing changes in the gene expression distribution between midgut stem cells and midgut tissue in response to midgut microbiota. This high diversity found in their sequence and in their expression profile suggests that REPAT proteins may be involved in multiple processes that could be of relevance for the understanding of the insect gut physiology.

  3. Classification

    Science.gov (United States)

    Clary, Renee; Wandersee, James

    2013-01-01

    In this article, Renee Clary and James Wandersee describe the beginnings of "Classification," which lies at the very heart of science and depends upon pattern recognition. Clary and Wandersee approach patterns by first telling the story of the "Linnaean classification system," introduced by Carl Linnacus (1707-1778), who is…

  4. Gene expression profiling of gliomas: merging genomic and histopathological classification for personalised therapy

    OpenAIRE

    Vitucci, M; Hayes, D N; Miller, C R

    2010-01-01

    The development of DNA microarray technologies over the past decade has revolutionised translational cancer research. These technologies were originally hailed as more objective, comprehensive replacements for traditional histopathological cancer classification systems, based on microscopic morphology. Although DNA microarray-based gene expression profiling (GEP) remains unlikely in the near term to completely replace morphological classification of primary brain tumours, specifically the dif...

  5. Hybrid SPR algorithm to select predictive genes for effectual cancer classification

    OpenAIRE

    2012-01-01

    Designing an automated system for classifying DNA microarray data is an extremely challenging problem because of its high dimension and low amount of sample data. In this paper, a hybrid statistical pattern recognition algorithm is proposed to reduce the dimensionality and select the predictive genes for the classification of cancer. Colon cancer gene expression profiles having 62 samples of 2000 genes were used for the experiment. A gene subset of 6 highly informative genes was selecte...

  6. Cluster Analysis and Significance of Novel Genes Related to Molecular Classification of Glioma

    Institute of Scientific and Technical Information of China (English)

    Juxiang Chen; Yicheng Lu; Guohan Hu; Kehua Sun; Chun Luo; Meiqing Lou; Kang Ying; Yao Li

    2005-01-01

    OBJECTIVE To screen differentially expressed genes in the development of human glioma and establish a primary molecular classification of glioma based on gene expression using cDNA microarrays.METHODS Brain specimens were obtained from 18 patients with glioma, 10males and 8 females, ages 14~62 with an average age of 44.4. The total RNAs of these glioma specimens and two specimens of donated brain of normal adults were extracted. BioStarH140S microarrays (including 8,347old genes and 5,592 novel genes) were adopted and hybridized with probes which were prepared from the total RNAs. Differentially expressed genes between normal tissues and glioma tissues were assayed after scanning cDNA microarrays with ScanArray4000. Northern hybridization and in situ hybridization (ISH) were used to identify functions of novel genes. Those differentially expressed genes were studied with a Hierarchical method and molecular classification of glioma was preliminary carried out.RESULTS Among the 13,939 target genes, there were 1,200 (8.61%)differentially expressed genes, of which 395 (2.83%) were novel genes. A total of 348 genes were up-regulated and 852 genes were down-regulated in the gliomas. The results of bioinformatical analysis, Northern hybridization and ISH revealed that those novel genes were highly associated with gliomas. There were multiple genes, such as the MAP gene、cytoskeleton & matrix motility genes, etc, which were of relevance to classification by the Hierarchical method. Molecular classification of glioma using a Hierarchical cluster was in accordance with pathology and suggested a molecular process of tumorigenesis and development.CONCLUSION Multiple genes play important roles in development of glioma. cDNA microarray technology is a powerful technique in screening for differentially expressed genes between two different kinds of tissues. Further analysis of gene expression and novel genes would be helpful to understand the molecular mechanism of glioma

  7. Classification

    DEFF Research Database (Denmark)

    Hjørland, Birger

    2017-01-01

    This article presents and discusses definitions of the term “classification” and the related concepts “Concept/conceptualization,”“categorization,” “ordering,” “taxonomy” and “typology.” It further presents and discusses theories of classification including the influences of Aristotle...... and Wittgenstein. It presents different views on forming classes, including logical division, numerical taxonomy, historical classification, hermeneutical and pragmatic/critical views. Finally, issues related to artificial versus natural classification and taxonomic monism versus taxonomic pluralism are briefly...

  8. Multiclass classification of microarray data samples with a reduced number of genes

    Directory of Open Access Journals (Sweden)

    Ornella Leonardo

    2011-02-01

    Full Text Available Abstract Background Multiclass classification of microarray data samples with a reduced number of genes is a rich and challenging problem in Bioinformatics research. The problem gets harder as the number of classes is increased. In addition, the performance of most classifiers is tightly linked to the effectiveness of mandatory gene selection methods. Critical to gene selection is the availability of estimates about the maximum number of genes that can be handled by any classification algorithm. Lack of such estimates may lead to either computationally demanding explorations of a search space with thousands of dimensions or classification models based on gene sets of unrestricted size. In the former case, unbiased but possibly overfitted classification models may arise. In the latter case, biased classification models unable to support statistically significant findings may be obtained. Results A novel bound on the maximum number of genes that can be handled by binary classifiers in binary mediated multiclass classification algorithms of microarray data samples is presented. The bound suggests that high-dimensional binary output domains might favor the existence of accurate and sparse binary mediated multiclass classifiers for microarray data samples. Conclusions A comprehensive experimental work shows that the bound is indeed useful to induce accurate and sparse multiclass classifiers for microarray data samples.

  9. Classification of Cancer Gene Selection Using Random Forest and Neural Network Based Ensemble Classifier

    Directory of Open Access Journals (Sweden)

    Jogendra Kushwah

    2013-06-01

    Full Text Available The free radical gene classification of cancer diseases is challenging job in biomedical data engineering. The improving of classification of gene selection of cancer diseases various classifier are used, but the classification of classifier are not validate. So ensemble classifier is used for cancer gene classification using neural network classifier with random forest tree. The random forest tree is ensembling technique of classifier in this technique the number of classifier ensemble of their leaf node of class of classifier. In this paper we combined neural network with random forest ensemble classifier for classification of cancer gene selection for diagnose analysis of cancer diseases. The proposed method is different from most of the methods of ensemble classifier, which follow an input output paradigm of neural network, where the members of the ensemble are selected from a set of neural network classifier. the number of classifiers is determined during the rising procedure of the forest. Furthermore, the proposed method produces an ensemble not only correct, but also assorted, ensuring the two important properties that should characterize an ensemble classifier. For empirical evaluation of our proposed method we used UCI cancer diseases data set for classification. Our experimental result shows that better result in compression of random forest tree classification.

  10. A Comparative Study of Feature Selection and Classification Methods for Gene Expression Data

    KAUST Repository

    Abusamra, Heba

    2013-05-01

    Microarray technology has enriched the study of gene expression in such a way that scientists are now able to measure the expression levels of thousands of genes in a single experiment. Microarray gene expression data gained great importance in recent years due to its role in disease diagnoses and prognoses which help to choose the appropriate treatment plan for patients. This technology has shifted a new era in molecular classification, interpreting gene expression data remains a difficult problem and an active research area due to their native nature of “high dimensional low sample size”. Such problems pose great challenges to existing classification methods. Thus, effective feature selection techniques are often needed in this case to aid to correctly classify different tumor types and consequently lead to a better understanding of genetic signatures as well as improve treatment strategies. This thesis aims on a comparative study of state-of-the-art feature selection methods, classification methods, and the combination of them, based on gene expression data. We compared the efficiency of three different classification methods including: support vector machines, k- nearest neighbor and random forest, and eight different feature selection methods, including: information gain, twoing rule, sum minority, max minority, gini index, sum of variances, t- statistics, and one-dimension support vector machine. Five-fold cross validation was used to evaluate the classification performance. Two publicly available gene expression data sets of glioma were used for this study. Different experiments have been applied to compare the performance of the classification methods with and without performing feature selection. Results revealed the important role of feature selection in classifying gene expression data. By performing feature selection, the classification accuracy can be significantly boosted by using a small number of genes. The relationship of features selected in

  11. Classification and Diagnostic Output Prediction of Cancer Using Gene Expression Profiling and Supervised Machine Learning Algorithms

    DEFF Research Database (Denmark)

    Yoo, C.; Gernaey, Krist

    2008-01-01

    In this paper, a new supervised clustering and classification method is proposed. First, the application of discriminant partial least squares (DPLS) for the selection of a minimum number of key genes is applied on a gene expression microarray data set. Second, supervised hierarchical clustering ...

  12. Classification between normal and tumor tissues based on the pair-wise gene expression ratio

    Directory of Open Access Journals (Sweden)

    Wong YC

    2004-10-01

    Full Text Available Abstract Background Precise classification of cancer types is critically important for early cancer diagnosis and treatment. Numerous efforts have been made to use gene expression profiles to improve precision of tumor classification. However, reliable cancer-related signals are generally lacking. Method Using recent datasets on colon and prostate cancer, a data transformation procedure from single gene expression to pair-wise gene expression ratio is proposed. Making use of the internal consistency of each expression profiling dataset this transformation improves the signal to noise ratio of the dataset and uncovers new relevant cancer-related signals (features. The efficiency in using the transformed dataset to perform normal/tumor classification was investigated using feature partitioning with informative features (gene annotation as discriminating axes (single gene expression or pair-wise gene expression ratio. Classification results were compared to the original datasets for up to 10-feature model classifiers. Results 82 and 262 genes that have high correlation to tissue phenotype were selected from the colon and prostate datasets respectively. Remarkably, data transformation of the highly noisy expression data successfully led to lower the coefficient of variation (CV for the within-class samples as well as improved the correlation with tissue phenotypes. The transformed dataset exhibited lower CV when compared to that of single gene expression. In the colon cancer set, the minimum CV decreased from 45.3% to 16.5%. In prostate cancer, comparable CV was achieved with and without transformation. This improvement in CV, coupled with the improved correlation between the pair-wise gene expression ratio and tissue phenotypes, yielded higher classification efficiency, especially with the colon dataset – from 87.1% to 93.5%. Over 90% of the top ten discriminating axes in both datasets showed significant improvement after data transformation. The

  13. Evolution and Functional Classification of Vertebrate Gene Deserts

    Energy Technology Data Exchange (ETDEWEB)

    Ovcharenko, I; Loots, G; Nobrega, M; Hardison, R; Miller, W; Stubbs, L

    2004-07-14

    Gene deserts, long stretches of DNA sequence devoid of protein coding genes, span approximately one quarter of the human genome. Through human-chicken genome comparisons we were able to characterized one third of human gene deserts as evolutionarily stable - they are highly conserved in vertebrates, resist chromosomal rearrangements, and contain multiple conserved non-coding elements physically linked to their neighboring genes. A linear relationship was observed between human and chicken orthologous stable gene deserts, where the human deserts appear to have expanded homogeneously by a uniform accumulation of repetitive elements. Stable gene deserts are associated with key vertebrate genes that construct the framework of vertebrate development; many of which encode transcription factors. We show that the regulatory machinery governing genes associated with stable gene deserts operates differently from other regions in the human genome and relies heavily on distant regulatory elements. The regulation guided by these elements is independent of the distance between the gene and its distant regulatory element, or the distance between two distant regulatory cassettes. The location of gene deserts and their associated genes in the genome is independent of chromosomal length or content presenting these regions as well-bounded regions evolving separately from the rest of the genome.

  14. SoFoCles: feature filtering for microarray classification based on gene ontology.

    Science.gov (United States)

    Papachristoudis, Georgios; Diplaris, Sotiris; Mitkas, Pericles A

    2010-02-01

    Marker gene selection has been an important research topic in the classification analysis of gene expression data. Current methods try to reduce the "curse of dimensionality" by using statistical intra-feature set calculations, or classifiers that are based on the given dataset. In this paper, we present SoFoCles, an interactive tool that enables semantic feature filtering in microarray classification problems with the use of external, well-defined knowledge retrieved from the Gene Ontology. The notion of semantic similarity is used to derive genes that are involved in the same biological path during the microarray experiment, by enriching a feature set that has been initially produced with legacy methods. Among its other functionalities, SoFoCles offers a large repository of semantic similarity methods that are used in order to derive feature sets and marker genes. The structure and functionality of the tool are discussed in detail, as well as its ability to improve classification accuracy. Through experimental evaluation, SoFoCles is shown to outperform other classification schemes in terms of classification accuracy in two real datasets using different semantic similarity computation approaches.

  15. Gene selection and classification for cancer microarray data based on machine learning and similarity measures

    Directory of Open Access Journals (Sweden)

    Liu Qingzhong

    2011-12-01

    Full Text Available Abstract Background Microarray data have a high dimension of variables and a small sample size. In microarray data analyses, two important issues are how to choose genes, which provide reliable and good prediction for disease status, and how to determine the final gene set that is best for classification. Associations among genetic markers mean one can exploit information redundancy to potentially reduce classification cost in terms of time and money. Results To deal with redundant information and improve classification, we propose a gene selection method, Recursive Feature Addition, which combines supervised learning and statistical similarity measures. To determine the final optimal gene set for prediction and classification, we propose an algorithm, Lagging Prediction Peephole Optimization. By using six benchmark microarray gene expression data sets, we compared Recursive Feature Addition with recently developed gene selection methods: Support Vector Machine Recursive Feature Elimination, Leave-One-Out Calculation Sequential Forward Selection and several others. Conclusions On average, with the use of popular learning machines including Nearest Mean Scaled Classifier, Support Vector Machine, Naive Bayes Classifier and Random Forest, Recursive Feature Addition outperformed other methods. Our studies also showed that Lagging Prediction Peephole Optimization is superior to random strategy; Recursive Feature Addition with Lagging Prediction Peephole Optimization obtained better testing accuracies than the gene selection method varSelRF.

  16. Improving accuracy for cancer classification with a new algorithm for genes selection

    Directory of Open Access Journals (Sweden)

    Zhang Hongyan

    2012-11-01

    Full Text Available Abstract Background Even though the classification of cancer tissue samples based on gene expression data has advanced considerably in recent years, it faces great challenges to improve accuracy. One of the challenges is to establish an effective method that can select a parsimonious set of relevant genes. So far, most methods for gene selection in literature focus on screening individual or pairs of genes without considering the possible interactions among genes. Here we introduce a new computational method named the Binary Matrix Shuffling Filter (BMSF. It not only overcomes the difficulty associated with the search schemes of traditional wrapper methods and overfitting problem in large dimensional search space but also takes potential gene interactions into account during gene selection. This method, coupled with Support Vector Machine (SVM for implementation, often selects very small number of genes for easy model interpretability. Results We applied our method to 9 two-class gene expression datasets involving human cancers. During the gene selection process, the set of genes to be kept in the model was recursively refined and repeatedly updated according to the effect of a given gene on the contributions of other genes in reference to their usefulness in cancer classification. The small number of informative genes selected from each dataset leads to significantly improved leave-one-out (LOOCV classification accuracy across all 9 datasets for multiple classifiers. Our method also exhibits broad generalization in the genes selected since multiple commonly used classifiers achieved either equivalent or much higher LOOCV accuracy than those reported in literature. Conclusions Evaluation of a gene’s contribution to binary cancer classification is better to be considered after adjusting for the joint effect of a large number of other genes. A computationally efficient search scheme was provided to perform effective search in the extensive

  17. TSG: a new algorithm for binary and multi-class cancer classification and informative genes selection

    Directory of Open Access Journals (Sweden)

    Wang Haiyan

    2013-01-01

    Full Text Available Abstract Background One of the challenges in classification of cancer tissue samples based on gene expression data is to establish an effective method that can select a parsimonious set of informative genes. The Top Scoring Pair (TSP, k-Top Scoring Pairs (k-TSP, Support Vector Machines (SVM, and prediction analysis of microarrays (PAM are four popular classifiers that have comparable performance on multiple cancer datasets. SVM and PAM tend to use a large number of genes and TSP, k-TSP always use even number of genes. In addition, the selection of distinct gene pairs in k-TSP simply combined the pairs of top ranking genes without considering the fact that the gene set with best discrimination power may not be the combined pairs. The k-TSP algorithm also needs the user to specify an upper bound for the number of gene pairs. Here we introduce a computational algorithm to address the problems. The algorithm is named Chisquare-statistic-based Top Scoring Genes (Chi-TSG classifier simplified as TSG. Results The TSG classifier starts with the top two genes and sequentially adds additional gene into the candidate gene set to perform informative gene selection. The algorithm automatically reports the total number of informative genes selected with cross validation. We provide the algorithm for both binary and multi-class cancer classification. The algorithm was applied to 9 binary and 10 multi-class gene expression datasets involving human cancers. The TSG classifier outperforms TSP family classifiers by a big margin in most of the 19 datasets. In addition to improved accuracy, our classifier shares all the advantages of the TSP family classifiers including easy interpretation, invariant to monotone transformation, often selects a small number of informative genes allowing follow-up studies, resistant to sampling variations due to within sample operations. Conclusions Redefining the scores for gene set and the classification rules in TSP family

  18. Classification of Cancer Gene Selection Using Random Forest and Neural Network Based Ensemble Classifier

    Directory of Open Access Journals (Sweden)

    Jogendra Kushwah

    2013-06-01

    Full Text Available The free radical gene classification of cancerdiseasesis challenging job in biomedical dataengineering. The improving of classification of geneselection of cancer diseases various classifier areused, but the classification of classifier are notvalidate. So ensemble classifier is used for cancergene classification using neural network classifierwith random forest tree. The random forest tree isensembling technique of classifier in this techniquethe number of classifier ensemble of their leaf nodeof class of classifier. In this paper we combinedneuralnetwork with random forest ensembleclassifier for classification of cancer gene selectionfor diagnose analysis of cancer diseases.Theproposed method is different from most of themethods of ensemble classifier, which follow aninput output paradigm ofneural network, where themembers of the ensemble are selected from a set ofneural network classifier. the number of classifiersis determined during the rising procedure of theforest. Furthermore, the proposed method producesan ensemble not only correct, but also assorted,ensuring the two important properties that shouldcharacterize an ensemble classifier. For empiricalevaluation of our proposed method we used UCIcancer diseases data set for classification. Ourexperimental result shows that betterresult incompression of random forest tree classification

  19. Gene selection in class space for molecular classification of cancer

    Institute of Scientific and Technical Information of China (English)

    ZHANG Junying; Yue Joseph WANG; Javed KHAN; Robert CLARKE

    2004-01-01

    Gene selection (feature selection) is generally performed in gene space (feature space), where a very serious curse of dimensionality problem always exists because the number of genes is much larger than the number of samples in gene space (G-space). This results in difficulty in modeling the data set in this space and the low confidence of the result of gene selection. How to find a gene subset in this case is a challenging subject. In this paper, the above G-space is transformed into its dual space, referred to as class space (C-space) such that the number of dimensions is the very number of classes of the samples in G-space and the number of samples in C-space is the number of genes in G-space. It is obvious that the curse of dimensionality in C-space does not exist. A new gene selection method which is based on the principle of separating different classes as far as possible is presented with the help of Principal Component Analysis (PCA). The experimental results on gene selection for real data set are evaluated with Fisher criterion, weighted Fisher criterion as well as leave-one-out cross validation, showing that the method presented here is effective and efficient.

  20. Comparison of linear discriminant analysis methods for the classification of cancer based on gene expression data

    Directory of Open Access Journals (Sweden)

    He Miao

    2009-12-01

    Full Text Available Abstract Background More studies based on gene expression data have been reported in great detail, however, one major challenge for the methodologists is the choice of classification methods. The main purpose of this research was to compare the performance of linear discriminant analysis (LDA and its modification methods for the classification of cancer based on gene expression data. Methods The classification performance of linear discriminant analysis (LDA and its modification methods was evaluated by applying these methods to six public cancer gene expression datasets. These methods included linear discriminant analysis (LDA, prediction analysis for microarrays (PAM, shrinkage centroid regularized discriminant analysis (SCRDA, shrinkage linear discriminant analysis (SLDA and shrinkage diagonal discriminant analysis (SDDA. The procedures were performed by software R 2.80. Results PAM picked out fewer feature genes than other methods from most datasets except from Brain dataset. For the two methods of shrinkage discriminant analysis, SLDA selected more genes than SDDA from most datasets except from 2-class lung cancer dataset. When comparing SLDA with SCRDA, SLDA selected more genes than SCRDA from 2-class lung cancer, SRBCT and Brain dataset, the result was opposite for the rest datasets. The average test error of LDA modification methods was lower than LDA method. Conclusions The classification performance of LDA modification methods was superior to that of traditional LDA with respect to the average error and there was no significant difference between theses modification methods.

  1. A Comparative Study of Feature Selection and Classification Methods for Gene Expression Data of Glioma

    KAUST Repository

    Abusamra, Heba

    2013-11-01

    Microarray gene expression data gained great importance in recent years due to its role in disease diagnoses and prognoses which help to choose the appropriate treatment plan for patients. This technology has shifted a new era in molecular classification. Interpreting gene expression data remains a difficult problem and an active research area due to their native nature of “high dimensional low sample size”. Such problems pose great challenges to existing classification methods. Thus, effective feature selection techniques are often needed in this case to aid to correctly classify different tumor types and consequently lead to a better understanding of genetic signatures as well as improve treatment strategies. This paper aims on a comparative study of state-of-the- art feature selection methods, classification methods, and the combination of them, based on gene expression data. We compared the efficiency of three different classification methods including: support vector machines, k-nearest neighbor and random forest, and eight different feature selection methods, including: information gain, twoing rule, sum minority, max minority, gini index, sum of variances, t-statistics, and one-dimension support vector machine. Five-fold cross validation was used to evaluate the classification performance. Two publicly available gene expression data sets of glioma were used in the experiments. Results revealed the important role of feature selection in classifying gene expression data. By performing feature selection, the classification accuracy can be significantly boosted by using a small number of genes. The relationship of features selected in different feature selection methods is investigated and the most frequent features selected in each fold among all methods for both datasets are evaluated.

  2. New feature extraction in gene expression data for tumor classification

    Institute of Scientific and Technical Information of China (English)

    HE Renya; CHENG Qiansheng; WU Lianwen; YUAN Kehong

    2005-01-01

    Using gene expression data to discriminate tumor from the normal ones is a powerful method. However, it is sometimes difficult because the gene expression data are in high dimension and the object number of the data sets is very small. The key technique is to find a new gene expression profiling that can provide understanding and insight into tumor related cellular processes. In this paper, we propose a new feature extraction method based on variance to the center of the class and employ the support vector machine to recognize the gene data either normal or tumor. Two tumor data sets are used to demonstrate the effectiveness of our methods. The results show that the performance has been significantly improved.

  3. Regularized logistic regression with adjusted adaptive elastic net for gene selection in high dimensional cancer classification.

    Science.gov (United States)

    Algamal, Zakariya Yahya; Lee, Muhammad Hisyam

    2015-12-01

    Cancer classification and gene selection in high-dimensional data have been popular research topics in genetics and molecular biology. Recently, adaptive regularized logistic regression using the elastic net regularization, which is called the adaptive elastic net, has been successfully applied in high-dimensional cancer classification to tackle both estimating the gene coefficients and performing gene selection simultaneously. The adaptive elastic net originally used elastic net estimates as the initial weight, however, using this weight may not be preferable for certain reasons: First, the elastic net estimator is biased in selecting genes. Second, it does not perform well when the pairwise correlations between variables are not high. Adjusted adaptive regularized logistic regression (AAElastic) is proposed to address these issues and encourage grouping effects simultaneously. The real data results indicate that AAElastic is significantly consistent in selecting genes compared to the other three competitor regularization methods. Additionally, the classification performance of AAElastic is comparable to the adaptive elastic net and better than other regularization methods. Thus, we can conclude that AAElastic is a reliable adaptive regularized logistic regression method in the field of high-dimensional cancer classification.

  4. Improved Sparse Multi-Class SVM and Its Application for Gene Selection in Cancer Classification.

    Science.gov (United States)

    Huang, Lingkang; Zhang, Hao Helen; Zeng, Zhao-Bang; Bushel, Pierre R

    2013-01-01

    Microarray techniques provide promising tools for cancer diagnosis using gene expression profiles. However, molecular diagnosis based on high-throughput platforms presents great challenges due to the overwhelming number of variables versus the small sample size and the complex nature of multi-type tumors. Support vector machines (SVMs) have shown superior performance in cancer classification due to their ability to handle high dimensional low sample size data. The multi-class SVM algorithm of Crammer and Singer provides a natural framework for multi-class learning. Despite its effective performance, the procedure utilizes all variables without selection. In this paper, we propose to improve the procedure by imposing shrinkage penalties in learning to enforce solution sparsity. The original multi-class SVM of Crammer and Singer is effective for multi-class classification but does not conduct variable selection. We improved the method by introducing soft-thresholding type penalties to incorporate variable selection into multi-class classification for high dimensional data. The new methods were applied to simulated data and two cancer gene expression data sets. The results demonstrate that the new methods can select a small number of genes for building accurate multi-class classification rules. Furthermore, the important genes selected by the methods overlap significantly, suggesting general agreement among different variable selection schemes. High accuracy and sparsity make the new methods attractive for cancer diagnostics with gene expression data and defining targets of therapeutic intervention. The source MATLAB code are available from http://math.arizona.edu/~hzhang/software.html.

  5. Entropy-based gene ranking without selection bias for the predictive classification of microarray data

    Directory of Open Access Journals (Sweden)

    Serafini Maria

    2003-11-01

    Full Text Available Abstract Background We describe the E-RFE method for gene ranking, which is useful for the identification of markers in the predictive classification of array data. The method supports a practical modeling scheme designed to avoid the construction of classification rules based on the selection of too small gene subsets (an effect known as the selection bias, in which the estimated predictive errors are too optimistic due to testing on samples already considered in the feature selection process. Results With E-RFE, we speed up the recursive feature elimination (RFE with SVM classifiers by eliminating chunks of uninteresting genes using an entropy measure of the SVM weights distribution. An optimal subset of genes is selected according to a two-strata model evaluation procedure: modeling is replicated by an external stratified-partition resampling scheme, and, within each run, an internal K-fold cross-validation is used for E-RFE ranking. Also, the optimal number of genes can be estimated according to the saturation of Zipf's law profiles. Conclusions Without a decrease of classification accuracy, E-RFE allows a speed-up factor of 100 with respect to standard RFE, while improving on alternative parametric RFE reduction strategies. Thus, a process for gene selection and error estimation is made practical, ensuring control of the selection bias, and providing additional diagnostic indicators of gene importance.

  6. An Improved Parallelized mRMR for Gene Subset Selection in Cancer Classification

    Directory of Open Access Journals (Sweden)

    Rohani Mohammad Kusairi

    2017-09-01

    Full Text Available DNA microarray technique has become a more attractive tool for cancer classification in the scientific and industrial fields. Based on the previous researchers, the conventional approach for cancer classification is primarily based on morphological appearance of the tumor. The limitations of this approach are bias in identify the tumors by expert and faced the difficulty in differentiate the cancer subtypes due to most cancers being highly related to the specific biological insight.  Thus, this study propose an improved parallelized Minimum Redundancy Maximum Relevance (mRMR, which is a particularly fast feature selection method for finding a set of both relevant and complementary features. The mRMR can identify genes more relevance to biological context that leads to richer biological interpretations. The proposed method is expected to achieve accurate classification performance using small number of predictive genes when tested using two datasets from Cancer Genome Project and compared to previous methods.

  7. CARD15 gene and the classification of Crohn's disease

    NARCIS (Netherlands)

    Murillo, L; Crusius, JBA; van Bodegraven, AA; Alizadeh, BZ; Pena, AS

    2002-01-01

    An insertion mutation at nucleotide 3020 (3020insC) in the CARD15 gene, originally reported as NOD2, has been strongly associated with Crohn's disease. The CARD15 G2722C missense mutation was also shown to be associated with this disease. We studied 130 Dutch Crohn's disease patients, with a median

  8. Classification of Breast Cancer Subtypes by combining Gene Expression and DNA Methylation Data

    DEFF Research Database (Denmark)

    List, Markus; Hauschild, Anne-Christin; Tan, Qihua

    2014-01-01

    on the transcriptomic, but also on an epigenetic level. We compared so-called random forest derived classification models based on gene expression and methylation data alone, to a model based on the combined features and to a model based on the gold standard PAM50. We obtained bootstrap errors of 10...

  9. Interactive Naive Bayesian network: A new approach of constructing gene-gene interaction network for cancer classification.

    Science.gov (United States)

    Tian, Xue W; Lim, Joon S

    2015-01-01

    Naive Bayesian (NB) network classifier is a simple and well-known type of classifier, which can be easily induced from a DNA microarray data set. However, a strong conditional independence assumption of NB network sometimes can lead to weak classification performance. In this paper, we propose a new approach of interactive naive Bayesian (INB) network to weaken the conditional independence of NB network and classify cancers using DNA microarray data set. We selected the differently expressed genes (DEGs) to reduce the dimension of the microarray data set. Then, an interactive parent which has the biggest influence among all DEGs is searched for each DEG. And then we calculate a weight to represent the interactive relationship between a DEG and its parent. Finally, the gene-gene interaction network is constructed. We experimentally test the INB network in terms of classification accuracy using leukemia and colon DNA microarray data sets, then we compare it with the NB network. The INB network can get higher classification accuracies than NB network. And INB network can show the gene-gene interactions visually.

  10. Review on Feature Selection Techniques and the Impact of SVM for Cancer Classification using Gene Expression Profile

    CERN Document Server

    George, G Victo Sudha; 10.5121/ijcses.2011.2302

    2011-01-01

    The DNA microarray technology has modernized the approach of biology research in such a way that scientists can now measure the expression levels of thousands of genes simultaneously in a single experiment. Gene expression profiles, which represent the state of a cell at a molecular level, have great potential as a medical diagnosis tool. But compared to the number of genes involved, available training data sets generally have a fairly small sample size for classification. These training data limitations constitute a challenge to certain classification methodologies. Feature selection techniques can be used to extract the marker genes which influence the classification accuracy effectively by eliminating the un wanted noisy and redundant genes This paper presents a review of feature selection techniques that have been employed in micro array data based cancer classification and also the predominant role of SVM for cancer classification.

  11. A Region-Based GeneSIS Segmentation Algorithm for the Classification of Remotely Sensed Images

    Directory of Open Access Journals (Sweden)

    Stelios K. Mylonas

    2015-03-01

    Full Text Available This paper proposes an object-based segmentation/classification scheme for remotely sensed images, based on a novel variant of the recently proposed Genetic Sequential Image Segmentation (GeneSIS algorithm. GeneSIS segments the image in an iterative manner, whereby at each iteration a single object is extracted via a genetic-based object extraction algorithm. Contrary to the previous pixel-based GeneSIS where the candidate objects to be extracted were evaluated through the fuzzy content of their included pixels, in the newly developed region-based GeneSIS algorithm, a watershed-driven fine segmentation map is initially obtained from the original image, which serves as the basis for the forthcoming GeneSIS segmentation. Furthermore, in order to enhance the spatial search capabilities, we introduce a more descriptive encoding scheme in the object extraction algorithm, where the structural search modules are represented by polygonal shapes. Our objectives in the new framework are posed as follows: enhance the flexibility of the algorithm in extracting more flexible object shapes, assure high level classification accuracies, and reduce the execution time of the segmentation, while at the same time preserving all the inherent attributes of the GeneSIS approach. Finally, exploiting the inherent attribute of GeneSIS to produce multiple segmentations, we also propose two segmentation fusion schemes that operate on the ensemble of segmentations generated by GeneSIS. Our approaches are tested on an urban and two agricultural images. The results show that region-based GeneSIS has considerably lower computational demands compared to the pixel-based one. Furthermore, the suggested methods achieve higher classification accuracies and good segmentation maps compared to a series of existing algorithms.

  12. Collagen-rich stroma in aggressive colon tumors induces mesenchymal gene expression and tumor cell invasion

    NARCIS (Netherlands)

    Vellinga, T T; den Uil, S; Rinkes, IHB; Marvin, D; Ponsioen, B; Alvarez-Varela, A; Fatrai, S; Scheele, C; Zwijnenburg, D A; Snippert, H; Vermeulen, L; Medema, J P; Stockmann, H B; Koster, J; Fijneman, R J A; de Rooij, J; Kranenburg, O

    2016-01-01

    Gene expression-based classification systems have identified an aggressive colon cancer subtype with mesenchymal features, possibly reflecting epithelial-to-mesenchymal transition (EMT) of tumor cells. However, stromal fibroblasts contribute extensively to the mesenchymal phenotype of aggressive col

  13. Transcription activator-like effector-mediated regulation of gene expression based on the inducible packaging and delivery via designed extracellular vesicles.

    Science.gov (United States)

    Lainšček, Duško; Lebar, Tina; Jerala, Roman

    2017-02-26

    Transcription activator-like effector (TALE) proteins present a powerful tool for genome editing and engineering, enabling introduction of site-specific mutations, gene knockouts or regulation of the transcription levels of selected genes. TALE nucleases or TALE-based transcription regulators are introduced into mammalian cells mainly via delivery of the coding genes. Here we report an extracellular vesicle-mediated delivery of TALE transcription regulators and their ability to upregulate the reporter gene in target cells. Designed transcriptional activator TALE-VP16 fused to the appropriate dimerization domain was enriched as a cargo protein within extracellular vesicles produced by mammalian HEK293 cells stimulated by Ca-ionophore and using blue light- or rapamycin-inducible dimerization systems. Blue light illumination or rapamycin increased the amount of the TALE-VP16 activator in extracellular vesicles and their addition to the target cells resulted in an increased expression of the reporter gene upon addition of extracellular vesicles to the target cells. This technology therefore represents an efficient delivery for the TALE-based transcriptional regulators.

  14. RNAi and Homologous Over-Expression Based Functional Approaches Reveal Triterpenoid Synthase Gene-Cycloartenol Synthase Is Involved in Downstream Withanolide Biosynthesis in Withania somnifera.

    Directory of Open Access Journals (Sweden)

    Smrati Mishra

    Full Text Available Withania somnifera Dunal, is one of the most commonly used medicinal plant in Ayurvedic and indigenous medicine traditionally owing to its therapeutic potential, because of major chemical constituents, withanolides. Withanolide biosynthesis requires the activities of several enzymes in vivo. Cycloartenol synthase (CAS is an important enzyme in the withanolide biosynthetic pathway, catalyzing cyclization of 2, 3 oxidosqualene into cycloartenol. In the present study, we have cloned full-length WsCAS from Withania somnifera by homology-based PCR method. For gene function investigation, we constructed three RNAi gene-silencing constructs in backbone of RNAi vector pGSA and a full-length over-expression construct. These constructs were transformed in Agrobacterium strain GV3101 for plant transformation in W. somnifera. Molecular and metabolite analysis was performed in putative Withania transformants. The PCR and Southern blot results showed the genomic integration of these RNAi and overexpression construct(s in Withania genome. The qRT-PCR analysis showed that the expression of WsCAS gene was considerably downregulated in stable transgenic silenced Withania lines compared with the non-transformed control and HPLC analysis showed that withanolide content was greatly reduced in silenced lines. Transgenic plants over expressing CAS gene displayed enhanced level of CAS transcript and withanolide content compared to non-transformed controls. This work is the first full proof report of functional validation of any metabolic pathway gene in W. somnifera at whole plant level as per our knowledge and it will be further useful to understand the regulatory role of different genes involved in the biosynthesis of withanolides.

  15. RNAi and Homologous Over-Expression Based Functional Approaches Reveal Triterpenoid Synthase Gene-Cycloartenol Synthase Is Involved in Downstream Withanolide Biosynthesis in Withania somnifera.

    Science.gov (United States)

    Mishra, Smrati; Bansal, Shilpi; Mishra, Bhawana; Sangwan, Rajender Singh; Asha; Jadaun, Jyoti Singh; Sangwan, Neelam S

    2016-01-01

    Withania somnifera Dunal, is one of the most commonly used medicinal plant in Ayurvedic and indigenous medicine traditionally owing to its therapeutic potential, because of major chemical constituents, withanolides. Withanolide biosynthesis requires the activities of several enzymes in vivo. Cycloartenol synthase (CAS) is an important enzyme in the withanolide biosynthetic pathway, catalyzing cyclization of 2, 3 oxidosqualene into cycloartenol. In the present study, we have cloned full-length WsCAS from Withania somnifera by homology-based PCR method. For gene function investigation, we constructed three RNAi gene-silencing constructs in backbone of RNAi vector pGSA and a full-length over-expression construct. These constructs were transformed in Agrobacterium strain GV3101 for plant transformation in W. somnifera. Molecular and metabolite analysis was performed in putative Withania transformants. The PCR and Southern blot results showed the genomic integration of these RNAi and overexpression construct(s) in Withania genome. The qRT-PCR analysis showed that the expression of WsCAS gene was considerably downregulated in stable transgenic silenced Withania lines compared with the non-transformed control and HPLC analysis showed that withanolide content was greatly reduced in silenced lines. Transgenic plants over expressing CAS gene displayed enhanced level of CAS transcript and withanolide content compared to non-transformed controls. This work is the first full proof report of functional validation of any metabolic pathway gene in W. somnifera at whole plant level as per our knowledge and it will be further useful to understand the regulatory role of different genes involved in the biosynthesis of withanolides.

  16. Genetic Bee Colony (GBC) algorithm: A new gene selection method for microarray cancer classification.

    Science.gov (United States)

    Alshamlan, Hala M; Badr, Ghada H; Alohali, Yousef A

    2015-06-01

    Naturally inspired evolutionary algorithms prove effectiveness when used for solving feature selection and classification problems. Artificial Bee Colony (ABC) is a relatively new swarm intelligence method. In this paper, we propose a new hybrid gene selection method, namely Genetic Bee Colony (GBC) algorithm. The proposed algorithm combines the used of a Genetic Algorithm (GA) along with Artificial Bee Colony (ABC) algorithm. The goal is to integrate the advantages of both algorithms. The proposed algorithm is applied to a microarray gene expression profile in order to select the most predictive and informative genes for cancer classification. In order to test the accuracy performance of the proposed algorithm, extensive experiments were conducted. Three binary microarray datasets are use, which include: colon, leukemia, and lung. In addition, another three multi-class microarray datasets are used, which are: SRBCT, lymphoma, and leukemia. Results of the GBC algorithm are compared with our recently proposed technique: mRMR when combined with the Artificial Bee Colony algorithm (mRMR-ABC). We also compared the combination of mRMR with GA (mRMR-GA) and Particle Swarm Optimization (mRMR-PSO) algorithms. In addition, we compared the GBC algorithm with other related algorithms that have been recently published in the literature, using all benchmark datasets. The GBC algorithm shows superior performance as it achieved the highest classification accuracy along with the lowest average number of selected genes. This proves that the GBC algorithm is a promising approach for solving the gene selection problem in both binary and multi-class cancer classification.

  17. Tumor Classification Using High-Order Gene Expression Profiles Based on Multilinear ICA

    Directory of Open Access Journals (Sweden)

    Ming-gang Du

    2009-01-01

    Full Text Available Motivation. Independent Components Analysis (ICA maximizes the statistical independence of the representational components of a training gene expression profiles (GEP ensemble, but it cannot distinguish relations between the different factors, or different modes, and it is not available to high-order GEP Data Mining. In order to generalize ICA, we introduce Multilinear-ICA and apply it to tumor classification using high order GEP. Firstly, we introduce the basis conceptions and operations of tensor and recommend Support Vector Machine (SVM classifier and Multilinear-ICA. Secondly, the higher score genes of original high order GEP are selected by using t-statistics and tabulate tensors. Thirdly, the tensors are performed by Multilinear-ICA. Finally, the SVM is used to classify the tumor subtypes. Results. To show the validity of the proposed method, we apply it to tumor classification using high order GEP. Though we only use three datasets, the experimental results show that the method is effective and feasible. Through this survey, we hope to gain some insight into the problem of high order GEP tumor classification, in aid of further developing more effective tumor classification algorithms.

  18. Angiotensinogen gene polymorphism predicts hypertension, and iridological constitutional classification enhances the risk for hypertension in Koreans.

    Science.gov (United States)

    Cho, Joo-Jang; Hwang, Woo-Jun; Hong, Seung-Heon; Jeong, Hyun-Ja; Lee, Hye-Jung; Kim, Hyung-Min; Um, Jae-Young

    2008-05-01

    This study investigated the relationship between iridological constitution and angiotensinogen (AGN) gene polymorphism in hypertensives. In addition to angiotensin converting enzyme gene, AGN genotype is also one of the most well studied genetic markers of hypertension. Furthermore, iridology, one of complementary and alternative medicine, is the diagnosis of the medical conditions through noting irregularities of the pigmentation in the iris. Iridological constitution has a strong familial aggregation and is implicated in heredity. Therefore, the study classified 87 hypertensive patients with familial history of cerebral infarction and controls (n = 88) according to Iris constitution, and determined AGN genotype. As a result, the AGN/TT genotype was associated with hypertension (chi2 = 13.413, p iridological constitutional classification increased the relative risk for hypertension in the subjects with AGN/T allele. These results suggest that AGN polymorphism predicts hypertension, and iridological constitutional classification enhances the risk for hypertension associated with AGN/T in a Korean population.

  19. Classification of genes and putative biomarker identification using distribution metrics on expression profiles.

    Directory of Open Access Journals (Sweden)

    Hung-Chung Huang

    Full Text Available BACKGROUND: Identification of genes with switch-like properties will facilitate discovery of regulatory mechanisms that underlie these properties, and will provide knowledge for the appropriate application of Boolean networks in gene regulatory models. As switch-like behavior is likely associated with tissue-specific expression, these gene products are expected to be plausible candidates as tissue-specific biomarkers. METHODOLOGY/PRINCIPAL FINDINGS: In a systematic classification of genes and search for biomarkers, gene expression profiles (GEPs of more than 16,000 genes from 2,145 mouse array samples were analyzed. Four distribution metrics (mean, standard deviation, kurtosis and skewness were used to classify GEPs into four categories: predominantly-off, predominantly-on, graded (rheostatic, and switch-like genes. The arrays under study were also grouped and examined by tissue type. For example, arrays were categorized as 'brain group' and 'non-brain group'; the Kolmogorov-Smirnov distance and Pearson correlation coefficient were then used to compare GEPs between brain and non-brain for each gene. We were thus able to identify tissue-specific biomarker candidate genes. CONCLUSIONS/SIGNIFICANCE: The methodology employed here may be used to facilitate disease-specific biomarker discovery.

  20. Classifying genes to the correct Gene Ontology Slim term in Saccharomyces cerevisiae using neighbouring genes with classification learning

    OpenAIRE

    Tsatsoulis Costas; Amthauer Heather A

    2010-01-01

    Abstract Background There is increasing evidence that gene location and surrounding genes influence the functionality of genes in the eukaryotic genome. Knowing the Gene Ontology Slim terms associated with a gene gives us insight into a gene's functionality by informing us how its gene product behaves in a cellular context using three different ontologies: molecular function, biological process, and cellular component. In this study, we analyzed if we could classify a gene in Saccharomyces ce...

  1. Genome-Wide Identification and Functional Classification of Tomato (Solanum lycopersicum) Aldehyde Dehydrogenase (ALDH) Gene Superfamily.

    Science.gov (United States)

    Jimenez-Lopez, Jose C; Lopez-Valverde, Francisco J; Robles-Bolivar, Paula; Lima-Cabello, Elena; Gachomo, Emma W; Kotchoni, Simeon O

    2016-01-01

    Aldehyde dehydrogenases (ALDHs) is a protein superfamily that catalyzes the oxidation of aldehyde molecules into their corresponding non-toxic carboxylic acids, and responding to different environmental stresses, offering promising genetic approaches for improving plant adaptation. The aim of the current study is the functional analysis for systematic identification of S. lycopersicum ALDH gene superfamily. We performed genome-based ALDH genes identification and functional classification, phylogenetic relationship, structure and catalytic domains analysis, and microarray based gene expression. Twenty nine unique tomato ALDH sequences encoding 11 ALDH families were identified, including a unique member of the family 19 ALDH. Phylogenetic analysis revealed 13 groups, with a conserved relationship among ALDH families. Functional structure analysis of ALDH2 showed a catalytic mechanism involving Cys-Glu couple. However, the analysis of ALDH3 showed no functional gene duplication or potential neo-functionalities. Gene expression analysis reveals that particular ALDH genes might respond to wounding stress increasing the expression as ALDH2B7. Overall, this study reveals the complexity of S. lycopersicum ALDH gene superfamily and offers new insights into the structure-functional features and evolution of ALDH gene families in vascular plants. The functional characterization of ALDHs is valuable and promoting molecular breeding in tomato for the improvement of stress tolerance and signaling.

  2. Gene classification based on amino acid motifs and residues: the DLX (distal-less test case.

    Directory of Open Access Journals (Sweden)

    Nuno A Fonseca

    Full Text Available BACKGROUND: Comparative studies using hundreds of sequences can give a detailed picture of the evolution of a given gene family. Nevertheless, retrieving only the sequences of interest from public databases can be difficult, in particular, when working with highly divergent sequences. The difficulty increases substantially when one wants to include in the study sequences from many (or less well studied species whose genomes are non-annotated or incompletely annotated. METHODOLOGY/PRINCIPAL FINDINGS: In this work we evaluate the usefulness of different approaches of gene retrieval and classification, using the distal-less (DLX gene family as a test case. Furthermore, we evaluate whether the use of a large number of gene sequences from a wide range of animal species, the use of multiple alternative alignments, and the use of amino acids aligned with high confidence only, is enough to recover the accepted DLX evolutionary history. CONCLUSIONS/SIGNIFICANCE: The canonical DLX homeobox gene sequence here derived, together with the characteristic amino acid variants here identified in the DLX homeodomain region, can be used to retrieve and classify DLX genes in a simple and efficient way. A program is made available that allows the easy retrieval of synteny information that can be used to classify gene sequences. Maximum likelihood trees using hundreds of sequences can be used for gene identification. Nevertheless, for the DLX case, the proposed DLX evolutionary is not recovered even when multiple alignment algorithms are used.

  3. Gene expression profiling for molecular classification of multiple myeloma in newly diagnosed patients.

    Science.gov (United States)

    Broyl, Annemiek; Hose, Dirk; Lokhorst, Henk; de Knegt, Yvonne; Peeters, Justine; Jauch, Anna; Bertsch, Uta; Buijs, Arjan; Stevens-Kroef, Marian; Beverloo, H Berna; Vellenga, Edo; Zweegman, Sonja; Kersten, Marie-Josée; van der Holt, Bronno; el Jarari, Laila; Mulligan, George; Goldschmidt, Hartmut; van Duin, Mark; Sonneveld, Pieter

    2010-10-07

    To identify molecularly defined subgroups in multiple myeloma, gene expression profiling was performed on purified CD138(+) plasma cells of 320 newly diagnosed myeloma patients included in the Dutch-Belgian/German HOVON-65/GMMG-HD4 trial. Hierarchical clustering identified 10 subgroups; 6 corresponded to clusters described in the University of Arkansas for Medical Science (UAMS) classification, CD-1 (n = 13, 4.1%), CD-2 (n = 34, 1.6%), MF (n = 32, 1.0%), MS (n = 33, 1.3%), proliferation-associated genes (n = 15, 4.7%), and hyperdiploid (n = 77, 24.1%). Moreover, the UAMS low percentage of bone disease cluster was identified as a subcluster of the MF cluster (n = 15, 4.7%). One subgroup (n = 39, 12.2%) showed a myeloid signature. Three novel subgroups were defined, including a subgroup of 37 patients (11.6%) characterized by high expression of genes involved in the nuclear factor kappa light-chain-enhancer of activated B cells pathway, which include TNFAIP3 and CD40. Another subgroup of 22 patients (6.9%) was characterized by distinct overexpression of cancer testis antigens without overexpression of proliferation genes. The third novel cluster of 9 patients (2.8%) showed up-regulation of protein tyrosine phosphatases PRL-3 and PTPRZ1 as well as SOCS3. To conclude, in addition to 7 clusters described in the UAMS classification, we identified 3 novel subsets of multiple myeloma that may represent unique diagnostic entities.

  4. Sparse representation of multi parametric DCE-MRI features using K-SVD for classifying gene expression based breast cancer recurrence risk

    Science.gov (United States)

    Mahrooghy, Majid; Ashraf, Ahmed B.; Daye, Dania; Mies, Carolyn; Rosen, Mark; Feldman, Michael; Kontos, Despina

    2014-03-01

    We evaluate the prognostic value of sparse representation-based features by applying the K-SVD algorithm on multiparametric kinetic, textural, and morphologic features in breast dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI). K-SVD is an iterative dimensionality reduction method that optimally reduces the initial feature space by updating the dictionary columns jointly with the sparse representation coefficients. Therefore, by using K-SVD, we not only provide sparse representation of the features and condense the information in a few coefficients but also we reduce the dimensionality. The extracted K-SVD features are evaluated by a machine learning algorithm including a logistic regression classifier for the task of classifying high versus low breast cancer recurrence risk as determined by a validated gene expression assay. The features are evaluated using ROC curve analysis and leave one-out cross validation for different sparse representation and dimensionality reduction numbers. Optimal sparse representation is obtained when the number of dictionary elements is 4 (K=4) and maximum non-zero coefficients is 2 (L=2). We compare K-SVD with ANOVA based feature selection for the same prognostic features. The ROC results show that the AUC of the K-SVD based (K=4, L=2), the ANOVA based, and the original features (i.e., no dimensionality reduction) are 0.78, 0.71. and 0.68, respectively. From the results, it can be inferred that by using sparse representation of the originally extracted multi-parametric, high-dimensional data, we can condense the information on a few coefficients with the highest predictive value. In addition, the dimensionality reduction introduced by K-SVD can prevent models from over-fitting.

  5. A systemic lupus erythematosus gene expression array in disease diagnosis and classification: a preliminary report.

    Science.gov (United States)

    Juang, Y-T; Peoples, C; Kafri, R; Kyttaris, V C; Sunahori, K; Kis-Toth, K; Fitzgerald, L; Ergin, S; Finnell, M; Tsokos, G C

    2011-03-01

    Systemic lupus erythematosus (SLE) is a clinically heterogeneous disease diagnosed on the presence of a constellation of clinical and laboratory findings. At the pathogenetic level, multiple factors using diverse biochemical and molecular pathways have been recognized. Succinct recognition and classification of clinical disease subsets, as well as the availability of disease biomarkers, remains largely unsolved. Based on information produced by the present authors' and other laboratories, a lupus gene expression array consisting of 30 genes, previously claimed to contribute to aberrant function of T cells, was developed. An additional eight genes were included as controls. Peripheral blood was obtained from 10 patients (19 samples) with SLE and six patients with rheumatoid arthritis (RA) as well as 19 healthy controls. T cell mRNA was subjected to reverse transcription and PCR, and the gene expression levels were measured. Conventional statistical analysis was performed along with principal component analysis (PCA) to capture the contribution of all genes to disease diagnosis and clinical parameters. The lupus gene expression array faithfully informed on the expression levels of genes. The recorded changes in expression reflect those reported in the literature by using a relatively small (5 ml) amount of peripheral blood. PCA of gene expression levels placed SLE samples apart from normal and RA samples regardless of disease activity. Individual principal components tended to define specific disease manifestations such as arthritis and proteinuria. Thus, a lupus gene expression array based on genes previously claimed to contribute to immune pathogenesis of SLE may define the disease, and principal components of the expression of 30 genes may define patients with specific disease manifestations.

  6. Computational Identification and Systematic Classification of Novel Cytochrome P450 Genes in Salvia miltiorrhiza.

    Directory of Open Access Journals (Sweden)

    Haimei Chen

    Full Text Available Salvia miltiorrhiza is one of the most economically important medicinal plants. Cytochrome P450 (CYP450 genes have been implicated in the biosynthesis of its active components. However, only a dozen full-length CYP450 genes have been described, and there is no systematic classification of CYP450 genes in S. miltiorrhiza. We obtained 77,549 unigenes from three tissue types of S. miltiorrhiza using RNA-Seq technology. Combining our data with previously identified CYP450 sequences and scanning with the CYP450 model from Pfam resulted in the identification of 116 full-length and 135 partial-length CYP450 genes. The 116 genes were classified into 9 clans and 38 families using standard criteria. The RNA-Seq results showed that 35 CYP450 genes were co-expressed with CYP76AH1, a marker gene for tanshinone biosynthesis, using r≥0.9 as a cutoff. The expression profiles for 16 of 19 randomly selected CYP450 obtained from RNA-Seq were validated by qRT-PCR. Comparing against the KEGG database, 10 CYP450 genes were found to be associated with diterpenoid biosynthesis. Considering all the evidence, 3 CYP450 genes were identified to be potentially involved in terpenoid biosynthesis. Moreover, we found that 15 CYP450 genes were possibly regulated by antisense transcripts (r≥0.9 or r≤-0.9. Lastly, a web resource (SMCYP450, http://www.herbalgenomics.org/samicyp450 was set up, which allows users to browse, search, retrieve and compare CYP450 genes and can serve as a centralized resource.

  7. Finding Combination of Features from Promoter Regions for Ovarian Cancer-related Gene Group Classification

    KAUST Repository

    Olayan, Rawan S.

    2012-12-01

    In classification problems, it is always important to use the suitable combination of features that will be employed by classifiers. Generating the right combination of features usually results in good classifiers. In the situation when the problem is not well understood, data items are usually described by many features in the hope that some of these may be the relevant or most relevant ones. In this study, we focus on one such problem related to genes implicated in ovarian cancer (OC). We try to recognize two important OC-related gene groups: oncogenes, which support the development and progression of OC, and oncosuppressors, which oppose such tendencies. For this, we use the properties of promoters of these genes. We identified potential “regulatory features” that characterize OC-related oncogenes and oncosuppressors promoters. In our study, we used 211 oncogenes and 39 oncosuppressors. For these, we identified 538 characteristic sequence motifs from their promoters. Promoters are annotated by these motifs and derived feature vectors used to develop classification models. We made a comparison of a number of classification models in their ability to distinguish oncogenes from oncosuppressors. Based on 10-fold cross-validation, the resultant model was able to separate the two classes with sensitivity of 96% and specificity of 100% with the complete set of features. Moreover, we developed another recognition model where we attempted to distinguish oncogenes and oncosuppressors as one group from other OC-related genes. That model achieved accuracy of 82%. We believe that the results of this study will help in discovering other OC-related oncogenes and oncosuppressors not identified as yet.

  8. Inflammation, Adenoma and Cancer: Objective Classification of Colon Biopsy Specimens with Gene Expression Signature

    Directory of Open Access Journals (Sweden)

    Orsolya Galamb

    2008-01-01

    Full Text Available Gene expression analysis of colon biopsies using high-density oligonucleotide microarrays can contribute to the understanding of local pathophysiological alterations and to functional classification of adenoma (15 samples, colorectal carcinomas (CRC (15 and inflammatory bowel diseases (IBD (14. Total RNA was extracted, amplified and biotinylated from frozen colonic biopsies. Genome-wide gene expression profile was evaluated by HGU133plus2 microarrays and verified by RT-PCR. We applied two independent methods for data normalization and used PAM for feature selection. Leave one-out stepwise discriminant analysis was performed. Top validated genes included collagenIVα1, lipocalin-2, calumenin, aquaporin-8 genes in CRC; CD44, met proto-oncogene, chemokine ligand-12, ADAM-like decysin-1 and ATP-binding casette-A8 genes in adenoma; and lipocalin-2, ubiquitin D and IFITM2 genes in IBD. Best differentiating markers between Ulcerative colitis and Crohn's disease were cyclin-G2; tripartite motif-containing-31; TNFR shedding aminopeptidase regulator-1 and AMICA. The discriminant analysis was able to classify the samples in overall 96.2% using 7 discriminatory genes (indoleamine-pyrrole-2,3-dioxygenase, ectodermal-neural cortex, TIMP3, fucosyltransferase-8, collectin sub-family member 12, carboxypeptidase D, and transglutaminase-2. Using routine biopsy samples we successfully performed whole genomic microarray analysis to identify discriminative signatures. Our results provide further insight into the pathophysiological background of colonic diseases. The results set up data warehouse which can be mined further.

  9. Cascaded Factor Analysis and Wavelet Transform Method for Tumor Classification Using Gene Expression Data

    Directory of Open Access Journals (Sweden)

    Jayakishan Meher

    2012-08-01

    Full Text Available Correlation between gene expression profiles to disease or different developmental stages of a cell through microarray data and its analysis has been a great deal in molecular biology. As the microarray data have thousands of genes and very few sample, thus efficient feature extraction and computational method development is necessary for the analysis. In this paper we have proposed an effective feature extraction method based on factor analysis (FA with discrete wavelet transform (DWT to detect informative genes. Radial basis function neural network (RBFNN classifier is used to efficiently predict the sample class which has a low complexity than other classifier. The potential of the proposed approach is evaluated through an exhaustive study by many benchmark datasets. The experimental results show that the proposed method can be a useful approach for cancer classification.

  10. A combinational feature selection and ensemble neural network method for classification of gene expression data

    Directory of Open Access Journals (Sweden)

    Jiang Tianzi

    2004-09-01

    Full Text Available Abstract Background Microarray experiments are becoming a powerful tool for clinical diagnosis, as they have the potential to discover gene expression patterns that are characteristic for a particular disease. To date, this problem has received most attention in the context of cancer research, especially in tumor classification. Various feature selection methods and classifier design strategies also have been generally used and compared. However, most published articles on tumor classification have applied a certain technique to a certain dataset, and recently several researchers compared these techniques based on several public datasets. But, it has been verified that differently selected features reflect different aspects of the dataset and some selected features can obtain better solutions on some certain problems. At the same time, faced with a large amount of microarray data with little knowledge, it is difficult to find the intrinsic characteristics using traditional methods. In this paper, we attempt to introduce a combinational feature selection method in conjunction with ensemble neural networks to generally improve the accuracy and robustness of sample classification. Results We validate our new method on several recent publicly available datasets both with predictive accuracy of testing samples and through cross validation. Compared with the best performance of other current methods, remarkably improved results can be obtained using our new strategy on a wide range of different datasets. Conclusions Thus, we conclude that our methods can obtain more information in microarray data to get more accurate classification and also can help to extract the latent marker genes of the diseases for better diagnosis and treatment.

  11. Differential distribution improves gene selection stability and has competitive classification performance for patient survival.

    Science.gov (United States)

    Strbenac, Dario; Mann, Graham J; Yang, Jean Y H; Ormerod, John T

    2016-07-27

    A consistent difference in average expression level, often referred to as differential expression (DE), has long been used to identify genes useful for classification. However, recent cancer studies have shown that when transcription factors or epigenetic signals become deregulated, a change in expression variability (DV) of target genes is frequently observed. This suggests that assessing the importance of genes by either differential expression or variability alone potentially misses sets of important biomarkers that could lead to improved predictions and treatments. Here, we describe a new approach for assessing the importance of genes based on differential distribution (DD), which combines information from differential expression and differential variability into a unified metric. We show that feature ranking and selection stability based on DD can perform two to three times better than DE or DV alone, and that DD yields equivalent error rates to DE and DV. Finally, assessing genes via differential distribution produces a complementary set of selected genes to DE and DV, potentially opening up new categories of biomarkers.

  12. Gene expression classification of colon cancer into molecular subtypes: characterization, validation, and prognostic value.

    Directory of Open Access Journals (Sweden)

    Laetitia Marisa

    Full Text Available BACKGROUND: Colon cancer (CC pathological staging fails to accurately predict recurrence, and to date, no gene expression signature has proven reliable for prognosis stratification in clinical practice, perhaps because CC is a heterogeneous disease. The aim of this study was to establish a comprehensive molecular classification of CC based on mRNA expression profile analyses. METHODS AND FINDINGS: Fresh-frozen primary tumor samples from a large multicenter cohort of 750 patients with stage I to IV CC who underwent surgery between 1987 and 2007 in seven centers were characterized for common DNA alterations, including BRAF, KRAS, and TP53 mutations, CpG island methylator phenotype, mismatch repair status, and chromosomal instability status, and were screened with whole genome and transcriptome arrays. 566 samples fulfilled RNA quality requirements. Unsupervised consensus hierarchical clustering applied to gene expression data from a discovery subset of 443 CC samples identified six molecular subtypes. These subtypes were associated with distinct clinicopathological characteristics, molecular alterations, specific enrichments of supervised gene expression signatures (stem cell phenotype-like, normal-like, serrated CC phenotype-like, and deregulated signaling pathways. Based on their main biological characteristics, we distinguished a deficient mismatch repair subtype, a KRAS mutant subtype, a cancer stem cell subtype, and three chromosomal instability subtypes, including one associated with down-regulated immune pathways, one with up-regulation of the Wnt pathway, and one displaying a normal-like gene expression profile. The classification was validated in the remaining 123 samples plus an independent set of 1,058 CC samples, including eight public datasets. Furthermore, prognosis was analyzed in the subset of stage II-III CC samples. The subtypes C4 and C6, but not the subtypes C1, C2, C3, and C5, were independently associated with shorter relapse

  13. Gene identification and protein classification in microbial metagenomic sequence data via incremental clustering

    Directory of Open Access Journals (Sweden)

    Li Weizhong

    2008-04-01

    Full Text Available Abstract Background The identification and study of proteins from metagenomic datasets can shed light on the roles and interactions of the source organisms in their communities. However, metagenomic datasets are characterized by the presence of organisms with varying GC composition, codon usage biases etc., and consequently gene identification is challenging. The vast amount of sequence data also requires faster protein family classification tools. Results We present a computational improvement to a sequence clustering approach that we developed previously to identify and classify protein coding genes in large microbial metagenomic datasets. The clustering approach can be used to identify protein coding genes in prokaryotes, viruses, and intron-less eukaryotes. The computational improvement is based on an incremental clustering method that does not require the expensive all-against-all compute that was required by the original approach, while still preserving the remote homology detection capabilities. We present evaluations of the clustering approach in protein-coding gene identification and classification, and also present the results of updating the protein clusters from our previous work with recent genomic and metagenomic sequences. The clustering results are available via CAMERA, (http://camera.calit2.net. Conclusion The clustering paradigm is shown to be a very useful tool in the analysis of microbial metagenomic data. The incremental clustering method is shown to be much faster than the original approach in identifying genes, grouping sequences into existing protein families, and also identifying novel families that have multiple members in a metagenomic dataset. These clusters provide a basis for further studies of protein families.

  14. S1 gene-based phylogeny of infectious bronchitis virus: An attempt to harmonize virus classification.

    Science.gov (United States)

    Valastro, Viviana; Holmes, Edward C; Britton, Paul; Fusaro, Alice; Jackwood, Mark W; Cattoli, Giovanni; Monne, Isabella

    2016-04-01

    Infectious bronchitis virus (IBV) is the causative agent of a highly contagious disease that results in severe economic losses to the global poultry industry. The virus exists in a wide variety of genetically distinct viral types, and both phylogenetic analysis and measures of pairwise similarity among nucleotide or amino acid sequences have been used to classify IBV strains. However, there is currently no consensus on the method by which IBV sequences should be compared, and heterogeneous genetic group designations that are inconsistent with phylogenetic history have been adopted, leading to the confusing coexistence of multiple genotyping schemes. Herein, we propose a simple and repeatable phylogeny-based classification system combined with an unambiguous and rationale lineage nomenclature for the assignment of IBV strains. By using complete nucleotide sequences of the S1 gene we determined the phylogenetic structure of IBV, which in turn allowed us to define 6 genotypes that together comprise 32 distinct viral lineages and a number of inter-lineage recombinants. Because of extensive rate variation among IBVs, we suggest that the inference of phylogenetic relationships alone represents a more appropriate criterion for sequence classification than pairwise sequence comparisons. The adoption of an internationally accepted viral nomenclature is crucial for future studies of IBV epidemiology and evolution, and the classification scheme presented here can be updated and revised novel S1 sequences should become available. Copyright © 2016 Elsevier B.V. All rights reserved.

  15. Classification and Clinical Management of Variants of Uncertain Significance in High Penetrance Cancer Predisposition Genes.

    Science.gov (United States)

    Moghadasi, Setareh; Eccles, Diana M; Devilee, Peter; Vreeswijk, Maaike P G; van Asperen, Christi J

    2016-04-01

    In 2008, the International Agency for Research on Cancer (IARC) proposed a system for classifying sequence variants in highly penetrant breast and colon cancer susceptibility genes, linked to clinical actions. This system uses a multifactorial likelihood model to calculate the posterior probability that an altered DNA sequence is pathogenic. Variants between 5%-94.9% (class 3) are categorized as variants of uncertain significance (VUS). This interval is wide and might include variants with a substantial difference in pathogenicity at either end of the spectrum. We think that carriers of class 3 variants would benefit from a fine-tuning of this classification. Classification of VUS to a category with a defined clinical significance is very important because for carriers of a pathogenic mutation full surveillance and risk-reducing surgery can reduce cancer incidence. Counselees who are not carriers of a pathogenic mutation can be discharged from intensive follow-up and avoid unnecessary risk-reducing surgery. By means of examples, we show how, in selected cases, additional data can lead to reclassification of some variants to a different class with different recommendations for surveillance and therapy. To improve the clinical utility of this classification system, we suggest a pragmatic adaptation to clinical practice.

  16. A microarray gene expression data classification using hybrid back propagation neural network

    Directory of Open Access Journals (Sweden)

    Vimaladevi M.

    2014-01-01

    Full Text Available Classification of cancer establishes appropriate treatment and helps to decide the diagnosis. Cancer expands progressively from an alteration in a cell's genetic structure. This change (mutation results in cells with uncontrolled growth patterns. In cancer classification, the approach, Back propagation is sufficient and also it is a universal technique of training artificial neural networks. It is also called supervised learning method. It needs many dataset for input and output for making up the training set. The back propagation method may execute the function of collaborate multiple parties. In existing method, collaborative learning is limited and it considers only two parties. The proposed collaborative function can perform well and problems can be solved by utilizing the power of cloud computing. This technical note applies hybrid models of Back Propagation Neural networks (BPN and fast Genetic Algorithms (GA to estimate the feature selection in gene expression data. The proposed research work examines many feature selection algorithms which are “fragile”; that is, the superiority of their results varies broadly over data sets. By this research, it is suggested that this is due to higherorder interactions between features causing restricted minima in search space in which the algorithm becomes attentive. GAs may escape from such minima by chance, because works are highly stochastic. A neural net classifier with a genetic algorithm, using the GA to select features for classification by the neural net and incorporating the net as part of the objective function of the GA.

  17. Classification of microarrays; synergistic effects between normalization, gene selection and machine learning

    Science.gov (United States)

    2011-01-01

    Background Machine learning is a powerful approach for describing and predicting classes in microarray data. Although several comparative studies have investigated the relative performance of various machine learning methods, these often do not account for the fact that performance (e.g. error rate) is a result of a series of analysis steps of which the most important are data normalization, gene selection and machine learning. Results In this study, we used seven previously published cancer-related microarray data sets to compare the effects on classification performance of five normalization methods, three gene selection methods with 21 different numbers of selected genes and eight machine learning methods. Performance in term of error rate was rigorously estimated by repeatedly employing a double cross validation approach. Since performance varies greatly between data sets, we devised an analysis method that first compares methods within individual data sets and then visualizes the comparisons across data sets. We discovered both well performing individual methods and synergies between different methods. Conclusion Support Vector Machines with a radial basis kernel, linear kernel or polynomial kernel of degree 2 all performed consistently well across data sets. We show that there is a synergistic relationship between these methods and gene selection based on the T-test and the selection of a relatively high number of genes. Also, we find that these methods benefit significantly from using normalized data, although it is hard to draw general conclusions about the relative performance of different normalization procedures. PMID:21982277

  18. Gene classification using parameter-free semi-supervised manifold learning.

    Science.gov (United States)

    Huang, Hong; Feng, Hailiang

    2012-01-01

    A new manifold learning method, called parameter-free semi-supervised local Fisher discriminant analysis (pSELF), is proposed to map the gene expression data into a low-dimensional space for tumor classification. Motivated by the fact that semi-supervised and parameter-free are two desirable and promising characteristics for dimension reduction, a new difference-based optimization objective function with unlabeled samples has been designed. The proposed method preserves the global structure of unlabeled samples in addition to separating labeled samples in different classes from each other. The semi-supervised method has an analytic form of the globally optimal solution, which can be computed efficiently by eigen decomposition. Experimental results on synthetic data and SRBCT, DLBCL, and Brain Tumor gene expression data sets demonstrate the effectiveness of the proposed method.

  19. VERTICAL HEREDITY VS. HORIZONTAL GENE TRANSFER: A CHALLENGE TO BACTERIAL CLASSIFICATION

    Institute of Scientific and Technical Information of China (English)

    HAO Bailin; QI Ji

    2003-01-01

    The diversity and classification of microbes has been a long-standing issue. Molecular phylogeny of the prokaryotes based on comparison of the 16S rRNA sequences of the small ribosomal subunit has led to a reasonable tree of life in the late 1970s. However, the availability of more and more complete bacterial genomes has brought about complications instead of refinement of the tree. In particular, it turns out that different choice of genes may tell different history. This might be caused by possible horizontal gene transfer (HGT) among species. There is an urgent need to develop phylogenetic methods that make use of whole genome data. We describe a new approach in molecular phylogeny, namely, tree construction based on K-tuple frequency analysis of the genomic sequences. Putting aside the technicalities, we emphasize the transition from randomness to determinism when the string length K increases and try to comment on the challenge mentioned in the title.

  20. Incorporating rich background knowledge for gene named entity classification and recognition

    Directory of Open Access Journals (Sweden)

    Yang Zhihao

    2009-07-01

    Full Text Available Abstract Background Gene named entity classification and recognition are crucial preliminary steps of text mining in biomedical literature. Machine learning based methods have been used in this area with great success. In most state-of-the-art systems, elaborately designed lexical features, such as words, n-grams, and morphology patterns, have played a central part. However, this type of feature tends to cause extreme sparseness in feature space. As a result, out-of-vocabulary (OOV terms in the training data are not modeled well due to lack of information. Results We propose a general framework for gene named entity representation, called feature coupling generalization (FCG. The basic idea is to generate higher level features using term frequency and co-occurrence information of highly indicative features in huge amount of unlabeled data. We examine its performance in a named entity classification task, which is designed to remove non-gene entries in a large dictionary derived from online resources. The results show that new features generated by FCG outperform lexical features by 5.97 F-score and 10.85 for OOV terms. Also in this framework each extension yields significant improvements and the sparse lexical features can be transformed into both a lower dimensional and more informative representation. A forward maximum match method based on the refined dictionary produces an F-score of 86.2 on BioCreative 2 GM test set. Then we combined the dictionary with a conditional random field (CRF based gene mention tagger, achieving an F-score of 89.05, which improves the performance of the CRF-based tagger by 4.46 with little impact on the efficiency of the recognition system. A demo of the NER system is available at http://202.118.75.18:8080/bioner.

  1. ANMM4CBR: a case-based reasoning method for gene expression data classification.

    Science.gov (United States)

    Yao, Bangpeng; Li, Shao

    2010-01-06

    Accurate classification of microarray data is critical for successful clinical diagnosis and treatment. The "curse of dimensionality" problem and noise in the data, however, undermines the performance of many algorithms. In order to obtain a robust classifier, a novel Additive Nonparametric Margin Maximum for Case-Based Reasoning (ANMM4CBR) method is proposed in this article. ANMM4CBR employs a case-based reasoning (CBR) method for classification. CBR is a suitable paradigm for microarray analysis, where the rules that define the domain knowledge are difficult to obtain because usually only a small number of training samples are available. Moreover, in order to select the most informative genes, we propose to perform feature selection via additively optimizing a nonparametric margin maximum criterion, which is defined based on gene pre-selection and sample clustering. Our feature selection method is very robust to noise in the data. The effectiveness of our method is demonstrated on both simulated and real data sets. We show that the ANMM4CBR method performs better than some state-of-the-art methods such as support vector machine (SVM) and k nearest neighbor (kNN), especially when the data contains a high level of noise. The source code is attached as an additional file of this paper.

  2. ANMM4CBR: a case-based reasoning method for gene expression data classification

    Directory of Open Access Journals (Sweden)

    Li Shao

    2010-01-01

    Full Text Available Abstract Background Accurate classification of microarray data is critical for successful clinical diagnosis and treatment. The "curse of dimensionality" problem and noise in the data, however, undermines the performance of many algorithms. Method In order to obtain a robust classifier, a novel Additive Nonparametric Margin Maximum for Case-Based Reasoning (ANMM4CBR method is proposed in this article. ANMM4CBR employs a case-based reasoning (CBR method for classification. CBR is a suitable paradigm for microarray analysis, where the rules that define the domain knowledge are difficult to obtain because usually only a small number of training samples are available. Moreover, in order to select the most informative genes, we propose to perform feature selection via additively optimizing a nonparametric margin maximum criterion, which is defined based on gene pre-selection and sample clustering. Our feature selection method is very robust to noise in the data. Results The effectiveness of our method is demonstrated on both simulated and real data sets. We show that the ANMM4CBR method performs better than some state-of-the-art methods such as support vector machine (SVM and k nearest neighbor (kNN, especially when the data contains a high level of noise. Availability The source code is attached as an additional file of this paper.

  3. A Pathway Based Classification Method for Analyzing Gene Expression for Alzheimer’s Disease Diagnosis

    Science.gov (United States)

    Voyle, Nicola; Keohane, Aoife; Newhouse, Stephen; Lunnon, Katie; Johnston, Caroline; Soininen, Hilkka; Kloszewska, Iwona; Mecocci, Patrizia; Tsolaki, Magda; Vellas, Bruno; Lovestone, Simon; Hodges, Angela; Kiddle, Steven; Dobson, Richard JB.

    2015-01-01

    Background: Recent studies indicate that gene expression levels in blood may be able to differentiate subjects with Alzheimer’s disease (AD) from normal elderly controls and mild cognitively impaired (MCI) subjects. However, there is limited replicability at the single marker level. A pathway-based interpretation of gene expression may prove more robust. Objectives: This study aimed to investigate whether a case/control classification model built on pathway level data was more robust than a gene level model and may consequently perform better in test data. The study used two batches of gene expression data from the AddNeuroMed (ANM) and Dementia Case Registry (DCR) cohorts. Methods: Our study used Illumina Human HT-12 Expression BeadChips to collect gene expression from blood samples. Random forest modeling with recursive feature elimination was used to predict case/control status. Age and APOE ɛ4 status were used as covariates for all analysis. Results: Gene and pathway level models performed similarly to each other and to a model based on demographic information only. Conclusions: Any potential increase in concordance from the novel pathway level approach used here has not lead to a greater predictive ability in these datasets. However, we have only tested one method for creating pathway level scores. Further, we have been able to benchmark pathways against genes in datasets that had been extensively harmonized. Further work should focus on the use of alternative methods for creating pathway level scores, in particular those that incorporate pathway topology, and the use of an endophenotype based approach. PMID:26484910

  4. Gene selection for microarray cancer classification using a new evolutionary method employing artificial intelligence concepts.

    Science.gov (United States)

    Dashtban, M; Balafar, Mohammadali

    2017-03-01

    Gene selection is a demanding task for microarray data analysis. The diverse complexity of different cancers makes this issue still challenging. In this study, a novel evolutionary method based on genetic algorithms and artificial intelligence is proposed to identify predictive genes for cancer classification. A filter method was first applied to reduce the dimensionality of feature space followed by employing an integer-coded genetic algorithm with dynamic-length genotype, intelligent parameter settings, and modified operators. The algorithmic behaviors including convergence trends, mutation and crossover rate changes, and running time were studied, conceptually discussed, and shown to be coherent with literature findings. Two well-known filter methods, Laplacian and Fisher score, were examined considering similarities, the quality of selected genes, and their influences on the evolutionary approach. Several statistical tests concerning choice of classifier, choice of dataset, and choice of filter method were performed, and they revealed some significant differences between the performance of different classifiers and filter methods over datasets. The proposed method was benchmarked upon five popular high-dimensional cancer datasets; for each, top explored genes were reported. Comparing the experimental results with several state-of-the-art methods revealed that the proposed method outperforms previous methods in DLBCL dataset. Copyright © 2017 Elsevier Inc. All rights reserved.

  5. Minimal gene selection for classification and diagnosis prediction based on gene expression profile

    Directory of Open Access Journals (Sweden)

    Alireza Mehridehnavi

    2013-01-01

    Conclusion: We have shown that the use of two most significant genes based on their S/N ratios and selection of suitable training samples can lead to classify DLBCL patients with a rather good result. Actually with the aid of mentioned methods we could compensate lack of enough number of patients, improve accuracy of classifying and reduce complication of computations and so running time.

  6. Interval-value Based Particle Swarm Optimization algorithm for cancer-type specific gene selection and sample classification.

    Science.gov (United States)

    Ramyachitra, D; Sofia, M; Manikandan, P

    2015-09-01

    Microarray technology allows simultaneous measurement of the expression levels of thousands of genes within a biological tissue sample. The fundamental power of microarrays lies within the ability to conduct parallel surveys of gene expression using microarray data. The classification of tissue samples based on gene expression data is an important problem in medical diagnosis of diseases such as cancer. In gene expression data, the number of genes is usually very high compared to the number of data samples. Thus the difficulty that lies with data are of high dimensionality and the sample size is small. This research work addresses the problem by classifying resultant dataset using the existing algorithms such as Support Vector Machine (SVM), K-nearest neighbor (KNN), Interval Valued Classification (IVC) and the improvised Interval Value based Particle Swarm Optimization (IVPSO) algorithm. Thus the results show that the IVPSO algorithm outperformed compared with other algorithms under several performance evaluation functions.

  7. Interval-value Based Particle Swarm Optimization algorithm for cancer-type specific gene selection and sample classification

    Directory of Open Access Journals (Sweden)

    D. Ramyachitra

    2015-09-01

    Full Text Available Microarray technology allows simultaneous measurement of the expression levels of thousands of genes within a biological tissue sample. The fundamental power of microarrays lies within the ability to conduct parallel surveys of gene expression using microarray data. The classification of tissue samples based on gene expression data is an important problem in medical diagnosis of diseases such as cancer. In gene expression data, the number of genes is usually very high compared to the number of data samples. Thus the difficulty that lies with data are of high dimensionality and the sample size is small. This research work addresses the problem by classifying resultant dataset using the existing algorithms such as Support Vector Machine (SVM, K-nearest neighbor (KNN, Interval Valued Classification (IVC and the improvised Interval Value based Particle Swarm Optimization (IVPSO algorithm. Thus the results show that the IVPSO algorithm outperformed compared with other algorithms under several performance evaluation functions.

  8. A classification-based framework for predicting and analyzing gene regulatory response.

    Science.gov (United States)

    Kundaje, Anshul; Middendorf, Manuel; Shah, Mihir; Wiggins, Chris H; Freund, Yoav; Leslie, Christina

    2006-03-20

    We have recently introduced a predictive framework for studying gene transcriptional regulation in simpler organisms using a novel supervised learning algorithm called GeneClass. GeneClass is motivated by the hypothesis that in model organisms such as Saccharomyces cerevisiae, we can learn a decision rule for predicting whether a gene is up- or down-regulated in a particular microarray experiment based on the presence of binding site subsequences ("motifs") in the gene's regulatory region and the expression levels of regulators such as transcription factors in the experiment ("parents"). GeneClass formulates the learning task as a classification problem--predicting +1 and -1 labels corresponding to up- and down-regulation beyond the levels of biological and measurement noise in microarray measurements. Using the Adaboost algorithm, GeneClass learns a prediction function in the form of an alternating decision tree, a margin-based generalization of a decision tree. In the current work, we introduce a new, robust version of the GeneClass algorithm that increases stability and computational efficiency, yielding a more scalable and reliable predictive model. The improved stability of the prediction tree enables us to introduce a detailed post-processing framework for biological interpretation, including individual and group target gene analysis to reveal condition-specific regulation programs and to suggest signaling pathways. Robust GeneClass uses a novel stabilized variant of boosting that allows a set of correlated features, rather than single features, to be included at nodes of the tree; in this way, biologically important features that are correlated with the single best feature are retained rather than decorrelated and lost in the next round of boosting. Other computational developments include fast matrix computation of the loss function for all features, allowing scalability to large datasets, and the use of abstaining weak rules, which results in a more

  9. Kernel-imbedded Gaussian processes for disease classification using microarray gene expression data

    Directory of Open Access Journals (Sweden)

    Cheung Leo

    2007-02-01

    Full Text Available Abstract Background Designing appropriate machine learning methods for identifying genes that have a significant discriminating power for disease outcomes has become more and more important for our understanding of diseases at genomic level. Although many machine learning methods have been developed and applied to the area of microarray gene expression data analysis, the majority of them are based on linear models, which however are not necessarily appropriate for the underlying connection between the target disease and its associated explanatory genes. Linear model based methods usually also bring in false positive significant features more easily. Furthermore, linear model based algorithms often involve calculating the inverse of a matrix that is possibly singular when the number of potentially important genes is relatively large. This leads to problems of numerical instability. To overcome these limitations, a few non-linear methods have recently been introduced to the area. Many of the existing non-linear methods have a couple of critical problems, the model selection problem and the model parameter tuning problem, that remain unsolved or even untouched. In general, a unified framework that allows model parameters of both linear and non-linear models to be easily tuned is always preferred in real-world applications. Kernel-induced learning methods form a class of approaches that show promising potentials to achieve this goal. Results A hierarchical statistical model named kernel-imbedded Gaussian process (KIGP is developed under a unified Bayesian framework for binary disease classification problems using microarray gene expression data. In particular, based on a probit regression setting, an adaptive algorithm with a cascading structure is designed to find the appropriate kernel, to discover the potentially significant genes, and to make the optimal class prediction accordingly. A Gibbs sampler is built as the core of the algorithm to make

  10. Molecular classification of basal cell carcinoma of skin by gene expression profiling.

    Science.gov (United States)

    Jee, Byul A; Lim, Hyoseob; Kwon, So Mee; Jo, Yuna; Park, Myong Chul; Lee, Il Jae; Woo, Hyun Goo

    2015-12-01

    Non-melanoma skin cancers (NMSC) including basal cell carcinoma (BCC) and squamous cell carcinoma (SCC) are more common kinds of skin cancer. Although these tumors share common pathological and clinical features, their similarity and heterogeneity at molecular levels are not fully elaborated yet. Here, by performing comparative analysis of gene expression profiling of BCC, SCC, and normal skin tissues, we could classify the BCC into three subtypes of classical, SCC-like, and normal-like BCCs. Functional enrichment and pathway analyses revealed the molecular characteristics of each subtype. The classical BCC showed the enriched expression and transcription signature with the activation of Wnt and Hedgehog signaling pathways, which were well known key features of BCC. By contrast, the SCC-like BCC was enriched with immune-response genes and oxidative stress-related genes. Network analysis revealed the PLAU/PLAUR as a key regulator of SCC-like BCC. The normal-like BCC showed prominent activation of metabolic processes particularly the fatty acid metabolism. The existence of these molecular subtypes could be validated in an independent dataset, which demonstrated the three subgroups of BCC with distinct functional enrichment. In conclusion, we suggest a novel molecular classification of BCC providing insights on the heterogeneous progression of BCC.

  11. A jackknife-like method for classification and uncertainty assessment of multi-category tumor samples using gene expression information

    Directory of Open Access Journals (Sweden)

    Bertrand Keith

    2010-04-01

    Full Text Available Abstract Background The use of gene expression profiling for the classification of human cancer tumors has been widely investigated. Previous studies were successful in distinguishing several tumor types in binary problems. As there are over a hundred types of cancers, and potentially even more subtypes, it is essential to develop multi-category methodologies for molecular classification for any meaningful practical application. Results A jackknife-based supervised learning method called paired-samples test algorithm (PST, coupled with a binary classification model based on linear regression, was proposed and applied to two well known and challenging datasets consisting of 14 (GCM dataset and 9 (NC160 dataset tumor types. The results showed that the proposed method improved the prediction accuracy of the test samples for the GCM dataset, especially when t-statistic was used in the primary feature selection. For the NCI60 dataset, the application of PST improved prediction accuracy when the numbers of used genes were relatively small (100 or 200. These improvements made the binary classification method more robust to the gene selection mechanism and the size of genes to be used. The overall prediction accuracies were competitive in comparison to the most accurate results obtained by several previous studies on the same datasets and with other methods. Furthermore, the relative confidence R(T provided a unique insight into the sources of the uncertainty shown in the statistical classification and the potential variants within the same tumor type. Conclusion We proposed a novel bagging method for the classification and uncertainty assessment of multi-category tumor samples using gene expression information. The strengths were demonstrated in the application to two bench datasets.

  12. Methods for Determining the Statistical Significance of Enrichment or Depletion of Gene Ontology Classifications under Weighted Membership

    Directory of Open Access Journals (Sweden)

    Ernesto eIacucci

    2012-02-01

    Full Text Available High-throughput molecular biology studies, such as microarray assays of gene expression, two-hybrid experiments for detecting protein interactions, or ChIP-Seq experiments for transcription factor binding, often result in an interesting set of genes—say, genes that are co-expressed or bound by the same factor. One way of understanding the biological meaning of such a set is to consider what processes or functions, as defined in an ontology, are over-represented (enriched or under-represented (depleted among genes in the set. Usually, the significance of enrichment or depletion scores is based on simple statistical models and on the membership of genes in different classifications. We consider the more general problem of computing p-values for arbitrary integer additive statistics, or weighted membership functions. Such membership functions can be used to represent, for example, prior knowledge on the role of certain genes or classifications, differential importance of different classifications or genes to the experimenter, hierarchical relationships between classifications, or different degrees of interestingness or evidence for specific genes. We describe a generic dynamic programming algorithm that can compute exact p-values for arbitrary integer additive statistics. We also describe several optimizations for important special cases, which can provide orders-of-magnitude speed up in the computations. We apply our methods to datasets describing oxidative phosphorylation and parturition and compare p-values based on computations of several different statistics for measuring enrichment. We find major differences between p-values resulting from these statistics, and that some statistics recover gold standard annotations of the data better than others. Our work establishes a theoretical and algorithmic basis for far richer notions of enrichment or depletion of gene sets with respect to gene ontologies than has previously been available.

  13. Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks | Center for Cancer Research

    Science.gov (United States)

    The purpose of this study was to develop a method of classifying cancers to specific diagnostic categories based on their gene expression signatures using artificial neural networks (ANNs). We trained the ANNs using the small, round blue-cell tumors (SRBCTs) as a model. These cancers belong to four distinct diagnostic categories and often present diagnostic dilemmas in clinical practice. The ANNs correctly classified all samples and identified the genes most relevant to the classification.

  14. Gene function classification using Bayesian models with hierarchy-based priors

    Directory of Open Access Journals (Sweden)

    Neal Radford M

    2006-10-01

    Full Text Available Abstract Background We investigate whether annotation of gene function can be improved using a classification scheme that is aware that functional classes are organized in a hierarchy. The classifiers look at phylogenic descriptors, sequence based attributes, and predicted secondary structure. We discuss three Bayesian models and compare their performance in terms of predictive accuracy. These models are the ordinary multinomial logit (MNL model, a hierarchical model based on a set of nested MNL models, and an MNL model with a prior that introduces correlations between the parameters for classes that are nearby in the hierarchy. We also provide a new scheme for combining different sources of information. We use these models to predict the functional class of Open Reading Frames (ORFs from the E. coli genome. Results The results from all three models show substantial improvement over previous methods, which were based on the C5 decision tree algorithm. The MNL model using a prior based on the hierarchy outperforms both the non-hierarchical MNL model and the nested MNL model. In contrast to previous attempts at combining the three sources of information in this dataset, our new approach to combining data sources produces a higher accuracy rate than applying our models to each data source alone. Conclusion Together, these results show that gene function can be predicted with higher accuracy than previously achieved, using Bayesian models that incorporate suitable prior information.

  15. Comparison of two approaches for the classification of 16S rRNA gene sequences.

    Science.gov (United States)

    Chatellier, Sonia; Mugnier, Nathalie; Allard, Françoise; Bonnaud, Bertrand; Collin, Valérie; van Belkum, Alex; Veyrieras, Jean-Baptiste; Emler, Stefan

    2014-10-01

    The use of 16S rRNA gene sequences for microbial identification in clinical microbiology is accepted widely, and requires databases and algorithms. We compared a new research database containing curated 16S rRNA gene sequences in combination with the lca (lowest common ancestor) algorithm (RDB-LCA) to a commercially available 16S rDNA Centroid approach. We used 1025 bacterial isolates characterized by biochemistry, matrix-assisted laser desorption/ionization time-of-flight MS and 16S rDNA sequencing. Nearly 80 % of isolates were identified unambiguously at the species level by both classification platforms used. The remaining isolates were mostly identified correctly at the genus level due to the limited resolution of 16S rDNA sequencing. Discrepancies between both 16S rDNA platforms were due to differences in database content and the algorithm used, and could amount to up to 10.5 %. Up to 1.4 % of the analyses were found to be inconclusive. It is important to realize that despite the overall good performance of the pipelines for analysis, some inconclusive results remain that require additional in-depth analysis performed using supplementary methods.

  16. Colored petri nets to model gene mutation and amino acids classification.

    Science.gov (United States)

    Yang, Jinliang; Gao, Rui; Meng, Max Q-H; Tarn, Tzyh-Jong

    2012-05-07

    The genetic code is the triplet code based on the three-letter codons, which determines the specific amino acid sequences in proteins synthesis. Choosing an appropriate model for processing these codons is a useful method to study genetic processes in Molecular Biology. As an effective modeling tool of discrete event dynamic systems (DEDS), colored petri net (CPN) has been used for modeling several biological systems, such as metabolic pathways and genetic regulatory networks. According to the genetic code table, CPN is employed to model the process of genetic information transmission. In this paper, we propose a CPN model of amino acids classification, and further present the improved CPN model. Based on the model mentioned above, we give another CPN model to classify the type of gene mutations via contrasting the bases of DNA strands and the codons of amino acids along the polypeptide chain. This model is helpful in determining whether a certain gene mutation will cause the changes of the structures and functions of protein molecules. The effectiveness and accuracy of the presented model are illustrated by the examples in this paper.

  17. GeneSrF and varSelRF: a web-based tool and R package for gene selection and classification using random forest

    Directory of Open Access Journals (Sweden)

    Diaz-Uriarte Ramón

    2007-09-01

    Full Text Available Abstract Background Microarray data are often used for patient classification and gene selection. An appropriate tool for end users and biomedical researchers should combine user friendliness with statistical rigor, including carefully avoiding selection biases and allowing analysis of multiple solutions, together with access to additional functional information of selected genes. Methodologically, such a tool would be of greater use if it incorporates state-of-the-art computational approaches and makes source code available. Results We have developed GeneSrF, a web-based tool, and varSelRF, an R package, that implement, in the context of patient classification, a validated method for selecting very small sets of genes while preserving classification accuracy. Computation is parallelized, allowing to take advantage of multicore CPUs and clusters of workstations. Output includes bootstrapped estimates of prediction error rate, and assessments of the stability of the solutions. Clickable tables link to additional information for each gene (GO terms, PubMed citations, KEGG pathways, and output can be sent to PaLS for examination of PubMed references, GO terms, KEGG and and Reactome pathways characteristic of sets of genes selected for class prediction. The full source code is available, allowing to extend the software. The web-based application is available from http://genesrf2.bioinfo.cnio.es. All source code is available from Bioinformatics.org or The Launchpad. The R package is also available from CRAN. Conclusion varSelRF and GeneSrF implement a validated method for gene selection including bootstrap estimates of classification error rate. They are valuable tools for applied biomedical researchers, specially for exploratory work with microarray data. Because of the underlying technology used (combination of parallelization with web-based application they are also of methodological interest to bioinformaticians and biostatisticians.

  18. Swarm Intelligence Approach Based on Adaptive ELM Classifier with ICGA Selection for Microarray Gene Expression and Cancer Classification

    Directory of Open Access Journals (Sweden)

    T. Karthikeyan

    2014-05-01

    Full Text Available The aim of this research study is based on efficient gene selection and classification of microarray data analysis using hybrid machine learning algorithms. The beginning of microarray technology has enabled the researchers to quickly measure the position of thousands of genes expressed in an organic/biological tissue samples in a solitary experiment. One of the important applications of this microarray technology is to classify the tissue samples using their gene expression representation, identify numerous type of cancer. Cancer is a group of diseases in which a set of cells shows uncontrolled growth, instance that interrupts upon and destroys nearby tissues and spreading to other locations in the body via lymph or blood. Cancer has becomes a one of the major important disease in current scenario. DNA microarrays turn out to be an effectual tool utilized in molecular biology and cancer diagnosis. Microarrays can be measured to establish the relative quantity of mRNAs in two or additional organic/biological tissue samples for thousands/several thousands of genes at the same time. As the superiority of this technique become exactly analysis/identifying the suitable assessment of microarray data in various open issues. In the field of medical sciences multi-category cancer classification play a major important role to classify the cancer types according to the gene expression. The need of the cancer classification has been become indispensible, because the numbers of cancer victims are increasing steadily identified by recent years. To perform this proposed a combination of Integer-Coded Genetic Algorithm (ICGA and Artificial Bee Colony algorithm (ABC, coupled with an Adaptive Extreme Learning Machine (AELM, is used for gene selection and cancer classification. ICGA is used with ABC based AELM classifier to chose an optimal set of genes which results in an efficient hybrid algorithm that can handle sparse data and sample imbalance. The

  19. Clinical classification and gene mutation of Chinese probands with Charcot-Marie-Tooth disease Analysis of 57 cases

    Institute of Scientific and Technical Information of China (English)

    Ruxu Zhang; Xiaobo Li; Xiaohong Zi; Shunxiang Huang; Fufeng Zhang; Kun Xia; Qian Pan; Beisha Tang

    2011-01-01

    Charcot-Mafie-Tooth (CMT) disease is the most common inherited peripheral neuropathic disorder.CMT is clinically and genetically heterogeneous. To date, 27 genes associated with the disease have been cloned. The present study carried out clinical classification according to clinical,electrophysiological and pathological features, conducted inheritance classification according to inheritance patterns, and performed mutation analysis of 13 CMT disease genes (PMP22, CX32,HSPB1, MNF2, MPZ, HSPB8, GDAP1, NFL, EGR2, SIMPLE, RAB7, LMNA, MTMR2) in 57 Chinese probands with CMT. Five cases of AD-CMT1 and 13 cases of sporadic CMT1 were diagnosed as CMT1A; five cases of X-CMT1, one case of X-CMT2 and one case of sporadic CMT1 were diagnosed as CMTX1; four cases of AD-CMT2 were diagnosed as CMT2F; one case of AD-CMT2 and one case of sporadic CMT2 were diagnosed as CMT2A2; one case of AD-CMT2 was diagnosed as CMT2L; one case of AD-CMT2 was diagnosed as CMT2J; one case of AR-CMT1 was diagnosed as CMT4A. Among the 57 CMT probands, seven genotypes were determined among 34 patients, with a detection rate of 59.6%. The results indicated that the clinical classification and inheritance classification are indispensable for selecting potential disease genes for mutation detection, and for efficient molecular diagnosis.

  20. Temporal expression-based analysis of metabolism.

    Directory of Open Access Journals (Sweden)

    Sara B Collins

    Full Text Available Metabolic flux is frequently rerouted through cellular metabolism in response to dynamic changes in the intra- and extra-cellular environment. Capturing the mechanisms underlying these metabolic transitions in quantitative and predictive models is a prominent challenge in systems biology. Progress in this regard has been made by integrating high-throughput gene expression data into genome-scale stoichiometric models of metabolism. Here, we extend previous approaches to perform a Temporal Expression-based Analysis of Metabolism (TEAM. We apply TEAM to understanding the complex metabolic dynamics of the respiratorily versatile bacterium Shewanella oneidensis grown under aerobic, lactate-limited conditions. TEAM predicts temporal metabolic flux distributions using time-series gene expression data. Increased predictive power is achieved by supplementing these data with a large reference compendium of gene expression, which allows us to take into account the unique character of the distribution of expression of each individual gene. We further propose a straightforward method for studying the sensitivity of TEAM to changes in its fundamental free threshold parameter θ, and reveal that discrete zones of distinct metabolic behavior arise as this parameter is changed. By comparing the qualitative characteristics of these zones to additional experimental data, we are able to constrain the range of θ to a small, well-defined interval. In parallel, the sensitivity analysis reveals the inherently difficult nature of dynamic metabolic flux modeling: small errors early in the simulation propagate to relatively large changes later in the simulation. We expect that handling such "history-dependent" sensitivities will be a major challenge in the future development of dynamic metabolic-modeling techniques.

  1. Gene Expression Profiles for Predicting Metastasis in Breast Cancer: A Cross-Study Comparison of Classification Methods

    Directory of Open Access Journals (Sweden)

    Mark Burton

    2012-01-01

    Full Text Available Machine learning has increasingly been used with microarray gene expression data and for the development of classifiers using a variety of methods. However, method comparisons in cross-study datasets are very scarce. This study compares the performance of seven classification methods and the effect of voting for predicting metastasis outcome in breast cancer patients, in three situations: within the same dataset or across datasets on similar or dissimilar microarray platforms. Combining classification results from seven classifiers into one voting decision performed significantly better during internal validation as well as external validation in similar microarray platforms than the underlying classification methods. When validating between different microarray platforms, random forest, another voting-based method, proved to be the best performing method. We conclude that voting based classifiers provided an advantage with respect to classifying metastasis outcome in breast cancer patients.

  2. A novel volume-age-KPS (VAK glioblastoma classification identifies a prognostic cognate microRNA-gene signature.

    Directory of Open Access Journals (Sweden)

    Pascal O Zinn

    Full Text Available BACKGROUND: Several studies have established Glioblastoma Multiforme (GBM prognostic and predictive models based on age and Karnofsky Performance Status (KPS, while very few studies evaluated the prognostic and predictive significance of preoperative MR-imaging. However, to date, there is no simple preoperative GBM classification that also correlates with a highly prognostic genomic signature. Thus, we present for the first time a biologically relevant, and clinically applicable tumor Volume, patient Age, and KPS (VAK GBM classification that can easily and non-invasively be determined upon patient admission. METHODS: We quantitatively analyzed the volumes of 78 GBM patient MRIs present in The Cancer Imaging Archive (TCIA corresponding to patients in The Cancer Genome Atlas (TCGA with VAK annotation. The variables were then combined using a simple 3-point scoring system to form the VAK classification. A validation set (N = 64 from both the TCGA and Rembrandt databases was used to confirm the classification. Transcription factor and genomic correlations were performed using the gene pattern suite and Ingenuity Pathway Analysis. RESULTS: VAK-A and VAK-B classes showed significant median survival differences in discovery (P = 0.007 and validation sets (P = 0.008. VAK-A is significantly associated with P53 activation, while VAK-B shows significant P53 inhibition. Furthermore, a molecular gene signature comprised of a total of 25 genes and microRNAs was significantly associated with the classes and predicted survival in an independent validation set (P = 0.001. A favorable MGMT promoter methylation status resulted in a 10.5 months additional survival benefit for VAK-A compared to VAK-B patients. CONCLUSIONS: The non-invasively determined VAK classification with its implication of VAK-specific molecular regulatory networks, can serve as a very robust initial prognostic tool, clinical trial selection criteria, and important step toward

  3. Xenolog classification.

    Science.gov (United States)

    Darby, Charlotte A; Stolzer, Maureen; Ropp, Patrick J; Barker, Daniel; Durand, Dannie

    2017-03-01

    Orthology analysis is a fundamental tool in comparative genomics. Sophisticated methods have been developed to distinguish between orthologs and paralogs and to classify paralogs into subtypes depending on the duplication mechanism and timing, relative to speciation. However, no comparable framework exists for xenologs: gene pairs whose history, since their divergence, includes a horizontal transfer. Further, the diversity of gene pairs that meet this broad definition calls for classification of xenologs with similar properties into subtypes. We present a xenolog classification that uses phylogenetic reconciliation to assign each pair of genes to a class based on the event responsible for their divergence and the historical association between genes and species. Our classes distinguish between genes related through transfer alone and genes related through duplication and transfer. Further, they separate closely-related genes in distantly-related species from distantly-related genes in closely-related species. We present formal rules that assign gene pairs to specific xenolog classes, given a reconciled gene tree with an arbitrary number of duplications and transfers. These xenology classification rules have been implemented in software and tested on a collection of ∼13 000 prokaryotic gene families. In addition, we present a case study demonstrating the connection between xenolog classification and gene function prediction. The xenolog classification rules have been implemented in N otung 2.9, a freely available phylogenetic reconciliation software package. http://www.cs.cmu.edu/~durand/Notung . Gene trees are available at http://dx.doi.org/10.7488/ds/1503 . durand@cmu.edu. Supplementary data are available at Bioinformatics online.

  4. MIDClass: microarray data classification by association rules and gene expression intervals.

    Directory of Open Access Journals (Sweden)

    Rosalba Giugno

    Full Text Available We present a new classification method for expression profiling data, called MIDClass (Microarray Interval Discriminant CLASSifier, based on association rules. It classifies expressions profiles exploiting the idea that the transcript expression intervals better discriminate subtypes in the same class. A wide experimental analysis shows the effectiveness of MIDClass compared to the most prominent classification approaches.

  5. Cell of origin associated classification of B-cell malignancies by gene signatures of the normal B-cell hierarchy.

    Science.gov (United States)

    Johnsen, Hans Erik; Bergkvist, Kim Steve; Schmitz, Alexander; Kjeldsen, Malene Krag; Hansen, Steen Møller; Gaihede, Michael; Nørgaard, Martin Agge; Bæch, John; Grønholdt, Marie-Louise; Jensen, Frank Svendsen; Johansen, Preben; Bødker, Julie Støve; Bøgsted, Martin; Dybkær, Karen

    2014-06-01

    Recent findings have suggested biological classification of B-cell malignancies as exemplified by the "activated B-cell-like" (ABC), the "germinal-center B-cell-like" (GCB) and primary mediastinal B-cell lymphoma (PMBL) subtypes of diffuse large B-cell lymphoma and "recurrent translocation and cyclin D" (TC) classification of multiple myeloma. Biological classification of B-cell derived cancers may be refined by a direct and systematic strategy where identification and characterization of normal B-cell differentiation subsets are used to define the cancer cell of origin phenotype. Here we propose a strategy combining multiparametric flow cytometry, global gene expression profiling and biostatistical modeling to generate B-cell subset specific gene signatures from sorted normal human immature, naive, germinal centrocytes and centroblasts, post-germinal memory B-cells, plasmablasts and plasma cells from available lymphoid tissues including lymph nodes, tonsils, thymus, peripheral blood and bone marrow. This strategy will provide an accurate image of the stage of differentiation, which prospectively can be used to classify any B-cell malignancy and eventually purify tumor cells. This report briefly describes the current models of the normal B-cell subset differentiation in multiple tissues and the pathogenesis of malignancies originating from the normal germinal B-cell hierarchy.

  6. Statistical Redundancy Testing for Improved Gene Selection in Cancer Classification Using Microarray Data

    Directory of Open Access Journals (Sweden)

    J. Sunil Rao

    2007-01-01

    Full Text Available In gene selection for cancer classifi cation using microarray data, we define an eigenvalue-ratio statistic to measure a gene’s contribution to the joint discriminability when this gene is included into a set of genes. Based on this eigenvalueratio statistic, we define a novel hypothesis testing for gene statistical redundancy and propose two gene selection methods. Simulation studies illustrate the agreement between statistical redundancy testing and gene selection methods. Real data examples show the proposed gene selection methods can select a compact gene subset which can not only be used to build high quality cancer classifiers but also show biological relevance.

  7. A supervised learning approach for taxonomic classification of core-photosystem-II genes and transcripts in the marine environment

    Directory of Open Access Journals (Sweden)

    Polz Martin F

    2009-05-01

    Full Text Available Abstract Background Cyanobacteria of the genera Synechococcus and Prochlorococcus play a key role in marine photosynthesis, which contributes to the global carbon cycle and to the world oxygen supply. Recently, genes encoding the photosystem II reaction center (psbA and psbD were found in cyanophage genomes. This phenomenon suggested that the horizontal transfer of these genes may be involved in increasing phage fitness. To date, a very small percentage of marine bacteria and phages has been cultured. Thus, mapping genomic data extracted directly from the environment to its taxonomic origin is necessary for a better understanding of phage-host relationships and dynamics. Results To achieve an accurate and rapid taxonomic classification, we employed a computational approach combining a multi-class Support Vector Machine (SVM with a codon usage position specific scoring matrix (cuPSSM. Our method has been applied successfully to classify core-photosystem-II gene fragments, including partial sequences coming directly from the ocean, to seven different taxonomic classes. Applying the method on a large set of DNA and RNA psbA clones from the Mediterranean Sea, we studied the distribution of cyanobacterial psbA genes and transcripts in their natural environment. Using our approach, we were able to simultaneously examine taxonomic and ecological distributions in the marine environment. Conclusion The ability to accurately classify the origin of individual genes and transcripts coming directly from the environment is of great importance in studying marine ecology. The classification method presented in this paper could be applied further to classify other genes amplified from the environment, for which training data is available.

  8. Informative Gene Selection and Direct Classification of Tumor Based on Chi-Square Test of Pairwise Gene Interactions

    Directory of Open Access Journals (Sweden)

    Hongyan Zhang

    2014-01-01

    Full Text Available In efforts to discover disease mechanisms and improve clinical diagnosis of tumors, it is useful to mine profiles for informative genes with definite biological meanings and to build robust classifiers with high precision. In this study, we developed a new method for tumor-gene selection, the Chi-square test-based integrated rank gene and direct classifier (χ2-IRG-DC. First, we obtained the weighted integrated rank of gene importance from chi-square tests of single and pairwise gene interactions. Then, we sequentially introduced the ranked genes and removed redundant genes by using leave-one-out cross-validation of the chi-square test-based Direct Classifier (χ2-DC within the training set to obtain informative genes. Finally, we determined the accuracy of independent test data by utilizing the genes obtained above with χ2-DC. Furthermore, we analyzed the robustness of χ2-IRG-DC by comparing the generalization performance of different models, the efficiency of different feature-selection methods, and the accuracy of different classifiers. An independent test of ten multiclass tumor gene-expression datasets showed that χ2-IRG-DC could efficiently control overfitting and had higher generalization performance. The informative genes selected by χ2-IRG-DC could dramatically improve the independent test precision of other classifiers; meanwhile, the informative genes selected by other feature selection methods also had good performance in χ2-DC.

  9. A Classification Technique for Microarray Gene Expression Data using PSO-FLANN

    Directory of Open Access Journals (Sweden)

    Jayashree Dev

    2012-09-01

    Full Text Available Despite of an increased global effort to end breast cancer, it continues to be most common cancer deaths in women. This problem reminds that new therapeutic approaches are desperately neededto improve patient survival rate. This requires proper diagnosis of disease and classification of tumor type based on genomic information according to which proper treatment can be provided to the patient.There exists a no. of classification techniques to classify the tumor types. In this paper we have focused on three different classification techniques: BPN, FLANN and PSO-FLANN and found that the integrated approach of Functional Link Artificial Neural Network (FLANN and Particle Swarm Optimization (PSO can better predict the disease as compared to other method.

  10. Constructing the gene regulation-level representation of microarray data for cancer classification.

    Science.gov (United States)

    Wong, Hau-San; Wang, Hong-Qiang

    2008-02-01

    In this paper, we propose a regulation-level representation for microarray data and optimize it using genetic algorithms (GAs) for cancer classification. Compared with the traditional expression-level features, this representation can greatly reduce the dimensionality of microarray data and accommodate noise and variability such that many statistical machine-learning methods now become applicable and efficient for cancer classification. Experimental results on real-world microarray datasets show that the regulation-level representation can consistently converge at a solution with three regulation levels. This verifies the existence of the three regulation levels (up-regulation, down-regulation and non-significant regulation) associated with a particular biological phenotype. The ternary regulation-level representation not only improves the cancer classification capability but also facilitates the visualization of microarray data.

  11. Medusa structure of the gene regulatory network: dominance of transcription factors in cancer subtype classification.

    Science.gov (United States)

    Guo, Yuchun; Feng, Ying; Trivedi, Niraj S; Huang, Sui

    2011-05-01

    Gene expression profiles consisting of ten thousands of transcripts are used for clustering of tissue, such as tumors, into subtypes, often without considering the underlying reason that the distinct patterns of expression arise because of constraints in the realization of gene expression profiles imposed by the gene regulatory network. The topology of this network has been suggested to consist of a regulatory core of genes represented most prominently by transcription factors (TFs) and microRNAs, that influence the expression of other genes, and of a periphery of 'enslaved' effector genes that are regulated but not regulating. This 'medusa' architecture implies that the core genes are much stronger determinants of the realized gene expression profiles. To test this hypothesis, we examined the clustering of gene expression profiles into known tumor types to quantitatively demonstrate that TFs, and even more pronounced, microRNAs, are much stronger discriminators of tumor type specific gene expression patterns than a same number of randomly selected or metabolic genes. These findings lend support to the hypothesis of a medusa architecture and of the canalizing nature of regulation by microRNAs. They also reveal the degree of freedom for the expression of peripheral genes that are less stringently associated with a tissue type specific global gene expression profile.

  12. Gene Structures, Classification, and Expression Models of the DREB Transcription Factor Subfamily in Populus trichocarpa

    Directory of Open Access Journals (Sweden)

    Yunlin Chen

    2013-01-01

    Full Text Available We identified 75 dehydration-responsive element-binding (DREB protein genes in Populus trichocarpa. We analyzed gene structures, phylogenies, domain duplications, genome localizations, and expression profiles. The phylogenic construction suggests that the PtrDREB gene subfamily can be classified broadly into six subtypes (DREB A-1 to A-6 in Populus. The chromosomal localizations of the PtrDREB genes indicated 18 segmental duplication events involving 36 genes and six redundant PtrDREB genes were involved in tandem duplication events. There were fewer introns in the PtrDREB subfamily. The motif composition of PtrDREB was highly conserved in the same subtype. We investigated expression profiles of this gene subfamily from different tissues and/or developmental stages. Sixteen genes present in the digital expression analysis had high levels of transcript accumulation. The microarray results suggest that 18 genes were upregulated. We further examined the stress responsiveness of 15 genes by qRT-PCR. A digital northern analysis showed that the PtrDREB17, 18, and 32 genes were highly induced in leaves under cold stress, and the same expression trends were shown by qRT-PCR. Taken together, these observations may lay the foundation for future functional analyses to unravel the biological roles of Populus’ DREB genes.

  13. Link between confusional migraine, hemiplegic migraine and episodic ataxia type 2: hypothesis, family genealogy, gene typing and classification.

    Science.gov (United States)

    Cleves, C; Parikh, S; Rothner, A D; Tepper, S J

    2010-06-01

    An association between hemiplegic migraine (HM) and episodic ataxia type 2 (EA2) has been described; both disorders are linked to mutations in the CACNA1A gene. Although confusion occurs in 21% of patients with HM, we found only one case in the literature of confusional episodes associated with ataxia without hemiplegia. These findings raise the possibility of confusional episodes being part of both the HM and EA2 phenotype. However, a patient with episodic ataxia, confusional spells and CACNA1A gene mutations has not been identified. We describe four individuals, spanning three generations of a family, with episodic ataxia without hemiplegia and confusion, in association with a CACNA1A mutation. We follow with a description of the relationship between the CACNA1A mutations and the three syndromes, suggesting a potential need for a new classification in which the conditions can be subsumed.

  14. Genomewide identification, classification and analysis of NAC type gene family in maize

    Indian Academy of Sciences (India)

    Xiaojian Peng; Yang Zhao; Xiaoming Li; Min Wu; Wenbo Chai; Lei Sheng; Yu Wang; Qing Dong; Haiyang Jiang; Beijiu Cheng

    2015-09-01

    NAC transcription factors comprise a large plant-specific gene family. Increasing evidence suggests that members of this family have diverse functions in plant growth and development. In this study, we performed a genomewide survey of NAC type genes in maize (Zea mays L.). A complete set of 148 nonredundant NAC genes (ZmNAC1–ZmNAC148) were identified in the maize genome using Blast search tools, and divided into 12 groups (a–l) based on phylogeny. Chromosomal location of these genes revealed that they are distributed unevenly across all 10 chromosomes. Segmental and tandem duplication contributed largely to the expansion of the maize NAC gene family. The a/s ratio suggested that the duplicated genes of maize NAC family mainly experienced purifying selection, with limited functional divergence after duplication events. Microarray analysis indicated most of the maize NAC genes were expressed across different developmental stages. Moreover, 19 maize NAC genes grouped with published stress-responsive genes from other plants were found to contain putative stress-responsive cis-elements in their promoter regions. All these stress-responsive genes belonged to the group d (stress-related). Further, these genes showed differential expression patterns over time in response to drought treatments by quantitative real-time PCR analysis. Our results reveal a comprehensive overview of the maize NAC, and form the foundation for future functional research to uncover their roles in maize growth and development.

  15. Annotation and classification of the bovine T cell receptor delta genes.

    Science.gov (United States)

    Herzig, Carolyn T A; Lefranc, Marie-Paule; Baldwin, Cynthia L

    2010-02-09

    gammadelta T cells differ from alphabeta T cells with regard to the types of antigen with which their T cell receptors interact; gammadelta T cell antigens are not necessarily peptides nor are they presented on MHC. Cattle are considered a "gammadelta T cell high" species indicating they have an increased proportion of gammadelta T cells in circulation relative to that in "gammadelta T cell low" species such as humans and mice. Prior to the onset of the studies described here, there was limited information regarding the genes that code for the T cell receptor delta chains of this gammadelta T cell high species. By annotating the bovine (Bos taurus) genome Btau_3.1 assembly the presence of 56 distinct T cell receptor delta (TRD) variable (V) genes were found, 52 of which belong to the TRDV1 subgroup and were co-mingled with the T cell receptor alpha variable (TRAV) genes. In addition, two genes belonging to the TRDV2 subgroup and single TRDV3 and TRDV4 genes were found. We confirmed the presence of five diversity (D) genes, three junctional (J) genes and a single constant (C) gene and describe the organization of the TRD locus. The TRDV4 gene is found downstream of the C gene and in an inverted orientation of transcription, consistent with its orthologs in humans and mice. cDNA evidence was assessed to validate expression of the variable genes and showed that one to five D genes could be incorporated into a single transcript. Finally, we grouped the bovine and ovine TRDV1 genes into sets based on their relatedness. The bovine genome contains a large and diverse repertoire of TRD genes when compared to the genomes of "gammadelta T cell low" species. This suggests that in cattle gammadelta T cells play a more important role in immune function since they would be predicted to bind a greater variety of antigens.

  16. A Gene Selection Approach based on Clustering for Classification Tasks in Colon Cancer

    Directory of Open Access Journals (Sweden)

    José Antonio CASTELLANOS GARZÓN

    2016-06-01

    Full Text Available Gene selection (GS is an important research area in the analysis of DNA-microarray data, since it involves gene discovery meaningful for a particular target annotation or able to discriminate expression profiles of samples coming from different populations. In this context, a wide number of filter methods have been proposed in the literature to identify subsets of relevant genes in accordance with prefixed targets. Despite the fact that there is a wide number of proposals, the complexity imposed by this problem (GS remains a challenge. Hence, this paper proposes a novel approach for gene selection by using cluster techniques and filter methods on the found groupings to achieve informative gene subsets. As a result of applying our methodology to Colon cancer data, we have identified the best informative gene subset between several one subsets. According to the above, the reached results have proven the reliability of the approach given in this paper.

  17. Gene expression in the urinary bladder: a common carcinoma in situ gene expression signature exists disregarding histopathological classification

    DEFF Research Database (Denmark)

    Andersen, Lars Dyrskjøt; Kruhøffer, Mogens; Andersen, Thomas Thykjær

    2004-01-01

    The presence of carcinoma in situ (CIS) lesions in the urinary bladder is associated with a high risk of disease progression to a muscle invasive stage. In this study, we used microarray expression profiling to examine the gene expression patterns in superficial transitional cell carcinoma (s...... urothelium and urothelium with CIS lesions from the same urinary bladder revealed that the gene expression found in sTCC with surrounding CIS is found also in CIS biopsies as well as in histologically normal samples adjacent to the CIS lesions. Furthermore, we also identified similar gene expression changes...

  18. Three-gene based phylogeny of the Urostyloidea (Protista, Ciliophora, Hypotricha), with notes on classification of some core taxa.

    Science.gov (United States)

    Huang, Jie; Chen, Zigui; Song, Weibo; Berger, Helmut

    2014-01-01

    Classifications of the Urostyloidea were mainly based on morphology and morphogenesis. Since molecular phylogeny largely focused on limited sampling using mostly the one-gene information, the incongruence between morphological data and gene sequences have risen. In this work, the three-gene data (SSU-rDNA, ITS1-5.8S-ITS2 and LSU-rDNA) comprising 12 genera in the "core urostyloids" are sequenced, and the phylogenies based on these different markers are compared using maximum-likelihood and Bayesian algorithms and tested by unconstrained and constrained analyses. The molecular phylogeny supports the following conclusions: (1) the monophyly of the core group of Urostyloidea is well supported while the whole Urostyloidea is not monophyletic; (2) Thigmokeronopsis and Apokeronopsis are clearly separated from the pseudokeronopsids in analyses of all three gene markers, supporting their exclusion from the Pseudokeronopsidae and the inclusion in the Urostylidae; (3) Diaxonella and Apobakuella should be assigned to the Urostylidae; (4) Bergeriella, Monocoronella and Neourostylopsis flavicana share a most recent common ancestor; (5) all molecular trees support the transfer of Metaurostylopsis flavicana to the recently proposed genus Neourostylopsis; (6) all molecular phylogenies fail to separate the morphologically well-defined genera Uroleptopsis and Pseudokeronopsis; and (7) Arcuseries gen. nov. containing three distinctly deviating Anteholosticha species is established. Copyright © 2013 The Authors. Published by Elsevier Inc. All rights reserved.

  19. Three-gene based phylogeny of the Urostyloidea (Protista, Ciliophora, Hypotricha), with notes on classification of some core taxa☆

    Science.gov (United States)

    Huang, Jie; Chen, Zigui; Song, Weibo; Berger, Helmut

    2014-01-01

    Classifications of the Urostyloidea were mainly based on morphology and morphogenesis. Since molecular phylogeny largely focused on limited sampling using mostly the one-gene information, the incongruence between morphological data and gene sequences have risen. In this work, the three-gene data (SSU-rDNA, ITS1-5.8S-ITS2 and LSU-rDNA) comprising 12 genera in the “core urostyloids” are sequenced, and the phylogenies based on these different markers are compared using maximum-likelihood and Bayesian algorithms and tested by unconstrained and constrained analyses. The molecular phylogeny supports the following conclusions: (1) the monophyly of the core group of Urostyloidea is well supported while the whole Urostyloidea is not monophyletic; (2) Thigmokeronopsis and Apokeronopsis are clearly separated from the pseudokeronopsids in analyses of all three gene markers, supporting their exclusion from the Pseudokeronopsidae and the inclusion in the Urostylidae; (3) Diaxonella and Apobakuella should be assigned to the Urostylidae; (4) Bergeriella, Monocoronella and Neourostylopsis flavicana share a most recent common ancestor; (5) all molecular trees support the transfer of Metaurostylopsis flavicana to the recently proposed genus Neourostylopsis; (6) all molecular phylogenies fail to separate the morphologically well-defined genera Uroleptopsis and Pseudokeronopsis; and (7) Arcuseries gen. nov. containing three distinctly deviating Anteholosticha species is established. PMID:24140978

  20. Control of mucin-type O-glycosylation: a classification of the polypeptide GalNAc-transferase gene family.

    Science.gov (United States)

    Bennett, Eric P; Mandel, Ulla; Clausen, Henrik; Gerken, Thomas A; Fritz, Timothy A; Tabak, Lawrence A

    2012-06-01

    Glycosylation of proteins is an essential process in all eukaryotes and a great diversity in types of protein glycosylation exists in animals, plants and microorganisms. Mucin-type O-glycosylation, consisting of glycans attached via O-linked N-acetylgalactosamine (GalNAc) to serine and threonine residues, is one of the most abundant forms of protein glycosylation in animals. Although most protein glycosylation is controlled by one or two genes encoding the enzymes responsible for the initiation of glycosylation, i.e. the step where the first glycan is attached to the relevant amino acid residue in the protein, mucin-type O-glycosylation is controlled by a large family of up to 20 homologous genes encoding UDP-GalNAc:polypeptide GalNAc-transferases (GalNAc-Ts) (EC 2.4.1.41). Therefore, mucin-type O-glycosylation has the greatest potential for differential regulation in cells and tissues. The GalNAc-T family is the largest glycosyltransferase enzyme family covering a single known glycosidic linkage and it is highly conserved throughout animal evolution, although absent in bacteria, yeast and plants. Emerging studies have shown that the large number of genes (GALNTs) in the GalNAc-T family do not provide full functional redundancy and single GalNAc-T genes have been shown to be important in both animals and human. Here, we present an overview of the GalNAc-T gene family in animals and propose a classification of the genes into subfamilies, which appear to be conserved in evolution structurally as well as functionally.

  1. Identification, classification and differential expression of oleosin genes in tung tree (Vernicia fordii.

    Directory of Open Access Journals (Sweden)

    Heping Cao

    Full Text Available Triacylglycerols (TAG are the major molecules of energy storage in eukaryotes. TAG are packed in subcellular structures called oil bodies or lipid droplets. Oleosins (OLE are the major proteins in plant oil bodies. Multiple isoforms of OLE are present in plants such as tung tree (Vernicia fordii, whose seeds are rich in novel TAG with a wide range of industrial applications. The objectives of this study were to identify OLE genes, classify OLE proteins and analyze OLE gene expression in tung trees. We identified five tung tree OLE genes coding for small hydrophobic proteins. Genome-wide phylogenetic analysis and multiple sequence alignment demonstrated that the five tung OLE genes represented the five OLE subfamilies and all contained the "proline knot" motif (PX5SPX3P shared among 65 OLE from 19 tree species, including the sequenced genomes of Prunus persica (peach, Populus trichocarpa (poplar, Ricinus communis (castor bean, Theobroma cacao (cacao and Vitis vinifera (grapevine. Tung OLE1, OLE2 and OLE3 belong to the S type and OLE4 and OLE5 belong to the SM type of Arabidopsis OLE. TaqMan and SYBR Green qPCR methods were used to study the differential expression of OLE genes in tung tree tissues. Expression results demonstrated that 1 All five OLE genes were expressed in developing tung seeds, leaves and flowers; 2 OLE mRNA levels were much higher in seeds than leaves or flowers; 3 OLE1, OLE2 and OLE3 genes were expressed in tung seeds at much higher levels than OLE4 and OLE5 genes; 4 OLE mRNA levels rapidly increased during seed development; and 5 OLE gene expression was well-coordinated with tung oil accumulation in the seeds. These results suggest that tung OLE genes 1-3 probably play major roles in tung oil accumulation and/or oil body development. Therefore, they might be preferred targets for tung oil engineering in transgenic plants.

  2. Kernelized partial least squares for feature reduction and classification of gene microarray data

    Directory of Open Access Journals (Sweden)

    Land Walker H

    2011-12-01

    Full Text Available Abstract Background The primary objectives of this paper are: 1. to apply Statistical Learning Theory (SLT, specifically Partial Least Squares (PLS and Kernelized PLS (K-PLS, to the universal "feature-rich/case-poor" (also known as "large p small n", or "high-dimension, low-sample size" microarray problem by eliminating those features (or probes that do not contribute to the "best" chromosome bio-markers for lung cancer, and 2. quantitatively measure and verify (by an independent means the efficacy of this PLS process. A secondary objective is to integrate these significant improvements in diagnostic and prognostic biomedical applications into the clinical research arena. That is, to devise a framework for converting SLT results into direct, useful clinical information for patient care or pharmaceutical research. We, therefore, propose and preliminarily evaluate, a process whereby PLS, K-PLS, and Support Vector Machines (SVM may be integrated with the accepted and well understood traditional biostatistical "gold standard", Cox Proportional Hazard model and Kaplan-Meier survival analysis methods. Specifically, this new combination will be illustrated with both PLS and Kaplan-Meier followed by PLS and Cox Hazard Ratios (CHR and can be easily extended for both the K-PLS and SVM paradigms. Finally, these previously described processes are contained in the Fine Feature Selection (FFS component of our overall feature reduction/evaluation process, which consists of the following components: 1. coarse feature reduction, 2. fine feature selection and 3. classification (as described in this paper and prediction. Results Our results for PLS and K-PLS showed that these techniques, as part of our overall feature reduction process, performed well on noisy microarray data. The best performance was a good 0.794 Area Under a Receiver Operating Characteristic (ROC Curve (AUC for classification of recurrence prior to or after 36 months and a strong 0.869 AUC for

  3. Gene expression in the urinary bladder: a common carcinoma in situ gene expression signature exists disregarding histopathological classification

    DEFF Research Database (Denmark)

    Andersen, Lars Dyrskjøt; Kruhøffer, Mogens; Andersen, Thomas Thykjær

    2004-01-01

    that contained genes with similar expression levels in transitional cell carcinoma (TCC) with surrounding CIS and invasive TCC. However, no close relationship between TCC with adjacent CIS and invasive TCC was observed using hierarchical cluster analysis. Expression profiling of a series of biopsies from normal...... urothelium and urothelium with CIS lesions from the same urinary bladder revealed that the gene expression found in sTCC with surrounding CIS is found also in CIS biopsies as well as in histologically normal samples adjacent to the CIS lesions. Furthermore, we also identified similar gene expression changes...... not only in CIS biopsies but also in sTCC, mTCC, and, remarkably, in histologically normal urothelium from bladders with CIS. Identification of this expression signature could provide guidance for the selection of therapy and follow-up regimen in patients with early stage bladder cancer....

  4. Hierarchical information representation and efficient classification of gene expression microarray data

    OpenAIRE

    Bosio, Mattia

    2014-01-01

    In the field of computational biology, microarryas are used to measure the activity of thousands of genes at once and create a global picture of cellular function. Microarrays allow scientists to analyze expression of many genes in a single experiment quickly and eficiently. Even if microarrays are a consolidated research technology nowadays and the trends in high-throughput data analysis are shifting towards new technologies like Next Generation Sequencing (NGS), an optimum method for sample...

  5. Gene Structures, Evolution, Classification and Expression Profiles of the Aquaporin Gene Family in Castor Bean (Ricinus communis L..

    Directory of Open Access Journals (Sweden)

    Zhi Zou

    Full Text Available Aquaporins (AQPs are a class of integral membrane proteins that facilitate the passive transport of water and other small solutes across biological membranes. Castor bean (Ricinus communis L., Euphobiaceae, an important non-edible oilseed crop, is widely cultivated for industrial, medicinal and cosmetic purposes. Its recently available genome provides an opportunity to analyze specific gene families. In this study, a total of 37 full-length AQP genes were identified from the castor bean genome, which were assigned to five subfamilies, including 10 plasma membrane intrinsic proteins (PIPs, 9 tonoplast intrinsic proteins (TIPs, 8 NOD26-like intrinsic proteins (NIPs, 6 X intrinsic proteins (XIPs and 4 small basic intrinsic proteins (SIPs on the basis of sequence similarities. Functional prediction based on the analysis of the aromatic/arginine (ar/R selectivity filter, Froger's positions and specificity-determining positions (SDPs showed a remarkable difference in substrate specificity among subfamilies. Homology analysis supported the expression of all 37 RcAQP genes in at least one of examined tissues, e.g., root, leaf, flower, seed and endosperm. Furthermore, global expression profiles with deep transcriptome sequencing data revealed diverse expression patterns among various tissues. The current study presents the first genome-wide analysis of the AQP gene family in castor bean. Results obtained from this study provide valuable information for future functional analysis and utilization.

  6. Gene Structures, Evolution, Classification and Expression Profiles of the Aquaporin Gene Family in Castor Bean (Ricinus communis L.).

    Science.gov (United States)

    Zou, Zhi; Gong, Jun; Huang, Qixing; Mo, Yeyong; Yang, Lifu; Xie, Guishui

    2015-01-01

    Aquaporins (AQPs) are a class of integral membrane proteins that facilitate the passive transport of water and other small solutes across biological membranes. Castor bean (Ricinus communis L., Euphobiaceae), an important non-edible oilseed crop, is widely cultivated for industrial, medicinal and cosmetic purposes. Its recently available genome provides an opportunity to analyze specific gene families. In this study, a total of 37 full-length AQP genes were identified from the castor bean genome, which were assigned to five subfamilies, including 10 plasma membrane intrinsic proteins (PIPs), 9 tonoplast intrinsic proteins (TIPs), 8 NOD26-like intrinsic proteins (NIPs), 6 X intrinsic proteins (XIPs) and 4 small basic intrinsic proteins (SIPs) on the basis of sequence similarities. Functional prediction based on the analysis of the aromatic/arginine (ar/R) selectivity filter, Froger's positions and specificity-determining positions (SDPs) showed a remarkable difference in substrate specificity among subfamilies. Homology analysis supported the expression of all 37 RcAQP genes in at least one of examined tissues, e.g., root, leaf, flower, seed and endosperm. Furthermore, global expression profiles with deep transcriptome sequencing data revealed diverse expression patterns among various tissues. The current study presents the first genome-wide analysis of the AQP gene family in castor bean. Results obtained from this study provide valuable information for future functional analysis and utilization.

  7. Impact of Missing Value Imputation on Classification for DNA Microarray Gene Expression Data—A Model-Based Study

    Directory of Open Access Journals (Sweden)

    Sun Youting

    2009-01-01

    Full Text Available Many missing-value (MV imputation methods have been developed for microarray data, but only a few studies have investigated the relationship between MV imputation and classification accuracy. Furthermore, these studies are problematic in fundamental steps such as MV generation and classifier error estimation. In this work, we carry out a model-based study that addresses some of the issues in previous studies. Six popular imputation algorithms, two feature selection methods, and three classification rules are considered. The results suggest that it is beneficial to apply MV imputation when the noise level is high, variance is small, or gene-cluster correlation is strong, under small to moderate MV rates. In these cases, if data quality metrics are available, then it may be helpful to consider the data point with poor quality as missing and apply one of the most robust imputation algorithms to estimate the true signal based on the available high-quality data points. However, at large MV rates, we conclude that imputation methods are not recommended. Regarding the MV rate, our results indicate the presence of a peaking phenomenon: performance of imputation methods actually improves initially as the MV rate increases, but after an optimum point, performance quickly deteriorates with increasing MV rates.

  8. Blood-based gene expression profiles models for classification of subsyndromal symptomatic depression and major depressive disorder.

    Science.gov (United States)

    Yi, Zhenghui; Li, Zezhi; Yu, Shunying; Yuan, Chengmei; Hong, Wu; Wang, Zuowei; Cui, Jian; Shi, Tieliu; Fang, Yiru

    2012-01-01

    Subsyndromal symptomatic depression (SSD) is a subtype of subthreshold depressive and also lead to significant psychosocial functional impairment as same as major depressive disorder (MDD). Several studies have suggested that SSD is a transitory phenomena in the depression spectrum and is thus considered a subtype of depression. However, the pathophysioloy of depression remain largely obscure and studies on SSD are limited. The present study compared the expression profile and made the classification with the leukocytes by using whole-genome cRNA microarrays among drug-free first-episode subjects with SSD, MDD, and matched controls (8 subjects in each group). Support vector machines (SVMs) were utilized for training and testing on candidate signature expression profiles from signature selection step. Firstly, we identified 63 differentially expressed SSD signatures in contrast to control (Pbiomarkers for SSD and MDD together, we selected top gene signatures from each group of pair-wise comparison results, and merged the signatures together to generate better profiles used for clearly classify SSD and MDD sets in the same time. In details, we tried different combination of signatures from the three pair-wise compartmental results and finally determined 48 gene expression signatures with 100% accuracy. Our finding suggested that SSD and MDD did not exhibit the same expressed genome signature with peripheral blood leukocyte, and blood cell-derived RNA of these 48 gene models may have significant value for performing diagnostic functions and classifying SSD, MDD, and healthy controls.

  9. Molecular classification of tamoxifen-resistant breast carcinomas by gene expression profiling.

    Science.gov (United States)

    Jansen, Maurice P H M; Foekens, John A; van Staveren, Iris L; Dirkzwager-Kiel, Maaike M; Ritstier, Kirsten; Look, Maxime P; Meijer-van Gelder, Marion E; Sieuwerts, Anieta M; Portengen, Henk; Dorssers, Lambert C J; Klijn, Jan G M; Berns, Els M J J

    2005-02-01

    To discover a set of markers predictive for the type of response to endocrine therapy with the antiestrogen tamoxifen using gene expression profiling. The study was performed on 112 estrogen receptor-positive primary breast carcinomas from patients with advanced disease and clearly defined types of response (ie, 52 patients with objective response v 60 patients with progressive disease) from start of first-line treatment with tamoxifen. Main clinical end points are the effects of therapy on tumor size and time until tumor progression (progression-free survival [PFS]). RNA isolated from tumor samples was amplified and hybridized to 18,000 human cDNA microarrays. Using a training set of 46 breast tumors, 81 genes were found to be differentially expressed (P tamoxifen-responsive and -resistant tumors. These genes were involved in estrogen action, apoptosis, extracellular matrix formation, and immune response. From the 81 genes, a predictive signature of 44 genes was extracted and validated on an independent set of 66 tumors. This 44-gene signature is significantly superior (odds ratio, 3.16; 95% CI, 1.10 to 9.11; P = .03) to traditional predictive factors in univariate analysis and also significantly related with a longer PFS in univariate (hazard ratio, 0.54; 95% CI, 0.31 to 0.94; P = .03) as well as in multivariate analyses (P = .03). Our data show that gene expression profiling can be used to discriminate between breast cancer patients with progressive disease and objective response to tamoxifen. Additional studies are needed to confirm if the predictive signature might allow identification of individual patients who could benefit from other (adjuvant) endocrine therapies.

  10. A reconsideration of the classification of the spider infraorder Mygalomorphae (Arachnida: Araneae based on three nuclear genes and morphology.

    Directory of Open Access Journals (Sweden)

    Jason E Bond

    Full Text Available BACKGROUND: The infraorder Mygalomorphae (i.e., trapdoor spiders, tarantulas, funnel web spiders, etc. is one of three main lineages of spiders. Comprising 15 families, 325 genera, and over 2,600 species, the group is a diverse assemblage that has retained a number of features considered primitive for spiders. Despite an evolutionary history dating back to the lower Triassic, the group has received comparatively little attention with respect to its phylogeny and higher classification. The few phylogenies published all share the common thread that a stable classification scheme for the group remains unresolved. METHODS AND FINDINGS: We report here a reevaluation of mygalomorph phylogeny using the rRNA genes 18S and 28S, the nuclear protein-coding gene EF-1γ, and a morphological character matrix. Taxon sampling includes members of all 15 families representing 58 genera. The following results are supported in our phylogenetic analyses of the data: (1 the Atypoidea (i.e., antrodiaetids, atypids, and mecicobothriids is a monophyletic group sister to all other mygalomorphs; and (2 the families Mecicobothriidae, Hexathelidae, Cyrtaucheniidae, Nemesiidae, Ctenizidae, and Dipluridae are not monophyletic. The Microstigmatidae is likely to be subsumed into Nemesiidae. Nearly half of all mygalomorph families require reevaluation of generic composition and placement. The polyphyletic family Cyrtaucheniidae is most problematic, representing no fewer than four unrelated lineages. CONCLUSIONS: Based on these analyses we propose the following nomenclatural changes: (1 the establishment of the family Euctenizidae (NEW RANK; (2 establishment of the subfamily Apomastinae within the Euctenizidae; and (3 the transfer of the cyrtaucheniid genus Kiama to Nemesiidae. Additional changes include relimitation of Domiothelina and Theraphosoidea, and the establishment of the Euctenizoidina clade (Idiopidae + Euctenizidae. In addition to these changes, we propose a "road map

  11. Application of Artificial Neural Networks in Cancer Classification and Diagnosis Prediction of a Subtype of Lymphoma Based on Gene Expression Profile

    Directory of Open Access Journals (Sweden)

    L Ziaei

    2006-01-01

    Full Text Available Background: Diffuse Large B-cell Lymphoma (DLBCL is the most common subtype of non-Hodgkin’s Lymphoma. DLBCL patients have different survivals after diagnosis. 40% of patients respond well to current therapy and have prolonged survival, whereas the remainders survive less than 5 years. In this study, we have applied artificial neural network to classify patients with DLBCL on the basis of their gene expression profiles. Finally, we have attempted to extract a number of genes that their differential expression were significant in DLBCL subtypes. Methods: We studied 40 patients and 4026 genes. In this study, genes were ranked based on their signal to noise (S/N ratios. After selecting a suitable threshold, some of them whose ratios were less than the threshold were removed. Then we used PCA for more reducing and Perceptron neural network for classification of these patients. We extracted some appropriate genes based on their prediction ability. Results: We considered various targets for patients classifying. Thus patients were classified based on their 5 years survival with accuracy of 93%, in regard to Alizadeh et al study results with accuracy of 100%, and regarding with their International Prognosis Index (IPI with accuracy of 89%. Conclusion: Combination of PCA and S/N ratio is an effective method for the reduction of the dimension and neural network is a robust tool for classification of patients according to their gene expression profile. Keywords: classification, gene expression, DLBCL, neural network, Perceptron

  12. Blood-based gene expression profiles models for classification of subsyndromal symptomatic depression and major depressive disorder.

    Directory of Open Access Journals (Sweden)

    Zhenghui Yi

    Full Text Available Subsyndromal symptomatic depression (SSD is a subtype of subthreshold depressive and also lead to significant psychosocial functional impairment as same as major depressive disorder (MDD. Several studies have suggested that SSD is a transitory phenomena in the depression spectrum and is thus considered a subtype of depression. However, the pathophysioloy of depression remain largely obscure and studies on SSD are limited. The present study compared the expression profile and made the classification with the leukocytes by using whole-genome cRNA microarrays among drug-free first-episode subjects with SSD, MDD, and matched controls (8 subjects in each group. Support vector machines (SVMs were utilized for training and testing on candidate signature expression profiles from signature selection step. Firstly, we identified 63 differentially expressed SSD signatures in contrast to control (P< = 5.0E-4 and 30 differentially expressed MDD signatures in contrast to control, respectively. Then, 123 gene signatures were identified with significantly differential expression level between SSD and MDD. Secondly, in order to conduct priority selection for biomarkers for SSD and MDD together, we selected top gene signatures from each group of pair-wise comparison results, and merged the signatures together to generate better profiles used for clearly classify SSD and MDD sets in the same time. In details, we tried different combination of signatures from the three pair-wise compartmental results and finally determined 48 gene expression signatures with 100% accuracy. Our finding suggested that SSD and MDD did not exhibit the same expressed genome signature with peripheral blood leukocyte, and blood cell-derived RNA of these 48 gene models may have significant value for performing diagnostic functions and classifying SSD, MDD, and healthy controls.

  13. Gene Expression Profiling for the Identification and Classification of Antibody-Mediated Heart Rejection.

    Science.gov (United States)

    Loupy, Alexandre; Duong Van Huyen, Jean Paul; Hidalgo, Luis; Reeve, Jeff; Racapé, Maud; Aubert, Olivier; Venner, Jeffery M; Falmuski, Konrad; Bories, Marie Cécile; Beuscart, Thibaut; Guillemain, Romain; François, Arnaud; Pattier, Sabine; Toquet, Claire; Gay, Arnaud; Rouvier, Philippe; Varnous, Shaida; Leprince, Pascal; Empana, Jean Philippe; Lefaucheur, Carmen; Bruneval, Patrick; Jouven, Xavier; Halloran, Philip F

    2017-03-07

    Antibody-mediated rejection (AMR) contributes to heart allograft loss. However, an important knowledge gap remains in terms of the pathophysiology of AMR and how detection of immune activity, injury degree, and stage could be improved by intragraft gene expression profiling. We prospectively monitored 617 heart transplant recipients referred from 4 French transplant centers (January 1, 2006-January 1, 2011) for AMR. We compared patients with AMR (n=55) with a matched control group of 55 patients without AMR. We characterized all patients using histopathology (ISHLT [International Society for Heart and Lung Transplantation] 2013 grades), immunostaining, and circulating anti-HLA donor-specific antibodies at the time of biopsy, together with systematic gene expression assessments of the allograft tissue, using microarrays. Effector cells were evaluated with in vitro human cell cultures. We studied a validation cohort of 98 heart recipients transplanted in Edmonton, AB, Canada, including 27 cases of AMR and 71 controls. A total of 240 heart transplant endomyocardial biopsies were assessed. AMR showed a distinct pattern of injury characterized by endothelial activation with microcirculatory inflammation by monocytes/macrophages and natural killer (NK) cells. We also observed selective changes in endothelial/angiogenesis and NK cell transcripts, including CD16A signaling and interferon-γ-inducible genes. The AMR-selective gene sets accurately discriminated patients with AMR from those without and included NK transcripts (area under the curve=0.87), endothelial activation transcripts (area under the curve=0.80), macrophage transcripts (area under the curve=0.86), and interferon-γ transcripts (area under the curve=0.84; P<0.0001 for all comparisons). These 4 gene sets showed increased expression with increasing pathological AMR (pAMR) International Society for Heart and Lung Transplantation grade (P<0.001) and association with donor-specific antibody levels. The

  14. Beyond classification: gene-family phylogenies from shotgun metagenomic reads enable accurate community analysis.

    Science.gov (United States)

    Riesenfeld, Samantha J; Pollard, Katherine S

    2013-06-22

    Sequence-based phylogenetic trees are a well-established tool for characterizing diversity of both macroorganisms and microorganisms. Phylogenetic methods have recently been applied to shotgun metagenomic data from microbial communities, particularly with the aim of classifying reads. But the accuracy of gene-family phylogenies that characterize evolutionary relationships among short, non-overlapping sequencing reads has not been thoroughly evaluated. To quantify errors in metagenomic read trees, we developed MetaPASSAGE, a software pipeline to generate in silico bacterial communities, simulate a sample of shotgun reads from a gene family represented in the community, orient or translate reads, and produce a profile-based alignment of the reads from which a gene-family phylogenetic tree can be built. We applied MetaPASSAGE to a variety of RNA and protein-coding gene families, built trees using a range of different phylogenetic methods, and compared the resulting trees using topological and branch-length error metrics. We identified read length as one of the major sources of error. Because phylogenetic methods use a reference database of full-length sequences from the gene family to guide construction of alignments and trees, we found that error can also be substantially reduced through increasing the size and diversity of the reference database. Finally, UniFrac analysis, which compares metagenomic samples based on a summary statistic computed over all branches in a read tree, is very robust to the level of error we observe. Bacterial community diversity can be quantified using phylogenetic approaches applied to shotgun metagenomic data. As sequencing reads get longer and more genomes across the bacterial tree of life are sequenced, the accuracy of this approach will continue to improve, opening the door to more applications.

  15. Application of a 5-tiered scheme for standardized classification of 2,360 unique mismatch repair gene variants in the InSiGHT locus-specific database

    NARCIS (Netherlands)

    Thompson, Bryony A.; Spurdle, Amanda B.; Plazzer, John-Paul; Greenblatt, Marc S.; Akagi, Kiwamu; Al-Mulla, Fahd; Bapat, Bharati; Bernstein, Inge; Capella, Gabriel; den Dunnen, Johan T.; du Sart, Desiree; Fabre, Aurelie; Farrell, Michael P.; Farrington, Susan M.; Frayling, Ian M.; Frebourg, Thierry; Goldgar, David E.; Heinen, Christopher D.; Holinski-Feder, Elke; Kohonen-Corish, Maija; Robinson, Kristina Lagerstedt; Leung, Suet Yi; Martins, Alexandra; Moller, Pal; Morak, Monika; Nystrom, Minna; Peltomaki, Paivi; Pineda, Marta; Qi, Ming; Ramesar, Rajkumar; Rasmussen, Lene Juel; Royer-Pokora, Brigitte; Scott, Rodney J.; Sijmons, Rolf; Tavtigian, Sean V.; Tops, Carli M.; Weber, Thomas; Wijnen, Juul; Woods, Michael O.; Macrae, Finlay; Genuardi, Maurizio

    2014-01-01

    The clinical classification of hereditary sequence variants identified in disease-related genes directly affects clinical management of patients and their relatives. The International Society for Gastrointestinal Hereditary Tumours (InSiGHT) undertook a collaborative effort to develop, test and appl

  16. Application of a 5-tiered scheme for standardized classification of 2,360 unique mismatch repair gene variants in the InSiGHT locus-specific database

    DEFF Research Database (Denmark)

    Thompson, Bryony A; Spurdle, Amanda B; Plazzer, John-Paul

    2014-01-01

    The clinical classification of hereditary sequence variants identified in disease-related genes directly affects clinical management of patients and their relatives. The International Society for Gastrointestinal Hereditary Tumours (InSiGHT) undertook a collaborative effort to develop, test and a...

  17. A chemometric evaluation of the underlying physical and chemical patterns that support near infrared spectroscopy of barley seeds as a tool for explorative classification of endosperm genes and gene combinations

    DEFF Research Database (Denmark)

    Jacobsen, Susanne; Søndergaard, Ib; Møller, Birthe

    2005-01-01

    Near infrared spectroscopic (NIR; 1100-2500 nm), chemical and genetic data were combined to study the pleiotropic secondary effects of mutant genes on milled samples in a barley seed model. NIR and chemical data were both effective in classifying gene and gene combinations by Principal Component...... revealing pleiotropic gene effects in expression timing that supporting the gene classification. To verify that NIR spectroscopy data represents a physio-chemical fingerprint of the barley seed, physical and chemical spectral components were partially separated by Multiple Scatter Correction...

  18. Regularization strategies for hyperplane classifiers: application to cancer classification with gene expression data.

    Science.gov (United States)

    Andries, Erik; Hagstrom, Thomas; Atlas, Susan R; Willman, Cheryl

    2007-02-01

    Linear discrimination, from the point of view of numerical linear algebra, can be treated as solving an ill-posed system of linear equations. In order to generate a solution that is robust in the presence of noise, these problems require regularization. Here, we examine the ill-posedness involved in the linear discrimination of cancer gene expression data with respect to outcome and tumor subclasses. We show that a filter factor representation, based upon Singular Value Decomposition, yields insight into the numerical ill-posedness of the hyperplane-based separation when applied to gene expression data. We also show that this representation yields useful diagnostic tools for guiding the selection of classifier parameters, thus leading to improved performance.

  19. Cancer classification through filtering progressive transductive support vector machine based on gene expression data

    Science.gov (United States)

    Lu, Xinguo; Chen, Dan

    2017-08-01

    Traditional supervised classifiers neglect a large amount of data which not have sufficient follow-up information, only work with labeled data. Consequently, the small sample size limits the advancement of design appropriate classifier. In this paper, a transductive learning method which combined with the filtering strategy in transductive framework and progressive labeling strategy is addressed. The progressive labeling strategy does not need to consider the distribution of labeled samples to evaluate the distribution of unlabeled samples, can effective solve the problem of evaluate the proportion of positive and negative samples in work set. Our experiment result demonstrate that the proposed technique have great potential in cancer prediction based on gene expression.

  20. PNPLA3-associated steatohepatitis: toward a gene-based classification of fatty liver disease.

    Science.gov (United States)

    Krawczyk, Marcin; Portincasa, Piero; Lammert, Frank

    2013-11-01

    Nonalcoholic fatty liver disease is one of the most common hepatic disorders worldwide. Given the high-calorie nutrition of children and adults, nonalcoholic fatty liver disease (NAFLD) is expected to become a major cause of cirrhosis and eventually liver transplantation. Familial clustering and ethnic differences indicate that genetic factors contribute to NAFLD. Recently, the common variant p.I148M of the enzyme adiponutrin (PNPLA3) has emerged as a major genetic determinant of hepatic steatosis and nonalcoholic steatohepatitis as well as its pathobiological sequelae fibrosis, cirrhosis, and hepatocellular cancer. PNPLA3 encodes a lipid droplet-associated, carbohydrate-regulated lipogenic and/or lipolytic enzyme. Homozygous carriers of the PNPLA3 variant are prone to develop cirrhosis in the absence of other risk factors such as alcohol or viral hepatitis. Here we review the plethora of studies that unraveled the association between PNPLA3 and NAFLD in children and adults, discuss its distinct effects on liver and metabolic traits, and introduce the term PNPLA3-associated steatohepatitis (PASH) as a novel gene-based liver disease. Given the prevalence of the risk allele in 40 to 50% of Europeans, the authors conclude that PNPLA3 should be considered in the diagnostic workup of fatty liver disease and that homozygous risk allele carriers might benefit from careful cancer surveillance.

  1. Two-gene signature improves the discriminatory power of IASLC/ATS/ERS classification to predict the survival of patients with early-stage lung adenocarcinoma

    Directory of Open Access Journals (Sweden)

    Sun Y

    2016-07-01

    Full Text Available Yifeng Sun,1,* Likun Hou,2,* Yu Yang,1 Huikang Xie,2 Yang Yang,1 Zhigang Li,1 Heng Zhao,1 Wen Gao,3 Bo Su4 1Department of Thoracic Surgery, Shanghai Chest Hospital, Shanghai Jiaotong University, 2Department of Pathology, Shanghai Pulmonary Hospital, Tongji University School of Medicine, Shanghai, 3Department of Thoracic Surgery, Shanghai Huadong Hospital, Fudan University School of Medicine, Shanghai, 4Central Lab, Shanghai Pulmonary Hospital, Tongji University School of Medicine, Shanghai, People’s Republic of China *These authors contributed equally to this work Background: In this study, we investigated the contribution of a gene expression–based signature (composed of BAG1, BRCA1, CDC6, CDK2AP1, ERBB3, FUT3, IL11, LCK, RND3, SH3BGR to survival prediction for early-stage lung adenocarcinoma categorized by the new International Association for the Study of Lung Cancer (IASLC/the American Thoracic Society (ATS/the European Respiratory Society (ERS classification. We also aimed to verify whether gene signature improves the risk discrimination of IASLC/ATS/ERS classification in early-stage lung adenocarcinoma. Patients and methods: Total RNA was extracted from 93 patients with pathologically confirmed TNM stage Ia and Ib lung adenocarcinoma. The mRNA expression levels of ten genes in the signature (BAG1, BRCA1, CDC6, CDK2AP1, ERBB3, FUT3, IL11, LCK, RND3, and SH3BGR were detected using real-time polymerase chain reaction. Each patient was categorized according to the new IASLC/ATS/ERS classification by accessing hematoxylin–eosin-stained slides. The corresponding Kaplan–Meier survival analysis by the log-rank statistic, multivariate Cox proportional hazards modeling, and c-index calculation were conducted using the programming language R (Version 2.15.1 with the “risksetROC” package. Results: The multivariate analysis demonstrated that the risk factor of the ten-gene expression signature can significantly improve the discriminatory

  2. Visualization-aided classification ensembles discriminate lung adenocarcinoma and squamous cell carcinoma samples using their gene expression profiles.

    Directory of Open Access Journals (Sweden)

    Ao Zhang

    Full Text Available INTRODUCTION: The widespread application of microarray experiments to cancer research is astounding including lung cancer, one of the most common fatal human tumors. Among non-small cell lung carcinoma (NSCLC, there are two major histological types of NSCLC, adenocarcinoma (AC and squamous cell carcinoma (SCC. RESULTS: In this paper, we proposed to integrate a visualization method called Radial Coordinate Visualization (Radviz with a suitable classifier, aiming at discriminating two NSCLC subtypes using patients' gene expression profiles. Our analyses on simulated data and a real microarray dataset show that combining with a classification method, Radviz may play a role in selecting relevant features and ameliorating parsimony, while the final model suffers no or least loss of accuracy. Most importantly, a graphic representation is more easily understandable and implementable for a clinician than statistical methods and/or mathematic equations. CONCLUSION: To conclude, using the NSCLC microarray data presented here as a benchmark, the comprehensive understanding of the underlying mechanism associated with NSCLC and of the mechanisms with its subtypes and respective stages will become reality in the near future.

  3. Gene Expression Profiling of Colorectal Tumors and Normal Mucosa by Microarrays Meta-Analysis Using Prediction Analysis of Microarray, Artificial Neural Network, Classification, and Regression Trees

    Directory of Open Access Journals (Sweden)

    Chi-Ming Chu

    2014-01-01

    Full Text Available Background. Microarray technology shows great potential but previous studies were limited by small number of samples in the colorectal cancer (CRC research. The aims of this study are to investigate gene expression profile of CRCs by pooling cDNA microarrays using PAM, ANN, and decision trees (CART and C5.0. Methods. Pooled 16 datasets contained 88 normal mucosal tissues and 1186 CRCs. PAM was performed to identify significant expressed genes in CRCs and models of PAM, ANN, CART, and C5.0 were constructed for screening candidate genes via ranking gene order of significances. Results. The first screening identified 55 genes. The test accuracy of each model was over 0.97 averagely. Less than eight genes achieve excellent classification accuracy. Combining the results of four models, we found the top eight differential genes in CRCs; suppressor genes, CA7, SPIB, GUCA2B, AQP8, IL6R and CWH43; oncogenes, SPP1 and TCN1. Genes of higher significances showed lower variation in rank ordering by different methods. Conclusion. We adopted a two-tier genetic screen, which not only reduced the number of candidate genes but also yielded good accuracy (nearly 100%. This method can be applied to future studies. Among the top eight genes, CA7, TCN1, and CWH43 have not been reported to be related to CRC.

  4. Microarray gene expression analysis of fixed archival tissue permits molecular classification and identification of potential therapeutic targets in diffuse large B-cell lymphoma.

    Science.gov (United States)

    Linton, Kim; Howarth, Christopher; Wappett, Mark; Newton, Gillian; Lachel, Cynthia; Iqbal, Javeed; Pepper, Stuart; Byers, Richard; Chan, Wing John; Radford, John

    2012-01-01

    Refractory/relapsed diffuse large B-cell lymphoma (DLBCL) has a poor prognosis. Novel drugs targeting the constitutively activated NF-κB pathway characteristic of ABC-DLBCL are promising, but evaluation depends on accurate activated B cell-like (ABC)/germinal center B cell-like (GCB) molecular classification. This is traditionally performed on gene microarray expression profiles of fresh biopsies, which are not routinely collected, or by immunohistochemistry on formalin-fixed, paraffin-embedded (FFPE) tissue, which lacks reproducibility and classification accuracy. We explored the possibility of using routine archival FFPE tissue for gene microarray applications. We examined Affymetrix HG U133 Plus 2.0 gene expression profiles from paired archival FFPE and fresh-frozen tissues of 40 ABC/GCB-classified DLBCL cases to compare classification accuracy and test the potential for this approach to aid the discovery of therapeutic targets and disease classifiers in DLBCL. Unsupervised hierarchical clustering of unselected present probe sets distinguished ABC/GCB in FFPE with remarkable accuracy, and a Bayesian classifier correctly assigned 32 of 36 cases with >90% probability. Enrichment for NF-κB genes was appropriately seen in ABC-DLBCL FFPE tissues. The top discriminatory genes expressed in FFPE separated cases with high statistical significance and contained novel biology with potential therapeutic insights, warranting further investigation. These results support a growing understanding that archival FFPE tissues can be used in microarray experiments aimed at molecular classification, prognostic biomarker discovery, and molecular exploration of rare diseases. Copyright © 2012 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.

  5. Clinical application of modified bag-of-features coupled with hybrid neural-based classifier in dengue fever classification using gene expression data.

    Science.gov (United States)

    Chatterjee, Sankhadeep; Dey, Nilanjan; Shi, Fuqian; Ashour, Amira S; Fong, Simon James; Sen, Soumya

    2017-09-11

    Dengue fever detection and classification have a vital role due to the recent outbreaks of different kinds of dengue fever. Recently, the advancement in the microarray technology can be employed for such classification process. Several studies have established that the gene selection phase takes a significant role in the classifier performance. Subsequently, the current study focused on detecting two different variations, namely, dengue fever (DF) and dengue hemorrhagic fever (DHF). A modified bag-of-features method has been proposed to select the most promising genes in the classification process. Afterward, a modified cuckoo search optimization algorithm has been engaged to support the artificial neural (ANN-MCS) to classify the unknown subjects into three different classes namely, DF, DHF, and another class containing convalescent and normal cases. The proposed method has been compared with other three well-known classifiers, namely, multilayer perceptron feed-forward network (MLP-FFN), artificial neural network (ANN) trained with cuckoo search (ANN-CS), and ANN trained with PSO (ANN-PSO). Experiments have been carried out with different number of clusters for the initial bag-of-features-based feature selection phase. After obtaining the reduced dataset, the hybrid ANN-MCS model has been employed for the classification process. The results have been compared in terms of the confusion matrix-based performance measuring metrics. The experimental results indicated a highly statistically significant improvement with the proposed classifier over the traditional ANN-CS model.

  6. Combining multiple hypothesis testing and affinity propagation clustering leads to accurate, robust and sample size independent classification on gene expression data

    Directory of Open Access Journals (Sweden)

    Sakellariou Argiris

    2012-10-01

    Full Text Available Abstract Background A feature selection method in microarray gene expression data should be independent of platform, disease and dataset size. Our hypothesis is that among the statistically significant ranked genes in a gene list, there should be clusters of genes that share similar biological functions related to the investigated disease. Thus, instead of keeping N top ranked genes, it would be more appropriate to define and keep a number of gene cluster exemplars. Results We propose a hybrid FS method (mAP-KL, which combines multiple hypothesis testing and affinity propagation (AP-clustering algorithm along with the Krzanowski & Lai cluster quality index, to select a small yet informative subset of genes. We applied mAP-KL on real microarray data, as well as on simulated data, and compared its performance against 13 other feature selection approaches. Across a variety of diseases and number of samples, mAP-KL presents competitive classification results, particularly in neuromuscular diseases, where its overall AUC score was 0.91. Furthermore, mAP-KL generates concise yet biologically relevant and informative N-gene expression signatures, which can serve as a valuable tool for diagnostic and prognostic purposes, as well as a source of potential disease biomarkers in a broad range of diseases. Conclusions mAP-KL is a data-driven and classifier-independent hybrid feature selection method, which applies to any disease classification problem based on microarray data, regardless of the available samples. Combining multiple hypothesis testing and AP leads to subsets of genes, which classify unknown samples from both, small and large patient cohorts with high accuracy.

  7. Assessment of the InSiGHT Interpretation Criteria for the Clinical Classification of 24 MLH1 and MSH2 Gene Variants.

    Science.gov (United States)

    Tricarico, Rossella; Kasela, Mariann; Mareni, Cristina; Thompson, Bryony A; Drouet, Aurélie; Staderini, Lucia; Gorelli, Greta; Crucianelli, Francesca; Ingrosso, Valentina; Kantelinen, Jukka; Papi, Laura; De Angioletti, Maria; Berardi, Margherita; Gaildrat, Pascaline; Soukarieh, Omar; Turchetti, Daniela; Martins, Alexandra; Spurdle, Amanda B; Nyström, Minna; Genuardi, Maurizio

    2017-01-01

    Pathogenicity assessment of DNA variants in disease genes to explain their clinical consequences is an integral component of diagnostic molecular testing. The International Society for Gastrointestinal Hereditary Tumors (InSiGHT) has developed specific criteria for the interpretation of mismatch repair (MMR) gene variants. Here, we performed a systematic investigation of 24 MLH1 and MSH2 variants. The assessments were done by analyzing population frequency, segregation, tumor molecular characteristics, RNA effects, protein expression levels, and in vitro MMR activity. Classifications were confirmed for 15 variants and changed for three, and for the first time determined for six novel variants. Overall, based on our results, we propose the introduction of some refinements to the InSiGHT classification rules. The proposed changes have the advantage of homogenizing the InSIGHT interpretation criteria with those set out by the Evidence-based Network for the Interpretation of Germline Mutant Alleles (ENIGMA) consortium for the BRCA1/BRCA2 genes. We also observed that the addition of only few clinical data was sufficient to obtain a more stable classification for variants considered as "likely pathogenic" or "likely nonpathogenic." This shows the importance of obtaining as many as possible points of evidence for variant interpretation, especially from the clinical setting.

  8. Multiplex real-time PCR assay for detection and classification of Klebsiella pneumoniae carbapenemase gene (bla KPC) variants.

    Science.gov (United States)

    Chen, Liang; Mediavilla, José R; Endimiani, Andrea; Rosenthal, Marnie E; Zhao, Yanan; Bonomo, Robert A; Kreiswirth, Barry N

    2011-02-01

    Carbapenem resistance mediated by plasmid-borne Klebsiella pneumoniae carbapenemases (KPC) is an emerging problem of significant clinical importance in Gram-negative bacteria. Multiple KPC gene variants (bla(KPC)) have been reported, with KPC-2 (bla(KPC-2)) and KPC-3 (bla(KPC-3)) associated with epidemic outbreaks in New York City and various international settings. Here, we describe the development of a multiplex real-time PCR assay using molecular beacons (MB-PCR) for rapid and accurate identification of bla(KPC) variants. The assay consists of six molecular beacons and two oligonucleotide primer pairs, allowing for detection and classification of all currently described bla(KPC) variants (bla(KPC-2) to bla(KPC-11)). The MB-PCR detection limit was 5 to 40 DNA copies per reaction and 4 CFU per reaction using laboratory-prepared samples. The MB-PCR probes were highly specific for each bla(KPC) variant, and cross-reactivity was not observed using DNA isolated from several bacterial species. A total of 457 clinical Gram-negative isolates were successfully characterized by our MB-PCR assay, with bla(KPC-3) and bla(KPC-2) identified as the most common types in the New York/New Jersey metropolitan region. The MB-PCR assay described herein is rapid, sensitive, and specific and should be useful for understanding the ongoing evolution of carbapenem resistance in Gram-negative bacteria. As novel bla(KPC) variants continue to emerge, the MB-PCR assay can be modified in response to epidemiologic developments.

  9. A microarray platform-independent classification tool for cell of origin class allows comparative analysis of gene expression in diffuse large B-cell lymphoma.

    Directory of Open Access Journals (Sweden)

    Matthew A Care

    Full Text Available Cell of origin classification of diffuse large B-cell lymphoma (DLBCL identifies subsets with biological and clinical significance. Despite the established nature of the classification existing studies display variability in classifier implementation, and a comparative analysis across multiple data sets is lacking. Here we describe the validation of a cell of origin classifier for DLBCL, based on balanced voting between 4 machine-learning tools: the DLBCL automatic classifier (DAC. This shows superior survival separation for assigned Activated B-cell (ABC and Germinal Center B-cell (GCB DLBCL classes relative to a range of other classifiers. DAC is effective on data derived from multiple microarray platforms and formalin fixed paraffin embedded samples and is parsimonious, using 20 classifier genes. We use DAC to perform a comparative analysis of gene expression in 10 data sets (2030 cases. We generate ranked meta-profiles of genes showing consistent class-association using ≥6 data sets as a cut-off: ABC (414 genes and GCB (415 genes. The transcription factor ZBTB32 emerges as the most consistent and differentially expressed gene in ABC-DLBCL while other transcription factors such as ARID3A, BATF, and TCF4 are also amongst the 24 genes associated with this class in all datasets. Analysis of enrichment of 12323 gene signatures against meta-profiles and all data sets individually confirms consistent associations with signatures of molecular pathways, chromosomal cytobands, and transcription factor binding sites. We provide DAC as an open access Windows application, and the accompanying meta-analyses as a resource.

  10. Development of a two-stage gene selection method that incorporates a novel hybrid approach using the cuckoo optimization algorithm and harmony search for cancer classification.

    Science.gov (United States)

    Elyasigomari, V; Lee, D A; Screen, H R C; Shaheed, M H

    2017-03-01

    For each cancer type, only a few genes are informative. Due to the so-called 'curse of dimensionality' problem, the gene selection task remains a challenge. To overcome this problem, we propose a two-stage gene selection method called MRMR-COA-HS. In the first stage, the minimum redundancy and maximum relevance (MRMR) feature selection is used to select a subset of relevant genes. The selected genes are then fed into a wrapper setup that combines a new algorithm, COA-HS, using the support vector machine as a classifier. The method was applied to four microarray datasets, and the performance was assessed by the leave one out cross-validation method. Comparative performance assessment of the proposed method with other evolutionary algorithms suggested that the proposed algorithm significantly outperforms other methods in selecting a fewer number of genes while maintaining the highest classification accuracy. The functions of the selected genes were further investigated, and it was confirmed that the selected genes are biologically relevant to each cancer type.

  11. A multi-gene phylogeny of Lactifluus (Basidiomycota, Russulales) translated into a new infrageneric classification of the genus

    NARCIS (Netherlands)

    Crop, de E.; Nuytinck, J.; Putte, van de K.; Wisitrassameewong, K.; Hackel, J.; Stubbe, D.; Hyde, K.D.; Roy, M.; Halling, R.E.; Moreau, P.-A.; Eberhardt, U.; Verbeken, A.

    2017-01-01

    Infrageneric relations of the genetically diverse milkcap genus Lactifluus (Russulales, Basidiomycota) are poorly known. Currently used classification systems still largely reflect the traditional, mainly morphological, characters used for infrageneric delimitations of milkcaps. Increased sampling,

  12. Genome-wide analyses of chitin synthases identify horizontal gene transfers towards bacteria and allow a robust and unifying classification into fungi.

    Science.gov (United States)

    Gonçalves, Isabelle R; Brouillet, Sophie; Soulié, Marie-Christine; Gribaldo, Simonetta; Sirven, Catherine; Charron, Noémie; Boccara, Martine; Choquer, Mathias

    2016-11-24

    Chitin, the second most abundant biopolymer on earth after cellulose, is found in probably all fungi, many animals (mainly invertebrates), several protists and a few algae, playing an essential role in the development of many of them. This polysaccharide is produced by type 2 glycosyltransferases, called chitin synthases (CHS). There are several contradictory classifications of CHS isoenzymes and, as regards their evolutionary history, their origin and diversity is still a matter of debate. A genome-wide analysis resulted in the detection of more than eight hundred putative chitin synthases in proteomes associated with about 130 genomes. Phylogenetic analyses were performed with special care to avoid any pitfalls associated with the peculiarities of these sequences (e.g. highly variable regions, truncated or recombined sequences, long-branch attraction). This allowed us to revise and unify the fungal CHS classification and to study the evolutionary history of the CHS multigenic family. This update has the advantage of being user-friendly due to the development of a dedicated website ( http://wwwabi.snv.jussieu.fr/public/CHSdb ), and it includes any correspondences with previously published classifications and mutants. Concerning the evolutionary history of CHS, this family has mainly evolved via duplications and losses. However, it is likely that several horizontal gene transfers (HGT) also occurred in eukaryotic microorganisms and, even more surprisingly, in bacteria. This comprehensive multi-species analysis contributes to the classification of fungal CHS, in particular by optimizing its robustness, consensuality and accessibility. It also highlights the importance of HGT in the evolutionary history of CHS and describes bacterial chs genes for the first time. Many of the bacteria that have acquired a chitin synthase are plant pathogens (e.g. Dickeya spp; Pectobacterium spp; Brenneria spp; Agrobacterium vitis and Pseudomonas cichorii). Whether they are able to

  13. Identification of Associated Genes and Diseases in Patients With Congenital Upper-Limb Anomalies: A Novel Application of the OMT Classification.

    Science.gov (United States)

    Baas, Martijn; Stubbs, Andrew P; van Zessen, David B; Galjaard, Robert-Jan H; van der Spek, Peter J; Hovius, Steven E R; van Nieuwenhoven, Christianne A

    2017-07-01

    Congenital upper-limb anomalies (CULA) can present as a part of a syndrome or association. There is a wide spectrum of CULA, each of which might be related to different diseases. The structure provided by the Oberg, Manske, and Tonkin (OMT) classification could aid in differential diagnosis formulation in patients with CULA. The aims of this study were to review the Human Phenotype Ontology (HPO) project database for diseases and causative genes related to the CULA described in the OMT classification and to develop a methodology for differential diagnosis formulation based on the observed congenital anomalies, CulaPhen. We reviewed the HPO database for all diseases, including causative genes related to CULA. All CULA were classified according to the OMT classification; associated non-hand phenotypes were classified into 12 anatomical groups. We analyzed the contribution of each anatomical group to a given disease and developed a tool for differential diagnosis formulation based on these contributions. We compared our results with cases from the literature and with a current HPO tool, Phenomizer. In total, 514 hand phenotypes were obtained, 384 of which could be classified in the OMT classification. A total of 1,403 diseases could be related to those CULA. A comparison with 10 recently published cases with CULA revealed that the presented phenotype matched the descriptions in our dataset. The differential diagnosis produced using our methodology was more accurate than Phenomizer in 4 of 5 examples. The OMT classification can be used to describe hand anomalies that may present in over 1,400 diseases. CulaPhen was developed to provide a (hand) phenotype-based differential diagnosis. Differential diagnosis formulation based on the proposed system outperforms the system in current use. This study illustrates that the OMT diagnoses, either individually or combined, can be cross-referenced with different diseases and syndromes. Therefore, use of the OMT classification can

  14. Comprehensive gene expression profiling and immunohistochemical studies support application of immunophenotypic algorithm for molecular subtype classification in diffuse large B-cell lymphoma

    DEFF Research Database (Denmark)

    Visco, C; Xu-Monette, Z Y; Miranda, R N

    2012-01-01

    Gene expression profiling (GEP) has stratified diffuse large B-cell lymphoma (DLBCL) into molecular subgroups that correspond to different stages of lymphocyte development-namely germinal center B-cell like and activated B-cell like. This classification has prognostic significance, but GEP...... on formalin-fixed, paraffin-embedded tissue samples. Sections were stained with antibodies reactive with CD10, GCET1, FOXP1, MUM1 and BCL6 and cases were classified following a rationale of sequential steps of differentiation of B cells. Cutoffs for each marker were obtained using receiver...

  15. Classification of Babesia canis strains in Europe based on polymorphism of the Bc28.1-gene from the Babesia canis Bc28 multigene family.

    Science.gov (United States)

    Carcy, B; Randazzo, S; Depoix, D; Adaszek, L; Cardoso, L; Baneth, G; Gorenflot, A; Schetters, T P

    2015-07-30

    The vast majority of clinical babesiosis cases in dogs in Europe is caused by Babesia canis. Although dogs can be vaccinated, the level of protection is highly variable, which might be due to genetic diversity of B. canis strains. One of the major merozoite surface antigens of B. canis is a protein with a Mr of 28 kDa that belongs to the Bc28 multigene family, that comprises at least two genes, Bc28.1 and a homologous Bc28.2 gene. The two genes are relatively conserved but they are very distinct in their 3' ends, enabling the design of specific primers. Sequencing of the Bc28.1 genes from 4 genetically distinct B. canis laboratory strains (A8, B, 34.01 and G) revealed 20 mutations at conserved positions of which three allowed the classification of B. canis strains into three main groups (A, B and 34.01/G) by RFLP. This assay was subsequently used to analyze blood samples of 394 dogs suspected of clinical babesiosis from nine countries in Europe. All blood samples were first analyzed with a previously described assay that allowed detection of the different Babesia species that infect dogs. Sixty one percent of the samples contained detectable levels of Babesia DNA. Of these, 98.3% were positive for B. canis, the remaining cases were positive for B. vogeli. Analysis of the Bc28.1 gene, performed on 178 of the B. canis samples, revealed an overall dominance of genotype B (62.4%), followed by genotypes A (37.1%) and 34 (11.8%). Interestingly, a great variation in the geographical distribution and prevalence of the three B. canis genotypes was observed; in the North-East genotype A predominated (72.1% A against 27.9% B), in contrast to the South-West where genotype B predominated (10.3% A against 89.7% B). In the central part of Europe intermediate levels were found (26.0-42.9% A against 74.0-57.1% B, from West to East). Genotype 34 was only identified in France (26.9% among 78 samples) and mostly as co-infection with genotypes A or B (61.9%). A comparative analysis of

  16. Genome-Wide Identification, Classification, and Expression Analysis of Amino Acid Transporter Gene Family in Glycine Max.

    Science.gov (United States)

    Cheng, Lin; Yuan, Hong-Yu; Ren, Ren; Zhao, Shi-Qi; Han, Ya-Peng; Zhou, Qi-Ying; Ke, Dan-Xia; Wang, Ying-Xiang; Wang, Lei

    2016-01-01

    Amino acid transporters (AATs) play important roles in transporting amino acid across cellular membranes and are essential for plant growth and development. To date, the AAT gene family in soybean (Glycine max L.) has not been characterized. In this study, we identified 189 AAT genes from the entire soybean genomic sequence, and classified them into 12 distinct subfamilies based upon their sequence composition and phylogenetic positions. To further investigate the functions of these genes, we analyzed the chromosome distributions, gene structures, duplication patterns, phylogenetic tree, tissue expression patterns of the 189 AAT genes in soybean. We found that a large number of AAT genes in soybean were expanded via gene duplication, 46 and 36 GmAAT genes were WGD/segmental and tandemly duplicated, respectively. Further comprehensive analyses of the expression profiles of GmAAT genes in various stages of vegetative and reproductive development showed that soybean AAT genes exhibited preferential or distinct expression patterns among different tissues. Overall, our study provides a framework for further analysis of the biological functions of AAT genes in either soybean or other crops.

  17. Classification and Clustering Methods for Multiple Environmental Factors in Gene-Environment Interaction: Application to the Multi-Ethnic Study of Atherosclerosis.

    Science.gov (United States)

    Ko, Yi-An; Mukherjee, Bhramar; Smith, Jennifer A; Kardia, Sharon L R; Allison, Matthew; Diez Roux, Ana V

    2016-11-01

    There has been an increased interest in identifying gene-environment interaction (G × E) in the context of multiple environmental exposures. Most G × E studies analyze one exposure at a time, but we are exposed to multiple exposures in reality. Efficient analysis strategies for complex G × E with multiple environmental factors in a single model are still lacking. Using the data from the Multiethnic Study of Atherosclerosis, we illustrate a two-step approach for modeling G × E with multiple environmental factors. First, we utilize common clustering and classification strategies (e.g., k-means, latent class analysis, classification and regression trees, Bayesian clustering using Dirichlet Process) to define subgroups corresponding to distinct environmental exposure profiles. Second, we illustrate the use of an additive main effects and multiplicative interaction model, instead of the conventional saturated interaction model using product terms of factors, to study G × E with the data-driven exposure subgroups defined in the first step. We demonstrate useful analytical approaches to translate multiple environmental exposures into one summary class. These tools not only allow researchers to consider several environmental exposures in G × E analysis but also provide some insight into how genes modify the effect of a comprehensive exposure profile instead of examining effect modification for each exposure in isolation.

  18. A Novel Hybrid Dimension Reduction Technique for Undersized High Dimensional Gene Expression Data Sets Using Information Complexity Criterion for Cancer Classification

    Directory of Open Access Journals (Sweden)

    Esra Pamukçu

    2015-01-01

    Full Text Available Gene expression data typically are large, complex, and highly noisy. Their dimension is high with several thousand genes (i.e., features but with only a limited number of observations (i.e., samples. Although the classical principal component analysis (PCA method is widely used as a first standard step in dimension reduction and in supervised and unsupervised classification, it suffers from several shortcomings in the case of data sets involving undersized samples, since the sample covariance matrix degenerates and becomes singular. In this paper we address these limitations within the context of probabilistic PCA (PPCA by introducing and developing a new and novel approach using maximum entropy covariance matrix and its hybridized smoothed covariance estimators. To reduce the dimensionality of the data and to choose the number of probabilistic PCs (PPCs to be retained, we further introduce and develop celebrated Akaike’s information criterion (AIC, consistent Akaike’s information criterion (CAIC, and the information theoretic measure of complexity (ICOMP criterion of Bozdogan. Six publicly available undersized benchmark data sets were analyzed to show the utility, flexibility, and versatility of our approach with hybridized smoothed covariance matrix estimators, which do not degenerate to perform the PPCA to reduce the dimension and to carry out supervised classification of cancer groups in high dimensions.

  19. Classification of rice allergenic protein cDNAs belonging to the alpha-amylase/trypsin inhibitor gene family.

    Science.gov (United States)

    Alvarez, A M; Adachi, T; Nakase, M; Aoki, N; Nakamura, R; Matsuda, T

    1995-09-06

    Seven cDNA clones encoding rice allergenic proteins were newly isolated. Comparison of the sequences of ten cDNA clones, including the previously isolated three clones results in their classification into four subfamilies. Homologies in the nucleotide sequences among and within subfamilies are 70-85% and above 95%, respectively. A sequence of twenty five amino-acid residues at the C-terminal proximal region is highly conserved among all clones and resembles that of plant lipid transfer proteins.

  20. The fuzzy gene filter: A classifier performance assesment

    CERN Document Server

    Perez, Meir

    2011-01-01

    The Fuzzy Gene Filter (FGF) is an optimised Fuzzy Inference System designed to rank genes in order of differential expression, based on expression data generated in a microarray experiment. This paper examines the effectiveness of the FGF for feature selection using various classification architectures. The FGF is compared to three of the most common gene ranking algorithms: t-test, Wilcoxon test and ROC curve analysis. Four classification schemes are used to compare the performance of the FGF vis-a-vis the standard approaches: K Nearest Neighbour (KNN), Support Vector Machine (SVM), Naive Bayesian Classifier (NBC) and Artificial Neural Network (ANN). A nested stratified Leave-One-Out Cross Validation scheme is used to identify the optimal number top ranking genes, as well as the optimal classifier parameters. Two microarray data sets are used for the comparison: a prostate cancer data set and a lymphoma data set.

  1. Algorithms of Expert Classification Applied in Quickbird Satellite Images for Land Use Mapping Algoritmos de Clasificación Experta Aplicados en Imágenes Satelitales Quickbird para el Mapeo de la Cobertura de la Tierra

    Directory of Open Access Journals (Sweden)

    Alberto Jesús Perea

    2009-09-01

    Full Text Available The objective of this paper was the development of a methodology for the classification of digital aerial images, which, with the aid of object-based classification and the Normalized Difference Vegetation Index (NDVI, can quantify agricultural areas, by using algorithms of expert classification, with the aim of improving the final results of thematic classifications. QuickBird satellite images and data of 2532 plots in Hinojosa del Duque, Spain, were used to validate the different classifications, obtaining an overall classification accuracy of 91.9% and an excellent Kappa statistic (87.6% for the algorithm of expert classification.El objetivo del presente trabajo fue poner a punto una metodología de clasificación de imágenes de satélite, que auxiliada por la clasificación orientada a objetos y el índice de vegetación de diferencia normalizada (normalized difference vegetation index, NDVI, permita cuantificar las áreas agrícolas de la región utilizando algoritmos de clasificación experta, con vistas a mejorar los resultados finales de las clasificaciones temáticas. Se utilizaron imágenes satelitales Quickbird y datos de 2532 parcelas en Hinojosa del Duque, España, para validar las clasificaciones, consiguiendo una precisión total del 91,9% y un excelente estadístico Kappa (87,6% para el algoritmo de clasificación experta.

  2. Classification of Non-Small Cell Lung Cancer Using Significance Analysis of Microarray-Gene Set Reduction Algorithm

    Directory of Open Access Journals (Sweden)

    Lei Zhang

    2016-01-01

    Full Text Available Among non-small cell lung cancer (NSCLC, adenocarcinoma (AC, and squamous cell carcinoma (SCC are two major histology subtypes, accounting for roughly 40% and 30% of all lung cancer cases, respectively. Since AC and SCC differ in their cell of origin, location within the lung, and growth pattern, they are considered as distinct diseases. Gene expression signatures have been demonstrated to be an effective tool for distinguishing AC and SCC. Gene set analysis is regarded as irrelevant to the identification of gene expression signatures. Nevertheless, we found that one specific gene set analysis method, significance analysis of microarray-gene set reduction (SAMGSR, can be adopted directly to select relevant features and to construct gene expression signatures. In this study, we applied SAMGSR to a NSCLC gene expression dataset. When compared with several novel feature selection algorithms, for example, LASSO, SAMGSR has equivalent or better performance in terms of predictive ability and model parsimony. Therefore, SAMGSR is a feature selection algorithm, indeed. Additionally, we applied SAMGSR to AC and SCC subtypes separately to discriminate their respective stages, that is, stage II versus stage I. Few overlaps between these two resulting gene signatures illustrate that AC and SCC are technically distinct diseases. Therefore, stratified analyses on subtypes are recommended when diagnostic or prognostic signatures of these two NSCLC subtypes are constructed.

  3. Classification of Non-Small Cell Lung Cancer Using Significance Analysis of Microarray-Gene Set Reduction Algorithm.

    Science.gov (United States)

    Zhang, Lei; Wang, Linlin; Du, Bochuan; Wang, Tianjiao; Tian, Pu; Tian, Suyan

    2016-01-01

    Among non-small cell lung cancer (NSCLC), adenocarcinoma (AC), and squamous cell carcinoma (SCC) are two major histology subtypes, accounting for roughly 40% and 30% of all lung cancer cases, respectively. Since AC and SCC differ in their cell of origin, location within the lung, and growth pattern, they are considered as distinct diseases. Gene expression signatures have been demonstrated to be an effective tool for distinguishing AC and SCC. Gene set analysis is regarded as irrelevant to the identification of gene expression signatures. Nevertheless, we found that one specific gene set analysis method, significance analysis of microarray-gene set reduction (SAMGSR), can be adopted directly to select relevant features and to construct gene expression signatures. In this study, we applied SAMGSR to a NSCLC gene expression dataset. When compared with several novel feature selection algorithms, for example, LASSO, SAMGSR has equivalent or better performance in terms of predictive ability and model parsimony. Therefore, SAMGSR is a feature selection algorithm, indeed. Additionally, we applied SAMGSR to AC and SCC subtypes separately to discriminate their respective stages, that is, stage II versus stage I. Few overlaps between these two resulting gene signatures illustrate that AC and SCC are technically distinct diseases. Therefore, stratified analyses on subtypes are recommended when diagnostic or prognostic signatures of these two NSCLC subtypes are constructed.

  4. Locus-Specific Databases and Recommendations to Strengthen Their Contribution to the Classification of Variants in Cancer Susceptibility Genes

    NARCIS (Netherlands)

    Greenblatt, Marc S.; Brody, Lawrence C.; Foulkes, William D.; Genuardi, Maurizio; Hofstra, Robert M. W.; Olivier, Magali; Plon, Sharon E.; Sijmons, Rolf H.; Sinilnikova, Olga; Spurdle, Amanda B.

    2008-01-01

    Locus-specific databases (LSDBs) are curated collections of sequence variants in genes associated with disease. LSDBs of cancer-related genes often serve as a critical resource to researchers, diagnostic laboratories, clinicians, and others in the cancer genetics community. LSDBs are poised to play

  5. Significant loss of sensitivity and specificity in the taxonomic classification occurs when short 16S rRNA gene sequences are used

    Directory of Open Access Journals (Sweden)

    Marcel Martínez-Porchas

    2016-09-01

    Full Text Available The classification performance of Kraken was evaluated in terms of sensitivity and specificity when using short and long 16S rRNA sequences. A total of 440,738 sequences from bacteria with complete taxonomic classifications were downloaded from the high quality ribosomal RNA database SILVA. Amplicons produced (86,371 sequences; 1450 bp by virtual PCR with primers covering the V1–V9 region of the 16S-rRNA gene were used as reference. Virtual PCŔs of internal fragments V3–V4, V4–V5 and V3–V5 were performed. A total of 81,523, 82,334 and 82,998 amplicons were obtained for regions V3–V4, V4–V5 and V3–V5 respectively. Differences in depth of taxonomic classification were detected among the internal fragments. For instance, sensitivity and specificity of sequences classified up to subspecies level were higher when the largest internal fraction (V3–V5 was used (54.0 and 74.6% respectively, compared to V3–V4 (45.1 and 66.7% and V4–V5 (41.8 and 64.6% fragments. Similar pattern was detected for sequences classified up to more superficial taxonomic categories (i.e. family, order, class…. Results also demonstrate that internal fragments lost specificity and some could be misclassified at the deepest taxonomic levels (i.e. species or subspecies. It is concluded that the larger V3–V5 fragment could be considered for massive high throughput sequencing reducing the loss of sensitivity and sensibility.

  6. Phylogeny and classification of bacteria in the genera Clavibacter and Rathayibacter on the basis of 16s rRNA gene sequence analyses.

    Science.gov (United States)

    Lee, I M; Bartoszyk, I M; Gundersen-Rindal, D E; Davis, R E

    1997-01-01

    A phylogenetic analysis by parsimony of 16S rRNA gene sequences (16S rDNA) revealed that species and subspecies of Clavibacter and Rathayibacter form a discrete monophyletic clade, paraphyletic to Corynebacterium species. Within the Clavibacter-Rathayibacter clade, four major phylogenetic groups (subclades) with a total of 10 distinct taxa were recognized: (I) species C. michiganensis; (II) species C. xyli; (III) species R. iranicus and R. tritici; and (IV) species R. rathayi. The first three groups form a monophyletic cluster, paraphyletic to R. rathayi. On the basis of the phylogeny inferred, reclassification of members of Clavibacter-Rathayibacter group is proposed. A system for classification of taxa in Clavibacter and Rathayibacter was developed based on restriction fragment length polymorphism (RFLP) analysis of the PCR-amplified 16S rDNA sequences. The groups delineated on the basis of RFLP patterns of 16S rDNA coincided well with the subclades delineated on the basis of phylogeny. In contrast to previous classification systems, which are based primarily on phenotypic properties and are laborious, the RFLP analyses allow for rapid differentiation among species and subspecies in the two genera. PMID:9212413

  7. Update on diabetes classification.

    Science.gov (United States)

    Thomas, Celeste C; Philipson, Louis H

    2015-01-01

    This article highlights the difficulties in creating a definitive classification of diabetes mellitus in the absence of a complete understanding of the pathogenesis of the major forms. This brief review shows the evolving nature of the classification of diabetes mellitus. No classification scheme is ideal, and all have some overlap and inconsistencies. The only diabetes in which it is possible to accurately diagnose by DNA sequencing, monogenic diabetes, remains undiagnosed in more than 90% of the individuals who have diabetes caused by one of the known gene mutations. The point of classification, or taxonomy, of disease, should be to give insight into both pathogenesis and treatment. It remains a source of frustration that all schemes of diabetes mellitus continue to fall short of this goal.

  8. Pathogenic classification of LPL gene variants reported to be associated with LPL deficiency

    DEFF Research Database (Denmark)

    Rodrigues, Rute; Artieda, Marta; Tejedor, Diego

    2016-01-01

    BACKGROUND: Lipoprotein lipase (LPL) deficiency is a serious lipid disorder of severe hypertriglyceridemia (SHTG) with chylomicronemia. A large number of variants in the LPL gene have been reported but their influence on LPL activity and SHTG has not been completely analyzed. Gaining insight...

  9. Towards a new classification of the Arthoniales (Ascomycota) based on a three-gene phylogeny focussing on the genus Opegrapha.

    Science.gov (United States)

    Ertz, Damien; Miadlikowska, Jolanta; Lutzoni, François; Dessein, Steven; Raspé, Olivier; Vigneron, Nathalie; Hofstetter, Valérie; Diederich, Paul

    2009-01-01

    A multi-locus phylogenetic study of the order Arthoniales is presented here using the nuclear ribosomal large subunit (nuLSU), the second largest subunit of RNA polymerase II (RPB2) and the mitochondrial ribosomal small subunit (mtSSU). These genes were sequenced from 43 specimens or culture isolates representing 33 species from this order, 16 of which were from the second largest genus, Opegrapha. With the inclusion of sequences from GenBank, ten genera and 35 species are included in this study, representing about 18% of the genera and ca 3% of the species of this order. Our study revealed the homoplastic nature of morphological characters traditionally used to circumscribe genera within the Arthoniales, such as exciple carbonization and ascomatal structure. The genus Opegrapha appears polyphyletic, species of that genus being nested in all the major clades identified within Arthoniales. The transfer of O. atra and O. calcarea to the genus Arthonia will allow this genus and family Arthoniaceae to be recognized as monophyletic. The genus Enterographa was also found to be polyphyletic. Therefore, the following new combinations are needed: Arthonia calcarea (basionym: O. calcarea), and O. anguinella (basionym: Stigmatidium anguinellum); and the use of the names A. atra and Enterographa zonata are proposed here. The simultaneous use of a mitochondrial gene and two nuclear genes led to the detection of what seems to be a case of introgression of a mitochondrion from one species to another (mitochondrion capture; cytoplasmic gene flow) resulting from hybridization.

  10. Analysis of the Olive Fruit Fly Bactrocera oleae Transcriptome and Phylogenetic Classification of the Major Detoxification Gene Families.

    Science.gov (United States)

    Pavlidi, Nena; Dermauw, Wannes; Rombauts, Stephane; Chrysargyris, Antonios; Chrisargiris, Antonis; Van Leeuwen, Thomas; Vontas, John

    2013-01-01

    The olive fruit fly Bactrocera oleae has a unique ability to cope with olive flesh, and is the most destructive pest of olives worldwide. Its control has been largely based on the use of chemical insecticides, however, the selection of insecticide resistance against several insecticides has evolved. The study of detoxification mechanisms, which allow the olive fruit fly to defend against insecticides, and/or phytotoxins possibly present in the mesocarp, has been hampered by the lack of genomic information in this species. In the NCBI database less than 1,000 nucleotide sequences have been deposited, with less than 10 detoxification gene homologues in total. We used 454 pyrosequencing to produce, for the first time, a large transcriptome dataset for B. oleae. A total of 482,790 reads were assembled into 14,204 contigs. More than 60% of those contigs (8,630) were larger than 500 base pairs, and almost half of them matched with genes of the order of the Diptera. Analysis of the Gene Ontology (GO) distribution of unique contigs, suggests that, compared to other insects, the assembly is broadly representative for the B. oleae transcriptome. Furthermore, the transcriptome was found to contain 55 P450, 43 GST-, 15 CCE- and 18 ABC transporter-genes. Several of those detoxification genes, may putatively be involved in the ability of the olive fruit fly to deal with xenobiotics, such as plant phytotoxins and insecticides. In summary, our study has generated new data and genomic resources, which will substantially facilitate molecular studies in B. oleae, including elucidation of detoxification mechanisms of xenobiotic, as well as other important aspects of olive fruit fly biology.

  11. Analysis of the Olive Fruit Fly Bactrocera oleae Transcriptome and Phylogenetic Classification of the Major Detoxification Gene Families.

    Directory of Open Access Journals (Sweden)

    Nena Pavlidi

    Full Text Available The olive fruit fly Bactrocera oleae has a unique ability to cope with olive flesh, and is the most destructive pest of olives worldwide. Its control has been largely based on the use of chemical insecticides, however, the selection of insecticide resistance against several insecticides has evolved. The study of detoxification mechanisms, which allow the olive fruit fly to defend against insecticides, and/or phytotoxins possibly present in the mesocarp, has been hampered by the lack of genomic information in this species. In the NCBI database less than 1,000 nucleotide sequences have been deposited, with less than 10 detoxification gene homologues in total. We used 454 pyrosequencing to produce, for the first time, a large transcriptome dataset for B. oleae. A total of 482,790 reads were assembled into 14,204 contigs. More than 60% of those contigs (8,630 were larger than 500 base pairs, and almost half of them matched with genes of the order of the Diptera. Analysis of the Gene Ontology (GO distribution of unique contigs, suggests that, compared to other insects, the assembly is broadly representative for the B. oleae transcriptome. Furthermore, the transcriptome was found to contain 55 P450, 43 GST-, 15 CCE- and 18 ABC transporter-genes. Several of those detoxification genes, may putatively be involved in the ability of the olive fruit fly to deal with xenobiotics, such as plant phytotoxins and insecticides. In summary, our study has generated new data and genomic resources, which will substantially facilitate molecular studies in B. oleae, including elucidation of detoxification mechanisms of xenobiotic, as well as other important aspects of olive fruit fly biology.

  12. Variations and classification of toxic epitopes related to celiac disease among α-gliadin genes from four Aegilops genomes.

    Science.gov (United States)

    Li, Jie; Wang, Shunli; Li, Shanshan; Ge, Pei; Li, Xiaohui; Ma, Wujun; Zeller, F J; Hsam, Sai L K; Yan, Yueming

    2012-07-01

    The α-gliadins are associated with human celiac disease. A total of 23 noninterrupted full open reading frame α-gliadin genes and 19 pseudogenes were cloned and sequenced from C, M, N, and U genomes of four diploid Aegilops species. Sequence comparison of α-gliadin genes from Aegilops and Triticum species demonstrated an existence of extensive allelic variations in Gli-2 loci of the four Aegilops genomes. Specific structural features were found including the compositions and variations of two polyglutamine domains (QI and QII) and four T cell stimulatory toxic epitopes. The mean numbers of glutamine residues in the QI domain in C and N genomes and the QII domain in C, N, and U genomes were much higher than those in Triticum genomes, and the QI domain in C and N genomes and the QII domain in C, M, N, and U genomes displayed greater length variations. Interestingly, the types and numbers of four T cell stimulatory toxic epitopes in α-gliadins from the four Aegilops genomes were significantly less than those from Triticum A, B, D, and their progenitor genomes. Relationships between the structural variations of the two polyglutamine domains and the distributions of four T cell stimulatory toxic epitopes were found, resulting in the α-gliadin genes from the Aegilops and Triticum genomes to be classified into three groups.

  13. Crystal structure of pyridoxal kinase from the Escherichia coli pdxK gene: implications for the classification of pyridoxal kinases.

    Science.gov (United States)

    Safo, Martin K; Musayev, Faik N; di Salvo, Martino L; Hunt, Sharyn; Claude, Jean-Baptiste; Schirch, Verne

    2006-06-01

    The pdxK and pdxY genes have been found to code for pyridoxal kinases, enzymes involved in the pyridoxal phosphate salvage pathway. Two pyridoxal kinase structures have recently been published, including Escherichia coli pyridoxal kinase 2 (ePL kinase 2) and sheep pyridoxal kinase, products of the pdxY and pdxK genes, respectively. We now report the crystal structure of E. coli pyridoxal kinase 1 (ePL kinase 1), encoded by a pdxK gene, and an isoform of ePL kinase 2. The structures were determined in the unliganded and binary complexes with either MgATP or pyridoxal to 2.1-, 2.6-, and 3.2-A resolutions, respectively. The active site of ePL kinase 1 does not show significant conformational change upon binding of either pyridoxal or MgATP. Like sheep PL kinase, ePL kinase 1 exhibits a sequential random mechanism. Unlike sheep pyridoxal kinase, ePL kinase 1 may not tolerate wide variation in the size and chemical nature of the 4' substituent on the substrate. This is the result of differences in a key residue at position 59 on a loop (loop II) that partially forms the active site. Residue 59, which is His in ePL kinase 1, interacts with the formyl group at C-4' of pyridoxal and may also determine if residues from another loop (loop I) can fill the active site in the absence of the substrate. Both loop I and loop II are suggested to play significant roles in the functions of PL kinases.

  14. Multiplex Real-Time PCR Assay for Detection and Classification of Klebsiella pneumoniae Carbapenemase Gene (blaKPC) Variants▿

    OpenAIRE

    Chen, Liang; Mediavilla, José R.; Endimiani, Andrea; Rosenthal, Marnie E.; Zhao, Yanan; Robert A Bonomo; Kreiswirth, Barry N.

    2011-01-01

    Carbapenem resistance mediated by plasmid-borne Klebsiella pneumoniae carbapenemases (KPC) is an emerging problem of significant clinical importance in Gram-negative bacteria. Multiple KPC gene variants (blaKPC) have been reported, with KPC-2 (blaKPC-2) and KPC-3 (blaKPC-3) associated with epidemic outbreaks in New York City and various international settings. Here, we describe the development of a multiplex real-time PCR assay using molecular beacons (MB-PCR) for rapid and accurate identific...

  15. Identification and classification of all potential hemolysin encoding genes and their products from Leptospira interrogans serogroup Icterohaemorrhagiae serovar Lai

    Institute of Scientific and Technical Information of China (English)

    Yi-xuan ZHANG; Yan GENG; Bo BI; Jian-yong HE; Chun-fu WU; Xiao-kui GUO; Guo-ping ZHAO

    2005-01-01

    Aim: To identify and classify all potential hemolysin candidates of Leptospira interrogans serogroup Icterohaemorrhagiae serovar Lai. Methods: All of thepotential hemolysin encoding genes were characterized in silico. These genes were cloned and expressed in Escherichia coli. The hemolytic activities of the expressed proteins were assayed observing the hemolysis on sheep blood agar plates. Sphingomyelinase activities of the hemolysin candidates were measured by thin-layer chromatography (TLC) and HPLC for sphingomyelin-hydrolysis. Expression and secretion of the hemolysins in L interrogans were studied by reverse transcription polymerase chain reaction, Western blot, and enzyme-linked immunosorbent assays. Results and Conclusion: The hemolytic activities of hemolysin candidates (LA0327, LA0378, LA1027, LA1029, LA1650, LA3050,LA3937, LA4004) from L interrogans strain Lai were confirmed. They were further divided into two groups, sphingomyelinase hemolysins and non-sphingomyelinase hemolysins, based on their ability to hydrolyze sphingomyelin. Most of these hemolysins were actually expressed in living L interrogans and some of them were secreted into the environment. This study establishes an essential and complete basis for further studying the contribution of hemolysins to the pathogenesis of L interrogans.

  16. Feature gene selection for Chinese hamster classification based on support vector machine%基于支持向量机的中国地鼠分类特征基因选取

    Institute of Scientific and Technical Information of China (English)

    杨俊丽; 刘田福

    2011-01-01

    针对中国地鼠基因表达谱数据维数高和样本小的特点,提出一种基于支持向量机(SVM)的分类特征基因选取方法.该方法利用改进的Fisher判别(FDR)基因特征计分准则剔除分类无关基因,提出由空间距离和功能距离组成的新距离作为相似性度量的标准进行冗余基因的剔除,采用SVM作为分类器检验特征基因的分类性能.实验结果表明,该方法有效地剔除了分类无关基因和冗余基因,选取的特征基因满足对中国地鼠正确分类的最小基因数.%Concerning the gene expression profile of Chinese hamster feature, such as high-dimension and small sample,a method of feature selection for Chinese hamster classification based on Support Vector Machine (SVM) was proposed in this paper. The method used improved FDR gene feature score criterion to remove the genes irrelevant to the classification. A new distance composed by space distance and function distance was proposed as the criterion of comparability to remove redundant genes. A SVM was used as classifier to validate the classification performance of the feature genes selected. The experimental results show that this method effectively removes the irrelevant and redundant genes, and selected the feature genes that meet the needs of least feature genes which classify accurately on Chinese hamster.

  17. Phylogeny and classification of the Litostomatea (Protista, Ciliophora), with emphasis on free-living taxa and the 18S rRNA gene.

    Science.gov (United States)

    Vd'ačný, Peter; Bourland, William A; Orsi, William; Epstein, Slava S; Foissner, Wilhelm

    2011-05-01

    The class Litostomatea is a highly diverse ciliate taxon comprising hundreds of species ranging from aerobic, free-living predators to anaerobic endocommensals. This is traditionally reflected by classifying the Litostomatea into the subclasses Haptoria and Trichostomatia. The morphological classifications of the Haptoria conflict with the molecular phylogenies, which indicate polyphyly and numerous homoplasies. Thus, we analyzed the genealogy of 53 in-group species with morphological and molecular methods, including 12 new sequences from free-living taxa. The phylogenetic analyses and some strong morphological traits show: (i) body polarization and simplification of the oral apparatus as main evolutionary trends in the Litostomatea and (ii) three distinct lineages (subclasses): the Rhynchostomatia comprising Tracheliida and Dileptida; the Haptoria comprising Lacrymariida, Haptorida, Didiniida, Pleurostomatida and Spathidiida; and the Trichostomatia. The curious Homalozoon cannot be assigned to any of the haptorian orders, but is basal to a clade containing the Didiniida and Pleurostomatida. The internal relationships of the Spathidiida remain obscure because many of them and some "traditional" haptorids form separate branches within the basal polytomy of the order, indicating one or several radiations and convergent evolution. Due to the high divergence in the 18S rRNA gene, the chaeneids and cyclotrichiids are classified incertae sedis. Copyright © 2011 Elsevier Inc. All rights reserved.

  18. A mixture model with a reference-based automatic selection of components for disease classification from protein and/or gene expression levels

    Directory of Open Access Journals (Sweden)

    Kopriva Ivica

    2011-12-01

    Full Text Available Abstract Background Bioinformatics data analysis is often using linear mixture model representing samples as additive mixture of components. Properly constrained blind matrix factorization methods extract those components using mixture samples only. However, automatic selection of extracted components to be retained for classification analysis remains an open issue. Results The method proposed here is applied to well-studied protein and genomic datasets of ovarian, prostate and colon cancers to extract components for disease prediction. It achieves average sensitivities of: 96.2 (sd = 2.7%, 97.6% (sd = 2.8% and 90.8% (sd = 5.5% and average specificities of: 93.6% (sd = 4.1%, 99% (sd = 2.2% and 79.4% (sd = 9.8% in 100 independent two-fold cross-validations. Conclusions We propose an additive mixture model of a sample for feature extraction using, in principle, sparseness constrained factorization on a sample-by-sample basis. As opposed to that, existing methods factorize complete dataset simultaneously. The sample model is composed of a reference sample representing control and/or case (disease groups and a test sample. Each sample is decomposed into two or more components that are selected automatically (without using label information as control specific, case specific and not differentially expressed (neutral. The number of components is determined by cross-validation. Automatic assignment of features (m/z ratios or genes to particular component is based on thresholds estimated from each sample directly. Due to the locality of decomposition, the strength of the expression of each feature across the samples can vary. Yet, they will still be allocated to the related disease and/or control specific component. Since label information is not used in the selection process, case and control specific components can be used for classification. That is not the case with standard factorization methods. Moreover, the component selected by proposed method

  19. Tissue Classification

    DEFF Research Database (Denmark)

    Van Leemput, Koen; Puonti, Oula

    2015-01-01

    Computational methods for automatically segmenting magnetic resonance images of the brain have seen tremendous advances in recent years. So-called tissue classification techniques, aimed at extracting the three main brain tissue classes (white matter, gray matter, and cerebrospinal fluid), are now...... well established. In their simplest form, these methods classify voxels independently based on their intensity alone, although much more sophisticated models are typically used in practice. This article aims to give an overview of often-used computational techniques for brain tissue classification...

  20. A comprehensive simulation study on classification of RNA-Seq data.

    Science.gov (United States)

    Zararsız, Gökmen; Goksuluk, Dincer; Korkmaz, Selcuk; Eldem, Vahap; Zararsiz, Gozde Erturk; Duru, Izzet Parug; Ozturk, Ahmet

    2017-01-01

    RNA sequencing (RNA-Seq) is a powerful technique for the gene-expression profiling of organisms that uses the capabilities of next-generation sequencing technologies. Developing gene-expression-based classification algorithms is an emerging powerful method for diagnosis, disease classification and monitoring at molecular level, as well as providing potential markers of diseases. Most of the statistical methods proposed for the classification of gene-expression data are either based on a continuous scale (eg. microarray data) or require a normal distribution assumption. Hence, these methods cannot be directly applied to RNA-Seq data since they violate both data structure and distributional assumptions. However, it is possible to apply these algorithms with appropriate modifications to RNA-Seq data. One way is to develop count-based classifiers, such as Poisson linear discriminant analysis and negative binomial linear discriminant analysis. Another way is to bring the data closer to microarrays and apply microarray-based classifiers. In this study, we compared several classifiers including PLDA with and without power transformation, NBLDA, single SVM, bagging SVM (bagSVM), classification and regression trees (CART), and random forests (RF). We also examined the effect of several parameters such as overdispersion, sample size, number of genes, number of classes, differential-expression rate, and the transformation method on model performances. A comprehensive simulation study is conducted and the results are compared with the results of two miRNA and two mRNA experimental datasets. The results revealed that increasing the sample size, differential-expression rate and decreasing the dispersion parameter and number of groups lead to an increase in classification accuracy. Similar with differential-expression studies, the classification of RNA-Seq data requires careful attention when handling data overdispersion. We conclude that, as a count-based classifier, the power

  1. Transporter Classification Database (TCDB)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Transporter Classification Database details a comprehensive classification system for membrane transport proteins known as the Transporter Classification (TC)...

  2. Classification method of gene expression programming based on principle of maximum degree of membership%基于最大隶属度原则的基因表达式编程分类

    Institute of Scientific and Technical Information of China (English)

    柳益君; 朱明放; 习海旭; 朱广萍; 蒋红芬; 陈丹

    2012-01-01

    The paper proposes a classification method of Gene Expression Programming(GEP) based on the principle of maximum degree of membership, which is named MDM-GER Describing fuzziness of classification by membership degree of fuzzy set, the GEP classifier approximating membership function is obtained on training data set. For the instance to be classified, it computes the membership degree of in fuzzy sets, and determines the final class based on the principle of maximum degree of membership. The experiments carried on three datasets from the UCI machine learning repository show that MDM-GEP not only is effective for classification, but also resolves the un-classifiable region problems in the conventional simple GEP classification strategy.%提出了一种基于最大隶属度原则的基因表达式编程(Gene Expression Programming,GEP)分类方法MDM-GEP.引入模糊集合中的隶属度描述分类的模糊性,在训练集上得到逼近各类别隶属函数的GEP分类器.对于待分类实例,计算其在各模糊集中的隶属度,基于最大隶属度的模糊模式识别原则确定最终归属类,并在三个UCI数据集上对该算法进行了实验.实验结果表明,MDM-GEP不仅具有较好的分类性能,而且有效解决了传统的简单GEP分类方法中存在的拒分区域问题.

  3. Predicting tissue-specific expressions based on sequence characteristics

    KAUST Repository

    Paik, Hyojung

    2011-04-30

    In multicellular organisms, including humans, understanding expression specificity at the tissue level is essential for interpreting protein function, such as tissue differentiation. We developed a prediction approach via generated sequence features from overrepresented patterns in housekeeping (HK) and tissue-specific (TS) genes to classify TS expression in humans. Using TS domains and transcriptional factor binding sites (TFBSs), sequence characteristics were used as indices of expressed tissues in a Random Forest algorithm by scoring exclusive patterns considering the biological intuition; TFBSs regulate gene expression, and the domains reflect the functional specificity of a TS gene. Our proposed approach displayed better performance than previous attempts and was validated using computational and experimental methods.

  4. Molecular classification of familial non-BRCA1/BRCA2 breast cancer.

    Science.gov (United States)

    Hedenfalk, Ingrid; Ringner, Markus; Ben-Dor, Amir; Yakhini, Zohar; Chen, Yidong; Chebil, Gunilla; Ach, Robert; Loman, Niklas; Olsson, Håkan; Meltzer, Paul; Borg, Ake; Trent, Jeffrey

    2003-03-01

    In the decade since their discovery, the two major breast cancer susceptibility genes BRCA1 and BRCA2, have been shown conclusively to be involved in a significant fraction of families segregating breast and ovarian cancer. However, it has become equally clear that a large proportion of families segregating breast cancer alone are not caused by mutations in BRCA1 or BRCA2. Unfortunately, despite intensive effort, the identification of additional breast cancer predisposition genes has so far been unsuccessful, presumably because of genetic heterogeneity, low penetrance, or recessive/polygenic mechanisms. These non-BRCA1/2 breast cancer families (termed BRCAx families) comprise a histopathologically heterogeneous group, further supporting their origin from multiple genetic events. Accordingly, the identification of a method to successfully subdivide BRCAx families into recognizable groups could be of considerable value to further genetic analysis. We have previously shown that global gene expression analysis can identify unique and distinct expression profiles in breast tumors from BRCA1 and BRCA2 mutation carriers. Here we show that gene expression profiling can discover novel classes among BRCAx tumors, and differentiate them from BRCA1 and BRCA2 tumors. Moreover, microarray-based comparative genomic hybridization (CGH) to cDNA arrays revealed specific somatic genetic alterations within the BRCAx subgroups. These findings illustrate that, when gene expression-based classifications are used, BRCAx families can be grouped into homogeneous subsets, thereby potentially increasing the power of conventional genetic analysis.

  5. Classification in context

    DEFF Research Database (Denmark)

    Mai, Jens Erik

    2004-01-01

    This paper surveys classification research literature, discusses various classification theories, and shows that the focus has traditionally been on establishing a scientific foundation for classification research. This paper argues that a shift has taken place, and suggests that contemporary...... classification research focus on contextual information as the guide for the design and construction of classification schemes....

  6. Classification in Australia.

    Science.gov (United States)

    McKinlay, John

    Despite some inroads by the Library of Congress Classification and short-lived experimentation with Universal Decimal Classification and Bliss Classification, Dewey Decimal Classification, with its ability in recent editions to be hospitable to local needs, remains the most widely used classification system in Australia. Although supplemented at…

  7. Classification of Sensitivity or Resistance of Cervical Cancers to Ionizing Radiation According to Expression Profiles of 62 Genes Selected by cDNA Microarray Analysis

    Directory of Open Access Journals (Sweden)

    Osamu Kitahara

    2002-01-01

    Full Text Available To identify a set of genes related to radiosensitivity of cervical squamous cell carcinomas and to establish a predictive method, we compared expression profiles of 9 radiosensitive and 10 radioresistant tumors obtained by biopsy before treatment, on a cDNA microarray consisting of 23,040 human genes. We identified 121 genes whose expression was significantly greater in radiosensitive cells than in radioresistant cells, and 50 genes that showed higher levels of expression in radioresistant cells than in radiosensitive cells. Some of these genes had already known to be associated with the radiation response, such as aldehyde dehydrogenase 1 (ALDH1 and X-ray repair cross-complementing 5 (XRCC5 (P<.05, Mann-Whitney test. The validity of the total of 171 genes as radiosensitivity related genes were certified by permutation test (P<.05. Furthermore, we selected 62 genes on the basis of a clustering analysis, and confirmed the validity of these genes with cross-validation test. The cross-validation test also indicates the possibility of making prediction of radiosensitivity for discriminating radiation-sensitive from radiation resistant biopsy samples by predicting score (PS values calculated from expression values of 62 genes in 19 samples, because the prediction successfully and unequivocally discriminated the radiosensitive phenotype from the radioresistant phenotype in our test panel of 19 cervical carcinomas. The extensive list of genes identified in these experiments provides a large body of potentially valuable information for studying the mechanism(s of radiosensitivity, and selected 62 genes opens the possibility of providing appropriate and effective radiotherapy to cancer patients.

  8. Importância da detecção das mutações no gene FLT3 e no gene NPM1 na leucemia mieloide aguda - Classificação da Organização Mundial de Saúde 2008 Importance of detecting FLT3 and NPM1 gene mutations in acute myeloid leukemia -World Health Organization Classification 2008

    Directory of Open Access Journals (Sweden)

    Marley Aparecida Licínio

    2010-01-01

    Full Text Available As leucemias mieloides agudas (LMA constituem um grupo de neoplasias malignas caracterizadas pela proliferação descontrolada de células hematopoéticas, decorrente de mutações que podem ocorrer em diferentes fases da diferenciação de células precursoras mieloides. Em 2008, a Organização Mundial da Saúde (OMS-2008 publicou uma nova classificação para neoplasias do sistema hematopoético e linfoide. De acordo com essa classificação, para um diagnóstico mais preciso e estratificação de prognóstico de pacientes com leucemias mieloides agudas, devem-se pesquisar mutações nos genes FLT3 e NPM1. Sabe-se que a presença de mutações no gene FLT3 é de prognóstico desfavorável e que as mutações no gene NPM1 do tipo A são de prognóstico favorável. Assim, nos países desenvolvidos, a análise das mutações no gene FLT3 e NPM1 tem sido considerada como um fator de prognóstico importante na decisão terapêutica em pacientes com diagnóstico de leucemias mieloides agudas. Considerando essas informações, é de extrema importância a análise das mutações no gene FLT3 (duplicação interna em tandem - DIT - e mutação pontual D835 e no gene NPM1 como marcadores moleculares para o diagnóstico, o prognóstico e a monitoração de doença residual mínima em pacientes com leucemias mieloides agudas.Acute myeloid leukemia (AML is a group of malignancies characterized by uncontrolled proliferation of hematopoietic cells resulting from mutations that occur at different stages in the differentiation of myeloid precursor cells. In 2008, the World Health Organization (WHO-2008 published a new classification for cancers of the hematopoietic and lymphoid system. According to this classification, FLT3 and NPM1 gene mutations should be investigated for a more precise diagnosis and prognostic stratification of AML patients. It is well known that the presence of FLT3 gene mutations is considered an unfavorable prognostic factor and type

  9. Gene selection and cancer type classification of diffuse large-B-cell lymphoma using a bivariate mixture model for two-species data.

    Science.gov (United States)

    Su, Yuhua; Nielsen, Dahlia; Zhu, Lei; Richards, Kristy; Suter, Steven; Breen, Matthew; Motsinger-Reif, Alison; Osborne, Jason

    2013-01-05

    : A bivariate mixture model utilizing information across two species was proposed to solve the fundamental problem of identifying differentially expressed genes in microarray experiments. The model utility was illustrated using a dog and human lymphoma data set prepared by a group of scientists in the College of Veterinary Medicine at North Carolina State University. A small number of genes were identified as being differentially expressed in both species and the human genes in this cluster serve as a good predictor for classifying diffuse large-B-cell lymphoma (DLBCL) patients into two subgroups, the germinal center B-cell-like diffuse large B-cell lymphoma and the activated B-cell-like diffuse large B-cell lymphoma. The number of human genes that were observed to be significantly differentially expressed (21) from the two-species analysis was very small compared to the number of human genes (190) identified with only one-species analysis (human data). The genes may be clinically relevant/important, as this small set achieved low misclassification rates of DLBCL subtypes. Additionally, the two subgroups defined by this cluster of human genes had significantly different survival functions, indicating that the stratification based on gene-expression profiling using the proposed mixture model provided improved insight into the clinical differences between the two cancer subtypes.

  10. Gene

    Data.gov (United States)

    U.S. Department of Health & Human Services — Gene integrates information from a wide range of species. A record may include nomenclature, Reference Sequences (RefSeqs), maps, pathways, variations, phenotypes,...

  11. Spectral Regression and Kernel Space K-Nearest Neighbor for Classification of Gene Expression Data%基于谱回归和核空间最近邻的基因表达数据分类

    Institute of Scientific and Technical Information of China (English)

    于攀; 叶俊勇

    2011-01-01

    Cancer gene expression data is a typical data with high dimension and small sample, identifying it directly will encounter the curse of dimensionality,so needs dimensions reduction. This paper proposes a kind of classification approach based on Spectral Regression (SR)analysis and Kernel space K-Nearest Neighbor(KKNN) classifier for gene expression data.it gets the projection matrix through Spectral Regression Analysis witch can extract effectively discriminative characteristics of low dimensions, and reduces the dimensionality of gene expression data by projection matrix, then identifies the low-dimensional data reduced with the Kernel Space K-Nearest Neighbor Classifier. As the experiments operated on the cancer datasets Prostate. Tumor and 4. Tumors demonstrate the effectiveness of the proposed algorithm; simultaneously,compared with the K-Nearest Neighbor(KNN) classification approach,The Kernel space K-Nearest Neighbor has a better classification result.%肿瘤基因表达数据是典型的高维小样本数据,直接对其进行识别存在维数灾难,需要对数据进行维数约简.提出了一种基于谱回归分析和核空间最近邻分类器的基因表达数据分类方法,采用谱回归分析得到可有效提取低维鉴别特征的投影矩阵,然后通过投影矩阵对基因表达数据进行维数约简,得到的低维数据用核空间最近邻分类器进行识别.通过在Prostate_Tumor,4_ Tumors两种肿瘤数据集上的实验,证明了该方法的有效性;同时证明了核空间最近邻具有比最近邻更好的分类效果.

  12. Genome-wide identification, classification and analysis of HD-ZIP gene family in citrus, and its potential roles in somatic embryogenesis regulation.

    Science.gov (United States)

    Ge, Xiao-Xia; Liu, Zheng; Wu, Xiao-Meng; Chai, Li-Jun; Guo, Wen-Wu

    2015-12-10

    The homeodomain-leucine zipper (HD-Zip) transcription factors, which belong to a class of Homeobox proteins, has been reported to be involved in different biological processes of plants, including growth and development, photomorphogenesis, flowering, fruit ripening and adaptation responses to environmental stresses. In this study, 27 HD-Zip genes (CsHBs) were identified in Citrus. Based on the phylogenetic analysis and characteristics of individual gene or protein, the HD-Zip gene family in Citrus can be classified into 4 subfamilies, i.e. HD-Zip I, HD-Zip II, HD-Zip III, and HD-Zip IV containing 16, 2, 4, and 5 members respectively. The digital expression patterns of 27 HD-Zip genes were analyzed in the callus, flower, leaf and fruit of Citrus sinensis. The qRT-PCR and RT-PCR analyses of six selected HD-Zip genes were performed in six citrus cultivars with different embryogenic competence and in the embryo induction stages, which revealed that these genes were differentially expressed and might be involved in citrus somatic embryogenesis (SE). The results exhibited that the expression of CsHB1 was up-regulated in somatic embryo induction process, and its expression was higher in citrus cultivars with high embryogenic capacity than in cultivars recalcitrant to form somatic embryos. Moreover, a microsatellite site of three nucleotide repeats was found in CsHB1 gene among eighteen citrus genotypes, indicating the possible association of CsHB1 gene to the capacity of callus induction. Copyright © 2015 Elsevier B.V. All rights reserved.

  13. Identification and Classification of bcl Genes and Proteins of Bacillus cereus Group Organisms and Their Application in Bacillus anthracis Detection and Fingerprinting▿ †

    OpenAIRE

    Leski, Tomasz A.; Caswell, Clayton C.; Pawlowski, Marcin; Klinke, David J.; Bujnicki, Janusz M.; Hart, Sean J.; Lukomski, Slawomir

    2009-01-01

    The Bacillus cereus group includes three closely related species, B. anthracis, B. cereus, and B. thuringiensis, which form a highly homogeneous subdivision of the genus Bacillus. One of these species, B. anthracis, has been identified as one of the most probable bacterial biowarfare agents. Here, we evaluate the sequence and length polymorphisms of the Bacillus collagen-like protein bcl genes as a basis for B. anthracis detection and fingerprinting. Five genes, designated bclA to bclE, are p...

  14. Classification of the web

    DEFF Research Database (Denmark)

    Mai, Jens Erik

    2004-01-01

    This paper discusses the challenges faced by investigations into the classification of the Web and outlines inquiries that are needed to use principles for bibliographic classification to construct classifications of the Web. This paper suggests that the classification of the Web meets challenges...

  15. Inferring Meta-covariates in Classification

    Science.gov (United States)

    Harris, Keith; McMillan, Lisa; Girolami, Mark

    This paper develops an alternative method for gene selection that combines model based clustering and binary classification. By averaging the covariates within the clusters obtained from model based clustering, we define “meta-covariates” and use them to build a probit regression model, thereby selecting clusters of similarly behaving genes, aiding interpretation. This simultaneous learning task is accomplished by an EM algorithm that optimises a single likelihood function which rewards good performance at both classification and clustering. We explore the performance of our methodology on a well known leukaemia dataset and use the Gene Ontology to interpret our results.

  16. A Classification Method Based on Manifold Learning for Gene Microarray Data%基于流形学习的基因微阵列数据分类方法

    Institute of Scientific and Technical Information of China (English)

    李强; 石陆魁; 刘恩海; 王歌

    2012-01-01

    Each sample in gene microarray data contains thousands or even tens of thousands of genes. It is necessary to reduce the dimension of the data before classifying them for obtaining better classified results. Manifold learning, as a nonlinear dimension reduction method, can discover the intrinsic laws hidden in the high dimensional data and has been widely applied in areas such as pattern recognition. A model combining manifold learning with classified algorithms was proposed to classify microarray data. In the model, the dimension of microarray data was firstly reduced with some manifold learning method. Then the data reduced the dimension were classified. In experiments, several manifold learning algorithms including LLE, ISOMAP, LE and LTSA are combined with three classified methods. And the results are compared with those from directly classifying high dimensional data. Experiments showed that the classification accuracy was great improved with the proposed model. Moreover, the execute efficiency of classification algorithms was also greatly increased.%提出了一种结合流形学习方法与分类算法的基因微阵列数据分类模型,先用流形学习算法对基因微阵列数据进行降维处理,然后再对降维后的数据进行分类.在实验中将流形学习算法LLE、ISOMAP、LE和LTSA与三种分类算法相结合,并与直接用高维数据进行分类的结果进行了比较,实验结果表明所提出的模型极大地提高了分类精度,同时也提高了分类算法的执行效率.

  17. Random frog: an efficient reversible jump Markov Chain Monte Carlo-like approach for variable selection with applications to gene selection and disease classification.

    Science.gov (United States)

    Li, Hong-Dong; Xu, Qing-Song; Liang, Yi-Zeng

    2012-08-31

    The identification of disease-relevant genes represents a challenge in microarray-based disease diagnosis where the sample size is often limited. Among established methods, reversible jump Markov Chain Monte Carlo (RJMCMC) methods have proven to be quite promising for variable selection. However, the design and application of an RJMCMC algorithm requires, for example, special criteria for prior distributions. Also, the simulation from joint posterior distributions of models is computationally extensive, and may even be mathematically intractable. These disadvantages may limit the applications of RJMCMC algorithms. Therefore, the development of algorithms that possess the advantages of RJMCMC methods and are also efficient and easy to follow for selecting disease-associated genes is required. Here we report a RJMCMC-like method, called random frog that possesses the advantages of RJMCMC methods and is much easier to implement. Using the colon and the estrogen gene expression datasets, we show that random frog is effective in identifying discriminating genes. The top 2 ranked genes for colon and estrogen are Z50753, U00968, and Y10871_at, Z22536_at, respectively. (The source codes with GNU General Public License Version 2.0 are freely available to non-commercial users at: http://code.google.com/p/randomfrog/.).

  18. Accurate molecular classification of cancer using simple rules

    Directory of Open Access Journals (Sweden)

    Gotoh Osamu

    2009-10-01

    Full Text Available Abstract Background One intractable problem with using microarray data analysis for cancer classification is how to reduce the extremely high-dimensionality gene feature data to remove the effects of noise. Feature selection is often used to address this problem by selecting informative genes from among thousands or tens of thousands of genes. However, most of the existing methods of microarray-based cancer classification utilize too many genes to achieve accurate classification, which often hampers the interpretability of the models. For a better understanding of the classification results, it is desirable to develop simpler rule-based models with as few marker genes as possible. Methods We screened a small number of informative single genes and gene pairs on the basis of their depended degrees proposed in rough sets. Applying the decision rules induced by the selected genes or gene pairs, we constructed cancer classifiers. We tested the efficacy of the classifiers by leave-one-out cross-validation (LOOCV of training sets and classification of independent test sets. Results We applied our methods to five cancerous gene expression datasets: leukemia (acute lymphoblastic leukemia [ALL] vs. acute myeloid leukemia [AML], lung cancer, prostate cancer, breast cancer, and leukemia (ALL vs. mixed-lineage leukemia [MLL] vs. AML. Accurate classification outcomes were obtained by utilizing just one or two genes. Some genes that correlated closely with the pathogenesis of relevant cancers were identified. In terms of both classification performance and algorithm simplicity, our approach outperformed or at least matched existing methods. Conclusion In cancerous gene expression datasets, a small number of genes, even one or two if selected correctly, is capable of achieving an ideal cancer classification effect. This finding also means that very simple rules may perform well for cancerous class prediction.

  19. Cluster Based Text Classification Model

    DEFF Research Database (Denmark)

    2011-01-01

    We propose a cluster based classification model for suspicious email detection and other text classification tasks. The text classification tasks comprise many training examples that require a complex classification model. Using clusters for classification makes the model simpler and increases th...... datasets. Our model also outperforms A Decision Cluster Classification (ADCC) and the Decision Cluster Forest Classification (DCFC) models on the Reuters-21578 dataset....

  20. Classification of cultivated plants.

    NARCIS (Netherlands)

    Brandenburg, W.A.

    1986-01-01

    Agricultural practice demands principles for classification, starting from the basal entity in cultivated plants: the cultivar. In establishing biosystematic relationships between wild, weedy and cultivated plants, the species concept needs re-examination. Combining of botanic classification, based

  1. Genic insights from integrated human proteomics in GeneCards.

    Science.gov (United States)

    Fishilevich, Simon; Zimmerman, Shahar; Kohn, Asher; Iny Stein, Tsippi; Olender, Tsviya; Kolker, Eugene; Safran, Marilyn; Lancet, Doron

    2016-01-01

    GeneCards is a one-stop shop for searchable human gene annotations (http://www.genecards.org/). Data are automatically mined from ∼120 sources and presented in an integrated web card for every human gene. We report the application of recent advances in proteomics to enhance gene annotation and classification in GeneCards. First, we constructed the Human Integrated Protein Expression Database (HIPED), a unified database of protein abundance in human tissues, based on the publically available mass spectrometry (MS)-based proteomics sources ProteomicsDB, Multi-Omics Profiling Expression Database, Protein Abundance Across Organisms and The MaxQuant DataBase. The integrated database, residing within GeneCards, compares favourably with its individual sources, covering nearly 90% of human protein-coding genes. For gene annotation and comparisons, we first defined a protein expression vector for each gene, based on normalized abundances in 69 normal human tissues. This vector is portrayed in the GeneCards expression section as a bar graph, allowing visual inspection and comparison. These data are juxtaposed with transcriptome bar graphs. Using the protein expression vectors, we further defined a pairwise metric that helps assess expression-based pairwise proximity. This new metric for finding functional partners complements eight others, including sharing of pathways, gene ontology (GO) terms and domains, implemented in the GeneCards Suite. In parallel, we calculated proteome-based differential expression, highlighting a subset of tissues that overexpress a gene and subserving gene classification. This textual annotation allows users of VarElect, the suite's next-generation phenotyper, to more effectively discover causative disease variants. Finally, we define the protein-RNA expression ratio and correlation as yet another attribute of every gene in each tissue, adding further annotative information. The results constitute a significant enhancement of several Gene

  2. Classification of refrigerants; Classification des fluides frigorigenes

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    2001-07-01

    This document was made from the US standard ANSI/ASHRAE 34 published in 2001 and entitled 'designation and safety classification of refrigerants'. This classification allows to clearly organize in an international way the overall refrigerants used in the world thanks to a codification of the refrigerants in correspondence with their chemical composition. This note explains this codification: prefix, suffixes (hydrocarbons and derived fluids, azeotropic and non-azeotropic mixtures, various organic compounds, non-organic compounds), safety classification (toxicity, flammability, case of mixtures). (J.S.)

  3. Classification, disease, and diagnosis.

    Science.gov (United States)

    Jutel, Annemarie

    2011-01-01

    Classification shapes medicine and guides its practice. Understanding classification must be part of the quest to better understand the social context and implications of diagnosis. Classifications are part of the human work that provides a foundation for the recognition and study of illness: deciding how the vast expanse of nature can be partitioned into meaningful chunks, stabilizing and structuring what is otherwise disordered. This article explores the aims of classification, their embodiment in medical diagnosis, and the historical traditions of medical classification. It provides a brief overview of the aims and principles of classification and their relevance to contemporary medicine. It also demonstrates how classifications operate as social framing devices that enable and disable communication, assert and refute authority, and are important items for sociological study.

  4. Security classification of information

    Energy Technology Data Exchange (ETDEWEB)

    Quist, A.S.

    1993-04-01

    This document is the second of a planned four-volume work that comprehensively discusses the security classification of information. The main focus of Volume 2 is on the principles for classification of information. Included herein are descriptions of the two major types of information that governments classify for national security reasons (subjective and objective information), guidance to use when determining whether information under consideration for classification is controlled by the government (a necessary requirement for classification to be effective), information disclosure risks and benefits (the benefits and costs of classification), standards to use when balancing information disclosure risks and benefits, guidance for assigning classification levels (Top Secret, Secret, or Confidential) to classified information, guidance for determining how long information should be classified (classification duration), classification of associations of information, classification of compilations of information, and principles for declassifying and downgrading information. Rules or principles of certain areas of our legal system (e.g., trade secret law) are sometimes mentioned to .provide added support to some of those classification principles.

  5. Security classification of information

    Energy Technology Data Exchange (ETDEWEB)

    Quist, A.S.

    1989-09-01

    Certain governmental information must be classified for national security reasons. However, the national security benefits from classifying information are usually accompanied by significant costs -- those due to a citizenry not fully informed on governmental activities, the extra costs of operating classified programs and procuring classified materials (e.g., weapons), the losses to our nation when advances made in classified programs cannot be utilized in unclassified programs. The goal of a classification system should be to clearly identify that information which must be protected for national security reasons and to ensure that information not needing such protection is not classified. This document was prepared to help attain that goal. This document is the first of a planned four-volume work that comprehensively discusses the security classification of information. Volume 1 broadly describes the need for classification, the basis for classification, and the history of classification in the United States from colonial times until World War 2. Classification of information since World War 2, under Executive Orders and the Atomic Energy Acts of 1946 and 1954, is discussed in more detail, with particular emphasis on the classification of atomic energy information. Adverse impacts of classification are also described. Subsequent volumes will discuss classification principles, classification management, and the control of certain unclassified scientific and technical information. 340 refs., 6 tabs.

  6. An Optimization-Based Framework for the Transformation of Incomplete Biological Knowledge into a Probabilistic Structure and Its Application to the Utilization of Gene/Protein Signaling Pathways in Discrete Phenotype Classification.

    Science.gov (United States)

    Esfahani, Mohammad Shahrokh; Dougherty, Edward R

    2015-01-01

    Phenotype classification via genomic data is hampered by small sample sizes that negatively impact classifier design. Utilization of prior biological knowledge in conjunction with training data can improve both classifier design and error estimation via the construction of the optimal Bayesian classifier. In the genomic setting, gene/protein signaling pathways provide a key source of biological knowledge. Although these pathways are neither complete, nor regulatory, with no timing associated with them, they are capable of constraining the set of possible models representing the underlying interaction between molecules. The aim of this paper is to provide a framework and the mathematical tools to transform signaling pathways to prior probabilities governing uncertainty classes of feature-label distributions used in classifier design. Structural motifs extracted from the signaling pathways are mapped to a set of constraints on a prior probability on a Multinomial distribution. Being the conjugate prior for the Multinomial distribution, we propose optimization paradigms to estimate the parameters of a Dirichlet distribution in the Bayesian setting. The performance of the proposed methods is tested on two widely studied pathways: mammalian cell cycle and a p53 pathway model.

  7. Ontologies vs. Classification Systems

    DEFF Research Database (Denmark)

    Madsen, Bodil Nistrup; Erdman Thomsen, Hanne

    2009-01-01

    What is an ontology compared to a classification system? Is a taxonomy a kind of classification system or a kind of ontology? These are questions that we meet when working with people from industry and public authorities, who need methods and tools for concept clarification, for developing meta d...... classification systems and meta data taxonomies, should be based on ontologies.......What is an ontology compared to a classification system? Is a taxonomy a kind of classification system or a kind of ontology? These are questions that we meet when working with people from industry and public authorities, who need methods and tools for concept clarification, for developing meta...... data sets or for obtaining advanced search facilities. In this paper we will present an attempt at answering these questions. We will give a presentation of various types of ontologies and briefly introduce terminological ontologies. Furthermore we will argue that classification systems, e.g. product...

  8. Data Mining Algorithms for Classification of Complex Biomedical Data

    Science.gov (United States)

    Lan, Liang

    2012-01-01

    In my dissertation, I will present my research which contributes to solve the following three open problems from biomedical informatics: (1) Multi-task approaches for microarray classification; (2) Multi-label classification of gene and protein prediction from multi-source biological data; (3) Spatial scan for movement data. In microarray…

  9. Classification of Spreadsheet Errors

    OpenAIRE

    Rajalingham, Kamalasen; Chadwick, David R.; Knight, Brian

    2008-01-01

    This paper describes a framework for a systematic classification of spreadsheet errors. This classification or taxonomy of errors is aimed at facilitating analysis and comprehension of the different types of spreadsheet errors. The taxonomy is an outcome of an investigation of the widespread problem of spreadsheet errors and an analysis of specific types of these errors. This paper contains a description of the various elements and categories of the classification and is supported by appropri...

  10. Information gathering for CLP classification

    OpenAIRE

    Ida Marcello; Felice Giordano; Francesca Marina Costamagna

    2011-01-01

    Regulation 1272/2008 includes provisions for two types of classification: harmonised classification and self-classification. The harmonised classification of substances is decided at Community level and a list of harmonised classifications is included in the Annex VI of the classification, labelling and packaging Regulation (CLP). If a chemical substance is not included in the harmonised classification list it must be self-classified, based on available information, according to the requireme...

  11. Una representación multi-características de imágenes basada en kernels para clasificación de imágenes de histopatología A KERNEL-BASED MULTI-FEATURE IMAGE REPRESENTATION FOR HISTOPATHOLOGY IMAGE CLASSIFICATION

    Directory of Open Access Journals (Sweden)

    J MORENO

    Full Text Available This paper presents a novel strategy for building a high-dimensional feature space to represent histopathology image contents. Histogram features, related to colors, textures and edges, are combined together in a unique image representation space using kernel functions. This feature space is further enhanced by the application of Latent Semantic Analysis, to model hidden relationships among visual patterns. All that information is included in the new image representation space. Then, Support Vector Machine classifiers are used to assign semantic labels to images. Processing and classification algorithms operate on top of kernel functions, so that, the structure of the feature space is completely controlled using similarity measures and a dual representation. The proposed approach has shown a successful performance in a classification task using a dataset with 1,502 real histopathology images in 18 different classes. The results show that our approach for histological image classification obtains an improved average performance of 20.6% when compared to a conventional classification approach based on SVM directly applied to the original kernel.Este trabajo presenta una estrategia nueva para la construcción de un espacio de características de gran dimensionalidad para la representación del contenido de imágenes de histopatología. Histogramas de características, relacionados con colores, texturas y bordes, son combinados para obtener una única representación de la imagen utilizando funciones de kernels. Este espacio de características es mejorado mediante la aplicación de Análisis de Semántica Latente, para modelar relaciones ocultas entre los patrones visuales. Esta información es incluida en la representación de la imagen en el nuevo espacio. Luego, un clasificador de Máquinas de Vectores de Soporte es utilizado para asignar etiquetas semánticas a las imágenes. Algoritmos de procesamiento y de clasificación son utilizados en las

  12. Concepts of Classification and Taxonomy. Phylogenetic Classification

    CERN Document Server

    Fraix-Burnet, Didier

    2016-01-01

    Phylogenetic approaches to classification have been heavily developed in biology by bioinformaticians. But these techniques have applications in other fields, in particular in linguistics. Their main characteristics is to search for relationships between the objects or species in study, instead of grouping them by similarity. They are thus rather well suited for any kind of evolutionary objects. For nearly fifteen years, astrocladistics has explored the use of Maximum Parsimony (or cladistics) for astronomical objects like galaxies or globular clusters. In this lesson we will learn how it works. 1 Why phylogenetic tools in astrophysics? 1.1 History of classification The need for classifying living organisms is very ancient, and the first classification system can be dated back to the Greeks. The goal was very practical since it was intended to distinguish between eatable and toxic aliments, or kind and dangerous animals. Simple resemblance was used and has been used for centuries. Basically, until the XVIIIth...

  13. Library Classification 2020

    Science.gov (United States)

    Harris, Christopher

    2013-01-01

    In this article the author explores how a new library classification system might be designed using some aspects of the Dewey Decimal Classification (DDC) and ideas from other systems to create something that works for school libraries in the year 2020. By examining what works well with the Dewey Decimal System, what features should be carried…

  14. Multiple sparse representations classification

    NARCIS (Netherlands)

    E. Plenge (Esben); S.K. Klein (Stefan); W.J. Niessen (Wiro); E. Meijering (Erik)

    2015-01-01

    textabstractSparse representations classification (SRC) is a powerful technique for pixelwise classification of images and it is increasingly being used for a wide variety of image analysis tasks. The method uses sparse representation and learned redundant dictionaries to classify image pixels. In t

  15. Library Classification 2020

    Science.gov (United States)

    Harris, Christopher

    2013-01-01

    In this article the author explores how a new library classification system might be designed using some aspects of the Dewey Decimal Classification (DDC) and ideas from other systems to create something that works for school libraries in the year 2020. By examining what works well with the Dewey Decimal System, what features should be carried…

  16. Classifier in Age classification

    Directory of Open Access Journals (Sweden)

    B. Santhi

    2012-12-01

    Full Text Available Face is the important feature of the human beings. We can derive various properties of a human by analyzing the face. The objective of the study is to design a classifier for age using facial images. Age classification is essential in many applications like crime detection, employment and face detection. The proposed algorithm contains four phases: preprocessing, feature extraction, feature selection and classification. The classification employs two class labels namely child and Old. This study addresses the limitations in the existing classifiers, as it uses the Grey Level Co-occurrence Matrix (GLCM for feature extraction and Support Vector Machine (SVM for classification. This improves the accuracy of the classification as it outperforms the existing methods.

  17. Kappa Coefficients for Circular Classifications

    NARCIS (Netherlands)

    Warrens, Matthijs J.; Pratiwi, Bunga C.

    2016-01-01

    Circular classifications are classification scales with categories that exhibit a certain periodicity. Since linear scales have endpoints, the standard weighted kappas used for linear scales are not appropriate for analyzing agreement between two circular classifications. A family of kappa coefficie

  18. Analyzing Members' Motivations to Participate in Role-Playing and Self-Expression Based Virtual Communities

    Science.gov (United States)

    Lee, Young Eun; Saharia, Aditya

    With the rapid growth of computer mediated communication technologies in the last two decades, various types of virtual communities have emerged. Some communities provide a role playing arena, enabled by avatars, while others provide an arena for expressing and promoting detailed personal profiles to enhance their offline social networks. Due to different focus of these virtual communities, different factors motivate members to participate in these communities. In this study, we examine differences in members’ motivations to participate in role-playing versus self-expression based virtual communities. To achieve this goal, we apply the Wang and Fesenmaier (2004) framework, which explains members’ participation in terms of their functional, social, psychological, and hedonic needs. The primary contributions of this study are two folds: First, it demonstrates differences between role-playing and self-expression based communities. Second, it provides a comprehensive framework describing members’ motivation to participate in virtual communities.

  19. voomDDA: discovery of diagnostic biomarkers and classification of RNA-seq data

    Directory of Open Access Journals (Sweden)

    Gokmen Zararsiz

    2017-10-01

    Full Text Available RNA-Seq is a recent and efficient technique that uses the capabilities of next-generation sequencing technology for characterizing and quantifying transcriptomes. One important task using gene-expression data is to identify a small subset of genes that can be used to build diagnostic classifiers particularly for cancer diseases. Microarray based classifiers are not directly applicable to RNA-Seq data due to its discrete nature. Overdispersion is another problem that requires careful modeling of mean and variance relationship of the RNA-Seq data. In this study, we present voomDDA classifiers: variance modeling at the observational level (voom extensions of the nearest shrunken centroids (NSC and the diagonal discriminant classifiers. VoomNSC is one of these classifiers and brings voom and NSC approaches together for the purpose of gene-expression based classification. For this purpose, we propose weighted statistics and put these weighted statistics into the NSC algorithm. The VoomNSC is a sparse classifier that models the mean-variance relationship using the voom method and incorporates voom’s precision weights into the NSC classifier via weighted statistics. A comprehensive simulation study was designed and four real datasets are used for performance assessment. The overall results indicate that voomNSC performs as the sparsest classifier. It also provides the most accurate results together with power-transformed Poisson linear discriminant analysis, rlog transformed support vector machines and random forests algorithms. In addition to prediction purposes, the voomNSC classifier can be used to identify the potential diagnostic biomarkers for a condition of interest. Through this work, statistical learning methods proposed for microarrays can be reused for RNA-Seq data. An interactive web application is freely available at http://www.biosoft.hacettepe.edu.tr/voomDDA/.

  20. 78 FR 54970 - Cotton Futures Classification: Optional Classification Procedure

    Science.gov (United States)

    2013-09-09

    ... process in March 2012 (77 FR 5379). When verified by a futures classification, Smith-Doxey data serves as... Classification: Optional Classification Procedure AGENCY: Agricultural Marketing Service, USDA. ACTION: Proposed... for the addition of an optional cotton futures classification procedure--identified and known...

  1. Pitch Based Sound Classification

    DEFF Research Database (Denmark)

    Nielsen, Andreas Brinch; Hansen, Lars Kai; Kjems, U

    2006-01-01

    A sound classification model is presented that can classify signals into music, noise and speech. The model extracts the pitch of the signal using the harmonic product spectrum. Based on the pitch estimate and a pitch error measure, features are created and used in a probabilistic model with soft......-max output function. Both linear and quadratic inputs are used. The model is trained on 2 hours of sound and tested on publicly available data. A test classification error below 0.05 with 1 s classification windows is achieved. Further more it is shown that linear input performs as well as a quadratic......, and that even though classification gets marginally better, not much is achieved by increasing the window size beyond 1 s....

  2. Learning Apache Mahout classification

    CERN Document Server

    Gupta, Ashish

    2015-01-01

    If you are a data scientist who has some experience with the Hadoop ecosystem and machine learning methods and want to try out classification on large datasets using Mahout, this book is ideal for you. Knowledge of Java is essential.

  3. [Classification of cardiomyopathy].

    Science.gov (United States)

    Asakura, Masanori; Kitakaze, Masafumi

    2014-01-01

    Cardiomyopathy is a group of cardiovascular diseases with poor prognosis. Some patients with dilated cardiomyopathy need heart transplantations due to severe heart failure. Some patients with hypertrophic cardiomyopathy die unexpectedly due to malignant ventricular arrhythmias. Various phenotypes of cardiomyopathies are due to the heterogeneous group of diseases. The classification of cardiomyopathies is important and indispensable in the clinical situation. However, their classification has not been established, because the causes of cardiomyopathies have not been fully elucidated. We usually use definition and classification offered by WHO/ISFC task force in 1995. Recently, several new definitions and classifications of the cardiomyopathies have been published by American Heart Association, European Society of Cardiology and Japanese Circulation Society.

  4. Carbohydrate terminology and classification

    National Research Council Canada - National Science Library

    Cummings, J H; Stephen, A M

    2007-01-01

    ...) and polysaccharides (DP> or =10). Within this classification, a number of terms are used such as mono- and disaccharides, polyols, oligosaccharides, starch, modified starch, non-starch polysaccharides, total carbohydrate, sugars, etc...

  5. Expected Classification Accuracy

    Directory of Open Access Journals (Sweden)

    Lawrence M. Rudner

    2005-08-01

    Full Text Available Every time we make a classification based on a test score, we should expect some number..of misclassifications. Some examinees whose true ability is within a score range will have..observed scores outside of that range. A procedure for providing a classification table of..true and expected scores is developed for polytomously scored items under item response..theory and applied to state assessment data. A simplified procedure for estimating the..table entries is also presented.

  6. Completion of the classification

    CERN Document Server

    Strade, Helmut

    2012-01-01

    This is the last of three volumes about ""Simple Lie Algebras over Fields of Positive Characteristic""by Helmut Strade, presenting the state of the art of the structure and classification of Lie algebras over fields of positive characteristic. In this monograph the proof of the Classification Theorem presented in the first volumeis concluded.Itcollects all the important results on the topic whichcan be found only in scatteredscientific literaturso far.

  7. Twitter content classification

    OpenAIRE

    2010-01-01

    This paper delivers a new Twitter content classification framework based sixteen existing Twitter studies and a grounded theory analysis of a personal Twitter history. It expands the existing understanding of Twitter as a multifunction tool for personal, profession, commercial and phatic communications with a split level classification scheme that offers broad categorization and specific sub categories for deeper insight into the real world application of the service.

  8. Combined genetic and splicing analysis of BRCA1 c.[594-2A>C; 641A>G] highlights the relevance of naturally occurring in-frame transcripts for developing disease gene variant classification algorithms.

    Science.gov (United States)

    de la Hoya, Miguel; Soukarieh, Omar; López-Perolio, Irene; Vega, Ana; Walker, Logan C; van Ierland, Yvette; Baralle, Diana; Santamariña, Marta; Lattimore, Vanessa; Wijnen, Juul; Whiley, Philip; Blanco, Ana; Raponi, Michela; Hauke, Jan; Wappenschmidt, Barbara; Becker, Alexandra; Hansen, Thomas V O; Behar, Raquel; Investigators, KConFaB; Niederacher, Diether; Arnold, Norbert; Dworniczak, Bernd; Steinemann, Doris; Faust, Ulrike; Rubinstein, Wendy; Hulick, Peter J; Houdayer, Claude; Caputo, Sandrine M; Castera, Laurent; Pesaran, Tina; Chao, Elizabeth; Brewer, Carole; Southey, Melissa C; van Asperen, Christi J; Singer, Christian F; Sullivan, Jan; Poplawski, Nicola; Mai, Phuong; Peto, Julian; Johnson, Nichola; Burwinkel, Barbara; Surowy, Harald; Bojesen, Stig E; Flyger, Henrik; Lindblom, Annika; Margolin, Sara; Chang-Claude, Jenny; Rudolph, Anja; Radice, Paolo; Galastri, Laura; Olson, Janet E; Hallberg, Emily; Giles, Graham G; Milne, Roger L; Andrulis, Irene L; Glendon, Gord; Hall, Per; Czene, Kamila; Blows, Fiona; Shah, Mitul; Wang, Qin; Dennis, Joe; Michailidou, Kyriaki; McGuffog, Lesley; Bolla, Manjeet K; Antoniou, Antonis C; Easton, Douglas F; Couch, Fergus J; Tavtigian, Sean; Vreeswijk, Maaike P; Parsons, Michael; Meeks, Huong D; Martins, Alexandra; Goldgar, David E; Spurdle, Amanda B

    2016-06-01

    A recent analysis using family history weighting and co-observation classification modeling indicated that BRCA1 c.594-2A > C (IVS9-2A > C), previously described to cause exon 10 skipping (a truncating alteration), displays characteristics inconsistent with those of a high risk pathogenic BRCA1 variant. We used large-scale genetic and clinical resources from the ENIGMA, CIMBA and BCAC consortia to assess pathogenicity of c.594-2A > C. The combined odds for causality considering case-control, segregation and breast tumor pathology information was 3.23 × 10(-8) Our data indicate that c.594-2A > C is always in cis with c.641A > G. The spliceogenic effect of c.[594-2A > C;641A > G] was characterized using RNA analysis of human samples and splicing minigenes. As expected, c.[594-2A > C; 641A > G] caused exon 10 skipping, albeit not due to c.594-2A > C impairing the acceptor site but rather by c.641A > G modifying exon 10 splicing regulatory element(s). Multiple blood-based RNA assays indicated that the variant allele did not produce detectable levels of full-length transcripts, with a per allele BRCA1 expression profile composed of ≈70-80% truncating transcripts, and ≈20-30% of in-frame Δ9,10 transcripts predicted to encode a BRCA1 protein with tumor suppression function.We confirm that BRCA1c.[594-2A > C;641A > G] should not be considered a high-risk pathogenic variant. Importantly, results from our detailed mRNA analysis suggest that BRCA-associated cancer risk is likely not markedly increased for individuals who carry a truncating variant in BRCA1 exons 9 or 10, or any other BRCA1 allele that permits 20-30% of tumor suppressor function. More generally, our findings highlight the importance of assessing naturally occurring alternative splicing for clinical evaluation of variants in disease-causing genes. © The Author 2016. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  9. MULTILABEL CLASSIFICATION OF DOCUMENTS WITH MAPREDUCE

    Directory of Open Access Journals (Sweden)

    P.Malarvizhi

    2013-04-01

    Full Text Available Multilabel classification is the problem of assigning a set of positive labels to an instance and recently it is highly required in applications like protein function classification, music categorization, gene classification and document classification for easy identification and retrieving of information. Labeling the documents of the web manually is a time consuming and a difficult task due to the size of the web which is a huge information resource and to overcome this difficulty, we propose an algorithm of MapReduce for classifying labels to the documents of the web. MapReduce is a framework of parallel programming model with the functions map and reduce and meets a number of varieties of applications. In our approach, the documents of the web are given to the MapReduce framework and the MapReduce framework assigns the set of positive labels to the documents of the web using binary classification ofbinary classifier. On experimentation, our proposed approach satisfactorily classifies the labels to the documents of the web.

  10. Concepts of Classification and Taxonomy Phylogenetic Classification

    Science.gov (United States)

    Fraix-Burnet, D.

    2016-05-01

    Phylogenetic approaches to classification have been heavily developed in biology by bioinformaticians. But these techniques have applications in other fields, in particular in linguistics. Their main characteristics is to search for relationships between the objects or species in study, instead of grouping them by similarity. They are thus rather well suited for any kind of evolutionary objects. For nearly fifteen years, astrocladistics has explored the use of Maximum Parsimony (or cladistics) for astronomical objects like galaxies or globular clusters. In this lesson we will learn how it works.

  11. CLASIFICACIÓN NO SUPERVISADA DE COBERTURAS VEGETALES SOBRE IMÁGENES DIGITALES DE SENSORES REMOTOS: “LANDSAT - ETM+” NONSUPERVISED CLASSIFICATION OF VEGETABLE COVERS ON DIGITAL IMAGES OF REMOTE SENSORS: "LANDSAT - ETM+"

    Directory of Open Access Journals (Sweden)

    Mauricio Arango Gutiérrez

    2005-06-01

    Full Text Available La diversidad de especies vegetales presentes en Colombia y la falta de inventario sobre ellas hace pensar en un proceso que facilite la labor de los investigadores en estas disciplinas. Los sensores remotos satelitales como el LANDSAT ETM+ y las técnicas de inteligencia artificial no supervisadas, como los Self-Organizing Maps - SOM, podrían proveer una alternativa viable para avanzar en la obtención rápida de información que corresponda a zonas con diferentes coberturas vegetales presentes en la geografía nacional. La zona propuesta para el caso en estudio fue clasificada de forma supervisada por el método de máxima similitud en otro trabajo de investigación en ciencias forestales y se discriminaron ocho tipos de coberturas vegetales. Esta información sirvió como patrón de medida para evaluar el desempeño de los clasificadores no supervisados ISODATA y SOM. Sin embargo, la información que proveen las imágenes debió ser depurada previamente de acuerdo a los criterios de uso y calidad de los datos de manera que se utilizara la información adecuada para estos métodos no supervisados. Para esto se recurrió a varios conceptos como las estadísticas de las imágenes, el comportamiento espectral de las comunidades vegetales, las características del sensor y la divergencia promedio que permitieron definir las mejores bandas y sus combinaciones. Sobre éstas se aplicó el concepto de análisis de componentes principales que permitió reducir el número de datos conservando un gran porcentaje de la información. Sobre estos datos depurados se aplicaron las técnicas no supervisadas modificando algunos parámetros que pudieran mostrar una mejor convergencia de los métodos. Los resultados obtenidos se compararon con la clasificación supervisada a través de matrices de confusión y se concluye que no hay una buena convergencia de los métodos de clasificación no supervisada con este proceso para el caso de las coberturas vegetales

  12. Supernova Photometric Classification Challenge

    CERN Document Server

    Kessler, Richard; Jha, Saurabh; Kuhlmann, Stephen

    2010-01-01

    We have publicly released a blinded mix of simulated SNe, with types (Ia, Ib, Ic, II) selected in proportion to their expected rate. The simulation is realized in the griz filters of the Dark Energy Survey (DES) with realistic observing conditions (sky noise, point spread function and atmospheric transparency) based on years of recorded conditions at the DES site. Simulations of non-Ia type SNe are based on spectroscopically confirmed light curves that include unpublished non-Ia samples donated from the Carnegie Supernova Project (CSP), the Supernova Legacy Survey (SNLS), and the Sloan Digital Sky Survey-II (SDSS-II). We challenge scientists to run their classification algorithms and report a type for each SN. A spectroscopically confirmed subset is provided for training. The goals of this challenge are to (1) learn the relative strengths and weaknesses of the different classification algorithms, (2) use the results to improve classification algorithms, and (3) understand what spectroscopically confirmed sub-...

  13. Classification in Medical Imaging

    DEFF Research Database (Denmark)

    Chen, Chen

    Classification is extensively used in the context of medical image analysis for the purpose of diagnosis or prognosis. In order to classify image content correctly, one needs to extract efficient features with discriminative properties and build classifiers based on these features. In addition......, a good metric is required to measure distance or similarity between feature points so that the classification becomes feasible. Furthermore, in order to build a successful classifier, one needs to deeply understand how classifiers work. This thesis focuses on these three aspects of classification...... to segment breast tissue and pectoral muscle area from the background in mammogram. The second focus is the choices of metric and its influence to the feasibility of a classifier, especially on k-nearest neighbors (k-NN) algorithm, with medical applications on breast cancer prediction and calcification...

  14. Classification of hand eczema

    DEFF Research Database (Denmark)

    Agner, T; Aalto-Korte, K; Andersen, K E;

    2015-01-01

    BACKGROUND: Classification of hand eczema (HE) is mandatory in epidemiological and clinical studies, and also important in clinical work. OBJECTIVES: The aim was to test a recently proposed classification system of HE in clinical practice in a prospective multicentre study. METHODS: Patients were...... HE, protein contact dermatitis/contact urticaria, hyperkeratotic endogenous eczema and vesicular endogenous eczema, respectively. An additional diagnosis was given if symptoms indicated that factors additional to the main diagnosis were of importance for the disease. RESULTS: Four hundred and twenty......%) could not be classified. 38% had one additional diagnosis and 26% had two or more additional diagnoses. Eczema on feet was found in 30% of the patients, statistically significantly more frequently associated with hyperkeratotic and vesicular endogenous eczema. CONCLUSION: We find that the classification...

  15. Acoustic classification of dwellings

    DEFF Research Database (Denmark)

    Berardi, Umberto; Rasmussen, Birgit

    2014-01-01

    Schemes for the classification of dwellings according to different building performances have been proposed in the last years worldwide. The general idea behind these schemes relates to the positive impact a higher label, and thus a better performance, should have. In particular, focusing on soun...... exchanging experiences about constructions fulfilling different classes, reducing trade barriers, and finally increasing the sound insulation of dwellings.......Schemes for the classification of dwellings according to different building performances have been proposed in the last years worldwide. The general idea behind these schemes relates to the positive impact a higher label, and thus a better performance, should have. In particular, focusing on sound...... insulation performance, national schemes for sound classification of dwellings have been developed in several European countries. These schemes define acoustic classes according to different levels of sound insulation. Due to the lack of coordination among countries, a significant diversity in terms...

  16. Classification problem in CBIR

    Directory of Open Access Journals (Sweden)

    Tatiana Jaworska

    2013-04-01

    Full Text Available At present a great deal of research is being done in different aspects of Content-Based Im-age Retrieval (CBIR. Image classification is one of the most important tasks in image re-trieval that must be dealt with. The primary issue we have addressed is: how can the fuzzy set theory be used to handle crisp image data. We propose fuzzy rule-based classification of image objects. To achieve this goal we have built fuzzy rule-based classifiers for crisp data. In this paper we present the results of fuzzy rule-based classification in our CBIR. Further-more, these results are used to construct a search engine taking into account data mining.

  17. Cellular image classification

    CERN Document Server

    Xu, Xiang; Lin, Feng

    2017-01-01

    This book introduces new techniques for cellular image feature extraction, pattern recognition and classification. The authors use the antinuclear antibodies (ANAs) in patient serum as the subjects and the Indirect Immunofluorescence (IIF) technique as the imaging protocol to illustrate the applications of the described methods. Throughout the book, the authors provide evaluations for the proposed methods on two publicly available human epithelial (HEp-2) cell datasets: ICPR2012 dataset from the ICPR'12 HEp-2 cell classification contest and ICIP2013 training dataset from the ICIP'13 Competition on cells classification by fluorescent image analysis. First, the reading of imaging results is significantly influenced by one’s qualification and reading systems, causing high intra- and inter-laboratory variance. The authors present a low-order LP21 fiber mode for optical single cell manipulation and imaging staining patterns of HEp-2 cells. A focused four-lobed mode distribution is stable and effective in optical...

  18. The paradox of atheoretical classification

    DEFF Research Database (Denmark)

    Hjørland, Birger

    2016-01-01

    A distinction can be made between “artificial classifications” and “natural classifications,” where artificial classifications may adequately serve some limited purposes, but natural classifications are overall most fruitful by allowing inference and thus many different purposes. There is strong...... support for the view that a natural classification should be based on a theory (and, of course, that the most fruitful theory provides the most fruitful classification). Nevertheless, atheoretical (or “descriptive”) classifications are often produced. Paradoxically, atheoretical classifications may...... be very successful. The best example of a successful “atheoretical” classification is probably the prestigious Diagnostic and Statistical Manual of Mental Disorders (DSM) since its third edition from 1980. Based on such successes one may ask: Should the claim that classifications ideally are natural...

  19. Information gathering for CLP classification.

    Science.gov (United States)

    Marcello, Ida; Giordano, Felice; Costamagna, Francesca Marina

    2011-01-01

    Regulation 1272/2008 includes provisions for two types of classification: harmonised classification and self-classification. The harmonised classification of substances is decided at Community level and a list of harmonised classifications is included in the Annex VI of the classification, labelling and packaging Regulation (CLP). If a chemical substance is not included in the harmonised classification list it must be self-classified, based on available information, according to the requirements of Annex I of the CLP Regulation. CLP appoints that the harmonised classification will be performed for carcinogenic, mutagenic or toxic to reproduction substances (CMR substances) and for respiratory sensitisers category 1 and for other hazard classes on a case-by-case basis. The first step of classification is the gathering of available and relevant information. This paper presents the procedure for gathering information and to obtain data. The data quality is also discussed.

  20. Information gathering for CLP classification

    Directory of Open Access Journals (Sweden)

    Ida Marcello

    2011-01-01

    Full Text Available Regulation 1272/2008 includes provisions for two types of classification: harmonised classification and self-classification. The harmonised classification of substances is decided at Community level and a list of harmonised classifications is included in the Annex VI of the classification, labelling and packaging Regulation (CLP. If a chemical substance is not included in the harmonised classification list it must be self-classified, based on available information, according to the requirements of Annex I of the CLP Regulation. CLP appoints that the harmonised classification will be performed for carcinogenic, mutagenic or toxic to reproduction substances (CMR substances and for respiratory sensitisers category 1 and for other hazard classes on a case-by-case basis. The first step of classification is the gathering of available and relevant information. This paper presents the procedure for gathering information and to obtain data. The data quality is also discussed.

  1. Transporter taxonomy - a comparison of different transport protein classification schemes.

    Science.gov (United States)

    Viereck, Michael; Gaulton, Anna; Digles, Daniela; Ecker, Gerhard F

    2014-06-01

    Currently, there are more than 800 well characterized human membrane transport proteins (including channels and transporters) and there are estimates that about 10% (approx. 2000) of all human genes are related to transport. Membrane transport proteins are of interest as potential drug targets, for drug delivery, and as a cause of side effects and drug–drug interactions. In light of the development of Open PHACTS, which provides an open pharmacological space, we analyzed selected membrane transport protein classification schemes (Transporter Classification Database, ChEMBL, IUPHAR/BPS Guide to Pharmacology, and Gene Ontology) for their ability to serve as a basis for pharmacology driven protein classification. A comparison of these membrane transport protein classification schemes by using a set of clinically relevant transporters as use-case reveals the strengths and weaknesses of the different taxonomy approaches.

  2. Bosniak Classification system

    DEFF Research Database (Denmark)

    Graumann, Ole; Osther, Susanne Sloth; Karstoft, Jens;

    2014-01-01

    . Purpose: To investigate the inter- and intra-observer agreement among experienced uroradiologists when categorizing complex renal cysts according to the Bosniak classification. Material and Methods: The original categories of 100 cystic renal masses were chosen as “Gold Standard” (GS), established...... to the calculated weighted κ all readers performed “very good” for both inter-observer and intra-observer variation. Most variation was seen in cysts catagorized as Bosniak II, IIF, and III. These results show that radiologists who evaluate complex renal cysts routinely may apply the Bosniak classification...

  3. Acoustic classification of dwellings

    DEFF Research Database (Denmark)

    Berardi, Umberto; Rasmussen, Birgit

    2014-01-01

    insulation performance, national schemes for sound classification of dwellings have been developed in several European countries. These schemes define acoustic classes according to different levels of sound insulation. Due to the lack of coordination among countries, a significant diversity in terms...... of descriptors, number of classes, and class intervals occurred between national schemes. However, a proposal “acoustic classification scheme for dwellings” has been developed recently in the European COST Action TU0901 with 32 member countries. This proposal has been accepted as an ISO work item. This paper...

  4. Classification of iconic images

    OpenAIRE

    Zrianina, Mariia; Kopf, Stephan

    2016-01-01

    Iconic images represent an abstract topic and use a presentation that is intuitively understood within a certain cultural context. For example, the abstract topic “global warming” may be represented by a polar bear standing alone on an ice floe. Such images are widely used in media and their automatic classification can help to identify high-level semantic concepts. This paper presents a system for the classification of iconic images. It uses a variation of the Bag of Visual Words approach wi...

  5. Classification problem in CBIR

    OpenAIRE

    Tatiana Jaworska

    2013-01-01

    At present a great deal of research is being done in different aspects of Content-Based Im-age Retrieval (CBIR). Image classification is one of the most important tasks in image re-trieval that must be dealt with. The primary issue we have addressed is: how can the fuzzy set theory be used to handle crisp image data. We propose fuzzy rule-based classification of image objects. To achieve this goal we have built fuzzy rule-based classifiers for crisp data. In this paper we present the results ...

  6. Latent classification models

    DEFF Research Database (Denmark)

    Langseth, Helge; Nielsen, Thomas Dyhre

    2005-01-01

    One of the simplest, and yet most consistently well-performing setof classifiers is the \\NB models. These models rely on twoassumptions: $(i)$ All the attributes used to describe an instanceare conditionally independent given the class of that instance,and $(ii)$ all attributes follow a specific...... parametric family ofdistributions.  In this paper we propose a new set of models forclassification in continuous domains, termed latent classificationmodels. The latent classification model can roughly be seen ascombining the \\NB model with a mixture of factor analyzers,thereby relaxing the assumptions...... classification model, and wedemonstrate empirically that the accuracy of the proposed model issignificantly higher than the accuracy of other probabilisticclassifiers....

  7. Minimum Error Entropy Classification

    CERN Document Server

    Marques de Sá, Joaquim P; Santos, Jorge M F; Alexandre, Luís A

    2013-01-01

    This book explains the minimum error entropy (MEE) concept applied to data classification machines. Theoretical results on the inner workings of the MEE concept, in its application to solving a variety of classification problems, are presented in the wider realm of risk functionals. Researchers and practitioners also find in the book a detailed presentation of practical data classifiers using MEE. These include multi‐layer perceptrons, recurrent neural networks, complexvalued neural networks, modular neural networks, and decision trees. A clustering algorithm using a MEE‐like concept is also presented. Examples, tests, evaluation experiments and comparison with similar machines using classic approaches, complement the descriptions.

  8. Constructing criticality by classification

    DEFF Research Database (Denmark)

    Machacek, Erika

    2017-01-01

    This paper explores the role of expertise, the nature of criticality, and their relationship to securitisation as mineral raw materials are classified. It works with the construction of risk along the liberal logic of security to explore how "key materials" are turned into "critical materials......, legitimizing a criticality discourse.Specifically, the paper introduces a typology delineating the inferences made by the experts from their produced recommendations in the classification of rare earth element criticality. The paper argues that the classification is a specific process of constructing risk...

  9. What is new in genetics and osteogenesis imperfecta classification?

    OpenAIRE

    Eugênia R. Valadares; Carneiro, Túlio B.; Santos, Paula M.; Oliveira, Ana Cristina; Zabel, Bernhard

    2014-01-01

    OBJECTIVE: Literature review of new genes related to osteogenesis imperfecta (OI) and update of its classification. SOURCES: Literature review in the PubMed and OMIM databases, followed by selection of relevant references. SUMMARY OF THE FINDINGS: In 1979, Sillence et al. developed a classification of OI subtypes based on clinical features and disease severity: OI type I, mild, common, with blue sclera; OI type II, perinatal lethal form; OI type III, severe and progressively deformin...

  10. Shark Teeth Classification

    Science.gov (United States)

    Brown, Tom; Creel, Sally; Lee, Velda

    2009-01-01

    On a recent autumn afternoon at Harmony Leland Elementary in Mableton, Georgia, students in a fifth-grade science class investigated the essential process of classification--the act of putting things into groups according to some common characteristics or attributes. While they may have honed these skills earlier in the week by grouping their own…

  11. Sandwich classification theorem

    Directory of Open Access Journals (Sweden)

    Alexey Stepanov

    2015-09-01

    Full Text Available The present note arises from the author's talk at the conference ``Ischia Group Theory 2014''. For subgroups FleN of a group G denote by Lat(F,N the set of all subgroups of N , containing F . Let D be a subgroup of G . In this note we study the lattice LL=Lat(D,G and the lattice LL ′ of subgroups of G , normalized by D . We say that LL satisfies sandwich classification theorem if LL splits into a disjoint union of sandwiches Lat(F,N G (F over all subgroups F such that the normal closure of D in F coincides with F . Here N G (F denotes the normalizer of F in G . A similar notion of sandwich classification is introduced for the lattice LL ′ . If D is perfect, i.,e. coincides with its commutator subgroup, then it turns out that sandwich classification theorem for LL and LL ′ are equivalent. We also show how to find basic subroup F of sandwiches for LL ′ and review sandwich classification theorems in algebraic groups over rings.

  12. Dynamic Latent Classification Model

    DEFF Research Database (Denmark)

    Zhong, Shengtong; Martínez, Ana M.; Nielsen, Thomas Dyhre

    as possible. Motivated by this problem setting, we propose a generative model for dynamic classification in continuous domains. At each time point the model can be seen as combining a naive Bayes model with a mixture of factor analyzers (FA). The latent variables of the FA are used to capture the dynamics...... in the process as well as modeling dependences between attributes....

  13. An automated cirrus classification

    Science.gov (United States)

    Gryspeerdt, Edward; Quaas, Johannes; Sourdeval, Odran; Goren, Tom

    2017-04-01

    Cirrus clouds play an important role in determining the radiation budget of the earth, but our understanding of the lifecycle and controls on cirrus clouds remains incomplete. Cirrus clouds can have very different properties and development depending on their environment, particularly during their formation. However, the relevant factors often cannot be distinguished using commonly retrieved satellite data products (such as cloud optical depth). In particular, the initial cloud phase has been identified as an important factor in cloud development, but although back-trajectory based methods can provide information on the initial cloud phase, they are computationally expensive and depend on the cloud parametrisations used in re-analysis products. In this work, a classification system (Identification and Classification of Cirrus, IC-CIR) is introduced. Using re-analysis and satellite data, cirrus clouds are separated in four main types: frontal, convective, orographic and in-situ. The properties of these classes show that this classification is able to provide useful information on the properties and initial phase of cirrus clouds, information that could not be provided by instantaneous satellite retrieved cloud properties alone. This classification is designed to be easily implemented in global climate models, helping to improve future comparisons between observations and models and reducing the uncertainty in cirrus clouds properties, leading to improved cloud parametrisations.

  14. Classifications in popular music

    NARCIS (Netherlands)

    van Venrooij, A.; Schmutz, V.; Wright, J.D.

    2015-01-01

    The categorical system of popular music, such as genre categories, is a highly differentiated and dynamic classification system. In this article we present work that studies different aspects of these categorical systems in popular music. Following the work of Paul DiMaggio, we focus on four questio

  15. Nearest convex hull classification

    NARCIS (Netherlands)

    G.I. Nalbantov (Georgi); P.J.F. Groenen (Patrick); J.C. Bioch (Cor)

    2006-01-01

    textabstractConsider the classification task of assigning a test object to one of two or more possible groups, or classes. An intuitive way to proceed is to assign the object to that class, to which the distance is minimal. As a distance measure to a class, we propose here to use the distance to the

  16. Principles for ecological classification

    Science.gov (United States)

    Dennis H. Grossman; Patrick Bourgeron; Wolf-Dieter N. Busch; David T. Cleland; William Platts; G. Ray; C. Robins; Gary Roloff

    1999-01-01

    The principal purpose of any classification is to relate common properties among different entities to facilitate understanding of evolutionary and adaptive processes. In the context of this volume, it is to facilitate ecosystem stewardship, i.e., to help support ecosystem conservation and management objectives.

  17. Improving Student Question Classification

    Science.gov (United States)

    Heiner, Cecily; Zachary, Joseph L.

    2009-01-01

    Students in introductory programming classes often articulate their questions and information needs incompletely. Consequently, the automatic classification of student questions to provide automated tutorial responses is a challenging problem. This paper analyzes 411 questions from an introductory Java programming course by reducing the natural…

  18. Classification of waste packages

    Energy Technology Data Exchange (ETDEWEB)

    Mueller, H.P.; Sauer, M.; Rojahn, T. [Versuchsatomkraftwerk GmbH, Kahl am Main (Germany)

    2001-07-01

    A barrel gamma scanning unit has been in use at the VAK for the classification of radioactive waste materials since 1998. The unit provides the facility operator with the data required for classification of waste barrels. Once these data have been entered into the AVK data processing system, the radiological status of raw waste as well as pre-treated and processed waste can be tracked from the point of origin to the point at which the waste is delivered to a final storage. Since the barrel gamma scanning unit was commissioned in 1998, approximately 900 barrels have been measured and the relevant data required for classification collected and analyzed. Based on the positive results of experience in the use of the mobile barrel gamma scanning unit, the VAK now offers the classification of barrels as a service to external users. Depending upon waste quantity accumulation, this measurement unit offers facility operators a reliable and time-saving and cost-effective means of identifying and documenting the radioactivity inventory of barrels scheduled for final storage. (orig.)

  19. Event Classification using Concepts

    NARCIS (Netherlands)

    Boer, M.H.T. de; Schutte, K.; Kraaij, W.

    2013-01-01

    The semantic gap is one of the challenges in the GOOSE project. In this paper a Semantic Event Classification (SEC) system is proposed as an initial step in tackling the semantic gap challenge in the GOOSE project. This system uses semantic text analysis, multiple feature detectors using the BoW

  20. Munitions Classification Library

    Science.gov (United States)

    2016-04-04

    the MM and TEMTADS 2x2 systems , with dynamic data handling for these systems on the horizon. Using UX-Analyze, a data processor can apply physics ... classification libraries. DRAFT 4 2.0 TECHNOLOGY Three different sensor systems were used during the initial phase of the data collection: a modified...

  1. Recurrent neural collective classification.

    Science.gov (United States)

    Monner, Derek D; Reggia, James A

    2013-12-01

    With the recent surge in availability of data sets containing not only individual attributes but also relationships, classification techniques that take advantage of predictive relationship information have gained in popularity. The most popular existing collective classification techniques have a number of limitations-some of them generate arbitrary and potentially lossy summaries of the relationship data, whereas others ignore directionality and strength of relationships. Popular existing techniques make use of only direct neighbor relationships when classifying a given entity, ignoring potentially useful information contained in expanded neighborhoods of radius greater than one. We present a new technique that we call recurrent neural collective classification (RNCC), which avoids arbitrary summarization, uses information about relationship directionality and strength, and through recursive encoding, learns to leverage larger relational neighborhoods around each entity. Experiments with synthetic data sets show that RNCC can make effective use of relationship data for both direct and expanded neighborhoods. Further experiments demonstrate that our technique outperforms previously published results of several collective classification methods on a number of real-world data sets.

  2. Event Classification using Concepts

    NARCIS (Netherlands)

    Boer, M.H.T. de; Schutte, K.; Kraaij, W.

    2013-01-01

    The semantic gap is one of the challenges in the GOOSE project. In this paper a Semantic Event Classification (SEC) system is proposed as an initial step in tackling the semantic gap challenge in the GOOSE project. This system uses semantic text analysis, multiple feature detectors using the BoW mod

  3. Homeobox genes and melatonin synthesis

    DEFF Research Database (Denmark)

    Rohde, Kristian; Møller, Morten; Rath, Martin Fredensborg

    2014-01-01

    ) transcription factor is believed to control pineal-specific Aanat expression. Based on recent advances in our understanding of Crx in the rodent pineal gland, we here suggest that homeobox genes play a role in adult pineal physiology both by ensuring pineal-specific Aanat expression and by facilitating cAMP...

  4. Efficient Fingercode Classification

    Science.gov (United States)

    Sun, Hong-Wei; Law, Kwok-Yan; Gollmann, Dieter; Chung, Siu-Leung; Li, Jian-Bin; Sun, Jia-Guang

    In this paper, we present an efficient fingerprint classification algorithm which is an essential component in many critical security application systems e. g. systems in the e-government and e-finance domains. Fingerprint identification is one of the most important security requirements in homeland security systems such as personnel screening and anti-money laundering. The problem of fingerprint identification involves searching (matching) the fingerprint of a person against each of the fingerprints of all registered persons. To enhance performance and reliability, a common approach is to reduce the search space by firstly classifying the fingerprints and then performing the search in the respective class. Jain et al. proposed a fingerprint classification algorithm based on a two-stage classifier, which uses a K-nearest neighbor classifier in its first stage. The fingerprint classification algorithm is based on the fingercode representation which is an encoding of fingerprints that has been demonstrated to be an effective fingerprint biometric scheme because of its ability to capture both local and global details in a fingerprint image. We enhance this approach by improving the efficiency of the K-nearest neighbor classifier for fingercode-based fingerprint classification. Our research firstly investigates the various fast search algorithms in vector quantization (VQ) and the potential application in fingerprint classification, and then proposes two efficient algorithms based on the pyramid-based search algorithms in VQ. Experimental results on DB1 of FVC 2004 demonstrate that our algorithms can outperform the full search algorithm and the original pyramid-based search algorithms in terms of computational efficiency without sacrificing accuracy.

  5. Sequence Classification: 890678 [

    Lifescience Database Archive (English)

    Full Text Available DNA endonuclease, encoded by the mitochondrial group I intron of the 21S_rRNA gene; mediates gene conversion that propagates the int...ron into intron-less copies of the 21S_rRNA gene; Sceip || http://www.ncbi.nlm.nih.gov/protein/6226538 ...

  6. Definitions and classification of tic disorders. The Tourette Syndrome Classification Study Group.

    Science.gov (United States)

    1993-10-01

    Tics are brief movements (motor tics) or sounds (vocal tics) that occur intermittently and unpredictably out of a background of normal motor activity. Although tics can appear as the result of direct brain injury (so-called symptomatic, eg, from head trauma or encephalitis), they most commonly are idiopathic and are part of the spectrum of Gilles de la Tourette syndrome or other idiopathic tic disorders. To aid investigators searching for the gene(s) causing Tourette syndrome, criteria are proposed to classify the idiopathic tic disorders. Although some of these separate entities may ultimately be shown to be caused by the same gene, until that is established, it is considered best when searching for the Tourette's gene to have tic disorders classified into distinct, homogeneous entities. The proposed classification will likely change over time as better diagnostic techniques become available and can both expand and consolidate, particularly after the Tourette gene is located.

  7. Cuckoo search optimisation for feature selection in cancer classification: a new approach.

    Science.gov (United States)

    Gunavathi, C; Premalatha, K

    2015-01-01

    Cuckoo Search (CS) optimisation algorithm is used for feature selection in cancer classification using microarray gene expression data. Since the gene expression data has thousands of genes and a small number of samples, feature selection methods can be used for the selection of informative genes to improve the classification accuracy. Initially, the genes are ranked based on T-statistics, Signal-to-Noise Ratio (SNR) and F-statistics values. The CS is used to find the informative genes from the top-m ranked genes. The classification accuracy of k-Nearest Neighbour (kNN) technique is used as the fitness function for CS. The proposed method is experimented and analysed with ten different cancer gene expression datasets. The results show that the CS gives 100% average accuracy for DLBCL Harvard, Lung Michigan, Ovarian Cancer, AML-ALL and Lung Harvard2 datasets and it outperforms the existing techniques in DLBCL outcome and prostate datasets.

  8. Bosniak classification system

    DEFF Research Database (Denmark)

    Graumann, Ole; Osther, Susanne Sloth; Karstoft, Jens;

    2016-01-01

    at MR and CEUS imaging and those at CT. PURPOSE: To compare diagnostic accuracy of MR, CEUS, and CT when categorizing complex renal cystic masses according to the Bosniak classification. MATERIAL AND METHODS: From February 2011 to June 2012, 46 complex renal cysts were prospectively evaluated by three...... readers. Each mass was categorized according to the Bosniak classification and CT was chosen as gold standard. Kappa was calculated for diagnostic accuracy and data was compared with pathological results. RESULTS: CT images found 27 BII, six BIIF, seven BIII, and six BIV. Forty-three cysts could...... one category lower. Pathologic correlation in six lesions revealed four malignant and two benign lesions. CONCLUSION: CEUS and MR both up- and downgraded renal cysts compared to CT, and until these non-radiation modalities have been refined and adjusted, CT should remain the gold standard...

  9. BIRADS classification in mammography.

    Science.gov (United States)

    Balleyguier, Corinne; Ayadi, Salma; Van Nguyen, Kim; Vanel, Daniel; Dromain, Clarisse; Sigal, Robert

    2007-02-01

    The Breast Imaging Report and Data System (BIRADS) of the American College of Radiology (ACR) is today largely used in most of the countries where breast cancer screening is implemented. It is a tool defined to reduce variability between radiologists when creating the reports in mammography, ultrasonography or MRI. Some changes in the last version of the BIRADStrade mark have been included to reduce the inaccuracy of some categories, especially for category 4. The BIRADStrade mark includes a lexicon and descriptive diagrams of the anomalies, recommendations for the mammographic report as well as councils and examples of mammographic cases. This review describes the mammographic items of the BIRADS classification with its more recent developments, while detailing the advantages and limits of this classification.

  10. [New classification of vasculitis].

    Science.gov (United States)

    Anić, Branimir

    2014-01-01

    Vasculitis syndrome comprises a heterogenic group of inflammatory rheumatic diseases whose common feature is inflammation in the blood vessel wall. Establishing the diagnosis of vasculitis is one of the greatest challenges in medicine. Clinical presentation of vasculitis depends on the extent of an organ system affection, as well as on the total number of affected organs. A great range of clinical presentations of vasculitis and the low incidence of the disease impede systematic clinical investigation of vasculitis. The needs of clinical routine and the need for conducting systemic clinical investigations require a clear distinction of individual clinical entities. Different classifications of vasculitis syndrome have been proposed: according to etiology, pathogenesis, histological finding in the affected vessels, affection of individual organs and organ systems. This paper presents and comments news and recent classifications and nomenclature of vasculitic entities proposed at the second conference in Chapel Hill.

  11. Hand eczema classification

    DEFF Research Database (Denmark)

    Diepgen, T L; Andersen, Klaus Ejner; Brandao, F M;

    2008-01-01

    Summary Background Hand eczema is a long-lasting disease with a high prevalence in the background population. The disease has severe, negative effects on quality of life and sometimes on social status. Epidemiological studies have identified risk factors for onset and prognosis, but treatment...... of the disease is rarely evidence based, and a classification system for different subdiagnoses of hand eczema is not agreed upon. Randomized controlled trials investigating the treatment of hand eczema are called for. For this, as well as for clinical purposes, a generally accepted classification system...... for hand eczema is needed. Objectives The present study attempts to characterize subdiagnoses of hand eczema with respect to basic demographics, medical history and morphology. Methods Clinical data from 416 patients with hand eczema from 10 European patch test clinics were assessed. Results...

  12. Multilingual documentation and classification.

    Science.gov (United States)

    Donnelly, Kevin

    2008-01-01

    Health care providers around the world have used classification systems for decades as a basis for documentation, communications, statistical reporting, reimbursement and research. In more recent years machine-readable medical terminologies have taken on greater importance with the adoption of electronic health records and the need for greater granularity of data in clinical systems. Use of a clinical terminology harmonised with classifications, implemented within a clinical information system, will enable the delivery of many patient health benefits including electronic clinical decision support, disease screening and enhanced patient safety. In order to be usable these systems must be translated into the language of use, without losing meaning. It is evident that today one system cannot meet all requirements which call for collaboration and harmonisation in order to achieve true interoperability on a multilingual basis.

  13. Sound classification of dwellings

    DEFF Research Database (Denmark)

    Rasmussen, Birgit

    2012-01-01

    dwellings, facade sound insulation and installation noise. The schemes have been developed, implemented and revised gradually since the early 1990s. However, due to lack of coordination between countries, there are significant discrepancies, and new standards and revisions continue to increase the diversity....... Descriptors, range of quality levels, number of quality classes, class intervals, denotations and descriptions vary across Europe. The diversity is an obstacle for exchange of experience about constructions fulfilling different classes, implying also trade barriers. Thus, a harmonized classification scheme...... is needed, and a European COST Action TU0901 "Integrating and Harmonizing Sound Insulation Aspects in Sustainable Urban Housing Constructions", has been established and runs 2009-2013, one of the main objectives being to prepare a proposal for a European sound classification scheme with a number of quality...

  14. Sparse discriminant analysis for breast cancer biomarker identification and classification

    Institute of Scientific and Technical Information of China (English)

    Yu Shi; Daoqing Dai; Chaochun Liu; Hong Yan

    2009-01-01

    Biomarker identification and cancer classification are two important procedures in microarray data analysis. We propose a novel uni-fied method to carry out both tasks. We first preselect biomarker candidates by eliminating unrelated genes through the BSS/WSS ratio filter to reduce computational cost, and then use a sparse discriminant analysis method for simultaneous biomarker identification and cancer classification. Moreover, we give a mathematical justification about automatic biomarker identification. Experimental results show that the proposed method can identify key genes that have been verified in biochemical or biomedical research and classify the breast cancer type correctly.

  15. Classification and regression trees

    CERN Document Server

    Breiman, Leo; Olshen, Richard A; Stone, Charles J

    1984-01-01

    The methodology used to construct tree structured rules is the focus of this monograph. Unlike many other statistical procedures, which moved from pencil and paper to calculators, this text's use of trees was unthinkable before computers. Both the practical and theoretical sides have been developed in the authors' study of tree methods. Classification and Regression Trees reflects these two sides, covering the use of trees as a data analysis method, and in a more mathematical framework, proving some of their fundamental properties.

  16. Robust Vertex Classification

    OpenAIRE

    Chen, Li; Shen, Cencheng; Vogelstein, Joshua; Priebe, Carey

    2013-01-01

    For random graphs distributed according to stochastic blockmodels, a special case of latent position graphs, adjacency spectral embedding followed by appropriate vertex classification is asymptotically Bayes optimal; but this approach requires knowledge of and critically depends on the model dimension. In this paper, we propose a sparse representation vertex classifier which does not require information about the model dimension. This classifier represents a test vertex as a sparse combinatio...

  17. Evolvement of Classification Society

    Institute of Scientific and Technical Information of China (English)

    Xu Hua

    2011-01-01

    As an independent industry, the emergence of the classification society was perhaps the demand of beneficial interests between shipowners, cargo owners and insurers at the earliest time. Today, as an indispensable link of the international maritime industry, class role has changed fundamentally. Start off from the demand of the insurersSeaborne trade, transport and insurance industries began to emerge successively in the 17th century. The massive risk and benefit brought by seaborne transport provided a difficult problem to insurers.

  18. Towards secondary fingerprint classification

    CSIR Research Space (South Africa)

    Msiza, IS

    2011-07-01

    Full Text Available . INTRODUCTION The classification of samples in an automated recognition system is primarily important because of the need to virtually divide the template database into smaller, manageable parti- tions. This virtual division is done before executing a... database search procedure, and it is done in order to avoid having to search the entire template database and, for this reason, minimize the database search time and improve the overall performance of an automated recognition system. Sample...

  19. Semiparametric Gaussian copula classification

    OpenAIRE

    Zhao, Yue; Wegkamp, Marten

    2014-01-01

    This paper studies the binary classification of two distributions with the same Gaussian copula in high dimensions. Under this semiparametric Gaussian copula setting, we derive an accurate semiparametric estimator of the log density ratio, which leads to our empirical decision rule and a bound on its associated excess risk. Our estimation procedure takes advantage of the potential sparsity as well as the low noise condition in the problem, which allows us to achieve faster convergence rate of...

  20. Classification of nanopolymers

    Energy Technology Data Exchange (ETDEWEB)

    Larena, A; Tur, A [Department of Chemical Industrial Engineering and Environment, Universidad Politecnica de Madrid, E.T.S. Ingenieros Industriales, C/ Jose Gutierrez Abascal, Madrid (Spain); Baranauskas, V [Faculdade de Engenharia Eletrica e Computacao, Departamento de Semicondutores, Instrumentos e Fotonica, Universidade Estadual de Campinas, UNICAMP, Av. Albert Einstein N.400, 13 083-852 Campinas SP Brasil (Brazil)], E-mail: alarena@etsii.upm.es

    2008-03-15

    Nanopolymers with different structures, shapes, and functional forms have recently been prepared using several techniques. Nanopolymers are the most promising basic building blocks for mounting complex and simple hierarchical nanosystems. The applications of nanopolymers are extremely broad and polymer-based nanotechnologies are fast emerging. We propose a nanopolymer classification scheme based on self-assembled structures, non self-assembled structures, and on the number of dimensions in the nanometer range (nD)

  1. Classification of myocardial infarction

    DEFF Research Database (Denmark)

    Saaby, Lotte; Poulsen, Tina Svenstrup; Hosbond, Susanne Elisabeth

    2013-01-01

    The classification of myocardial infarction into 5 types was introduced in 2007 as an important component of the universal definition. In contrast to the plaque rupture-related type 1 myocardial infarction, type 2 myocardial infarction is considered to be caused by an imbalance between demand...... and supply of oxygen in the myocardium. However, no specific criteria for type 2 myocardial infarction have been established....

  2. Decimal Classification Editions

    Directory of Open Access Journals (Sweden)

    Zenovia Niculescu

    2009-01-01

    Full Text Available The study approaches the evolution of Dewey Decimal Classification editions from the perspective of updating the terminology, reallocating and expanding the main and auxilary structure of Dewey indexing language. The comparative analysis of DDC editions emphasizes the efficiency of Dewey scheme from the point of view of improving the informational offer, through basic index terms, revised and developed, as well as valuing the auxilary notations.

  3. The paradox of atheoretical classification

    DEFF Research Database (Denmark)

    Hjørland, Birger

    2016-01-01

    A distinction can be made between “artificial classifications” and “natural classifications,” where artificial classifications may adequately serve some limited purposes, but natural classifications are overall most fruitful by allowing inference and thus many different purposes. There is strong...... be very successful. The best example of a successful “atheoretical” classification is probably the prestigious Diagnostic and Statistical Manual of Mental Disorders (DSM) since its third edition from 1980. Based on such successes one may ask: Should the claim that classifications ideally are natural...

  4. Classification of Meteorological Drought

    Institute of Scientific and Technical Information of China (English)

    Zhang Qiang; Zou Xukai; Xiao Fengjin; Lu Houquan; Liu Haibo; Zhu Changhan; An Shunqing

    2011-01-01

    Background The national standard of the Classification of Meteorological Drought (GB/T 20481-2006) was developed by the National Climate Center in cooperation with Chinese Academy of Meteorological Sciences,National Meteorological Centre and Department of Forecasting and Disaster Mitigation under the China Meteorological Administration (CMA),and was formally released and implemented in November 2006.In 2008,this Standard won the second prize of the China Standard Innovation and Contribution Awards issued by SAC.Developed through independent innovation,it is the first national standard published to monitor meteorological drought disaster and the first standard in China and around the world specifying the classification of drought.Since its release in 2006,the national standard of Classification of Meteorological Drought has been used by CMA as the operational index to monitor and drought assess,and gradually used by provincial meteorological sureaus,and applied to the drought early warning release standard in the Methods of Release and Propagation of Meteorological Disaster Early Warning Signal.

  5. Neuromuscular disease classification system

    Science.gov (United States)

    Sáez, Aurora; Acha, Begoña; Montero-Sánchez, Adoración; Rivas, Eloy; Escudero, Luis M.; Serrano, Carmen

    2013-06-01

    Diagnosis of neuromuscular diseases is based on subjective visual assessment of biopsies from patients by the pathologist specialist. A system for objective analysis and classification of muscular dystrophies and neurogenic atrophies through muscle biopsy images of fluorescence microscopy is presented. The procedure starts with an accurate segmentation of the muscle fibers using mathematical morphology and a watershed transform. A feature extraction step is carried out in two parts: 24 features that pathologists take into account to diagnose the diseases and 58 structural features that the human eye cannot see, based on the assumption that the biopsy is considered as a graph, where the nodes are represented by each fiber, and two nodes are connected if two fibers are adjacent. A feature selection using sequential forward selection and sequential backward selection methods, a classification using a Fuzzy ARTMAP neural network, and a study of grading the severity are performed on these two sets of features. A database consisting of 91 images was used: 71 images for the training step and 20 as the test. A classification error of 0% was obtained. It is concluded that the addition of features undetectable by the human visual inspection improves the categorization of atrophic patterns.

  6. Short Text Classification: A Survey

    Directory of Open Access Journals (Sweden)

    Ge Song

    2014-05-01

    Full Text Available With the recent explosive growth of e-commerce and online communication, a new genre of text, short text, has been extensively applied in many areas. So many researches focus on short text mining. It is a challenge to classify the short text owing to its natural characters, such as sparseness, large-scale, immediacy, non-standardization. It is difficult for traditional methods to deal with short text classification mainly because too limited words in short text cannot represent the feature space and the relationship between words and documents. Several researches and reviews on text classification are shown in recent times. However, only a few of researches focus on short text classification. This paper discusses the characters of short text and the difficulty of short text classification. Then we introduce the existing popular works on short text classifiers and models, including short text classification using sematic analysis, semi-supervised short text classification, ensemble short text classification, and real-time classification. The evaluations of short text classification are analyzed in our paper. Finally we summarize the existing classification technology and prospect for development trend of short text classification

  7. Maximum mutual information regularized classification

    KAUST Repository

    Wang, Jim Jing-Yan

    2014-09-07

    In this paper, a novel pattern classification approach is proposed by regularizing the classifier learning to maximize mutual information between the classification response and the true class label. We argue that, with the learned classifier, the uncertainty of the true class label of a data sample should be reduced by knowing its classification response as much as possible. The reduced uncertainty is measured by the mutual information between the classification response and the true class label. To this end, when learning a linear classifier, we propose to maximize the mutual information between classification responses and true class labels of training samples, besides minimizing the classification error and reducing the classifier complexity. An objective function is constructed by modeling mutual information with entropy estimation, and it is optimized by a gradient descend method in an iterative algorithm. Experiments on two real world pattern classification problems show the significant improvements achieved by maximum mutual information regularization.

  8. Classification of Bacteria and Archaea: past, present and future.

    Science.gov (United States)

    Schleifer, Karl Heinz

    2009-12-01

    The late 19th century was the beginning of bacterial taxonomy and bacteria were classified on the basis of phenotypic markers. The distinction of prokaryotes and eukaryotes was introduced in the 1960s. Numerical taxonomy improved phenotypic identification but provided little information on the phylogenetic relationships of prokaryotes. Later on, chemotaxonomic and genotypic methods were widely used for a more satisfactory classification. Archaea were first classified as a separate group of prokaryotes in 1977. The current classification of Bacteria and Archaea is based on an operational-based model, the so-called polyphasic approach, comprised of phenotypic, chemotaxonomic and genotypic data, as well as phylogenetic information. The provisional status Candidatus has been established for describing uncultured prokaryotic cells for which their phylogenetic relationship has been determined and their authenticity revealed by in situ probing. The ultimate goal is to achieve a theory-based classification system based on a phylogenetic/evolutionary concept. However, there are currently two contradictory opinions about the future classification of Bacteria and Archaea. A group of mostly molecular biologists posits that the yet-unclear effect of gene flow, in particular lateral gene transfer, makes the line of descent difficult, if not impossible, to describe. However, even in the face of genomic fluidity it seems that the typical geno- and phenotypic characteristics of a taxon are still maintained, and are sufficient for reliable classification and identification of Bacteria and Archaea. There are many well-defined genotypic clusters that are congruent with known species delineated by polyphasic approaches. Comparative sequence analysis of certain core genes, including rRNA genes, may be useful for the characterization of higher taxa, whereas various character genes may be suitable as phylogenetic markers for the delineation of lower taxa. Nevertheless, there may still be

  9. Sequence Classification: 890824 [

    Lifescience Database Archive (English)

    Full Text Available ion factor that is activated by a MAP kinase signaling cascade, activates genes involved in mating or pseudohyphal/invasive... growth pathways; cooperates with Tec1p transcription factor to regulate genes specific for invasive growth; Ste12p || http://www.ncbi.nlm.nih.gov/protein/6321876 ...

  10. Sequence Classification: 891115 [

    Lifescience Database Archive (English)

    Full Text Available involved in G1/S phase events such as bud site selection, bud emergence and cell cycle progression; similarity to Cup9p; Tos8p || http://www.ncbi.nlm.nih.gov/protein/6321342 ... ...omain-containing transcription factor; SBF regulated target gene that in turn regulates expression of genes

  11. Nonlinear programming for classification problems in machine learning

    Science.gov (United States)

    Astorino, Annabella; Fuduli, Antonio; Gaudioso, Manlio

    2016-10-01

    We survey some nonlinear models for classification problems arising in machine learning. In the last years this field has become more and more relevant due to a lot of practical applications, such as text and web classification, object recognition in machine vision, gene expression profile analysis, DNA and protein analysis, medical diagnosis, customer profiling etc. Classification deals with separation of sets by means of appropriate separation surfaces, which is generally obtained by solving a numerical optimization model. While linear separability is the basis of the most popular approach to classification, the Support Vector Machine (SVM), in the recent years using nonlinear separating surfaces has received some attention. The objective of this work is to recall some of such proposals, mainly in terms of the numerical optimization models. In particular we tackle the polyhedral, ellipsoidal, spherical and conical separation approaches and, for some of them, we also consider the semisupervised versions.

  12. [Classification of headache disorders].

    Science.gov (United States)

    Heinze, A; Heinze-Kuhn, K; Göbel, H

    2007-06-01

    In 2003 the International Headache Society (IHS) published the second edition of the International Classification of Headache Disorders. Diagnostic criteria for no less than 206 separate headache diagnoses are presented in the parts (I) primary headaches, (II) secondary headaches and (III) cranial neuralgia, central and primary facial pain. The headaches are classified according to the etiology in case of the secondary headaches and according to the phenomenology in case of the primary headaches. It is the task of the headache specialist to identify the correct headache diagnose with the smallest effort possible. Both, the differentiation between secondary and primary headaches and the differentiation between the various primary headaches are of equal importance.

  13. Latent classification models

    DEFF Research Database (Denmark)

    Langseth, Helge; Nielsen, Thomas Dyhre

    2005-01-01

    parametric family ofdistributions.  In this paper we propose a new set of models forclassification in continuous domains, termed latent classificationmodels. The latent classification model can roughly be seen ascombining the \\NB model with a mixture of factor analyzers,thereby relaxing the assumptions...... of the \\NB classifier. In theproposed model the continuous attributes are described by amixture of multivariate Gaussians, where the conditionaldependencies among the attributes are encoded using latentvariables. We present algorithms for learning both the parametersand the structure of a latent...

  14. Classification in Medical Imaging

    DEFF Research Database (Denmark)

    Chen, Chen

    Classification is extensively used in the context of medical image analysis for the purpose of diagnosis or prognosis. In order to classify image content correctly, one needs to extract efficient features with discriminative properties and build classifiers based on these features. In addition...... to segment breast tissue and pectoral muscle area from the background in mammogram. The second focus is the choices of metric and its influence to the feasibility of a classifier, especially on k-nearest neighbors (k-NN) algorithm, with medical applications on breast cancer prediction and calcification...

  15. SPORT FOOD ADDITIVE CLASSIFICATION

    Directory of Open Access Journals (Sweden)

    I. P. Prokopenko

    2015-01-01

    Full Text Available Correctly organized nutritive and pharmacological support is an important component of an athlete's preparation for competitions, an optimal shape maintenance, fast recovery and rehabilitation after traumas and defatigation. Special products of enhanced biological value (BAS for athletes nutrition are used with this purpose. Easy-to-use energy sources are administered into athlete's organism, yielded materials and biologically active substances which regulate and activate exchange reactions which proceed with difficulties during certain physical trainings. The article presents sport supplements classification which can be used before warm-up and trainings, after trainings and in competitions breaks.

  16. Genome-based Taxonomic Classification of Bacteroidetes

    Directory of Open Access Journals (Sweden)

    Richard L. Hahnke

    2016-12-01

    Full Text Available The bacterial phylum Bacteroidetes, characterized by a distinct gliding motility, occurs in a broad variety of ecosystems, habitats, life styles and physiologies. Accordingly, taxonomic classification of the phylum, based on a limited number of features, proved difficult and controversial in the past, for example, when decisions were based on unresolved phylogenetic trees of the 16S rRNA gene sequence. Here we use a large collection of type-strain genomes from Bacteroidetes and closely related phyla for assessing their taxonomy based on the principles of phylogenetic classification and trees inferred from genome-scale data. No significant conflict between 16S rRNA gene and whole-genome phylogenetic analysis is found, whereas many but not all of the involved taxa are supported as monophyletic groups, particularly in the genome-scale trees. Phenotypic and phylogenomic features support the separation of Balneolaceae as new phylum Balneolaeota from Rhodothermaeota and of Saprospiraceae as new class Saprospiria from Chitinophagia. Epilithonimonas is nested within the older genus Chryseobacterium and without significant phenotypic differences; thus merging the two genera is proposed. Similarly, Vitellibacter is proposed to be included in Aequorivita. Flexibacter is confirmed as being heterogeneous and dissected, yielding six distinct genera. Hallella seregens is a later heterotypic synonym of Prevotella dentalis. Compared to values directly calculated from genome sequences, the G+C content mentioned in many species descriptions is too imprecise; moreover, corrected G+C content values have a significantly better fit to the phylogeny. Corresponding emendations of species descriptions are provided where necessary. Whereas most observed conflict with the current classification of Bacteroidetes is already visible in 16S rRNA gene trees, as expected whole-genome phylogenies are much better resolved.

  17. Genome-Based Taxonomic Classification of Bacteroidetes.

    Science.gov (United States)

    Hahnke, Richard L; Meier-Kolthoff, Jan P; García-López, Marina; Mukherjee, Supratim; Huntemann, Marcel; Ivanova, Natalia N; Woyke, Tanja; Kyrpides, Nikos C; Klenk, Hans-Peter; Göker, Markus

    2016-01-01

    The bacterial phylum Bacteroidetes, characterized by a distinct gliding motility, occurs in a broad variety of ecosystems, habitats, life styles, and physiologies. Accordingly, taxonomic classification of the phylum, based on a limited number of features, proved difficult and controversial in the past, for example, when decisions were based on unresolved phylogenetic trees of the 16S rRNA gene sequence. Here we use a large collection of type-strain genomes from Bacteroidetes and closely related phyla for assessing their taxonomy based on the principles of phylogenetic classification and trees inferred from genome-scale data. No significant conflict between 16S rRNA gene and whole-genome phylogenetic analysis is found, whereas many but not all of the involved taxa are supported as monophyletic groups, particularly in the genome-scale trees. Phenotypic and phylogenomic features support the separation of Balneolaceae as new phylum Balneolaeota from Rhodothermaeota and of Saprospiraceae as new class Saprospiria from Chitinophagia. Epilithonimonas is nested within the older genus Chryseobacterium and without significant phenotypic differences; thus merging the two genera is proposed. Similarly, Vitellibacter is proposed to be included in Aequorivita. Flexibacter is confirmed as being heterogeneous and dissected, yielding six distinct genera. Hallella seregens is a later heterotypic synonym of Prevotella dentalis. Compared to values directly calculated from genome sequences, the G+C content mentioned in many species descriptions is too imprecise; moreover, corrected G+C content values have a significantly better fit to the phylogeny. Corresponding emendations of species descriptions are provided where necessary. Whereas most observed conflict with the current classification of Bacteroidetes is already visible in 16S rRNA gene trees, as expected whole-genome phylogenies are much better resolved.

  18. Classification of smooth Fano polytopes

    DEFF Research Database (Denmark)

    Øbro, Mikkel

    Fano polytopes up to isomorphism. A smooth Fano -polytope can have at most vertices. In case of vertices an explicit classification is known. The thesis contains the classification in case of vertices. Classifications of smooth Fano -polytopes for fixed exist only for . In the thesis an algorithm...... for the classification of smooth Fano -polytopes for any given is presented. The algorithm has been implemented and used to obtain the complete classification for .......A simplicial lattice polytope containing the origin in the interior is called a smooth Fano polytope, if the vertices of every facet is a basis of the lattice. The study of smooth Fano polytopes is motivated by their connection to toric varieties. The thesis concerns the classification of smooth...

  19. CRITERIA FOR AN UPDATED CLASSIFICATION OF HUMAN TRANSCRIPTION FACTOR DNA-BINDING DOMAINS

    NARCIS (Netherlands)

    Wingender, Edgar

    2013-01-01

    By binding to cis-regulatory elements in a sequence-specific manner, transcription factors regulate the activity of nearby genes. Here, we discuss the criteria for a comprehensive classification of human TFs based on their DNA-binding domains. In particular, classification of basic leucine zipper (b

  20. CRITERIA FOR AN UPDATED CLASSIFICATION OF HUMAN TRANSCRIPTION FACTOR DNA-BINDING DOMAINS

    NARCIS (Netherlands)

    Wingender, Edgar

    By binding to cis-regulatory elements in a sequence-specific manner, transcription factors regulate the activity of nearby genes. Here, we discuss the criteria for a comprehensive classification of human TFs based on their DNA-binding domains. In particular, classification of basic leucine zipper

  1. The future of general classification

    DEFF Research Database (Denmark)

    Mai, Jens Erik

    2013-01-01

    Discusses problems related to accessing multiple collections using a single retrieval language. Surveys the concepts of interoperability and switching language. Finds that mapping between more indexing languages always will be an approximation. Surveys the issues related to general classification...... and contrasts that to special classifications. Argues for the use of general classifications to provide access to collections nationally and internationally. © 2003 by The Haworth Press, Inc. All rights reserved....

  2. Combinatorial Approach of Associative Classification

    OpenAIRE

    P. R. Pal; R.C. Jain

    2010-01-01

    Association rule mining and classification are two important techniques of data mining in knowledge discovery process. Integration of these two has produced class association rule mining or associative classification techniques, which in many cases have shown better classification accuracy than conventional classifiers. Motivated by this study we have explored and applied the combinatorial mathematics in class association rule mining in this paper. Our algorithm is based on producing co...

  3. A Classification Leveraged Object Detector

    OpenAIRE

    Sun, Miao; Han, Tony X.; He, Zhihai

    2016-01-01

    Currently, the state-of-the-art image classification algorithms outperform the best available object detector by a big margin in terms of average precision. We, therefore, propose a simple yet principled approach that allows us to leverage object detection through image classification on supporting regions specified by a preliminary object detector. Using a simple bag-of- words model based image classification algorithm, we leveraged the performance of the deformable model objector from 35.9%...

  4. A new classification of glaucomas

    Directory of Open Access Journals (Sweden)

    Bordeianu CD

    2014-09-01

    Full Text Available Constantin-Dan Bordeianu Private Practice, Ploiesti, Prahova, Romania Purpose: To suggest a new glaucoma classification that is pathogenic, etiologic, and clinical.Methods: After discussing the logical pathway used in criteria selection, the paper presents the new classification and compares it with the classification currently in use, that is, the one issued by the European Glaucoma Society in 2008.Results: The paper proves that the new classification is clear (being based on a coherent and consistently followed set of criteria, is comprehensive (framing all forms of glaucoma, and helps in understanding the sickness understanding (in that it uses a logical framing system. The great advantage is that it facilitates therapeutic decision making in that it offers direct therapeutic suggestions and avoids errors leading to disasters. Moreover, the scheme remains open to any new development.Conclusion: The suggested classification is a pathogenic, etiologic, and clinical classification that fulfills the conditions of an ideal classification. The suggested classification is the first classification in which the main criterion is consistently used for the first 5 to 7 crossings until its differentiation capabilities are exhausted. Then, secondary criteria (etiologic and clinical pick up the relay until each form finds its logical place in the scheme. In order to avoid unclear aspects, the genetic criterion is no longer used, being replaced by age, one of the clinical criteria. The suggested classification brings only benefits to all categories of ophthalmologists: the beginners will have a tool to better understand the sickness and to ease their decision making, whereas the experienced doctors will have their practice simplified. For all doctors, errors leading to therapeutic disasters will be less likely to happen. Finally, researchers will have the object of their work gathered in the group of glaucoma with unknown or uncertain pathogenesis, whereas

  5. Open Source Fundamental Industry Classification

    OpenAIRE

    Kakushadze, Zura; Yu, Willie

    2017-01-01

    We provide complete source code for building a fundamental industry classification based on publically available and freely downloadable data. We compare various fundamental industry classifications by running a horserace of short-horizon trading signals (alphas) utilizing open source heterotic risk models (https://ssrn.com/abstract=2600798) built using such industry classifications. Our source code includes various stand-alone and portable modules, e.g., for downloading/parsing web data, etc.

  6. PSC: protein surface classification.

    Science.gov (United States)

    Tseng, Yan Yuan; Li, Wen-Hsiung

    2012-07-01

    We recently proposed to classify proteins by their functional surfaces. Using the structural attributes of functional surfaces, we inferred the pairwise relationships of proteins and constructed an expandable database of protein surface classification (PSC). As the functional surface(s) of a protein is the local region where the protein performs its function, our classification may reflect the functional relationships among proteins. Currently, PSC contains a library of 1974 surface types that include 25,857 functional surfaces identified from 24,170 bound structures. The search tool in PSC empowers users to explore related surfaces that share similar local structures and core functions. Each functional surface is characterized by structural attributes, which are geometric, physicochemical or evolutionary features. The attributes have been normalized as descriptors and integrated to produce a profile for each functional surface in PSC. In addition, binding ligands are recorded for comparisons among homologs. PSC allows users to exploit related binding surfaces to reveal the changes in functionally important residues on homologs that have led to functional divergence during evolution. The substitutions at the key residues of a spatial pattern may determine the functional evolution of a protein. In PSC (http://pocket.uchicago.edu/psc/), a pool of changes in residues on similar functional surfaces is provided.

  7. Supply chain planning classification

    Science.gov (United States)

    Hvolby, Hans-Henrik; Trienekens, Jacques; Bonde, Hans

    2001-10-01

    Industry experience a need to shift in focus from internal production planning towards planning in the supply network. In this respect customer oriented thinking becomes almost a common good amongst companies in the supply network. An increase in the use of information technology is needed to enable companies to better tune their production planning with customers and suppliers. Information technology opportunities and supply chain planning systems facilitate companies to monitor and control their supplier network. In spite if these developments, most links in today's supply chains make individual plans, because the real demand information is not available throughout the chain. The current systems and processes of the supply chains are not designed to meet the requirements now placed upon them. For long term relationships with suppliers and customers, an integrated decision-making process is needed in order to obtain a satisfactory result for all parties. Especially when customized production and short lead-time is in focus. An effective value chain makes inventory available and visible among the value chain members, minimizes response time and optimizes total inventory value held throughout the chain. In this paper a supply chain planning classification grid is presented based current manufacturing classifications and supply chain planning initiatives.

  8. Holistic facial expression classification

    Science.gov (United States)

    Ghent, John; McDonald, J.

    2005-06-01

    This paper details a procedure for classifying facial expressions. This is a growing and relatively new type of problem within computer vision. One of the fundamental problems when classifying facial expressions in previous approaches is the lack of a consistent method of measuring expression. This paper solves this problem by the computation of the Facial Expression Shape Model (FESM). This statistical model of facial expression is based on an anatomical analysis of facial expression called the Facial Action Coding System (FACS). We use the term Action Unit (AU) to describe a movement of one or more muscles of the face and all expressions can be described using the AU's described by FACS. The shape model is calculated by marking the face with 122 landmark points. We use Principal Component Analysis (PCA) to analyse how the landmark points move with respect to each other and to lower the dimensionality of the problem. Using the FESM in conjunction with Support Vector Machines (SVM) we classify facial expressions. SVMs are a powerful machine learning technique based on optimisation theory. This project is largely concerned with statistical models, machine learning techniques and psychological tools used in the classification of facial expression. This holistic approach to expression classification provides a means for a level of interaction with a computer that is a significant step forward in human-computer interaction.

  9. CLASSIFICATION OF CRIMINAL GROUPS

    Directory of Open Access Journals (Sweden)

    Natalia Romanova

    2013-06-01

    Full Text Available New types of criminal groups are emerging in modern society.  These types have their special criminal subculture. The research objective is to develop new parameters of classification of modern criminal groups, create a new typology of criminal groups and identify some features of their subculture. Research methodology is based on the system approach that includes using the method of analysis of documentary sources (materials of a criminal case, method of conversations with themembers of the criminal group, method of testing the members of the criminal group and method of observation. As a result of the conducted research, we have created a new classification of criminal groups. The first type is a lawful group in its form and criminal according to its content (i.e., its target is criminal enrichment. The second type is a criminal organization which is run by so-called "white-collars" that "remain in the shadow". The third type is traditional criminal groups.  The fourth type is the criminal group, which openly demonstrates its criminal activity.

  10. Sequence Classification: 890624 [

    Lifescience Database Archive (English)

    Full Text Available lso participates in regulation of metabolically unrelated genes as well as maintenance of mating efficiency and sporulation; Dep1p || http://www.ncbi.nlm.nih.gov/protein/41629667 ...

  11. Sequence Classification: 889932 [

    Lifescience Database Archive (English)

    Full Text Available ctor, involved in the expression of genes during nutrient limitation; also involved in the negative regulation of DPP1 and PHR1; Gis1p || http://www.ncbi.nlm.nih.gov/protein/6320301 ...

  12. Sequence Classification: 890604 [

    Lifescience Database Archive (English)

    Full Text Available scriptional coactivator SKIP, can activate transcription of a reporter gene; interacts with splicing factors Prp22p and Prp46p; Prp45p || http://www.ncbi.nlm.nih.gov/protein/6319287 ...

  13. Sequence Classification: 523339 [

    Lifescience Database Archive (English)

    Full Text Available nse regulator in two-component regulatory system with PhoQ, transcribes genes expressed under low Mg+ concentration (OmpR family) || http://www.ncbi.nlm.nih.gov/protein/62179752 ...

  14. Sequence Classification: 891052 [

    Lifescience Database Archive (English)

    Full Text Available entrations; Cup2p || http://www.ncbi.nlm.nih.gov/protein/6321272 ... ...ding transcription factor; activates transcription of the metallothionein genes CUP1-1 and CUP1-2 in response to elevated copper conc

  15. Homotopy Classification of Multiaxial Actions

    CERN Document Server

    Cappell, Sylvain; Yan, Min

    2011-01-01

    A U(n)-manifold is multiaxial if the isotropy groups are always conjugate to unitary subgroups. The classification and the concordance of such manifolds have been studied by Davis, Hsiang and Morgan under much more strict conditions. We show that in general, without much extra condition, the homotopy classification of multiaxial manifolds can be split into a direct sum of the classification of pairs of adjacent strata, which can be computed by the classical surgery theory. Moreover, we also compute the homotopy classification for the case of the standard representation sphere. We also present the result for the similar multiaxial Sp(n)-manifolds.

  16. Combined genetic and splicing analysis of BRCA1 c.[594-2A>C; 641A>G] highlights the relevance of naturally occurring in-frame transcripts for developing disease gene variant classification algorithms

    DEFF Research Database (Denmark)

    de la Hoya, Miguel; Soukarieh, Omar; López-perolio, Irene

    2016-01-01

    ,10 transcripts predicted to encode a BRCA1 protein with tumor suppression function.We confirm that BRCA1c.[594-2A > C;641A > G] should not be considered a high-risk pathogenic variant. Importantly, results from our detailed mRNA analysis suggest that BRCA-associated cancer risk is likely not markedly increased......A recent analysis using family history weighting and co-observation classification modeling indicated that BRCA1 c.594-2A > C (IVS9-2A > C), previously described to cause exon 10 skipping (a truncating alteration), displays characteristics inconsistent with those of a high risk pathogenic BRCA1...... for individuals who carry a truncating variant in BRCA1 exons 9 or 10, or any other BRCA1 allele that permits 20-30% of tumor suppressor function. More generally, our findings highlight the importance of assessing naturally occurring alternative splicing for clinical evaluation of variants in disease...

  17. Improving protein-protein interaction article classification using biological domain knowledge.

    Science.gov (United States)

    Chen, Yifei; Guo, Hongjian; Liu, Feng; Manderick, Bernard

    2015-01-01

    Interaction Article Classification (IAC) is a specific text classification application in biological domain that tries to find out which articles describe Protein-Protein Interactions (PPIs) to help extract PPIs from biological literature more efficiently. However, the existing text representation and feature weighting schemes commonly used for text classification are not well suited for IAC. We capture and utilise biological domain knowledge, i.e. gene mentions also known as protein or gene names in the articles, to address the problem. We put forward a new gene mention order-based approach that highlights the important role of gene mentions to represent the texts. Furthermore, we also incorporate the information concerning gene mentions into a novel feature weighting scheme called Gene Mention-based Term Frequency (GMTF). By conducting experiments, we show that using the proposed representation and weighting schemes, our Interaction Article Classifier (IACer) performs better than other leading systems for the moment.

  18. Transcriptome classification reveals molecular subtypes in psoriasis

    Directory of Open Access Journals (Sweden)

    Ainali Chrysanthi

    2012-09-01

    Full Text Available Abstract Background Psoriasis is an immune-mediated disease characterised by chronically elevated pro-inflammatory cytokine levels, leading to aberrant keratinocyte proliferation and differentiation. Although certain clinical phenotypes, such as plaque psoriasis, are well defined, it is currently unclear whether there are molecular subtypes that might impact on prognosis or treatment outcomes. Results We present a pipeline for patient stratification through a comprehensive analysis of gene expression in paired lesional and non-lesional psoriatic tissue samples, compared with controls, to establish differences in RNA expression patterns across all tissue types. Ensembles of decision tree predictors were employed to cluster psoriatic samples on the basis of gene expression patterns and reveal gene expression signatures that best discriminate molecular disease subtypes. This multi-stage procedure was applied to several published psoriasis studies and a comparison of gene expression patterns across datasets was performed. Conclusion Overall, classification of psoriasis gene expression patterns revealed distinct molecular sub-groups within the clinical phenotype of plaque psoriasis. Enrichment for TGFb and ErbB signaling pathways, noted in one of the two psoriasis subgroups, suggested that this group may be more amenable to therapies targeting these pathways. Our study highlights the potential biological relevance of using ensemble decision tree predictors to determine molecular disease subtypes, in what may initially appear to be a homogenous clinical group. The R code used in this paper is available upon request.

  19. Nonlinear estimation and classification

    CERN Document Server

    Hansen, Mark; Holmes, Christopher; Mallick, Bani; Yu, Bin

    2003-01-01

    Researchers in many disciplines face the formidable task of analyzing massive amounts of high-dimensional and highly-structured data This is due in part to recent advances in data collection and computing technologies As a result, fundamental statistical research is being undertaken in a variety of different fields Driven by the complexity of these new problems, and fueled by the explosion of available computer power, highly adaptive, non-linear procedures are now essential components of modern "data analysis," a term that we liberally interpret to include speech and pattern recognition, classification, data compression and signal processing The development of new, flexible methods combines advances from many sources, including approximation theory, numerical analysis, machine learning, signal processing and statistics The proposed workshop intends to bring together eminent experts from these fields in order to exchange ideas and forge directions for the future

  20. Transient Detection and Classification

    CERN Document Server

    Becker, Andrew C

    2008-01-01

    I provide an incomplete inventory of the astronomical variability that will be found by next-generation time-domain astronomical surveys. These phenomena span the distance range from near-Earth satellites to the farthest Gamma Ray Bursts. The surveys that detect these transients will issue alerts to the greater astronomical community; this decision process must be extremely robust to avoid a slew of ``false'' alerts, and to maintain the community's trust in the surveys. I review the functionality required of both the surveys and the telescope networks that will be following them up, and the role of VOEvents in this process. Finally, I offer some ideas about object and event classification, which will be explored more thoroughly by other articles in these proceedings.

  1. Spectral Classification Beyond M

    CERN Document Server

    Leggett, S K; Burgasser, A J; Jones, H R A; Marley, M S; Tsuji, T

    2004-01-01

    Significant populations of field L and T dwarfs are now known, and we anticipate the discovery of even cooler dwarfs by Spitzer and ground-based infrared surveys. However, as the number of known L and T dwarfs increases so does the range in their observational properties, and difficulties have arisen in interpreting the observations. Although modellers have made significant advances, the complexity of the very low temperature, high pressure, photospheres means that problems remain such as the treatment of grain condensation as well as incomplete and non-equilibrium molecular chemistry. Also, there are several parameters which control the observed spectral energy distribution - effective temperature, grain sedimentation efficiency, metallicity and gravity - and their effects are not well understood. In this paper, based on a splinter session, we discuss classification schemes for L and T dwarfs, their dependency on wavelength, and the effects of the parameters T_eff, f_sed, [m/H] and log g on optical and infra...

  2. Estuary Classification Revisited

    CERN Document Server

    Guha, Anirban

    2012-01-01

    The governing equations of a tidally averaged, width averaged, rectangular estuary has been investigated. It's theoretically shown that the dynamics of an estuary is entirely controlled by three parameters: (i) the Estuarine Froude number, (ii) the Tidal Froude number and (iii) the Estuarine Aspect ratio. The momentum, salinity and integral salt balance equations can be completely expressed in terms of these control variables. The estuary classification problem has also been reinvestigated. It's found that these three control variables can completely specify the estuary type. Comparison with real estuary data shows very good match. Additionally, we show that the well accepted leading order estuarine integral salt balance equation is inconsitent with the leading order salinity equation in an order of magnitude sense.

  3. Automated protein subfamily identification and classification.

    Directory of Open Access Journals (Sweden)

    Duncan P Brown

    2007-08-01

    Full Text Available Function prediction by homology is widely used to provide preliminary functional annotations for genes for which experimental evidence of function is unavailable or limited. This approach has been shown to be prone to systematic error, including percolation of annotation errors through sequence databases. Phylogenomic analysis avoids these errors in function prediction but has been difficult to automate for high-throughput application. To address this limitation, we present a computationally efficient pipeline for phylogenomic classification of proteins. This pipeline uses the SCI-PHY (Subfamily Classification in Phylogenomics algorithm for automatic subfamily identification, followed by subfamily hidden Markov model (HMM construction. A simple and computationally efficient scoring scheme using family and subfamily HMMs enables classification of novel sequences to protein families and subfamilies. Sequences representing entirely novel subfamilies are differentiated from those that can be classified to subfamilies in the input training set using logistic regression. Subfamily HMM parameters are estimated using an information-sharing protocol, enabling subfamilies containing even a single sequence to benefit from conservation patterns defining the family as a whole or in related subfamilies. SCI-PHY subfamilies correspond closely to functional subtypes defined by experts and to conserved clades found by phylogenetic analysis. Extensive comparisons of subfamily and family HMM performances show that subfamily HMMs dramatically improve the separation between homologous and non-homologous proteins in sequence database searches. Subfamily HMMs also provide extremely high specificity of classification and can be used to predict entirely novel subtypes. The SCI-PHY Web server at http://phylogenomics.berkeley.edu/SCI-PHY/ allows users to upload a multiple sequence alignment for subfamily identification and subfamily HMM construction. Biologists wishing to

  4. 15 CFR 2008.9 - Classification guides.

    Science.gov (United States)

    2010-01-01

    ... 15 Commerce and Foreign Trade 3 2010-01-01 2010-01-01 false Classification guides. 2008.9 Section... REPRESENTATIVE Derivative Classification § 2008.9 Classification guides. Classification guides shall be issued by... direct derivative classification, shall identify the information to be protected in specific and...

  5. 32 CFR 2400.15 - Classification guides.

    Science.gov (United States)

    2010-07-01

    ... 32 National Defense 6 2010-07-01 2010-07-01 false Classification guides. 2400.15 Section 2400.15... Derivative Classification § 2400.15 Classification guides. (a) OSTP shall issue and maintain classification guides to facilitate the proper and uniform derivative classification of information. These guides...

  6. 14 CFR 1203.412 - Classification guides.

    Science.gov (United States)

    2010-01-01

    ... 14 Aeronautics and Space 5 2010-01-01 2010-01-01 false Classification guides. 1203.412 Section... PROGRAM Guides for Original Classification § 1203.412 Classification guides. (a) General. A classification guide, based upon classification determinations made by appropriate program and...

  7. 7 CFR 27.34 - Classification procedure.

    Science.gov (United States)

    2010-01-01

    ... 7 Agriculture 2 2010-01-01 2010-01-01 false Classification procedure. 27.34 Section 27.34... REGULATIONS COTTON CLASSIFICATION UNDER COTTON FUTURES LEGISLATION Regulations Classification and Micronaire Determinations § 27.34 Classification procedure. Classification shall proceed as rapidly as possible, but...

  8. 22 CFR 9.6 - Derivative classification.

    Science.gov (United States)

    2010-04-01

    ... CFR 2001.22. (c) Department of State Classification Guide. The Department of State Classification... 22 Foreign Relations 1 2010-04-01 2010-04-01 false Derivative classification. 9.6 Section 9.6... classification. (a) Definition. Derivative classification is the incorporating, paraphrasing, restating...

  9. A Psychological Classification of Occupations.

    Science.gov (United States)

    Holland, John L.; And Others

    This occupational classification for practical and theoretical use in vocational guidance, occupational research, vocational education, and social science rests upon a theory of personality types and includes 431 common occupations which comprise about 95 percent of the United States labor force. Each of the classification's six main classes…

  10. Classification of Noncommutative Domain Algebras

    CERN Document Server

    Arias, Alvaro

    2012-01-01

    Noncommutative domain algebras are noncommutative analogues of the algebras of holomorphic functions on domains of $\\C^n$ defined by holomorphic polynomials, and they generalize the noncommutative Hardy algebras. We present here a complete classification of these algebras based upon techniques inspired by multivariate complex analysis, and more specifically the classification of domains in hermitian spaces up to biholomorphic equivalence.

  11. Classification Accuracy Is Not Enough

    DEFF Research Database (Denmark)

    Sturm, Bob L.

    2013-01-01

    A recent review of the research literature evaluating music genre recognition (MGR) systems over the past two decades shows that most works (81\\%) measure the capacity of a system to recognize genre by its classification accuracy. We show here, by implementing and testing three categorically diff...... classification accuracy obscures the aim of MGR: to select labels indistinguishable from those a person would choose....

  12. Classification of subtrochanteric femoral fractures.

    Science.gov (United States)

    Loizou, C L; McNamara, I; Ahmed, K; Pryor, G A; Parker, M J

    2010-07-01

    A review of the literature identified 15 different classification methods for subtrochanteric femoral fractures. Only eight of those classifications defined the area of bone, which constituted a subtrochanteric fracture. The actual length of femur defined as the subtrochanteric zone varied from 3 cm up to the level of the femoral isthmus. There was no agreement between the different classifications regarding the proximal and distal border or for those fractures, which traverse anatomical boundaries. In the various classifications, fractures were subdivided into 2-15 subgroups. The majority of the identified studies were unable to find the classifications useful in either determining treatment or predicting the outcome after treatment. We subdivided subtrochanteric fractures into three types based on the degree of fracture comminution. We examined the inter- and intra-observer agreement of our recommended classification. One orthopaedic consultant, one specialist hip fracture surgeon, two trainee registrar orthopaedic surgeons and one specialty trainee in orthopaedics, on two different occasions, 8 weeks apart, independently classified the radiographs of 20 patients with a subtrochanteric fracture. The mean kappa value for inter- and intra-observer variation was 0.71 and 0.79, respectively, with both showing substantial agreement and, therefore, this simpler classification is recommended. Based on the review of previous classification methods, we also recommend that the subtrochanteric zone be defined as the one in which the fracture line crossing the femur is predominantly within the area of bone extending 5 cm below the lower border of the lesser trochanter.

  13. Seismic texture classification. Final report

    Energy Technology Data Exchange (ETDEWEB)

    Vinther, R.

    1997-12-31

    The seismic texture classification method, is a seismic attribute that can both recognize the general reflectivity styles and locate variations from these. The seismic texture classification performs a statistic analysis for the seismic section (or volume) aiming at describing the reflectivity. Based on a set of reference reflectivities the seismic textures are classified. The result of the seismic texture classification is a display of seismic texture categories showing both the styles of reflectivity from the reference set and interpolations and extrapolations from these. The display is interpreted as statistical variations in the seismic data. The seismic texture classification is applied to seismic sections and volumes from the Danish North Sea representing both horizontal stratifications and salt diapers. The attribute succeeded in recognizing both general structure of successions and variations from these. Also, the seismic texture classification is not only able to display variations in prospective areas (1-7 sec. TWT) but can also be applied to deep seismic sections. The seismic texture classification is tested on a deep reflection seismic section (13-18 sec. TWT) from the Baltic Sea. Applied to this section the seismic texture classification succeeded in locating the Moho, which could not be located using conventional interpretation tools. The seismic texture classification is a seismic attribute which can display general reflectivity styles and deviations from these and enhance variations not found by conventional interpretation tools. (LN)

  14. Dewey Decimal Classification: A Quagmire.

    Science.gov (United States)

    Gamaluddin, Ahmad Fouad

    1980-01-01

    A survey of 660 Pennsylvania school librarians indicates that, though there is limited professional interest in the Library of Congress Classification system, Dewey Decimal Classification (DDC) appears to be firmly entrenched. This article also discusses the relative merits of DDC, the need for a uniform system, librarianship preparation, and…

  15. Classification of seizures and epilepsy.

    Science.gov (United States)

    Riviello, James J

    2003-07-01

    The management of seizures and epilepsy begins with forming a differential diagnosis, making the diagnosis, and then classifying seizure type and epileptic syndrome. Classification guides treatment, including ancillary testing, management, prognosis, and if needed, selection of the appropriate antiepileptic drug (AED). Many AEDs are available, and certain seizure types or epilepsy syndromes respond to specific AEDs. The identification of the genetics, molecular basis, and pathophysiologic mechanisms of epilepsy has resulted from classification of specific epileptic syndromes. The classification system used by the International League Against Epilepsy is periodically revised. The proposed revision changes the classification emphasis from the anatomic origin of seizures (focal vs generalized) to seizure semiology (ie, the signs or clinical manifestations). Modified systems have been developed for specific circumstances (eg, neonatal seizures, infantile seizures, status epilepticus, and epilepsy surgery). This article reviews seizure and epilepsy classification, emphasizing new data.

  16. Information Classification on University Websites

    DEFF Research Database (Denmark)

    Nawaz, Ather; Clemmensen, Torkil; Hertzum, Morten

    2011-01-01

    Websites are increasingly used as a medium for providing information to university students. The quality of a university website depends on how well the students’ information classification fits with the structure of the information on the website. This paper investigates the information...... classification of 14 Danish and 14 Pakistani students and compares it with the information classification of their university website. Brainstorming, card sorting, and task exploration activities were used to discover similarities and differences in the participating students’ classification of website...... information and their ability to navigate the websites. The results of the study indicate group differences in user classification and related taskperformance differences. The main implications of the study are that (a) the edit distance appears a useful measure in cross-country HCI research and practice...

  17. Information Classification on University Websites

    DEFF Research Database (Denmark)

    Nawaz, Ather; Clemmensen, Torkil; Hertzum, Morten

    2011-01-01

    Websites are increasingly used as a medium for providing information to university students. The quality of a university website depends on how well the students’ information classification fits with the structure of the information on the website. This paper investigates the information...... classification of 14 Danish and 14 Pakistani students and compares it with the information classification of their university website. Brainstorming, card sorting, and task exploration activities were used to discover similarities and differences in the participating students’ classification of website...... information and their ability to navigate the websites. The results of the study indicate group differences in user classification and related task-performance differences. The main implications of the study are that (a) the edit distance appears a useful measure in cross-country HCI research and practice...

  18. Agriculture classification using POLSAR data

    DEFF Research Database (Denmark)

    Skriver, Henning; Dall, Jørgen; Ferro-Famil, Laurent

    2005-01-01

    in the crop canopy, particularly between the response of the canopy itself and soil response. It is expected that PolInSAR data will add to the classification potential of POLSAR data by their sensitivity to the vertical distribution of scatterers. Different approaches have been used to classify SAR data...... content of the SAR data they attempt to generate robust, widely applicable methods, which are nonetheless capable of taking local conditions into account. In this paper a classification approach is presented, that uses a knowledge-based approach, where the crops are first classified into broad classes, i...... of the classification process is not as well established as the first part, and both a supervised approach and a knowledge-based approach have been evaluated. Both POLSAR and PolInSAR data may be included in the classification scheme. The classification approach has been evaluated using data from the Danish EMISAR...

  19. [Classification of viruses by computer].

    Science.gov (United States)

    Ageeva, O N; Andzhaparidze, O G; Kibardin, V M; Nazarova, G M; Pleteneva, E A

    1982-01-01

    The study used the information mass containing information on 83 viruses characterized by 41 markers. The suitability of one of the variants of cluster analysis for virus classification was demonstrated. It was established that certain stages of automatic allotment of viruses into groups by the degree of similarity of their properties end the formation of groups which consist of viruses sufficiently close to each other by their properties and are sufficiently isolated. Comparison of these groups with the classification proposed by the ICVT established their correspondence to individual families. Analysis of the obtained classification system permits sufficiently grounded conclusions to be drawn with regard to the classification position of certain viruses, the classification of which has not yet been completed by the ICVT.

  20. Information Classification on University Websites

    DEFF Research Database (Denmark)

    Nawaz, Ather; Clemmensen, Torkil; Hertzum, Morten

    2011-01-01

    Websites are increasingly used as a medium for providing information to university students. The quality of a university website depends on how well the students’ information classification fits with the structure of the information on the website. This paper investigates the information...... classification of 14 Danish and 14 Pakistani students and compares it with the information classification of their university website. Brainstorming, card sorting, and task exploration activities were used to discover similarities and differences in the participating students’ classification of website...... information and their ability to navigate the websites. The results of the study indicate group differences in user classification and related taskperformance differences. The main implications of the study are that (a) the edit distance appears a useful measure in cross-country HCI research and practice...

  1. Information Classification on University Websites

    DEFF Research Database (Denmark)

    Nawaz, Ather; Clemmensen, Torkil; Hertzum, Morten

    2011-01-01

    Websites are increasingly used as a medium for providing information to university students. The quality of a university website depends on how well the students’ information classification fits with the structure of the information on the website. This paper investigates the information...... classification of 14 Danish and 14 Pakistani students and compares it with the information classification of their university website. Brainstorming, card sorting and task exploration activities were used to discover similarities and differences in the participating students’ classification of website...... information and their ability to navigate the websites. The results of the study indicated group differences in user classification and related task performances differences. The main implications of the study were that (a) the edit distance appears a useful measure in cross-country HCI research and practice...

  2. Pathological Bases for a Robust Application of Cancer Molecular Classification

    Directory of Open Access Journals (Sweden)

    Salvador J. Diaz-Cano

    2015-04-01

    Full Text Available Any robust classification system depends on its purpose and must refer to accepted standards, its strength relying on predictive values and a careful consideration of known factors that can affect its reliability. In this context, a molecular classification of human cancer must refer to the current gold standard (histological classification and try to improve it with key prognosticators for metastatic potential, staging and grading. Although organ-specific examples have been published based on proteomics, transcriptomics and genomics evaluations, the most popular approach uses gene expression analysis as a direct correlate of cellular differentiation, which represents the key feature of the histological classification. RNA is a labile molecule that varies significantly according with the preservation protocol, its transcription reflect the adaptation of the tumor cells to the microenvironment, it can be passed through mechanisms of intercellular transference of genetic information (exosomes, and it is exposed to epigenetic modifications. More robust classifications should be based on stable molecules, at the genetic level represented by DNA to improve reliability, and its analysis must deal with the concept of intratumoral heterogeneity, which is at the origin of tumor progression and is the byproduct of the selection process during the clonal expansion and progression of neoplasms. The simultaneous analysis of multiple DNA targets and next generation sequencing offer the best practical approach for an analytical genomic classification of tumors.

  3. Identification and classification of genes regulated by phosphatidylinositol 3-kinase- and TRKB-mediated signalling pathways during neuronal differentiation in two subtypes of the human neuroblastoma cell line SH-SY5Y

    Directory of Open Access Journals (Sweden)

    Sakaki Yoshiyuki

    2008-10-01

    Full Text Available Abstract Background SH-SY5Y cells exhibit a neuronal phenotype when treated with all-trans retinoic acid (RA, but the molecular mechanism of activation in the signalling pathway mediated by phosphatidylinositol 3-kinase (PI3K is unclear. To investigate this mechanism, we compared the gene expression profiles in SK-N-SH cells and two subtypes of SH-SY5Y cells (SH-SY5Y-A and SH-SY5Y-E, each of which show a different phenotype during RA-mediated differentiation. Findings SH-SY5Y-A cells differentiated in the presence of RA, whereas RA-treated SH-SY5Y-E cells required additional treatment with brain-derived neurotrophic factor (BDNF for full differentiation. After exposing cells to a PI3K inhibitor, LY294002, we identified 386 genes and categorised these genes into two clusters dependent on the PI3K signalling pathway during RA-mediated differentiation in SH-SY5Y-A cells. Transcriptional regulation of the gene cluster, including 158 neural genes, was greatly reduced in SK-N-SH cells and partially impaired in SH-SY5Y-E cells, which is consistent with a defect in the neuronal phenotype of these cells. Additional stimulation with BDNF induced a set of neural genes that were down-regulated in RA-treated SH-SY5Y-E cells but were abundant in differentiated SH-SY5Y-A cells. Conclusion We identified gene clusters controlled by PI3K- and TRKB-mediated signalling pathways during the differentiation of two subtypes of SH-SY5Y cells. The TRKB-mediated bypass pathway compensates for impaired neural function generated by defects in several signalling pathways, including PI3K in SH-SY5Y-E cells. Our expression profiling data will be useful for further elucidation of the signal transduction-transcriptional network involving PI3K or TRKB.

  4. Genetic classification and distinguishing of Staphylococcus species based on different partial gap, 16S rRNA, hsp60, rpoB, sodA, and tuf gene sequences.

    Science.gov (United States)

    Ghebremedhin, B; Layer, F; König, W; König, B

    2008-03-01

    The analysis of 16S rRNA gene sequences has been the technique generally used to study the evolution and taxonomy of staphylococci. However, the results of this method do not correspond to the results of polyphasic taxonomy, and the related species cannot always be distinguished from each other. Thus, new phylogenetic markers for Staphylococcus spp. are needed. We partially sequenced the gap gene (approximately 931 bp), which encodes the glyceraldehyde-3-phosphate dehydrogenase, for 27 Staphylococcus species. The partial sequences had 24.3 to 96% interspecies homology and were useful in the identification of staphylococcal species (F. Layer, B. Ghebremedhin, W. König, and B. König, J. Microbiol. Methods 70:542-549, 2007). The DNA sequence similarities of the partial staphylococcal gap sequences were found to be lower than those of 16S rRNA (approximately 97%), rpoB (approximately 86%), hsp60 (approximately 82%), and sodA (approximately 78%). Phylogenetically derived trees revealed four statistically supported groups: S. hyicus/S. intermedius, S. sciuri, S. haemolyticus/S. simulans, and S. aureus/epidermidis. The branching of S. auricularis, S. cohnii subsp. cohnii, and the heterogeneous S. saprophyticus group, comprising S. saprophyticus subsp. saprophyticus and S. equorum subsp. equorum, was not reliable. Thus, the phylogenetic analysis based on the gap gene sequences revealed similarities between the dendrograms based on other gene sequences (e.g., the S. hyicus/S. intermedius and S. sciuri groups) as well as differences, e.g., the grouping of S. arlettae and S. kloosii in the gap-based tree. From our results, we propose the partial sequencing of the gap gene as an alternative molecular tool for the taxonomical analysis of Staphylococcus species and for decreasing the possibility of misidentification.

  5. Sequence Classification: 894156 [

    Lifescience Database Archive (English)

    Full Text Available on extends the mean and maximum life span of cells; Lag2p || http://www.ncbi.nlm.nih.gov/protein/6324548 ... ...n involved in determination of longevity; LAG2 gene is preferentially expressed in young cells; overexpressi

  6. Sequence Classification: 893787 [

    Lifescience Database Archive (English)

    Full Text Available ociation with SAGA and for H2B deubiquitylation; Sgf11p || http://www.ncbi.nlm.nih.gov/protein/6325210 ... ...it of SAGA histone acetyltransferase complex, regulates transcription of a subset of SAGA-regulated genes, required for the Ubp8p ass

  7. Sequence Classification: 889230 [

    Lifescience Database Archive (English)

    Full Text Available , primarily involved in telomere length regulation; contributes to cell cycle checkpoint control in response to DNA damage; functiona...lly redundant with Mec1p; homolog of human ataxia telangiectasia (ATM) gene; Tel1p || http://www.ncbi.nlm.nih.gov/protein/6319383 ...

  8. Sequence Classification: 891969 [

    Lifescience Database Archive (English)

    Full Text Available tor involved in the repression of GAL genes in the absence of galactose; inhibits transcriptional activation by Gal4p; inhibition rel...ieved by Gal3p or Gal1p binding; Gal80p || http://www.ncbi.nlm.nih.gov/protein/6323590 ...

  9. Sequence Classification: 894820 [

    Lifescience Database Archive (English)

    Full Text Available ruption does not increase the rate of spontaneous mutagenesis; Ham1p || http://www.ncbi.nlm.nih.gov/protein/6322529 ... ...n of unknown function that is involved in DNA repair; mutant is sensitive to the base analog, 6-N-hydroxylaminopurine, while gene dis

  10. Sequence Classification: 524859 [

    Lifescience Database Archive (English)

    Full Text Available Non-TMB Non-TMH Non-TMB Non-TMB Non-TMB Non-TMB >gi|62181272|ref|YP_217689.1| H inversion...: regulation of flagellar gene expression by site-specific inversion of DNA || http://www.ncbi.nlm.nih.gov/protein/62181272 ...

  11. Sequence Classification: 772467 [

    Lifescience Database Archive (English)

    Full Text Available protein mediating cell communication, sex-determining gene, promotes female development family member, TRAnsformer : XX animals trans...formed into males TRA-2 (170.4 kD) (tra-2) || http://www.ncbi.nlm.nih.gov/protein/17536595 ...

  12. Sequence Classification: 893287 [

    Lifescience Database Archive (English)

    Full Text Available ction; gene expression increases in cultures shifted to a lower temperature; Lot5p || http://www.ncbi.nlm.nih.gov/protein/6322665 ... ...TMB Non-TMH Non-TMB TMB TMB TMB >gi|6322665|ref|NP_012738.1| Protein of unknown fun

  13. Sequence Classification: 889823 [

    Lifescience Database Archive (English)

    Full Text Available dback control mechanism; RPN4 is transcriptionally regulated by various stress responses; Rpn4p || http://www.ncbi.nlm.nih.gov/protein/6320184 ... ...ion factor that stimulates expression of proteasome genes; Rpn4p levels are in turn regulated by the 26S proteasome in a negative fee

  14. 75 FR 10529 - Mail Classification Change

    Science.gov (United States)

    2010-03-08

    ... Mail Classification Change AGENCY: Postal Regulatory Commission. ACTION: Notice. SUMMARY: The... Classification Schedule. The change affects a change in terminology. This notice addresses procedural steps....90 et seq. concerning a change in classification which reflects a change in terminology from...

  15. 75 FR 70754 - Postal Classification Changes

    Science.gov (United States)

    2010-11-18

    ... Postal Classification Changes AGENCY: Postal Regulatory Commission. ACTION: Notice. SUMMARY: The Commission is noticing a recently-filed Postal Service request announcing a classification change affecting... Notice with the Commission announcing a classification change ] established by the Governors.\\1\\...

  16. Automatic Classification of Marine Mammals with Speaker Classification Methods.

    Science.gov (United States)

    Kreimeyer, Roman; Ludwig, Stefan

    2016-01-01

    We present an automatic acoustic classifier for marine mammals based on human speaker classification methods as an element of a passive acoustic monitoring (PAM) tool. This work is part of the Protection of Marine Mammals (PoMM) project under the framework of the European Defense Agency (EDA) and joined by the Research Department for Underwater Acoustics and Geophysics (FWG), Bundeswehr Technical Centre (WTD 71) and Kiel University. The automatic classification should support sonar operators in the risk mitigation process before and during sonar exercises with a reliable automatic classification result.

  17. Biomarker Selection and Classification of “-Omics” Data Using a Two-Step Bayes Classification Framework

    Directory of Open Access Journals (Sweden)

    Anunchai Assawamakin

    2013-01-01

    Full Text Available Identification of suitable biomarkers for accurate prediction of phenotypic outcomes is a goal for personalized medicine. However, current machine learning approaches are either too complex or perform poorly. Here, a novel two-step machine-learning framework is presented to address this need. First, a Naïve Bayes estimator is used to rank features from which the top-ranked will most likely contain the most informative features for prediction of the underlying biological classes. The top-ranked features are then used in a Hidden Naïve Bayes classifier to construct a classification prediction model from these filtered attributes. In order to obtain the minimum set of the most informative biomarkers, the bottom-ranked features are successively removed from the Naïve Bayes-filtered feature list one at a time, and the classification accuracy of the Hidden Naïve Bayes classifier is checked for each pruned feature set. The performance of the proposed two-step Bayes classification framework was tested on different types of -omics datasets including gene expression microarray, single nucleotide polymorphism microarray (SNParray, and surface-enhanced laser desorption/ionization time-of-flight (SELDI-TOF proteomic data. The proposed two-step Bayes classification framework was equal to and, in some cases, outperformed other classification methods in terms of prediction accuracy, minimum number of classification markers, and computational time.

  18. 水稻不育系的广亲和基因检测及籼粳型分析%Identification of S5-n gene and subspecies classification of indica and japonica in rice male sterile line

    Institute of Scientific and Technical Information of China (English)

    夏士健; 张启军; 杨杰; 吕川根

    2011-01-01

    为鉴定部分水稻不育系的广亲和性和籼粳型,根据籼粳亚种间广亲和基因S5-n在第6染色体存在136 bp片段缺失的特征,设计InDel标记S5136,对培矮64S、广占63S等不育系进行广亲和基因鉴定.利用与栽培稻籼、粳遗传分化密切相关的34个InDel标记,对这些不育系PCR产物的电泳结果进行判读和分析,计算其籼型或粳型基因频率(InDel分子指数法).结果表明:①三系不育系粤泰A具有广亲和基因S5-n.②两系不育系培矮64S、N111S、C815S等11个不育系具有广亲和基因S5-n.③目前广泛应用于生产的不育系培矮64S、广占63S、Y58S等的籼型基因频率均为0.76~0.91,据此认为具有10%~25%左右粳稻血缘的籼型不育系可能更有利于亚种间杂种优势利用.%According to the 136 bp deletion of wide compatibility gene (S5-n) in comparison with alleles in indica and japonica varieties at 55 locus, SSI36, an InDel marker was designed to identify S5-n gene in rice male sterile lines, Peiai64S, Cuangzhan63S, etc. With 34 InDel markers which related closely to genetic differentiation of rice indica or japonica variety, male sterile lines were analyzed for their indica or japonica gene frequency (InDel molecular method) by gel electrophoresis of PCR product. The results showed that:① A cytoplasm male sterility line, YuetaiA, possessed wide compatibility gene,S5-n.②Ten dual-purpose genie male sterile lines, such as Peiai64S, N111S, and C815S etc. Possessed S5-n gene.③Peiai64S, Cuangzhan63S, Y58S, etc. Which were widely used in the present rice production, displayed their indica gene frequency in a range of 0.76-0.91, from which it could be suggested that rice male sterile lines holding genes with 10% -25% of indica type might be more favorable for the utilization of heterosis of hybrid rice.

  19. Phylogenetic reconstruction of the wolf spiders (Araneae: Lycosidae) using sequences from the 12S rRNA, 28S rRNA, and NADH1 genes: implications for classification, biogeography, and the evolution of web building behavior.

    Science.gov (United States)

    Murphy, Nicholas P; Framenau, Volker W; Donnellan, Stephen C; Harvey, Mark S; Park, Yung-Chul; Austin, Andrew D

    2006-03-01

    Current knowledge of the evolutionary relationships amongst the wolf spiders (Araneae: Lycosidae) is based on assessment of morphological similarity or phylogenetic analysis of a small number of taxa. In order to enhance the current understanding of lycosid relationships, phylogenies of 70 lycosid species were reconstructed by parsimony and Bayesian methods using three molecular markers; the mitochondrial genes 12S rRNA, NADH1, and the nuclear gene 28S rRNA. The resultant trees from the mitochondrial markers were used to assess the current taxonomic status of the Lycosidae and to assess the evolutionary history of sheet-web construction in the group. The results suggest that a number of genera are not monophyletic, including Lycosa, Arctosa, Alopecosa, and Artoria. At the subfamilial level, the status of Pardosinae needs to be re-assessed, and the position of a number of genera within their respective subfamilies is in doubt (e.g., Hippasa and Arctosa in Lycosinae and Xerolycosa, Aulonia and Hygrolycosa in Venoniinae). In addition, a major clade of strictly Australasian taxa may require the creation of a new subfamily. The analysis of sheet-web building in Lycosidae revealed that the interpretation of this trait as an ancestral state relies on two factors: (1) an asymmetrical model favoring the loss of sheet-webs and (2) that the suspended silken tube of Pirata is directly descended from sheet-web building. Paralogous copies of the nuclear 28S rRNA gene were sequenced, confounding the interpretation of the phylogenetic analysis and suggesting that a cautionary approach should be taken to the further use of this gene for lycosid phylogenetic analysis.

  20. Radar clutter classification

    Science.gov (United States)

    Stehwien, Wolfgang

    1989-11-01

    The problem of classifying radar clutter as found on air traffic control radar systems is studied. An algorithm based on Bayes decision theory and the parametric maximum a posteriori probability classifier is developed to perform this classification automatically. This classifier employs a quadratic discriminant function and is optimum for feature vectors that are distributed according to the multivariate normal density. Separable clutter classes are most likely to arise from the analysis of the Doppler spectrum. Specifically, a feature set based on the complex reflection coefficients of the lattice prediction error filter is proposed. The classifier is tested using data recorded from L-band air traffic control radars. The Doppler spectra of these data are examined; the properties of the feature set computed using these data are studied in terms of both the marginal and multivariate statistics. Several strategies involving different numbers of features, class assignments, and data set pretesting according to Doppler frequency and signal to noise ratio were evaluated before settling on a workable algorithm. Final results are presented in terms of experimental misclassification rates and simulated and classified plane position indicator displays.

  1. Aircraft Operations Classification System

    Science.gov (United States)

    Harlow, Charles; Zhu, Weihong

    2001-01-01

    Accurate data is important in the aviation planning process. In this project we consider systems for measuring aircraft activity at airports. This would include determining the type of aircraft such as jet, helicopter, single engine, and multiengine propeller. Some of the issues involved in deploying technologies for monitoring aircraft operations are cost, reliability, and accuracy. In addition, the system must be field portable and acceptable at airports. A comparison of technologies was conducted and it was decided that an aircraft monitoring system should be based upon acoustic technology. A multimedia relational database was established for the study. The information contained in the database consists of airport information, runway information, acoustic records, photographic records, a description of the event (takeoff, landing), aircraft type, and environmental information. We extracted features from the time signal and the frequency content of the signal. A multi-layer feed-forward neural network was chosen as the classifier. Training and testing results were obtained. We were able to obtain classification results of over 90 percent for training and testing for takeoff events.

  2. Molecular Classification of Medulloblastoma

    Science.gov (United States)

    KIJIMA, Noriyuki; KANEMURA, Yonehiro

    2016-01-01

    Medulloblastoma (MB) is one of the most frequent malignant brain tumors in children. The current standard treatment regimen consists of surgical resection, craniospinal irradiation, and adjuvant chemotherapy. Although these treatments have the potential to increase the survival of 70–80% of patients with MB, they are also associated with serious treatment-induced morbidity. The current risk stratification of MB is based on clinical factors, including age at presentation, metastatic status, and the presence of residual tumor following resection. In addition, recent genomic studies indicate that MB consists of at least four distinct molecular subgroups: WNT, sonic hedgehog (SHH), Group 3, and Group 4. WNT and SHH MBs are characterized by aberrations in the WNT and SHH signaling pathways, respectively. WNT MB has the best prognosis compared to the other MBs, while SHH MB has an intermediate prognosis. The underlying signaling pathways associated with Group 3 and 4 MBs have not been identified. Group 3 MB is frequently associated with metastasis, resulting in a poor prognosis, while Group 4 is sometimes associated with metastasis and has an intermediate prognosis. Group 4 is the most frequent MB and represents 35% of all MBs. These findings suggest that MB is a heterogeneous disease, and that MB subgroups have distinct molecular, demographic, and clinical characteristics. The molecular classification of MBs is redefining the risk stratification of patients with MB, and has the potential to identify new therapeutic strategies for the treatment of MB. PMID:27238212

  3. Classification of Rainbows

    Science.gov (United States)

    Adams, Peter; Ricard, Jean; Barckicke, Jean

    2016-04-01

    Rainbows are the most beautiful and most spectacular optical atmospheric phenomenon. Humphreys (1964) pointedly noted that "the "explanations" generally given of the rainbow [ in textbooks] may well be said to explain beautifully that which does not occur, and to leave unexplained which does" . . . "The records of close observations of rainbows soon show that not even the colors are always the same". Textbooks stress that the main factor affecting the aspect of the rainbow is the radius of the water droplets. In his well-known textbook entitled "the nature of light & colour in the open air", Minnaert (1954) gives the chief features of the rainbow depending on the diameter of the drops producing it. For this study, we have gathered hundreds of pictures of primary bows. We sort out the pictures into classes. The classes are defined in a such way that rainbows belonging to the same class look similar. Our results are surprising and do not confirm Minnaert's classification. In practice, the size of the water droplets is only a minor factor controlling the overall aspect of the rainbow. The main factor appears to be the height of the sun above the horizon. At sunset, the width of the red band increases, while the width of the other bands of colours decreases. The orange, the violet, the blue and the green bands disappear completely in this order. At the end, the primary bow is mainly red and slightly yellow. Picture = taken from the CNRM in Toulouse after a summer storm (Jean Ricard)

  4. Classification of Building Object Types

    DEFF Research Database (Denmark)

    Jørgensen, Kaj Asbjørn

    2011-01-01

    Development of the existing classification systems has been very difficult and time consuming tasks, where many considerations have been taken and many compromises have been made. The results reveal that, although the theoretical foundation was clarified, many deviations and shortcuts have been...... made. This is certainly the case in the Danish development. Based on the theories about these abstraction mechanisms, the basic principles for classification systems are presented and the observed misconceptions are analyses and explained. Furthermore, it is argued that the purpose of classification...

  5. Music classification with MPEG-7

    Science.gov (United States)

    Crysandt, Holger; Wellhausen, Jens

    2003-01-01

    Driven by increasing amount of music available electronically the need and possibility of automatic classification systems for music becomes more and more important. Currently most search engines for music are based on textual descriptions like artist or/and title. This paper presents a system for automatic music description, classification and visualization for a set of songs. The system is designed to extract significant features of a piece of music in order to find songs of similar genre or a similar sound characteristics. The description is done with the help of MPEG-7 only. The classification and visualization is done with the self organizing map algorithm.

  6. Is classification necessary after Google?

    DEFF Research Database (Denmark)

    Hjørland, Birger

    2012-01-01

    and purposes. Evidence-based practice provides an example of the importance of classifying documents according to research methods. Originality/value – Solving both the practical (organisational) and the theoretical problems facing classification is necessary if the field is to survive both as a practice...... – At the practical level, there is a need to provide high-quality control mechanisms. At the theoretical level, there is a need to establish the basis of each decision, and to change the philosophy of classification from being based on “standardisation” to being based on classifications tailored to different domains...

  7. Sequence Classification: 885394 [

    Lifescience Database Archive (English)

    Full Text Available 703); The expression pattern of this gene is described in PMID:12000842; possible frameshift detected when compared...Non-TMB TMH Non-TMB Non-TMB Non-TMB Non-TMB >gi|23619146|ref|NP_705108.1| Slight di...fference exist when compared to the published sequence of EBL-1 from Dd2 strain of P. falciparum (PMID:10613

  8. Differential prioritization between relevance and redundancy in correlation-based feature selection techniques for multiclass gene expression data

    Directory of Open Access Journals (Sweden)

    Chetty Madhu

    2006-06-01

    Full Text Available Abstract Background Due to the large number of genes in a typical microarray dataset, feature selection looks set to play an important role in reducing noise and computational cost in gene expression-based tissue classification while improving accuracy at the same time. Surprisingly, this does not appear to be the case for all multiclass microarray datasets. The reason is that many feature selection techniques applied on microarray datasets are either rank-based and hence do not take into account correlations between genes, or are wrapper-based, which require high computational cost, and often yield difficult-to-reproduce results. In studies where correlations between genes are considered, attempts to establish the merit of the proposed techniques are hampered by evaluation procedures which are less than meticulous, resulting in overly optimistic estimates of accuracy. Results We present two realistically evaluated correlation-based feature selection techniques which incorporate, in addition to the two existing criteria involved in forming a predictor set (relevance and redundancy, a third criterion called the degree of differential prioritization (DDP. DDP functions as a parameter to strike the balance between relevance and redundancy, providing our techniques with the novel ability to differentially prioritize the optimization of relevance against redundancy (and vice versa. This ability proves useful in producing optimal classification accuracy while using reasonably small predictor set sizes for nine well-known multiclass microarray datasets. Conclusion For multiclass microarray datasets, especially the GCM and NCI60 datasets, DDP enables our filter-based techniques to produce accuracies better than those reported in previous studies which employed similarly realistic evaluation procedures.

  9. Genes and Gene Therapy

    Science.gov (United States)

    ... correctly, a child can have a genetic disorder. Gene therapy is an experimental technique that uses genes to ... or prevent disease. The most common form of gene therapy involves inserting a normal gene to replace an ...

  10. Biogeography based Satellite Image Classification

    CERN Document Server

    Panchal, V K; Kaur, Navdeep; Kundra, Harish

    2009-01-01

    Biogeography is the study of the geographical distribution of biological organisms. The mindset of the engineer is that we can learn from nature. Biogeography Based Optimization is a burgeoning nature inspired technique to find the optimal solution of the problem. Satellite image classification is an important task because it is the only way we can know about the land cover map of inaccessible areas. Though satellite images have been classified in past by using various techniques, the researchers are always finding alternative strategies for satellite image classification so that they may be prepared to select the most appropriate technique for the feature extraction task in hand. This paper is focused on classification of the satellite image of a particular land cover using the theory of Biogeography based Optimization. The original BBO algorithm does not have the inbuilt property of clustering which is required during image classification. Hence modifications have been proposed to the original algorithm and...

  11. VT Biodiversity Project - Bedrock Classification

    Data.gov (United States)

    Vermont Center for Geographic Information — (Link to Metadata) This dataset is a five category, nine sub-category classification of the bedrock units appearing on the Centennial Geologic Map of Vermont. The...

  12. Hazard classification or risk assessment

    DEFF Research Database (Denmark)

    Hass, Ulla

    2013-01-01

    The EU classification of substances for e.g. reproductive toxicants is hazard based and does not to address the risk suchsubstances may pose through normal, or extreme, use. Such hazard classification complies with the consumer's right to know. It is also an incentive to careful use and storage...... and to substitute with less toxic compounds. Actually, if exposure is constant across product class, producersmay make substitution decisions based on hazard. Hazard classification is also useful during major accidents where there is no time for risk assessment and the exposure is likely to be substantial enough...... be a poor substitute for a proper risk assessment as low potency substances can constitute a risk if the exposure is high enough and vice versa. Examples illustrating the strength and limitations of hazard classification, risk assessment and toxicological potency will be presented with focus on reproductive...

  13. Text Classification using Artificial Intelligence

    CERN Document Server

    Kamruzzaman, S M

    2010-01-01

    Text classification is the process of classifying documents into predefined categories based on their content. It is the automated assignment of natural language texts to predefined categories. Text classification is the primary requirement of text retrieval systems, which retrieve texts in response to a user query, and text understanding systems, which transform text in some way such as producing summaries, answering questions or extracting data. Existing supervised learning algorithms for classifying text need sufficient documents to learn accurately. This paper presents a new algorithm for text classification using artificial intelligence technique that requires fewer documents for training. Instead of using words, word relation i.e. association rules from these words is used to derive feature set from pre-classified text documents. The concept of na\\"ive Bayes classifier is then used on derived features and finally only a single concept of genetic algorithm has been added for final classification. A syste...

  14. The last classification of vasculitis

    NARCIS (Netherlands)

    Kallenberg, Cees G. M.

    2008-01-01

    Systemic vasculitides are a group of diverse conditions characterized by inflammation of the blood vessels. To obtain homogeneity in clinical characteristics, prognosis, and response to treatment, patients with vasculitis should be classified into defined disease categories. Many classification

  15. Handwriting Classification in Forensic Science.

    Science.gov (United States)

    Ansell, Michael

    1979-01-01

    Considers systems for the classification of handwriting features, discusses computer storage of information about handwriting features, and summarizes recent studies that give an idea of the range of forensic handwriting research. (GT)

  16. [Definition and classification of epilepsy].

    Science.gov (United States)

    Jibiki, Itsuki

    2014-05-01

    The concept or definition of epilepsy was mentioned as a chronic disease of the brain consisting of repetitions of EEG paroxysm and clinical seizures caused by excessive discharges of the cerebral neurons, in reference with Gastaut's opinion and the other statements. Further, we referred to diseases to be excluded from epilepsy such as isolated, occasional and subclinical seizures and so on. Next, new classifications of seizures and epilepsies were explained on the basis of revised terminology and concepts for organization of seizures and epilepsies in Report of the ILAE Communication in Classification and Terminology, 2005-09, in comparison with the Classification of Epileptic Seizures in 1981 and the Classification of Epilepsies and Epileptic Syndromes in 1989.

  17. Text Classification using Data Mining

    CERN Document Server

    Kamruzzaman, S M; Hasan, Ahmed Ryadh

    2010-01-01

    Text classification is the process of classifying documents into predefined categories based on their content. It is the automated assignment of natural language texts to predefined categories. Text classification is the primary requirement of text retrieval systems, which retrieve texts in response to a user query, and text understanding systems, which transform text in some way such as producing summaries, answering questions or extracting data. Existing supervised learning algorithms to automatically classify text need sufficient documents to learn accurately. This paper presents a new algorithm for text classification using data mining that requires fewer documents for training. Instead of using words, word relation i.e. association rules from these words is used to derive feature set from pre-classified text documents. The concept of Naive Bayes classifier is then used on derived features and finally only a single concept of Genetic Algorithm has been added for final classification. A system based on the...

  18. The classification on short message

    Institute of Scientific and Technical Information of China (English)

    2007-01-01

    This paper discusses the importance of the classification of short message, and details some key technologies related. Through implementing a fundamental prototype, some basic models and technical references are provided.

  19. Classification and clinical assessment

    Directory of Open Access Journals (Sweden)

    F. Cantini

    2012-06-01

    Full Text Available There are at least nine classification criteria for psoriatic arthritis (PsA that have been proposed and used in clinical studies. With the exception of the ESSG and Bennett rules, all of the other criteria sets have a good performance in identifying PsA patients. As the CASPAR criteria are based on a robust study methodology, they are considered the current reference standard. However, if there seems to be no doubt that they are very good to classify PsA patients (very high specificity, they might be not sensitive enough to diagnose patients with unknown early PsA. The vast clinical heterogeneity of PsA makes its assessment very challenging. Peripheral joint involvement is measured by 78/76 joint counts, spine involvement by the instruments used for ankylosing spondylitis (AS, dactylitis by involved digit count or by the Leeds dactylitis index, enthesitis by the number of affected entheses (several indices available and psoriasis by the Psoriasis Area and Severity Index (PASI. Peripheral joint damage can be assessed by a modified van der Heijde-Sharp scoring system and axial damage by the methods used for AS or by the Psoriatic Arthritis Spondylitis Radiology Index (PASRI. As in other arthritides, global evaluation of disease activity and severity by patient and physician and assessment of disability and quality of life are widely used. Finally, composite indices that capture several clinical manifestations of PsA have been proposed and a new instrument, the Psoriatic ARthritis Disease Activity Score (PASDAS, is currently being developed.

  20. Personality: Description, Classification and Evaluation

    OpenAIRE

    Ibrahim Taymur; M. Hakan TURKCAPAR

    2012-01-01

    Many descriptions and classifications of personality have been made to understand and acknowledge human being through out the history. During the developmental process of psychiatry, almost every school defined and assessed personality regarding to their own perspective. As DSM (Diagnostical and Statistical Manual of Mental Disorders) and ICD (International Classification of Diseases) being available to common usage, scientists conducted studies to set a common terminology for personality. Ca...

  1. Integrated classification of inflammatory myopathies.

    Science.gov (United States)

    Allenbach, Y; Benveniste, O; Goebel, H-H; Stenzel, W

    2017-02-01

    Inflammatory myopathies comprise a multitude of diverse diseases, most often occurring in complex clinical settings. To ensure accurate diagnosis, multidisciplinary expertise is required. Here, we propose a comprehensive myositis classification that incorporates clinical, morphological and molecular data as well as autoantibody profile. This review focuses on recent advances in myositis research, in particular, the correlation between autoantibodies and morphological or clinical phenotypes that can be used as the basis for an 'integrated' classification system.

  2. Product Work Classification and Coding

    Science.gov (United States)

    1986-06-01

    detail is much more useful in planning steel welding processes. In this regard remember that mild steel , HSLA steel , and high-yield steel (e.g. HY80 ...manufacturing facility. In Figure 2.3-2, a classification and coding system for steel parts is shown. This classification and coding system sorts steel parts...system would provide a shop which produced steel parts with a means of organizing parts. Rather than attempting to manage all of its parts as a single

  3. Application of machine learning on brain cancer multiclass classification

    Science.gov (United States)

    Panca, V.; Rustam, Z.

    2017-07-01

    Classification of brain cancer is a problem of multiclass classification. One approach to solve this problem is by first transforming it into several binary problems. The microarray gene expression dataset has the two main characteristics of medical data: extremely many features (genes) and only a few number of samples. The application of machine learning on microarray gene expression dataset mainly consists of two steps: feature selection and classification. In this paper, the features are selected using a method based on support vector machine recursive feature elimination (SVM-RFE) principle which is improved to solve multiclass classification, called multiple multiclass SVM-RFE. Instead of using only the selected features on a single classifier, this method combines the result of multiple classifiers. The features are divided into subsets and SVM-RFE is used on each subset. Then, the selected features on each subset are put on separate classifiers. This method enhances the feature selection ability of each single SVM-RFE. Twin support vector machine (TWSVM) is used as the method of the classifier to reduce computational complexity. While ordinary SVM finds single optimum hyperplane, the main objective Twin SVM is to find two non-parallel optimum hyperplanes. The experiment on the brain cancer microarray gene expression dataset shows this method could classify 71,4% of the overall test data correctly, using 100 and 1000 genes selected from multiple multiclass SVM-RFE feature selection method. Furthermore, the per class results show that this method could classify data of normal and MD class with 100% accuracy.

  4. Expression-based functional investigation of the organ-specific microRNAs in Arabidopsis.

    Directory of Open Access Journals (Sweden)

    Yijun Meng

    Full Text Available MicroRNAs (miRNAs play a pivotal role in plant development. The expression patterns of the miRNA genes significantly influence their regulatory activities. By utilizing small RNA (sRNA high-throughput sequencing (HTS data, the miRNA expression patterns were investigated in four organs (flowers, leaves, roots and seedlings of Arabidopsis. Based on a set of criteria, dozens of organ-specific miRNAs were discovered. A dominant portion of the organ-specific miRNAs identified from the ARGONAUTE 4-enriched sRNA HTS libraries were highly expressed in flowers. Additionally, the expression of the precursors of the organ-specific miRNAs was analyzed. Degradome sequencing data-based approach was employed to identify the targets of the organ-specific miRNAs. The miRNA-target interactions were used for network construction. Subnetwork analysis unraveled some novel regulatory cascades, such as the feedback regulation mediated by miR161, the potential self-regulation of the genes miR172, miR396, miR398 and miR860, and the miR863-guided cleavage of the SERRATE transcript. Our bioinformatics survey expanded the organ-specific miRNA-target list in Arabidopsis, and could deepen the biological view of the miRNA expression and their regulatory roles.

  5. 28 CFR 345.20 - Position classification.

    Science.gov (United States)

    2010-07-01

    ... 28 Judicial Administration 2 2010-07-01 2010-07-01 false Position classification. 345.20 Section... INDUSTRIES (FPI) INMATE WORK PROGRAMS Position Classification § 345.20 Position classification. (a) Inmate... the objectives and principles of pay classification as a part of the routine orientation of new...

  6. 32 CFR 2400.6 - Classification levels.

    Science.gov (United States)

    2010-07-01

    ... 32 National Defense 6 2010-07-01 2010-07-01 false Classification levels. 2400.6 Section 2400.6... Original Classification § 2400.6 Classification levels. (a) National security information (hereinafter... three authorized classification levels, such as “Secret Sensitive” or “Agency Confidential.” The...

  7. 32 CFR 2001.15 - Classification guides.

    Science.gov (United States)

    2010-07-01

    ... 32 National Defense 6 2010-07-01 2010-07-01 false Classification guides. 2001.15 Section 2001.15..., NATIONAL ARCHIVES AND RECORDS ADMINISTRATION CLASSIFIED NATIONAL SECURITY INFORMATION Classification § 2001.15 Classification guides. (a) Preparation of classification guides. Originators of...

  8. 7 CFR 28.911 - Review classification.

    Science.gov (United States)

    2010-01-01

    ... 7 Agriculture 2 2010-01-01 2010-01-01 false Review classification. 28.911 Section 28.911... REGULATIONS COTTON CLASSING, TESTING, AND STANDARDS Cotton Classification and Market News Service for Producers Classification § 28.911 Review classification. (a) A producer may request one...

  9. 32 CFR 2700.22 - Classification guides.

    Science.gov (United States)

    2010-07-01

    ... 32 National Defense 6 2010-07-01 2010-07-01 false Classification guides. 2700.22 Section 2700.22... SECURITY INFORMATION REGULATIONS Derivative Classification § 2700.22 Classification guides. OMSN shall issue classification guides pursuant to section 2-2 of E.O. 12065. These guides, which shall be used...

  10. 37 CFR 2.85 - Classification schedules.

    Science.gov (United States)

    2010-07-01

    ... 37 Patents, Trademarks, and Copyrights 1 2010-07-01 2010-07-01 false Classification schedules. 2..., DEPARTMENT OF COMMERCE RULES OF PRACTICE IN TRADEMARK CASES Classification § 2.85 Classification schedules. (a) International classification system. Section 6.1 of this chapter sets forth the...

  11. 7 CFR 51.2284 - Size classification.

    Science.gov (United States)

    2010-01-01

    ... 7 Agriculture 2 2010-01-01 2010-01-01 false Size classification. 51.2284 Section 51.2284... Size classification. The following classifications are provided to describe the size of any lot... shall conform to the requirements of the specified classification as defined below: (a) Halves....

  12. 22 CFR 9.8 - Classification challenges.

    Science.gov (United States)

    2010-04-01

    ... 22 Foreign Relations 1 2010-04-01 2010-04-01 false Classification challenges. 9.8 Section 9.8 Foreign Relations DEPARTMENT OF STATE GENERAL SECURITY INFORMATION REGULATIONS § 9.8 Classification... classification status is improper are expected and encouraged to challenge the classification status of...

  13. 49 CFR 8.17 - Classification challenges.

    Science.gov (United States)

    2010-10-01

    ... 49 Transportation 1 2010-10-01 2010-10-01 false Classification challenges. 8.17 Section 8.17 Transportation Office of the Secretary of Transportation CLASSIFIED INFORMATION: CLASSIFICATION/DECLASSIFICATION/ACCESS Classification/Declassification of Information § 8.17 Classification challenges. (a)...

  14. 10 CFR 61.55 - Waste classification.

    Science.gov (United States)

    2010-01-01

    ... 10 Energy 2 2010-01-01 2010-01-01 false Waste classification. 61.55 Section 61.55 Energy NUCLEAR... Requirements for Land Disposal Facilities § 61.55 Waste classification. (a) Classification of waste for near surface disposal—(1) Considerations. Determination of the classification of radioactive waste involves...

  15. 45 CFR 601.5 - Derivative classification.

    Science.gov (United States)

    2010-10-01

    ... 45 Public Welfare 3 2010-10-01 2010-10-01 false Derivative classification. 601.5 Section 601.5... CLASSIFICATION AND DECLASSIFICATION OF NATIONAL SECURITY INFORMATION § 601.5 Derivative classification. Distinct from “original” classification is the determination that information is in substance the same...

  16. Current terminology and diagnostic classification schemes.

    Science.gov (United States)

    Okeson, J P

    1997-01-01

    This article reviews the current terminology and classification schemes available for temporomandibular disorders. The origin of each term is presented, and the classification schemes that have been offered for temporomandibular disorders are briefly reviewed. Several important classifications are presented in more detail, with mention of advantages and disadvantages. Final recommendations are provided for future direction in the area of classification schemes.

  17. Classification of Dukes' B and C colorectal cancers using expression arrays

    DEFF Research Database (Denmark)

    Frederiksen, C.M.; Knudsen, Steen; Laurberg, S.;

    2003-01-01

    Purpose. Colorectal cancer is one of the most common malignancies. Substaging of the cancer is of importance not only to prognosis but also to treatment. Classification of substages based on DNA microarray technology is currently the most promising approach. We therefore investigated if gene...... expression microarrays could be used to classify colorectal tumors. Methods. We used the Affymetrix oligonucleotide arrays to analyze the expression of more than 5,000 genes in samples from the sigmoid and upper rectum of the left colon. Five samples were from normal mucosa and five samples from each......' A and D could not be classified correctly. A number of interesting gene clusters showed a discriminating difference between Dukes' B and C samples. These included mitochondrial genes, stromal remodeling genes, and genes related to cell adhesion. Conclusion. Molecular classification based on gene...

  18. HIV classification using coalescent theory

    Energy Technology Data Exchange (ETDEWEB)

    Zhang, Ming [Los Alamos National Laboratory; Letiner, Thomas K [Los Alamos National Laboratory; Korber, Bette T [Los Alamos National Laboratory

    2008-01-01

    Algorithms for subtype classification and breakpoint detection of HIV-I sequences are based on a classification system of HIV-l. Hence, their quality highly depend on this system. Due to the history of creation of the current HIV-I nomenclature, the current one contains inconsistencies like: The phylogenetic distance between the subtype B and D is remarkably small compared with other pairs of subtypes. In fact, it is more like the distance of a pair of subsubtypes Robertson et al. (2000); Subtypes E and I do not exist any more since they were discovered to be composed of recombinants Robertson et al. (2000); It is currently discussed whether -- instead of CRF02 being a recombinant of subtype A and G -- subtype G should be designated as a circulating recombination form (CRF) nd CRF02 as a subtype Abecasis et al. (2007); There are 8 complete and over 400 partial HIV genomes in the LANL-database which belong neither to a subtype nor to a CRF (denoted by U). Moreover, the current classification system is somehow arbitrary like all complex classification systems that were created manually. To this end, it is desirable to deduce the classification system of HIV systematically by an algorithm. Of course, this problem is not restricted to HIV, but applies to all fast mutating and recombining viruses. Our work addresses the simpler subproblem to score classifications of given input sequences of some virus species (classification denotes a partition of the input sequences in several subtypes and CRFs). To this end, we reconstruct ancestral recombination graphs (ARG) of the input sequences under restrictions determined by the given classification. These restritions are imposed in order to ensure that the reconstructed ARGs do not contradict the classification under consideration. Then, we find the ARG with maximal probability by means of Markov Chain Monte Carlo methods. The probability of the most probable ARG is interpreted as a score for the classification. To our

  19. NIM: a node influence based method for cancer classification.

    Science.gov (United States)

    Wang, Yiwen; Yao, Min; Yang, Jianhua

    2014-01-01

    The classification of different cancer types owns great significance in the medical field. However, the great majority of existing cancer classification methods are clinical-based and have relatively weak diagnostic ability. With the rapid development of gene expression technology, it is able to classify different kinds of cancers using DNA microarray. Our main idea is to confront the problem of cancer classification using gene expression data from a graph-based view. Based on a new node influence model we proposed, this paper presents a novel high accuracy method for cancer classification, which is composed of four parts: the first is to calculate the similarity matrix of all samples, the second is to compute the node influence of training samples, the third is to obtain the similarity between every test sample and each class using weighted sum of node influence and similarity matrix, and the last is to classify each test sample based on its similarity between every class. The data sets used in our experiments are breast cancer, central nervous system, colon tumor, prostate cancer, acute lymphoblastic leukemia, and lung cancer. experimental results showed that our node influence based method (NIM) is more efficient and robust than the support vector machine, K-nearest neighbor, C4.5, naive Bayes, and CART.

  20. Current Trends in the Molecular Classification of Renal Neoplasms

    Directory of Open Access Journals (Sweden)

    Andrew N. Young

    2006-01-01

    Full Text Available Renal cell carcinoma (RCC is the most common form of kidney cancer in adults. RCC is a significant challenge for pathologic diagnosis and clinical management. The primary approach to diagnosis is by light microscopy, using the World Health Organization (WHO classification system, which defines histopathologic tumor subtypes with distinct clinical behavior and underlying genetic mutations. However, light microscopic diagnosis of RCC subtypes is often difficult due to variable histology. In addition, the clinical behavior of RCC is highly variable and therapeutic response rates are poor. Few clinical assays are available to predict outcome in RCC or correlate behavior with histology. Therefore, novel RCC classification systems based on gene expression should be useful for diagnosis, prognosis, and treatment. Recent microarray studies have shown that renal tumors are characterized by distinct gene expression profiles, which can be used to discover novel diagnostic and prognostic biomarkers. Here, we review clinical features of kidney cancer, the WHO classification system, and the growing role of molecular classification for diagnosis, prognosis, and therapy of this disease.

  1. NIM: A Node Influence Based Method for Cancer Classification

    Directory of Open Access Journals (Sweden)

    Yiwen Wang

    2014-01-01

    Full Text Available The classification of different cancer types owns great significance in the medical field. However, the great majority of existing cancer classification methods are clinical-based and have relatively weak diagnostic ability. With the rapid development of gene expression technology, it is able to classify different kinds of cancers using DNA microarray. Our main idea is to confront the problem of cancer classification using gene expression data from a graph-based view. Based on a new node influence model we proposed, this paper presents a novel high accuracy method for cancer classification, which is composed of four parts: the first is to calculate the similarity matrix of all samples, the second is to compute the node influence of training samples, the third is to obtain the similarity between every test sample and each class using weighted sum of node influence and similarity matrix, and the last is to classify each test sample based on its similarity between every class. The data sets used in our experiments are breast cancer, central nervous system, colon tumor, prostate cancer, acute lymphoblastic leukemia, and lung cancer. experimental results showed that our node influence based method (NIM is more efficient and robust than the support vector machine, K-nearest neighbor, C4.5, naive Bayes, and CART.

  2. Gene Expression Analysis of an EGFR Indirectly Related Pathway Identified PTEN and MMP9 as Reliable Diagnostic Markers for Human Glial Tumor Specimens

    Directory of Open Access Journals (Sweden)

    Sergio Comincini

    2009-01-01

    Full Text Available In this study the mRNA levels of five EGFR indirectly related genes, EGFR, HB-EGF, ADAM17, PTEN, and MMP9, have been assessed by Real-time PCR in a panel of 37 glioblastoma multiforme specimens and in 5 normal brain samples; as a result, in glioblastoma, ADAM17 and PTEN expression was significantly lower than in normal brain samples, and, in particular, a statistically significant inverse correlation was found between PTEN and MMP9 mRNA levels. To verify if this correlation was conserved in gliomas, PTEN and MMP9 expression was further investigated in an additional panel of 16 anaplastic astrocytoma specimens and, in parallel, in different human normal and astrocytic tumor cell lines. In anaplastic astrocytomas PTEN expression was significantly higher than in glioblastoma multiforme, but no significant correlation was found between PTEN and MMP9 expression. PTEN and MMP9 mRNA levels were also employed to identify subgroups of specimens within the different glioma malignancy grades and to define a gene expression-based diagnostic classification scheme. In conclusion, this gene expression survey highlighted that the combined measurement of PTEN and MMP9 transcripts might represent a novel reliable tool for the differential diagnosis of high-grade gliomas, and it also suggested a functional link involving these genes in glial tumors.

  3. The Mechanistic Approach to Psychiatric Classification

    Directory of Open Access Journals (Sweden)

    Elisabetta Sirgiovanni

    2009-01-01

    Full Text Available A Kuhnian reformulation of the recent debate in psychiatric nosography suggested that the current psychiatric classification system (the DSM is in crisis and that a sort of paradigm shift is awaited (Aragona, 2009. Among possible revolutionary alternatives, the proposed fi ve-axes etiopathogenetic taxonomy (Charney et al., 2002 emphasizes the primacy of the genotype over the phenomenological level as the relevant basis for psychiatric nosography. Such a position is along the lines of the micro-reductionist perspective of E. Kandel (1998, 1999, which sees mental disorders reducible to explanations at a fundamental epistemic level of genes and neurotransmitters. This form of micro-reductionism has been criticized as a form of genetic-molecular fundamentalism (e.g. Murphy, 2006 and a multi-level approach, in the form of the burgeoning Cognitive Neuropsychiatry, was proposed. This article focuses on multi-level mechanistic explanations, coming from Cognitive Science, as a possible alternative etiopathogenetic basis for psychiatric classification. The idea of a mechanistic approach to psychiatric taxonomy is here defended on the basis of a better conception of levels and causality. Nevertheless some critical remarks of Mechanism as a psychiatric general view are also offered.

  4. Diagnosis and classification of juvenile idiopathic arthritis.

    Science.gov (United States)

    Eisenstein, Eli M; Berkun, Yackov

    2014-01-01

    In recent years, it has become increasingly clear that the term Juvenile Idiopathic Arthritis (JIA) comprises not one disease but several. Moreover, recent studies strongly suggest that some of these clinico-pathophysiologic entities appear to cross current diagnostic categories. The ultimate goal of the JIA classification is to facilitate development of better, more specific therapy for different forms of disease though improved understanding of pathophysiology. The past two decades have witnessed significant advances in treatment and improved outcomes for many children with chronic arthritis. However, understanding of the basic biologic processes underlying these diseases remains far from complete. As a result, even the best biologic agents of today represent "halfway technologies". Because they do not treat fundamental biologic processes, they are inherently expensive, need to be given for a long time in order to ameliorate the adverse effects of chronic inflammation, and do not cure the disease. Pediatric rheumatology is now entering an era in which diagnostic categories may need to change to keep up with discovery. A more precise, biologically based classification is likely to contribute to development of more specific and improved treatments for the various forms of childhood arthritis. In this review, we discuss how genetic, gene expression, and immunologic findings have begun to influence how these diseases are understood and classified.

  5. Multiclass gene selection using Pareto-fronts.

    Science.gov (United States)

    Rajapakse, Jagath C; Mundra, Piyushkumar A

    2013-01-01

    Filter methods are often used for selection of genes in multiclass sample classification by using microarray data. Such techniques usually tend to bias toward a few classes that are easily distinguishable from other classes due to imbalances of strong features and sample sizes of different classes. It could therefore lead to selection of redundant genes while missing the relevant genes, leading to poor classification of tissue samples. In this manuscript, we propose to decompose multiclass ranking statistics into class-specific statistics and then use Pareto-front analysis for selection of genes. This alleviates the bias induced by class intrinsic characteristics of dominating classes. The use of Pareto-front analysis is demonstrated on two filter criteria commonly used for gene selection: F-score and KW-score. A significant improvement in classification performance and reduction in redundancy among top-ranked genes were achieved in experiments with both synthetic and real-benchmark data sets.

  6. Isolated dentinogenesis imperfecta and dentin dysplasia: revision of the classification.

    Science.gov (United States)

    de La Dure-Molla, Muriel; Philippe Fournier, Benjamin; Berdal, Ariane

    2015-04-01

    Dentinogenesis imperfecta is an autosomal dominant disease characterized by severe hypomineralization of dentin and altered dentin structure. Dentin extra cellular matrix is composed of 90% of collagen type I and 10% of non-collagenous proteins among which dentin sialoprotein (DSP), dentin glycoprotein (DGP) and dentin phosphoprotein (DPP) are crucial in dentinogenesis. These proteins are encoded by a single gene: dentin sialophosphoprotein (DSPP) and undergo several post-translational modifications such as glycosylation and phosphorylation to contribute and to control mineralization. Human mutations of this DSPP gene are responsible for three isolated dentinal diseases classified by Shield in 1973: type II and III dentinogenesis imperfecta and type II dentin dysplasia. Shield classification was based on clinical phenotypes observed in patient. Genetics results show now that these three diseases are a severity variation of the same pathology. So this review aims to revise and to propose a new classification of the isolated forms of DI to simplify diagnosis for practitioners.

  7. Evolution and classification of the CRISPR-Cas systems

    Science.gov (United States)

    S. Makarova, Kira; H. Haft, Daniel; Barrangou, Rodolphe; J. J. Brouns, Stan; Charpentier, Emmanuelle; Horvath, Philippe; Moineau, Sylvain; J. M. Mojica, Francisco; I. Wolf, Yuri; Yakunin, Alexander F.; van der Oost, John; V. Koonin, Eugene

    2012-01-01

    The CRISPR–Cas (clustered regularly interspaced short palindromic repeats–CRISPR-associated proteins) modules are adaptive immunity systems that are present in many archaea and bacteria. These defence systems are encoded by operons that have an extraordinarily diverse architecture and a high rate of evolution for both the cas genes and the unique spacer content. Here, we provide an updated analysis of the evolutionary relationships between CRISPR–Cas systems and Cas proteins. Three major types of CRISPR–Cas system are delineated, with a further division into several subtypes and a few chimeric variants. Given the complexity of the genomic architectures and the extremely dynamic evolution of the CRISPR–Cas systems, a unified classification of these systems should be based on multiple criteria. Accordingly, we propose a `polythetic' classification that integrates the phylogenies of the most common cas genes, the sequence and organization of the CRISPR repeats and the architecture of the CRISPR–cas loci. PMID:21552286

  8. Annotation and Classification of CRISPR-Cas Systems.

    Science.gov (United States)

    Makarova, Kira S; Koonin, Eugene V

    2015-01-01

    The clustered regularly interspaced short palindromic repeats (CRISPR)-Cas (CRISPR-associated proteins) is a prokaryotic adaptive immune system that is represented in most archaea and many bacteria. Among the currently known prokaryotic defense systems, the CRISPR-Cas genomic loci show unprecedented complexity and diversity. Classification of CRISPR-Cas variants that would capture their evolutionary relationships to the maximum possible extent is essential for comparative genomic and functional characterization of this theoretically and practically important system of adaptive immunity. To this end, a multipronged approach has been developed that combines phylogenetic analysis of the conserved Cas proteins with comparison of gene repertoires and arrangements in CRISPR-Cas loci. This approach led to the current classification of CRISPR-Cas systems into three distinct types and ten subtypes for each of which signature genes have been identified. Comparative genomic analysis of the CRISPR-Cas systems in new archaeal and bacterial genomes performed over the 3 years elapsed since the development of this classification makes it clear that new types and subtypes of CRISPR-Cas need to be introduced. Moreover, this classification system captures only part of the complexity of CRISPR-Cas organization and evolution, due to the intrinsic modularity and evolutionary mobility of these immunity systems, resulting in numerous recombinant variants. Moreover, most of the cas genes evolve rapidly, complicating the family assignment for many Cas proteins and the use of family profiles for the recognition of CRISPR-Cas subtype signatures. Further progress in the comparative analysis of CRISPR-Cas systems requires integration of the most sensitive sequence comparison tools, protein structure comparison, and refined approaches for comparison of gene neighborhoods.

  9. Molecular Classification of indica-japonica Rice According to Wide-compatibility Gene S5 Sequence with Endonuclease CEL Ⅰ%利用CELⅠ酶鉴定水稻广亲和基因S5位点的籼粳属性

    Institute of Scientific and Technical Information of China (English)

    倪深; 肖文斐; 陈红旗; 王跃星; 朱旭东

    2012-01-01

    广亲和基因S5的克隆为水稻品种亲和特性的鉴定提供了分子学依据.利用PCR技术对水稻S5位点中籼、粳序列间存在差异的区段进行扩增,并结合芹菜核酸内切酶CEL Ⅰ对其进行酶切鉴定,从而区分S5不同的基因型.研究结果表明该方法能准确鉴定不同水稻品种中S5位点的籼粳特性,并能发现新的差异序列.%Cloning of the major locus S5. a wide-compatibility gene has laid a molecular basis for identification of the wide-compatibility varieties(WCV). According to the difference in S5 locus sequence among indica, japonica and WCV. PCR was conducted and celery endonuclease CEL I, which cleaves DNA with high specificity at sites of base-substitution mismatch, was used to digest PCR products so as to distinguish various S5 DNA fragments. The results showed that the method is accurate in indica-japonica classification. At the same time, 8 new genotype on the SS locus was found.

  10. Automatic Hierarchical Color Image Classification

    Directory of Open Access Journals (Sweden)

    Jing Huang

    2003-02-01

    Full Text Available Organizing images into semantic categories can be extremely useful for content-based image retrieval and image annotation. Grouping images into semantic classes is a difficult problem, however. Image classification attempts to solve this hard problem by using low-level image features. In this paper, we propose a method for hierarchical classification of images via supervised learning. This scheme relies on using a good low-level feature and subsequently performing feature-space reconfiguration using singular value decomposition to reduce noise and dimensionality. We use the training data to obtain a hierarchical classification tree that can be used to categorize new images. Our experimental results suggest that this scheme not only performs better than standard nearest-neighbor techniques, but also has both storage and computational advantages.

  11. Automated Periodontal Diseases Classification System

    Directory of Open Access Journals (Sweden)

    Aliaa A. A. Youssif

    2012-01-01

    Full Text Available This paper presents an efficient and innovative system for automated classification of periodontal diseases, The strength of our technique lies in the fact that it incorporates knowledge from the patients' clinical data, along with the features automatically extracted from the Haematoxylin and Eosin (H&E stained microscopic images. Our system uses image processing techniques based on color deconvolution, morphological operations, and watershed transforms for epithelium & connective tissue segmentation, nuclear segmentation, and extraction of the microscopic immunohistochemical features for the nuclei, dilated blood vessels & collagen fibers. Also, Feedforward Backpropagation Artificial Neural Networks are used for the classification process. We report 100% classification accuracy in correctly identifying the different periodontal diseases observed in our 30 samples dataset.

  12. Classification of Medical Brain Images

    Institute of Scientific and Technical Information of China (English)

    Pan Haiwei(潘海为); Li Jianzhong; Zhang Wei

    2003-01-01

    Since brain tumors endanger people's living quality and even their lives, the accuracy of classification becomes more important. Conventional classifying techniques are used to deal with those datasets with characters and numbers. It is difficult, however, to apply them to datasets that include brain images and medical history (alphanumeric data), especially to guarantee the accuracy. For these datasets, this paper combines the knowledge of medical field and improves the traditional decision tree. The new classification algorithm with the direction of the medical knowledge not only adds the interaction with the doctors, but also enhances the quality of classification. The algorithm has been used on real brain CT images and a precious rule has been gained from the experiments. This paper shows that the algorithm works well for real CT data.

  13. Rock suitability classification RSC 2012

    Energy Technology Data Exchange (ETDEWEB)

    McEwen, T. (ed.) [McEwen Consulting, Leicester (United Kingdom); Kapyaho, A. [Geological Survey of Finland, Espoo (Finland); Hella, P. [Saanio and Riekkola, Helsinki (Finland); Aro, S.; Kosunen, P.; Mattila, J.; Pere, T.

    2012-12-15

    This report presents Posiva's Rock Suitability Classification (RSC) system, developed for locating suitable rock volumes for repository design and construction. The RSC system comprises both the revised rock suitability criteria and the procedure for the suitability classification during the construction of the repository. The aim of the classification is to avoid such features of the host rock that may be detrimental to the favourable conditions within the repository, either initially or in the long term. This report also discusses the implications of applying the RSC system for the fulfilment of the regulatory requirements concerning the host rock as a natural barrier and the site's overall suitability for hosting a final repository of spent nuclear fuel.

  14. Agriculture classification using POLSAR data

    DEFF Research Database (Denmark)

    Skriver, Henning; Dall, Jørgen; Ferro-Famil, Laurent

    2005-01-01

    , and a very important class of algorithms is the knowledge-based approaches. Here, generic characteristics of different cover types are derived by combining physical reasoning with the available empirical evidence. These are then used to define classification rules. Because of their emphasis on the physical...... of their components) show strongly preferred orientations, such as the stalks or ears of cereals. The importance of SAR polarimetry in crop classification arises principally because polarisation is sen-sitive to orientation. Hence it provides a means to distinguish crops with different canopy archi-tectures. Detailed...... in the crop canopy, particularly between the response of the canopy itself and soil response. It is expected that PolInSAR data will add to the classification potential of POLSAR data by their sensitivity to the vertical distribution of scatterers. Different approaches have been used to classify SAR data...

  15. Modulation of gene expression made easy

    DEFF Research Database (Denmark)

    Solem, Christian; Jensen, Peter Ruhdal

    2002-01-01

    A new approach for modulating gene expression, based on randomization of promoter (spacer) sequences, was developed. The method was applied to chromosomal genes in Lactococcus lactis and shown to generate libraries of clones with broad ranges of expression levels of target genes. In one example...... beta-glucuronidase, resulting in an operon structure in which both genes are transcribed from a common promoter. We show that there is a linear correlation between the expressions of the two genes, which facilitates screening for mutants with suitable enzyme activities. In a second example, we show......, overexpression was achieved by introducing an additional gene copy into a phage attachment site on the chromosome. This resulted in a series of strains with phosphofructokinase activities from 1.4 to 11 times the wild-type activity level. In this example, the pfk gene was cloned upstream of a gusA gene encoding...

  16. Classification of Mycoplasma synoviae strains using single-strand conformation polymorphism and high-resolution melting-curve analysis of the vlhA gene single-copy region.

    Science.gov (United States)

    Jeffery, Nathan; Gasser, Robin B; Steer, Penelope A; Noormohammadi, Amir H

    2007-08-01

    Mycoplasma synoviae is an economically important pathogen of poultry worldwide, causing respiratory infection and synovitis in chickens and turkeys. Identification of M. synoviae isolates is of critical importance, particularly in countries in which poultry flocks are vaccinated with the live attenuated M. synoviae strain MS-H. Using oligonucleotide primers complementary to the single-copy conserved 5' end of the variable lipoprotein and haemagglutinin gene (vlhA), amplicons of approximately 400 bp were generated from 35 different M. synoviae strains/isolates from chickens and subjected to mutation scanning analysis. Analysis of the amplicons by single-strand conformation polymorphism (SSCP) revealed 10 distinct profiles (A-J). Sequencing of the amplicons representing these profiles revealed that each profile related to a unique sequence, some differing from each other by only one base-pair substitution. Comparative high-resolution melting (HRM) curve analysis of the amplicons using SYTO 9 green fluorescent dye also displayed profiles which were concordant with the same 10 SSCP profiles (A-J) and their sequences. For both mutation detection methods, the Australian M. synoviae strains represented one of the A, B, C or D profiles, while the USA strains represented one of the E, F, G, H, I or J profiles. The results presented in this study show that the PCR-based SSCP or HRM curve analyses of vlhA provide high-resolution mutation detection tools for the detection and identification of M. synoviae strains. In particular, the HRM curve analysis is a rapid and effective technique which can be performed in a single test tube in less than 2 h.

  17. SHIP CLASSIFICATION FROM MULTISPECTRAL VIDEOS

    Directory of Open Access Journals (Sweden)

    Frederique Robert-Inacio

    2012-05-01

    Full Text Available Surveillance of a seaport can be achieved by different means: radar, sonar, cameras, radio communications and so on. Such a surveillance aims, on the one hand, to manage cargo and tanker traffic, and, on the other hand, to prevent terrorist attacks in sensitive areas. In this paper an application to video-surveillance of a seaport entrance is presented, and more particularly, the different steps enabling to classify mobile shapes. This classification is based on a parameter measuring the similarity degree between the shape under study and a set of reference shapes. The classification result describes the considered mobile in terms of shape and speed.

  18. Facial aging: A clinical classification

    Directory of Open Access Journals (Sweden)

    Shiffman Melvin

    2007-01-01

    Full Text Available The purpose of this classification of facial aging is to have a simple clinical method to determine the severity of the aging process in the face. This allows a quick estimate as to the types of procedures that the patient would need to have the best results. Procedures that are presently used for facial rejuvenation include laser, chemical peels, suture lifts, fillers, modified facelift and full facelift. The physician is already using his best judgment to determine which procedure would be best for any particular patient. This classification may help to refine these decisions.

  19. Action information from classification learning.

    Science.gov (United States)

    Ross, Brian H; Wang, Ranxiao Frances; Kramer, Arthur F; Simons, Daniel J; Crowell, James A

    2007-06-01

    Much of our learning comes from interacting with objects. Two experiments investigated whether or not arbitrary actions used during category learning with objects might be incorporated into object representations and influence later recognition judgments. In a virtual-reality chamber, participants used distinct arm movements to make different classification responses. During a recognition test phase, these same objects required arm movements that were consistent or inconsistent with the classification movement. In both experiments, consistent movements were facilitated relative to inconsistent movements, suggesting that arbitrary action information is incorporated into the representations.

  20. Small-scale classification schemes

    DEFF Research Database (Denmark)

    Hertzum, Morten

    2004-01-01

    . While coordination mechanisms focus on how classification schemes enable cooperation among people pursuing a common goal, boundary objects embrace the implicit consequences of classification schemes in situations involving conflicting goals. Moreover, the requirements specification focused on functional...... requirements and provided little information about why these requirements were considered relevant. This stands in contrast to the discussions at the project meetings where the software engineers made frequent use of both abstract goal descriptions and concrete examples to make sense of the requirements....... This difference between the written requirements specification and the oral discussions at the meetings may help explain software engineers’ general preference for people, rather than documents, as their information sources....

  1. Proteomic classification of breast cancer.

    LENUS (Irish Health Repository)

    Kamel, Dalia

    2012-11-01

    Being a significant health problem that affects patients in various age groups, breast cancer has been extensively studied to date. Recently, molecular breast cancer classification has advanced significantly with the availability of genomic profiling technologies. Proteomic technologies have also advanced from traditional protein assays including enzyme-linked immunosorbent assay, immunoblotting and immunohistochemistry to more comprehensive approaches including mass spectrometry and reverse phase protein lysate arrays (RPPA). The purpose of this manuscript is to review the current protein markers that influence breast cancer prediction and prognosis and to focus on novel advances in proteomic classification of breast cancer.

  2. Classification of remotely sensed images

    CSIR Research Space (South Africa)

    Dudeni, N

    2008-10-01

    Full Text Available (s)) is the data vector for a pixel located at s θ(s) is an unknown ground class to which pixel s belongs Objective is to classify the pixel at location s to the one of the k clusters Classification of remotely sensed images N. Dudeni, P. Debba...(s) is an unknown ground class to which pixel s belongs Objective is to classify the pixel at location s to the one of the k clusters Classification of remotely sensed images N. Dudeni, P. Debba Introduction to Remote Sensing Introduction to Image...

  3. 78 FR 68983 - Cotton Futures Classification: Optional Classification Procedure

    Science.gov (United States)

    2013-11-18

    ... intended to have retroactive effect. There are no administrative procedures that must be exhausted prior to... classification services, $3.50 per bale, is less than one percent of the average value of a bale of cotton; (4) The fee for this service will not affect competition in the marketplace; (5) The...

  4. Classifications for cesarean section: a systematic review.

    Directory of Open Access Journals (Sweden)

    Maria Regina Torloni

    Full Text Available BACKGROUND: Rising cesarean section (CS rates are a major public health concern and cause worldwide debates. To propose and implement effective measures to reduce or increase CS rates where necessary requires an appropriate classification. Despite several existing CS classifications, there has not yet been a systematic review of these. This study aimed to 1 identify the main CS classifications used worldwide, 2 analyze advantages and deficiencies of each system. METHODS AND FINDINGS: Three electronic databases were searched for classifications published 1968-2008. Two reviewers independently assessed classifications using a form created based on items rated as important by international experts. Seven domains (ease, clarity, mutually exclusive categories, totally inclusive classification, prospective identification of categories, reproducibility, implementability were assessed and graded. Classifications were tested in 12 hypothetical clinical case-scenarios. From a total of 2948 citations, 60 were selected for full-text evaluation and 27 classifications identified. Indications classifications present important limitations and their overall score ranged from 2-9 (maximum grade =14. Degree of urgency classifications also had several drawbacks (overall scores 6-9. Woman-based classifications performed best (scores 5-14. Other types of classifications require data not routinely collected and may not be relevant in all settings (scores 3-8. CONCLUSIONS: This review and critical appraisal of CS classifications is a methodologically sound contribution to establish the basis for the appropriate monitoring and rational use of CS. Results suggest that women-based classifications in general, and Robson's classification, in particular, would be in the best position to fulfill current international and local needs and that efforts to develop an internationally applicable CS classification would be most appropriately placed in building upon this

  5. Synthesis of Facial Image with Expression Based on Muscular Contraction Parameters Using Linear Muscle and Sphincter Muscle

    Science.gov (United States)

    Ahn, Seonju; Ozawa, Shinji

    We aim to synthesize individual facial image with expression based on muscular contraction parameters. We have proposed a method of calculating the muscular contraction parameters from arbitrary face image without using learning for each individual. As a result, we could generate not only individual facial expression, but also the facial expressions of various persons. In this paper, we propose the muscle-based facial model; the facial muscles define both the linear and the novel sphincter. Additionally, we propose a method of synthesizing individual facial image with expression based on muscular contraction parameters. First, the individual facial model with expression is generated by fitting using the arbitrary face image. Next, the muscular contraction parameters are calculated that correspond to the expression displacement of the input face image. Finally, the facial expression is synthesized by the vertex displacements of a neutral facial model based on calculated muscular contraction parameters. Experimental results reveal that the novel sphincter muscle can synthesize facial expressions of the facial image, which corresponds to the actual face image with arbitrary and mouth or eyes expression.

  6. 羊水 ABH血型物质测定与 ABO 血型基因分型%Detection of amniotic fluid ABH blood group substances and ABO blood type gene classification

    Institute of Scientific and Technical Information of China (English)

    陈江; 逯心敏; 郭渝; 胡伟

    2014-01-01

    Objective To detect amniotic fluid ABH blood group substances and ABO blood group genotype by the polymerase chain reaction with sequence-specific primers(PCR-SSP) to increase the prenatal diagnosis of fetal ABO blood group .Methods 53 pregnant women with gestational age 16 -25 weeks were selected .Amniotic fluid was extracted for detecting ABH blood group substances by the serological indirect agglutinating reaction ;the amniotic fluid cells were separated for extracting DNA .Then the PCR-SSP technique was adopted to analyze the ABO blood group genotypes .Results 16 specimens of amniotic fluid were non-se-creting type phenotype(30 .2% ) and 37 specimens of amniotic fluid were secreting type phenotype (69 .8% );48 specimens of amni-otic fluid were detected out the ABO blood group genotype by the PCR-SSP method .ABO blood group of fetal amniotic fluid cells by the gene identification was consistent to the detection results of amniotic fluid secreting type ABH blood group substances .Con-clusion The PCR-SSP technique can accurately detect the fetal amniotic fluid cells ABO blood group .%目的:通过检测羊水ABH血型物质和序列特异性引物-聚合酶链反应(PCR-SSP)基因技术检测胎儿羊水细胞ABO血型基因型,鉴定胎儿ABO血型。方法选取妊娠16~25周的孕妇53例,抽取羊水,利用间接凝集实验测定羊水AB H血型物质;将羊水细胞进行分离,提取羊水细胞DNA ,运用PCR-SSP技术分析其ABO血型基因型。结果16例羊水标本为非分泌型,占30.2%,37例羊水标本为分泌型,占69.8%;48例羊水标本通过PCR-SSP方法检测出了ABO血型的基因型。经基因鉴定的胎儿羊水细胞ABO血型与羊水分泌型ABH血型物质检测结果一致。结论 PCR-SSP技术可以准确地检测胎儿羊水细胞的ABO血型。

  7. Supervised Ensemble Classification of Kepler Variable Stars

    CERN Document Server

    Bass, Gideon

    2016-01-01

    Variable star analysis and classification is an important task in the understanding of stellar features and processes. While historically classifications have been done manually by highly skilled experts, the recent and rapid expansion in the quantity and quality of data has demanded new techniques, most notably automatic classification through supervised machine learning. We present an expansion of existing work on the field by analyzing variable stars in the {\\em Kepler} field using an ensemble approach, combining multiple characterization and classification techniques to produce improved classification rates. Classifications for each of the roughly 150,000 stars observed by {\\em Kepler} are produced separating the stars into one of 14 variable star classes.

  8. Prediction of Breast Cancer using Rule Based Classification

    Directory of Open Access Journals (Sweden)

    Nagendra Kumar SINGH

    2015-12-01

    Full Text Available The current work proposes a model for prediction of breast cancer using the classification approach in data mining. The proposed model is based on various parameters, including symptoms of breast cancer, gene mutation and other risk factors causing breast cancer. Mutations have been predicted in breast cancer causing genes with the help of alignment of normal and abnormal gene sequences; then predicting the class label of breast cancer (risky or safe on the basis of IF-THEN rules, using Genetic Algorithm (GA. In this work, GA has used variable gene encoding mechanisms for chromosomes encoding, uniform population generations and selects two chromosomes by Roulette-Wheel selection technique for two-point crossover, which gives better solutions. The performance of the model is evaluated using the F score measure, Matthews Correlation Coefficient (MCC and Receiver Operating Characteristic (ROC by plotting points (Sensitivity V/s 1- Specificity.

  9. Tumor-specific gene expression patterns with gene expression profiles

    Institute of Scientific and Technical Information of China (English)

    RUAN Xiaogang; LI Yingxin; LI Jiangeng; GONG Daoxiong; WANG Jinlian

    2006-01-01

    Gene expression profiles of 14 common tumors and their counterpart normal tissues were analyzed with machine learning methods to address the problem of selection of tumor-specific genes and analysis of their differential expressions in tumor tissues. First, a variation of the Relief algorithm, "RFE_Relief algorithm" was proposed to learn the relations between genes and tissue types. Then, a support vector machine was employed to find the gene subset with the best classification performance for distinguishing cancerous tissues and their counterparts. After tissue-specific genes were removed, cross validation experiments were employed to demonstrate the common deregulated expressions of the selected gene in tumor tissues. The results indicate the existence of a specific expression fingerprint of these genes that is shared in different tumor tissues, and the hallmarks of the expression patterns of these genes in cancerous tissues are summarized at the end of this paper.

  10. CREST--classification resources for environmental sequence tags.

    Directory of Open Access Journals (Sweden)

    Anders Lanzén

    Full Text Available Sequencing of taxonomic or phylogenetic markers is becoming a fast and efficient method for studying environmental microbial communities. This has resulted in a steadily growing collection of marker sequences, most notably of the small-subunit (SSU ribosomal RNA gene, and an increased understanding of microbial phylogeny, diversity and community composition patterns. However, to utilize these large datasets together with new sequencing technologies, a reliable and flexible system for taxonomic classification is critical. We developed CREST (Classification Resources for Environmental Sequence Tags, a set of resources and tools for generating and utilizing custom taxonomies and reference datasets for classification of environmental sequences. CREST uses an alignment-based classification method with the lowest common ancestor algorithm. It also uses explicit rank similarity criteria to reduce false positives and identify novel taxa. We implemented this method in a web server, a command line tool and the graphical user interfaced program MEGAN. Further, we provide the SSU rRNA reference database and taxonomy SilvaMod, derived from the publicly available SILVA SSURef, for classification of sequences from bacteria, archaea and eukaryotes. Using cross-validation and environmental datasets, we compared the performance of CREST and SilvaMod to the RDP Classifier. We also utilized Greengenes as a reference database, both with CREST and the RDP Classifier. These analyses indicate that CREST performs better than alignment-free methods with higher recall rate (sensitivity as well as precision, and with the ability to accurately identify most sequences from novel taxa. Classification using SilvaMod performed better than with Greengenes, particularly when applied to environmental sequences. CREST is freely available under a GNU General Public License (v3 from http://apps.cbu.uib.no/crest and http://lcaclassifier.googlecode.com.

  11. Genetic classification and molecular mechanisms of primary dystonia

    Institute of Scientific and Technical Information of China (English)

    Xueping Chen; Huifang Shang; Zuming Luo

    2008-01-01

    BACKGROUND: Primary dystonia is a heterogeneous disease, with a complex genetic basis. In previous studies, primary dystonia was classified according to age of onset, involved regions, and other clinical characteristics. With the development of molecular genetics, new virulence genes and sites have been discovered. Therefore, there is a gradual understanding of the various forms of dystonia, based on new viewpoints. There are 15 subtypes of dystonia, based on the molecular level, i.e., DYT1 to DYT15. OBJECTIVE: To analyze the genetic development of dystonia in detail, and to further investigate molecular mechanisms of dystonia. RETRIEVAL STRATEGY: A computer-based online search was conducted in PubMed for English language publications containing the keywords "dystonia and genetic" from January 1980 to March 2007. There were 105 articles in total. Inclusion criteria: ① the contents of the articles should closely address genetic classification and molecular mechanisms of primary dystonia; ② the articles published in recent years or in high-impact journals took preference. Exclusion criteria: duplicated articles. LITERATURE EVALUATION: The selected articles were on genetic classification and molecular genetics mechanism of primary dystonia. Of those, 27 were basic or clinical studies. DATA SYNTHESIS: ① Dystonia is a heterogeneous disease, with a complex genetic basis. According to the classification of the Human Genome Organization, there are 15 dystonia subtypes, based on genetics, i.e., DYT1-DYT15,including primary dystonia, dystonia plus syndrome, degeneration plus dystonia, and paroxysmal dyskinesia plus dystonia. ② To date, the chromosomes of 13 subtypes have been localized; however, DYT2 and DYT4 remain unclear. Six subtypes have been located within virulence genes. Specifically, torsinA gene expression results in the DYT1 genotype; autosomal dominant GTP cyclohydrolase I gene expression and recessive tyrosine hydroxylase expression result in the DYT5

  12. Classification using Bayesian neural nets

    NARCIS (Netherlands)

    J.C. Bioch (Cor); O. van der Meer; R. Potharst (Rob)

    1995-01-01

    textabstractRecently, Bayesian methods have been proposed for neural networks to solve regression and classification problems. These methods claim to overcome some difficulties encountered in the standard approach such as overfitting. However, an implementation of the full Bayesian approach to neura

  13. Crop Classification by Polarimetric SAR

    DEFF Research Database (Denmark)

    Skriver, Henning; Svendsen, Morten Thougaard; Nielsen, Flemming;

    1999-01-01

    Polarimetric SAR-data of agricultural fields have been acquired by the Danish polarimetric L- and C-band SAR (EMISAR) during a number of missions at the Danish agricultural test site Foulum during 1995. The data are used to study the classification potential of polarimetric SAR data using...

  14. Classification of Global Illumination Algorithms

    OpenAIRE

    Lesev, Hristo

    2010-01-01

    This article describes and classifies various approaches for solving the global illumination problem. The classification aims to show the similarities between different types of algorithms. We introduce the concept of Light Manager, as a central element and mediator between illumination algorithms in a heterogeneous environment of a graphical system. We present results and analysis of the implementation of the described ideas.

  15. Optimizing Classification in Intelligence Processing

    Science.gov (United States)

    2010-12-01

    ACC Classification Accuracy AUC Area Under the ROC Curve CI Competitive Intelligence COMINT Communications Intelligence DoD Department of...indispensible tool to support a national leader’s decision making process, competitive intelligence (CI) has emerged in recent decades as an environment meant...effectiveness for the intelligence product in competitive intelligence environment: accuracy, objectivity, usability, relevance, readiness, and timeliness

  16. CLASSIFICATION OF LEARNING MANAGEMENT SYSTEMS

    Directory of Open Access Journals (Sweden)

    Yu. B. Popova

    2016-01-01

    Full Text Available Using of information technologies and, in particular, learning management systems, increases opportunities of teachers and students in reaching their goals in education. Such systems provide learning content, help organize and monitor training, collect progress statistics and take into account the individual characteristics of each user. Currently, there is a huge inventory of both paid and free systems are physically located both on college servers and in the cloud, offering different features sets of different licensing scheme and the cost. This creates the problem of choosing the best system. This problem is partly due to the lack of comprehensive classification of such systems. Analysis of more than 30 of the most common now automated learning management systems has shown that a classification of such systems should be carried out according to certain criteria, under which the same type of system can be considered. As classification features offered by the author are: cost, functionality, modularity, keeping the customer’s requirements, the integration of content, the physical location of a system, adaptability training. Considering the learning management system within these classifications and taking into account the current trends of their development, it is possible to identify the main requirements to them: functionality, reliability, ease of use, low cost, support for SCORM standard or Tin Can API, modularity and adaptability. According to the requirements at the Software Department of FITR BNTU under the guidance of the author since 2009 take place the development, the use and continuous improvement of their own learning management system.

  17. Galaxy Classifications with Deep Learning

    Science.gov (United States)

    Lukic, Vesna; Brüggen, Marcus

    2017-06-01

    Machine learning techniques have proven to be increasingly useful in astronomical applications over the last few years, for example in object classification, estimating redshifts and data mining. One example of object classification is classifying galaxy morphology. This is a tedious task to do manually, especially as the datasets become larger with surveys that have a broader and deeper search-space. The Kaggle Galaxy Zoo competition presented the challenge of writing an algorithm to find the probability that a galaxy belongs in a particular class, based on SDSS optical spectroscopy data. The use of convolutional neural networks (convnets), proved to be a popular solution to the problem, as they have also produced unprecedented classification accuracies in other image databases such as the database of handwritten digits (MNIST †) and large database of images (CIFAR ‡). We experiment with the convnets that comprised the winning solution, but using broad classifications. The effect of changing the number of layers is explored, as well as using a different activation function, to help in developing an intuition of how the networks function and to see how they can be applied to radio galaxy images.

  18. Real time automatic scene classification

    NARCIS (Netherlands)

    Israël, Menno; Broek, van den Egon L.; Putten, van der Peter; Uyl, den Marten J.; Verbrugge, R.; Taatgen, N.; Schomaker, L.

    2004-01-01

    This work has been done as part of the EU VICAR (IST) project and the EU SCOFI project (IAP). The aim of the first project was to develop a real time video indexing classification annotation and retrieval system. For our systems, we have adapted the approach of Picard and Minka [3], who categorized

  19. Is classification necessary after Google?

    DEFF Research Database (Denmark)

    Hjørland, Birger

    2012-01-01

    Purpose – The purpose of this paper is to examine challenges facing bibliographic classification at both the practical and theoretical levels. At the practical level, libraries are increasingly dispensing with classifying books. At the theoretical level, many researchers, managers, and users beli...

  20. A new classification of Chelicerata

    NARCIS (Netherlands)

    Hammen, van der L.

    1977-01-01

    Progress in classification of Chelicerata has been thwarted especially by two factors, viz., the concept of mites as one monophyletic group, and the opinion that this group consists of species with a highly modified plan of construction and without any trace of true segmentation. The only characters

  1. Contextualizing Object Detection and Classification.

    Science.gov (United States)

    Chen, Qiang; Song, Zheng; Dong, Jian; Huang, Zhongyang; Hua, Yang; Yan, Shuicheng

    2015-01-01

    We investigate how to iteratively and mutually boost object classification and detection performance by taking the outputs from one task as the context of the other one. While context models have been quite popular, previous works mainly concentrate on co-occurrence relationship within classes and few of them focus on contextualization from a top-down perspective, i.e. high-level task context. In this paper, our system adopts a new method for adaptive context modeling and iterative boosting. First, the contextualized support vector machine (Context-SVM) is proposed, where the context takes the role of dynamically adjusting the classification score based on the sample ambiguity, and thus the context-adaptive classifier is achieved. Then, an iterative training procedure is presented. In each step, Context-SVM, associated with the output context from one task (object classification or detection), is instantiated to boost the performance for the other task, whose augmented outputs are then further used to improve the former task by Context-SVM. The proposed solution is evaluated on the object classification and detection tasks of PASCAL Visual Object Classes Challenge (VOC) 2007, 2010 and SUN09 data sets, and achieves the state-of-the-art performance.

  2. Real time automatic scene classification

    NARCIS (Netherlands)

    Verbrugge, R.; Israël, Menno; Taatgen, N.; van den Broek, Egon; van der Putten, Peter; Schomaker, L.; den Uyl, Marten J.

    2004-01-01

    This work has been done as part of the EU VICAR (IST) project and the EU SCOFI project (IAP). The aim of the first project was to develop a real time video indexing classification annotation and retrieval system. For our systems, we have adapted the approach of Picard and Minka [3], who categorized

  3. Handbook for Preparing Job Classifications.

    Science.gov (United States)

    Thomas, John C.

    To assist local governments in their responsibility for eliminating and preventing discrimination in employment based on race, color, religion, sex, or national origin as specified by the Equal Employment Opportunity Act of 1972, the handbook provides guidelines for analyzing jobs and preparing job classifications (defining; listing duties…

  4. Functions in Biological Kind Classification

    Science.gov (United States)

    Lombrozo, Tania; Rehder, Bob

    2012-01-01

    Biological traits that serve functions, such as a zebra's coloration (for camouflage) or a kangaroo's tail (for balance), seem to have a special role in conceptual representations for biological kinds. In five experiments, we investigate whether and why functional features are privileged in biological kind classification. Experiment 1…

  5. Functions in Biological Kind Classification

    Science.gov (United States)

    Lombrozo, Tania; Rehder, Bob

    2012-01-01

    Biological traits that serve functions, such as a zebra's coloration (for camouflage) or a kangaroo's tail (for balance), seem to have a special role in conceptual representations for biological kinds. In five experiments, we investigate whether and why functional features are privileged in biological kind classification. Experiment 1…

  6. The CHAT classification of stroke.

    Science.gov (United States)

    Bernstein, E F; Browse, N L

    1989-02-01

    Current terminology for clinical episodes relating to stroke is inconsistent and unclear, does not permit inclusion of data regarding the location and magnitude of extracranial and intracerebral arterial disease, does not coincide with existing classifications in Europe, and characterizes a hemispheric entity only, as opposed to a global description including prior symptoms in both hemispheres. A new classification system (CHAT) has been designed to deal with these problems, including the current clinical presentation, historical clinical episodes, the site and pathologic type of arterial disease, and information regarding abnormalities of the brain. Using this system, a retrospective review of 480 consecutive carotid endarterectomies is presented, demonstrating the advantages of the CHAT classification. Data include a significant difference in the probability of survival after carotid endarterectomy for asymptomatic stenosis in patients with prior symptoms on the opposite side, as well as a significant difference in the probability of stroke-free survival between patients with amaurosis fugax and those with prior carotid cortical symptoms (TIAs) as the presenting clinical condition. The CHAT classification is suggested as a significant advance in the reporting of all surgical cerebrovascular disease experience, and has particular implications for the current randomized trials between medical and surgical therapy for carotid artery disease.

  7. The classification of orofacial pains.

    Science.gov (United States)

    Okeson, Jeffrey P

    2008-05-01

    This article highlights the process of making the proper orofacial pain diagnosis. A classification is presented based on the clinical characteristics of the pain complaint and the structure by which it emanates. It is meant to serve as a road map for the clinician, which will help him or her establish the correct diagnosis, thereby allowing the selection of the proper treatment.

  8. What is new in genetics and osteogenesis imperfecta classification?

    Directory of Open Access Journals (Sweden)

    Eugênia R. Valadares

    2014-12-01

    Full Text Available OBJECTIVE: Literature review of new genes related to osteogenesis imperfecta (OI and update of its classification. SOURCES: Literature review in the PubMed and OMIM databases, followed by selection of relevant references. SUMMARY OF THE FINDINGS: In 1979, Sillence et al. developed a classification of OI subtypes based on clinical features and disease severity: OI type I, mild, common, with blue sclera; OI type II, perinatal lethal form; OI type III, severe and progressively deforming, with normal sclera; and OI type IV, moderate severity with normal sclera. Approximately 90% of individuals with OI are heterozygous for mutations in the COL1A1 and COL1A2 genes, with dominant pattern of inheritance or sporadic mutations. After 2006, mutations were identified in the CRTAP, FKBP10, LEPRE1, PLOD2, PPIB, SERPINF1, SERPINH1, SP7, WNT1, BMP1, and TMEM38B genes, associated with recessive OI and mutation in the IFITM5 gene associated with dominant OI. Mutations in PLS3 were recently identified in families with osteoporosis and fractures, with X-linked inheritance pattern. In addition to the genetic complexity of the molecular basis of OI, extensive phenotypic variability resulting from individual loci has also been documented. CONCLUSIONS: Considering the discovery of new genes and limited genotype-phenotype correlation, the use of next-generation sequencing tools has become useful in molecular studies of OI cases. The recommendation of the Nosology Group of the International Society of Skeletal Dysplasias is to maintain the classification of Sillence as the prototypical form, universally accepted to classify the degree of severity in OI, while maintaining it free from direct molecular reference.

  9. Comparative genomic analysis of eutherian kallikrein genes

    Directory of Open Access Journals (Sweden)

    Marko Premzl

    2017-03-01

    Full Text Available The present study made attempts to update and revise eutherian kallikrein genes implicated in major physiological and pathological processes and in medical molecular diagnostics. Using eutherian comparative genomic analysis protocol and free available genomic sequence assemblies, the tests of reliability of eutherian public genomic sequences annotated most comprehensive curated third party data gene data set of eutherian kallikrein genes including 121 complete coding sequences among 335 potential coding sequences. The present analysis first described 13 major gene clusters of eutherian kallikrein genes, and explained their differential gene expansion patterns. One updated classification and nomenclature of eutherian kallikrein genes was proposed, as new framework of future experiments.

  10. Functional Basis of Microorganism Classification.

    Directory of Open Access Journals (Sweden)

    Chengsheng Zhu

    2015-08-01

    Full Text Available Correctly identifying nearest "neighbors" of a given microorganism is important in industrial and clinical applications where close relationships imply similar treatment. Microbial classification based on similarity of physiological and genetic organism traits (polyphasic similarity is experimentally difficult and, arguably, subjective. Evolutionary relatedness, inferred from phylogenetic markers, facilitates classification but does not guarantee functional identity between members of the same taxon or lack of similarity between different taxa. Using over thirteen hundred sequenced bacterial genomes, we built a novel function-based microorganism classification scheme, functional-repertoire similarity-based organism network (FuSiON; flattened to fusion. Our scheme is phenetic, based on a network of quantitatively defined organism relationships across the known prokaryotic space. It correlates significantly with the current taxonomy, but the observed discrepancies reveal both (1 the inconsistency of functional diversity levels among different taxa and (2 an (unsurprising bias towards prioritizing, for classification purposes, relatively minor traits of particular interest to humans. Our dynamic network-based organism classification is independent of the arbitrary pairwise organism similarity cut-offs traditionally applied to establish taxonomic identity. Instead, it reveals natural, functionally defined organism groupings and is thus robust in handling organism diversity. Additionally, fusion can use organism meta-data to highlight the specific environmental factors that drive microbial diversification. Our approach provides a complementary view to cladistic assignments and holds important clues for further exploration of microbial lifestyles. Fusion is a more practical fit for biomedical, industrial, and ecological applications, as many of these rely on understanding the functional capabilities of the microbes in their environment and are less

  11. 6 CFR 7.30 - Classification challenges.

    Science.gov (United States)

    2010-01-01

    ... 6 CFR 7.31. ... 6 Domestic Security 1 2010-01-01 2010-01-01 false Classification challenges. 7.30 Section 7.30... INFORMATION Classified Information § 7.30 Classification challenges. (a) Authorized holders of...

  12. Classification of cyber attacks in South Africa

    CSIR Research Space (South Africa)

    Van Heerden, R

    2016-05-01

    Full Text Available This paper introduces a classification scheme for the visual classification of cyber attacks. Through the use of the scheme, the impact of various cyber attacks throughout the history of South Africa are investigated and classified. The goal...

  13. Classification of eight dimensional perfect forms

    NARCIS (Netherlands)

    Dutour Sikiric, M.; Schuermann, A.; Vallentin, F.

    2007-01-01

    In this paper, we classify the perfect lattices in dimension 8. There are 10916 of them. Our classification heavily relies on exploiting symmetry in polyhedral computations. Here we describe algorithms making the classification possible.

  14. Flowers of annonaceae: Morphology, classification, and evolution

    NARCIS (Netherlands)

    Heusden, van E.C.H.

    1992-01-01

    The present paper describes the diversity in floral characters of Annonaceae and their distribution over the family, and discusses their value for classification and generic delimitation. Flower morphology predominated historical classifications of this family since Hooker & Thomson (1855)

  15. Fast rule-based bioactivity prediction using associative classification mining

    Directory of Open Access Journals (Sweden)

    Yu Pulan

    2012-11-01

    Full Text Available Abstract Relating chemical features to bioactivities is critical in molecular design and is used extensively in the lead discovery and optimization process. A variety of techniques from statistics, data mining and machine learning have been applied to this process. In this study, we utilize a collection of methods, called associative classification mining (ACM, which are popular in the data mining community, but so far have not been applied widely in cheminformatics. More specifically, classification based on predictive association rules (CPAR, classification based on multiple association rules (CMAR and classification based on association rules (CBA are employed on three datasets using various descriptor sets. Experimental evaluations on anti-tuberculosis (antiTB, mutagenicity and hERG (the human Ether-a-go-go-Related Gene blocker datasets show that these three methods are computationally scalable and appropriate for high speed mining. Additionally, they provide comparable accuracy and efficiency to the commonly used Bayesian and support vector machines (SVM methods, and produce highly interpretable models.

  16. 46 CFR 503.54 - Original classification.

    Science.gov (United States)

    2010-10-01

    ... 46 Shipping 9 2010-10-01 2010-10-01 false Original classification. 503.54 Section 503.54 Shipping... Program § 503.54 Original classification. (a) No Commission Member or employee has the authority to... require classification, or receives any foreign government information as defined in section 1.1(d)...

  17. 14 CFR 298.3 - Classification.

    Science.gov (United States)

    2010-01-01

    ... 14 Aeronautics and Space 4 2010-01-01 2010-01-01 false Classification. 298.3 Section 298.3... REGULATIONS EXEMPTIONS FOR AIR TAXI AND COMMUTER AIR CARRIER OPERATIONS General § 298.3 Classification. (a) There is hereby established a classification of air carriers, designated as “air taxi operators,”...

  18. 10 CFR 1045.37 - Classification guides.

    Science.gov (United States)

    2010-01-01

    ... 10 Energy 4 2010-01-01 2010-01-01 false Classification guides. 1045.37 Section 1045.37 Energy DEPARTMENT OF ENERGY (GENERAL PROVISIONS) NUCLEAR CLASSIFICATION AND DECLASSIFICATION Generation and Review of Documents Containing Restricted Data and Formerly Restricted Data § 1045.37 Classification...

  19. 32 CFR 1602.7 - Classification.

    Science.gov (United States)

    2010-07-01

    ... 32 National Defense 6 2010-07-01 2010-07-01 false Classification. 1602.7 Section 1602.7 National Defense Other Regulations Relating to National Defense SELECTIVE SERVICE SYSTEM DEFINITIONS § 1602.7 Classification. Classification is the exercise of the power to determine claims or questions with respect...

  20. 32 CFR 644.426 - Classification.

    Science.gov (United States)

    2010-07-01

    ... 32 National Defense 4 2010-07-01 2010-07-01 true Classification. 644.426 Section 644.426 National... HANDBOOK Disposal Disposal of Fee-Owned Real Property and Easement Interests § 644.426 Classification... required by the special acts, classification will be coordinated with the interested Federal agency....

  1. 15 CFR 4a.4 - Classification authority.

    Science.gov (United States)

    2010-01-01

    ... 15 Commerce and Foreign Trade 1 2010-01-01 2010-01-01 false Classification authority. 4a.4 Section 4a.4 Commerce and Foreign Trade Office of the Secretary of Commerce CLASSIFICATION, DECLASSIFICATION, AND PUBLIC AVAILABILITY OF NATIONAL SECURITY INFORMATION § 4a.4 Classification authority. Authority...

  2. 10 CFR 1045.17 - Classification levels.

    Science.gov (United States)

    2010-01-01

    ... 10 Energy 4 2010-01-01 2010-01-01 false Classification levels. 1045.17 Section 1045.17 Energy DEPARTMENT OF ENERGY (GENERAL PROVISIONS) NUCLEAR CLASSIFICATION AND DECLASSIFICATION Identification of Restricted Data and Formerly Restricted Data Information § 1045.17 Classification levels. (a) Restricted...

  3. 17 CFR 200.505 - Original classification.

    Science.gov (United States)

    2010-04-01

    ... 17 Commodity and Securities Exchanges 2 2010-04-01 2010-04-01 false Original classification. 200...; CONDUCT AND ETHICS; AND INFORMATION AND REQUESTS Classification and Declassification of National Security Information and Material § 200.505 Original classification. (a) No Commission Member or employee has...

  4. 14 CFR 1203.701 - Classification.

    Science.gov (United States)

    2010-01-01

    ... 14 Aeronautics and Space 5 2010-01-01 2010-01-01 false Classification. 1203.701 Section 1203.701... Government Information § 1203.701 Classification. (a) Foreign government information that is classified by a foreign entity shall either retain its original classification designation or be marked with a...

  5. 6 CFR 7.26 - Derivative classification.

    Science.gov (United States)

    2010-01-01

    ... amended, 32 CFR 2001.22, and internal DHS guidance provided by the Chief Security Officer. ... 6 Domestic Security 1 2010-01-01 2010-01-01 false Derivative classification. 7.26 Section 7.26... INFORMATION Classified Information § 7.26 Derivative classification. (a) Derivative classification is...

  6. 17 CFR 200.506 - Derivative classification.

    Science.gov (United States)

    2010-04-01

    ... 17 Commodity and Securities Exchanges 2 2010-04-01 2010-04-01 false Derivative classification. 200...; CONDUCT AND ETHICS; AND INFORMATION AND REQUESTS Classification and Declassification of National Security Information and Material § 200.506 Derivative classification. Any document that includes...

  7. 7 CFR 51.1903 - Size classification.

    Science.gov (United States)

    2010-01-01

    ... 7 Agriculture 2 2010-01-01 2010-01-01 false Size classification. 51.1903 Section 51.1903... STANDARDS) United States Consumer Standards for Fresh Tomatoes Size and Maturity Classification § 51.1903 Size classification. The following terms may be used for describing the size of the tomatoes in any...

  8. 7 CFR 51.1402 - Size classification.

    Science.gov (United States)

    2010-01-01

    ... 7 Agriculture 2 2010-01-01 2010-01-01 false Size classification. 51.1402 Section 51.1402... STANDARDS) United States Standards for Grades of Pecans in the Shell 1 Size Classification § 51.1402 Size classification. Size of pecans may be specified in connection with the grade in accordance with one of...

  9. 12 CFR 560.160 - Asset classification.

    Science.gov (United States)

    2010-01-01

    ... 12 Banks and Banking 5 2010-01-01 2010-01-01 false Asset classification. 560.160 Section 560.160... Lending and Investment Provisions Applicable to all Savings Associations § 560.160 Asset classification... consistent with, or reconcilable to, the asset classification system used by OTS in its Thrift...

  10. 46 CFR 95.50-5 - Classification.

    Science.gov (United States)

    2010-10-01

    ... 46 Shipping 4 2010-10-01 2010-10-01 false Classification. 95.50-5 Section 95.50-5 Shipping COAST... Details § 95.50-5 Classification. (a) Hand portable fire extinguishers and semiportable fire extinguishing... extinguishing systems are set forth in Table 95.50-5(c). Table 95.50-5(c) Classification Type Size Soda-acid...

  11. 7 CFR 1794.31 - Classification.

    Science.gov (United States)

    2010-01-01

    ... 7 Agriculture 12 2010-01-01 2010-01-01 false Classification. 1794.31 Section 1794.31 Agriculture... Classification. (a) Electric and telecommunications programs. RUS will normally determine the proper environmental classification of projects based on its evaluation of the project description set forth in...

  12. 15 CFR 4a.3 - Classification levels.

    Science.gov (United States)

    2010-01-01

    ... 15 Commerce and Foreign Trade 1 2010-01-01 2010-01-01 false Classification levels. 4a.3 Section 4a.3 Commerce and Foreign Trade Office of the Secretary of Commerce CLASSIFICATION, DECLASSIFICATION, AND PUBLIC AVAILABILITY OF NATIONAL SECURITY INFORMATION § 4a.3 Classification levels. Information...

  13. 32 CFR 1602.13 - Judgmental Classification.

    Science.gov (United States)

    2010-07-01

    ... 32 National Defense 6 2010-07-01 2010-07-01 false Judgmental Classification. 1602.13 Section 1602.13 National Defense Other Regulations Relating to National Defense SELECTIVE SERVICE SYSTEM DEFINITIONS § 1602.13 Judgmental Classification. A classification action relating to a registrant's claim...

  14. 32 CFR 2400.34 - Classification.

    Science.gov (United States)

    2010-07-01

    ... 32 National Defense 6 2010-07-01 2010-07-01 false Classification. 2400.34 Section 2400.34 National... Government Information § 2400.34 Classification. (a) Foreign government information classified by a foreign government or international organization of governments shall retain its original classification...

  15. 46 CFR Sec. 18 - Group classification.

    Science.gov (United States)

    2010-10-01

    ... 46 Shipping 8 2010-10-01 2010-10-01 false Group classification. Sec. 18 Section 18 Shipping... Sec. 18 Group classification. In the preparation of specifications, Job Orders, Supplemental Job... inserted thereon: Number Classification 41 Maintenance Repairs (deck, engine and stewards...

  16. 46 CFR 76.50-5 - Classification.

    Science.gov (United States)

    2010-10-01

    ... 46 Shipping 3 2010-10-01 2010-10-01 false Classification. 76.50-5 Section 76.50-5 Shipping COAST... Classification. (a) Hand portable fire extinguishers and semiportable fire extinguishing systems shall be... extinguishing systems are set forth in table 76.50-5(c). Table 76.50-5(c) Classification Type Size Soda acid...

  17. 5 CFR 2500.3 - Original classification.

    Science.gov (United States)

    2010-01-01

    ... 5 Administrative Personnel 3 2010-01-01 2010-01-01 false Original classification. 2500.3 Section... SECURITY REGULATION § 2500.3 Original classification. No one in the Office of Administration has been granted authority for original classification of information....

  18. 5 CFR 1312.7 - Derivative classification.

    Science.gov (United States)

    2010-01-01

    ... 5 Administrative Personnel 3 2010-01-01 2010-01-01 false Derivative classification. 1312.7 Section 1312.7 Administrative Personnel OFFICE OF MANAGEMENT AND BUDGET OMB DIRECTIVES CLASSIFICATION, DOWNGRADING, DECLASSIFICATION AND SAFEGUARDING OF NATIONAL SECURITY INFORMATION Classification...

  19. 32 CFR 2001.22 - Derivative classification.

    Science.gov (United States)

    2010-07-01

    ... 32 National Defense 6 2010-07-01 2010-07-01 false Derivative classification. 2001.22 Section 2001... Identification and Markings § 2001.22 Derivative classification. (a) General. Information classified derivatively on the basis of source documents or classification guides shall bear all markings prescribed...

  20. 12 CFR 403.4 - Derivative classification.

    Science.gov (United States)

    2010-01-01

    ... 12 Banks and Banking 4 2010-01-01 2010-01-01 false Derivative classification. 403.4 Section 403.4 Banks and Banking EXPORT-IMPORT BANK OF THE UNITED STATES CLASSIFICATION, DECLASSIFICATION, AND SAFEGUARDING OF NATIONAL SECURITY INFORMATION § 403.4 Derivative classification. (a) Use of...